Re: Beam Website Feedback
Thank you for the feedback Peter. /cc @Reza Rokni @David Cavazos for visibility. On Fri, Sep 9, 2022 at 2:51 PM Austin Bennett wrote: > Any chance you've seen a GH Issue for that? If not, please feel free to > file one :-) > > https://github.com/apache/beam/issues > > On Fri, Sep 9, 2022 at 12:09 PM Peter McArthur wrote: > >> I wish there were a python example for "Slowly updating global window >> side inputs” on this page >> https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs >> >> thank you >> >>
Re: Beam Website Feedback
Any chance you've seen a GH Issue for that? If not, please feel free to file one :-) https://github.com/apache/beam/issues On Fri, Sep 9, 2022 at 12:09 PM Peter McArthur wrote: > I wish there were a python example for "Slowly updating global window side > inputs” on this page > https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs > > thank you > >
Beam Website Feedback
I wish there were a python example for "Slowly updating global window side inputs” on this page https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs thank you
Re: Incomplete Beam Schema -> Avro Schema conversion
Is it still better to have an asymmetric conversion that supports more data types than not having these implemented, right? This contribution seems simple enough, but that's definitely not true for the other direction (... and I'm also biased, I only need Beam->Avro). Brian Hulette via dev ezt írta (időpont: 2022. aug. 23., K, 1:53): > I don't think there's a reason for this, it's just that these logical > types were defined after the Avro <-> Beam schema conversion. I think it > would be worthwhile to add support for them, but we'd also need to look at > the reverse (avro to beam) direction, which would map back to the catch-all > DATETIME primitive type [1]. Changing that could break backwards > compatibility. > > [1] > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776 > > On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh > wrote: > >> java.lang.RuntimeException: Unhandled logical type >> beam:logical_type:date:v1 >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943) >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306) >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341) >> at >> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java >> >> In >> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944 >> it seems to me there are some missing options. >> >> For example >> - FixedBytes.IDENTIFIER, >> - EnumerationType.IDENTIFIER, >> - OneOfType.IDENTIFIER >> is there, but: >> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER >> ("beam:logical_type:date:v1") >> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER >> ("beam:logical_type:datetime:v1") >> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER >> ("beam:logical_type:time:v1") >> is missing. >> >> This in an example that fails: >> >>> import java.time.LocalDate; >>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils; >>> import org.apache.beam.sdk.schemas.Schema; >>> import org.apache.beam.sdk.schemas.Schema.FieldType; >>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; >>> import org.apache.beam.sdk.schemas.utils.AvroUtils; >>> import org.apache.beam.sdk.values.Row; >> >> // ... >> >> final Schema schema = >>> Schema.builder() >>> .addField("ymd", >>> FieldType.logicalType(SqlTypes.DATE)) >>> .build(); >>> >>> final Row row = >>> Row.withSchema(schema) >>> .withFieldValue("ymd", LocalDate.now()) >>> .build(); >>> >>> System.out.println(BigQueryUtils.toTableSchema(schema)); // works >>> System.out.println(BigQueryUtils.toTableRow(row)); // works >>> >>> System.out.println(AvroUtils.toAvroSchema(schema)); // fails >>> System.out.println(AvroUtils.toGenericRecord(row)); // fails >> >> >> Am I missing a reason for that or is it just not done properly yet? If >> this is the case, am I right to assume that they should be represented in >> the Avro format as the already existing cases? >> "beam:logical_type:date:v1" vs "DATE" >> "beam:logical_type:time:v1" vs "TIME" >> >> >>
Beam High Priority Issue Report (69)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/23106 [Feature Request][Go SDK]: Support Slowly Changing Side Input Pattern https://github.com/apache/beam/issues/23020 [Bug]: Kubernetes service not cleaned up and reaching quota causing multiple performance test failure https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is failing https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update causes Invisible parameter type error https://github.com/apache/beam/issues/22743 [Bug]: Test flake: org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow SideInput LoadTests failing https://github.com/apache/beam/issues/22321 PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing on jenkins https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and fix known and discovered issues https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at getConnection() in WriteFn https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner test items cost exactly 5 seconds to run https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer whenever the output timestamp is change https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output to Failed Inserts PCollection https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 failures parent bug https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2 https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam PostCommit Java V1 https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors https://github.com/apache/beam/issues/21700 --dataflowServiceOptions=use_runner_v2 is broken https://github.com/apache/beam/issues/21696 Flink Tests failure : java.lang.NoClassDefFoundError: Could not initialize class org.apache.beam.runners.core.construction.SerializablePipelineOptions https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not raise exception for unsuccessful states. https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with writeResult retry and write to error table https://github.com/apache/beam/issues/21480 flake: FlinkRunnerTest.testEnsureStdoutStdErrIsRestored https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing new AfterSynchronizedProcessingTime test https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21468 beam_PostCommit_Python_Examples_Dataflow failing https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load tests failing https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle https://github.com/apache/beam/issues/21463 NPE in Flink Portable ValidatesRunner streaming suite https://github.com/apache/beam/issues/21462 Flake in org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in beam_PostCommit_Java_DataflowV2 https://github.com/apache/beam/issues/21270 org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView flaky on Dataflow Runner V2 https://github.com/apache/beam/issues/21268 Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi https://github.com/apache/beam/issues/21266 org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful is flaky in Java ValidatesRunner Flink suite. https://github.com/apache/beam/issues/21265 apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked Dataflow Pipeline https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21261 org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClien