Re: Forward StackOverflow questions with the apache-beam tag to a new mailing list
Hi folks, Thanks for the comments here. I created https://issues.apache.org/jira/browse/INFRA-23616 to request Apache INFRA to create a new email list. I'll update this thread when INFRA comments on the feasibility of this. Thanks, Cham On Wed, Aug 17, 2022 at 6:35 PM Ahmet Altay wrote: > +1 to this idea. > > I think both options are reasonable. At the same time I also agree with > checking with infra first on the feasibility of creating another mailing > list. > > On Wed, Aug 17, 2022 at 10:00 AM Chamikara Jayalath > wrote: > >> >> >> On Wed, Aug 17, 2022 at 9:18 AM Robert Bradshaw >> wrote: >> >>> On Wed, Aug 17, 2022 at 5:15 AM Alexey Romanenko >>> wrote: >>> > >>> > Good point about unanswered SO questions. +1 that we need to improve a >>> situation there. >>> > >>> > Yes, we may try to stream them to a new dedicated list but it will >>> require people here to subscribe to and check regularly one more list which >>> perhaps won’t be so efficient as well. >>> >>> I view the value of a new list as making it easy to choose to >>> subscribe or not, and being able to respond to questions as they come >>> in. You're right it's less useful for people who manually check lists >>> rather than get them in their inbox. >>> >>> > I believe that a digest of the N latest unanswered questions to dev@ >>> (or to user@? or both?) every 3-4 days should be a better option. >>> >>> We could do this too. Ideally a list of all recent questions (answered >>> or not, answers can be improved and voted on) if that's technically >>> feasible. >>> >>> IIRC, the current rate is around 10-20 questions a week, so not huge, >>> but still non-trivial. >>> >> >> Yeah, ideally we can do both. >> >> (1) Individual questions at a high high frequency to >> stackoverf...@beam.apache.org so that folks who are interested in >> answering such questions (and subscribed to the list) get notified early. >> > > As an anecdotal data point: Google as a company has an internal Q&A site > and creating a similar mailing list had a noticeable impact on the number > of questions answered. It was a good way to include the broader community > as well. That said, I do not know if this would similarly apply to stack > overflow or not. > > >> >> (2) Aggregated digests at a low frequency to the dev@ and user@ so that >> we get more eyeballs into questions that go unanswered. Also this will be a >> good way to get more upvotes for the correct answers. >> >> Thanks, >> Cham >> >> >>> >>> > On 17 Aug 2022, at 04:42, Chamikara Jayalath via dev < >>> dev@beam.apache.org> wrote: >>> > >>> > Hi folks, >>> > >>> > It seems like many of the questions posted to StackOverflow with the >>> apache-beam tag [1] go unanswered or take more than they should to receive >>> an acceptable answer. >>> > >>> > What do you all think about creating a new mailing list, >>> stackoverf...@beam.apache.org (assuming Apache Infra is OK with this >>> and it's technically feasible to do so), where notifications regarding such >>> questions will be forwarded to ? >>> > >>> > This should allow folks who are interested in answering related >>> questions to get notified early. Hopefully getting more eyeballs on these >>> questions will increase the response rate (and the quality of the answers). >>> > >>> > Another option might be to post such notifications (or aggregations) >>> to dev@ but this might unnecessarily spam all members of the dev list. >>> > >>> > Thanks, >>> > Cham >>> > >>> > [1] https://stackoverflow.com/questions/tagged/apache-beam >>> > >>> > >>> >>
Re: Incomplete Beam Schema -> Avro Schema conversion
I don't think there's a reason for this, it's just that these logical types were defined after the Avro <-> Beam schema conversion. I think it would be worthwhile to add support for them, but we'd also need to look at the reverse (avro to beam) direction, which would map back to the catch-all DATETIME primitive type [1]. Changing that could break backwards compatibility. [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776 On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh wrote: > java.lang.RuntimeException: Unhandled logical type > beam:logical_type:date:v1 > at > org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943) > at > org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306) > at > org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341) > at > org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java > > In > https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944 > it seems to me there are some missing options. > > For example > - FixedBytes.IDENTIFIER, > - EnumerationType.IDENTIFIER, > - OneOfType.IDENTIFIER > is there, but: > - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER > ("beam:logical_type:date:v1") > - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER > ("beam:logical_type:datetime:v1") > - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER > ("beam:logical_type:time:v1") > is missing. > > This in an example that fails: > >> import java.time.LocalDate; >> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils; >> import org.apache.beam.sdk.schemas.Schema; >> import org.apache.beam.sdk.schemas.Schema.FieldType; >> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes; >> import org.apache.beam.sdk.schemas.utils.AvroUtils; >> import org.apache.beam.sdk.values.Row; > > // ... > > final Schema schema = >> Schema.builder() >> .addField("ymd", >> FieldType.logicalType(SqlTypes.DATE)) >> .build(); >> >> final Row row = >> Row.withSchema(schema) >> .withFieldValue("ymd", LocalDate.now()) >> .build(); >> >> System.out.println(BigQueryUtils.toTableSchema(schema)); // works >> System.out.println(BigQueryUtils.toTableRow(row)); // works >> >> System.out.println(AvroUtils.toAvroSchema(schema)); // fails >> System.out.println(AvroUtils.toGenericRecord(row)); // fails > > > Am I missing a reason for that or is it just not done properly yet? If > this is the case, am I right to assume that they should be represented in > the Avro format as the already existing cases? > "beam:logical_type:date:v1" vs "DATE" > "beam:logical_type:time:v1" vs "TIME" > > >
Re: Beam Website Feedback
Thank you Tianzi. Adding relevant folks: @John Casey On Mon, Aug 22, 2022 at 3:03 PM Tianzi Cai via dev wrote: > According to this page, > https://beam.apache.org/documentation/io/connectors/, KinesisIO doesn't > have Python implementation, but I found > https://beam.apache.org/releases/pydoc/current/apache_beam.io.kinesis.html#apache_beam.io.kinesis.ReadDataFromKinesis > . > > The page is probably outdated. >
Beam Website Feedback
According to this page, https://beam.apache.org/documentation/io/connectors/, KinesisIO doesn't have Python implementation, but I found https://beam.apache.org/releases/pydoc/current/apache_beam.io.kinesis.html#apache_beam.io.kinesis.ReadDataFromKinesis . The page is probably outdated.
Beam High Priority Issue Report (73)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/22808 [Bug]: Dataflow job stuckness related to big query storage api https://github.com/apache/beam/issues/22779 [Bug]: SpannerIO.readChangeStream() stops forwarding change records and starts continuously throwing (large number) of Operation ongoing errors https://github.com/apache/beam/issues/22773 [Bug]: ElasticsearchIO.Write fails when calling outputWithTimestamp() https://github.com/apache/beam/issues/22749 [Bug]: Bytebuddy version update causes Invisible parameter type error https://github.com/apache/beam/issues/22743 [Bug]: Test flake: org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImplTest.testInsertWithinRowCountLimits https://github.com/apache/beam/issues/22642 [Bug]: Dataflow fails to drain a job when using BigQuery (java sdk v.2.38) https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow SideInput LoadTests failing https://github.com/apache/beam/issues/22321 PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing on jenkins https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and fix known and discovered issues https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at getConnection() in WriteFn https://github.com/apache/beam/issues/22283 [Bug]: Python Lots of fn runner test items cost exactly 5 seconds to run https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer whenever the output timestamp is change https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output to Failed Inserts PCollection https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 failures parent bug https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2 https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam PostCommit Java V1 https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors https://github.com/apache/beam/issues/21700 --dataflowServiceOptions=use_runner_v2 is broken https://github.com/apache/beam/issues/21696 Flink Tests failure : java.lang.NoClassDefFoundError: Could not initialize class org.apache.beam.runners.core.construction.SerializablePipelineOptions https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not raise exception for unsuccessful states. https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with writeResult retry and write to error table https://github.com/apache/beam/issues/21480 flake: FlinkRunnerTest.testEnsureStdoutStdErrIsRestored https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing new AfterSynchronizedProcessingTime test https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21468 beam_PostCommit_Python_Examples_Dataflow failing https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load tests failing https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle https://github.com/apache/beam/issues/21463 NPE in Flink Portable ValidatesRunner streaming suite https://github.com/apache/beam/issues/21462 Flake in org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in beam_PostCommit_Java_DataflowV2 https://github.com/apache/beam/issues/21270 org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView flaky on Dataflow Runner V2 https://github.com/apache/beam/issues/21268 Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi https://github.com/apache/beam/issues/21266 org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful is flaky in Java ValidatesRunner Flink suite. https://github.com/apache/beam/issues/21265 apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked Dataflow Pipeline https://github.com/apache/beam/issues/21262 Python AfterAny, A