Re: Join a meeting to help coordinate implementing a Dask Runner for Beam
Looks/Sounds great! On Tue, Jun 21, 2022 at 11:06 AM Alex Merose wrote: > We had a great meeting last week on this topic! Here is a proposal / > meeting notes doc: > > https://docs.google.com/document/d/1Awj_eNmH-WRSte3bKcCcUlQDiZ5mMKmCO_xV-mHWAak/edit#heading=h.y0pwg4polebc > > Tomorrow, another engineer (https://github.com/cisaacstern) and I are > meeting to create an initial prototype of the Dask runner in the main Beam > repo. Let us know if you'd like to help out in any way. We'll post updates > in this mailing list + the above doc. > > Best, > Alex Merose > > On 2022/06/08 14:22:41 Ryan Abernathey wrote: > > Dear Beamer, > > > > Thank you for all of your work on this amazing project. I am new to Beam > > and am quite excited about its potential to help with some data > processing > > challenges in my field of climate science. > > > > Our community is interested in running Beam on Dask Distributed clusters, > > which we already know how to deploy. This has been discussed at > > https://issues.apache.org/jira/browse/BEAM-5336 and > > https://github.com/apache/beam/issues/18962. It seems technically > feasible. > > > > We are trying to organize a meeting next week to kickstart and coordinate > > this effort. It would be great if we could entrain some Beam maintainers > > into this meeting. If you have interest in this topic and are available > > next week, please share your availability here - > > https://www.when2meet.com/?15861604-jLnA4 > > > > Alternatively, if you have any guidance or suggestions you wish to > provide > > by email or GitHub discussion, we welcome your input. > > > > Thanks again for your open source work. > > > > Best, > > Ryan Abernathey > > >
Flaky test issue report (56)
This is your daily summary of Beam's current flaky tests. These are P1 issues because they have a major negative impact on the community and make it hard to determine the quality of the software. https://api.github.com/repos/apache/beam/issues/21714: PulsarIOTest.testReadFromSimpleTopic is very flaky https://api.github.com/repos/apache/beam/issues/21709: beam_PostCommit_Java_ValidatesRunner_Samza Failing https://api.github.com/repos/apache/beam/issues/21708: beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://api.github.com/repos/apache/beam/issues/21707: GroupByKeyTest BasicTests testLargeKeys100MB flake (on ULR) https://api.github.com/repos/apache/beam/issues/21706: Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://api.github.com/repos/apache/beam/issues/21704: beam_PostCommit_Java_DataflowV2 failures parent bug https://api.github.com/repos/apache/beam/issues/21701: beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors https://api.github.com/repos/apache/beam/issues/21698: Docker Snapshots failing to be published since April 14th https://api.github.com/repos/apache/beam/issues/21696: Flink Tests failure : java.lang.NoClassDefFoundError: Could not initialize class org.apache.beam.runners.core.construction.SerializablePipelineOptions https://api.github.com/repos/apache/beam/issues/21643: FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://api.github.com/repos/apache/beam/issues/21629: Multiple XVR Suites having similar flakes simultaneously https://api.github.com/repos/apache/beam/issues/21587: beam_PreCommit_PythonDocs failing (jinja2) https://api.github.com/repos/apache/beam/issues/21540: Jenkins worker sometimes crashes while running Python Flink pipeline https://api.github.com/repos/apache/beam/issues/21480: flake: FlinkRunnerTest.testEnsureStdoutStdErrIsRestored https://api.github.com/repos/apache/beam/issues/21474: Flaky tests: Gradle build daemon disappeared unexpectedly https://api.github.com/repos/apache/beam/issues/21472: Dataflow streaming tests failing new AfterSynchronizedProcessingTime test https://api.github.com/repos/apache/beam/issues/21471: Flakes: Failed to load cache entry https://api.github.com/repos/apache/beam/issues/21470: Test flake: test_split_half_sdf https://api.github.com/repos/apache/beam/issues/21469: beam_PostCommit_XVR_Flink flaky: Connection refused https://api.github.com/repos/apache/beam/issues/21468: beam_PostCommit_Python_Examples_Dataflow failing https://api.github.com/repos/apache/beam/issues/21467: GBK and CoGBK streaming Java load tests failing https://api.github.com/repos/apache/beam/issues/21464: GroupIntoBatchesTest is failing https://api.github.com/repos/apache/beam/issues/21463: NPE in Flink Portable ValidatesRunner streaming suite https://api.github.com/repos/apache/beam/issues/21462: Flake in org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use https://api.github.com/repos/apache/beam/issues/21333: Flink testParDoRequiresStableInput flaky https://api.github.com/repos/apache/beam/issues/21271: pubsublite.ReadWriteIT flaky in beam_PostCommit_Java_DataflowV2 https://api.github.com/repos/apache/beam/issues/21270: org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView flaky on Dataflow Runner V2 https://api.github.com/repos/apache/beam/issues/21266: org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful is flaky in Java ValidatesRunner Flink suite. https://api.github.com/repos/apache/beam/issues/21264: beam_PostCommit_Python36 - CrossLanguageSpannerIOTest - flakey failing https://api.github.com/repos/apache/beam/issues/21261: org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer is flaky https://api.github.com/repos/apache/beam/issues/21242: org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle is flaky in Java Spark ValidatesRunner suite https://api.github.com/repos/apache/beam/issues/21121: apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://api.github.com/repos/apache/beam/issues/21120: beam_PostRelease_NightlySnapshot failed https://api.github.com/repos/apache/beam/issues/21118: PortableRunnerTestWithExternalEnv.test_pardo_timers flaky https://api.github.com/repos/apache/beam/issues/21116: Python PreCommit flaking in PipelineOptionsTest.test_display_data https://api.github.com/repos/apache/beam/issues/21114: Already Exists: Dataset apache-beam-testing:python_bq_file_loads_NNN https://api.github.com/repos/apache/beam/issues/21113: testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky https://api.github.com/re
P1 issues report (71)
This is your daily summary of Beam's current P1 issues, not including flaky tests. See https://beam.apache.org/contribute/issue-priorities/#p1-critical for the meaning and expectations around P1 issues. https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way to read or write to file when running Beam in Flink https://api.github.com/repos/apache/beam/issues/21941: [Bug]: No output timestamp incorrectly handled in Dataflow runner https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject illformed GBK Coders https://api.github.com/repos/apache/beam/issues/21897: [Feature Request]: Flink runner savepoint backward compatibility https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery Storage Write API implementation does not support table partitioning https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner creates a new timer whenever the output timestamp is change https://api.github.com/repos/apache/beam/issues/21763: [Playground Task]: Migrate from Google Analytics to Matomo Cloud https://api.github.com/repos/apache/beam/issues/21715: Data missing when using CassandraIO.Read https://api.github.com/repos/apache/beam/issues/21713: 404s in BigQueryIO don't get output to Failed Inserts PCollection https://api.github.com/repos/apache/beam/issues/21711: Python Streaming job failing to drain with BigQueryIO write errors https://api.github.com/repos/apache/beam/issues/21703: pubsublite.ReadWriteIT failing in beam_PostCommit_Java_DataflowV1 and V2 https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT failing in beam PostCommit Java V1 https://api.github.com/repos/apache/beam/issues/21700: --dataflowServiceOptions=use_runner_v2 is broken https://api.github.com/repos/apache/beam/issues/21699: Changing the output timestamp of a timer does not clear the previously set timer https://api.github.com/repos/apache/beam/issues/21695: DataflowPipelineResult does not raise exception for unsuccessful states. https://api.github.com/repos/apache/beam/issues/21694: BigQuery Storage API insert with writeResult retry and write to error table https://api.github.com/repos/apache/beam/issues/21479: Install Python wheel and dependencies to local venv in SDK harness https://api.github.com/repos/apache/beam/issues/21478: KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions https://api.github.com/repos/apache/beam/issues/21477: Add integration testing for BQ Storage API write modes https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery Dynamic table destinations returns wrong tableId https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang Dataflow tests failing due to _InactiveRpcError https://api.github.com/repos/apache/beam/issues/21473: PVR_Spark2_Streaming perma-red https://api.github.com/repos/apache/beam/issues/21466: Simplify version override for Dev versions of the Go SDK. https://api.github.com/repos/apache/beam/issues/21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned files https://api.github.com/repos/apache/beam/issues/21268: Race between member variable being accessed due to leaking uninitialized state via OutboundObserverFactory https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery submits a duplicate BQ load job if a 503 error code is returned from googleapi https://api.github.com/repos/apache/beam/issues/21265: apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe induced) Bricked Dataflow Pipeline https://api.github.com/repos/apache/beam/issues/21262: Python AfterAny, AfterAll do not follow spec https://api.github.com/repos/apache/beam/issues/21260: Python DirectRunner does not emit data at GC time https://api.github.com/repos/apache/beam/issues/21259: Consumer group with random prefix https://api.github.com/repos/apache/beam/issues/21258: Dataflow error in CombinePerKey operation https://api.github.com/repos/apache/beam/issues/21257: Either Create or DirectRunner fails to produce all elements to the following transform https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment. https://api.github.com/repos/apache/beam/issues/21119: Migrate to the next version of Python `requests` when released https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT Tests" - missing data in grafana https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date conversion is sensitive to OS https://api.github.com/repos/apache/beam/issues/21112: Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery https:/
P0 issues report (2)
This is your daily summary of Beam's current P0 issues, not including flaky tests. See https://beam.apache.org/contribute/issue-priorities/#p0-outage for the meaning and expectations around P0 issues. https://api.github.com/repos/apache/beam/issues/21948: [Bug]: KinesisIO javadoc is no longer up-to-date https://api.github.com/repos/apache/beam/issues/21824: [Bug]: Disable PR comment trigger
Re: Write data to Jdbc from BigQueryIO | Issue
Hello Team, Has anyone aware of this problem? Thanks, Ravi On Sun, Jun 19, 2022 at 10:59 PM Ravi Kapoor wrote: > Hi Team > > I am trying writing a PCollection from BQ with Schema as > > > *final Schema schema = > Schema.builder().addInt64Field("user_id").addStringField("user_name").build();* > to a JDBC datasource (oracle) > having table schema as below on Oracle : > > Table_A ( user_id NUMBER(3), user_name varchar(10)) > The code flow is such that this will invoke > > > *PCollection expand(PCollection input)*and internally this will > call > > > *List fields = > spec.getFilteredFields(input.getSchema());*which converts the > resultsetmetadata to BeamSchema and for the numeric type this is what is set > *LogicalTypes.FixedPrecisionNumeric.of* as FieldType in > jdbcTypeToBeamFieldConverter method which has baseType as Decimal > > > *SchemaUtil.compareSchemaField(tableField, f)*which subsequently compare > schemaFieldType with below method: > > > *static boolean compareSchemaFieldType(Schema.FieldType a, > Schema.FieldType b) {*The below code returns false: > > *return > a.getLogicalType().getBaseType().getTypeName().equals(b.getTypeName());* > as the base type is set as DECIMAL for the user_id field in oracle > datasource. > > And it eventually fails with > > *"Provided schema doesn't match with database schema. " + " Table has > fields: ",* > Can you please check whether it's a bug in matching BQ data type to Oracle? > Or do I require different handling in writing to Jdbc source? > > -- > Thanks, > Ravi Kapoor > +91-9818764564 > kapoorrav...@gmail.com > -- Thanks, Ravi Kapoor +91-9818764564 kapoorrav...@gmail.com