Re: Join a meeting to help coordinate implementing a Dask Runner for Beam

2022-06-21 Thread Austin Bennett
Looks/Sounds great!

On Tue, Jun 21, 2022 at 11:06 AM Alex Merose  wrote:

> We had a great meeting last week on this topic! Here is a proposal /
> meeting notes doc:
>
> https://docs.google.com/document/d/1Awj_eNmH-WRSte3bKcCcUlQDiZ5mMKmCO_xV-mHWAak/edit#heading=h.y0pwg4polebc
>
> Tomorrow, another engineer (https://github.com/cisaacstern) and I are
> meeting to create an initial prototype of the Dask runner in the main Beam
> repo. Let us know if you'd like to help out in any way. We'll post updates
> in this mailing list + the above doc.
>
> Best,
> Alex Merose
>
> On 2022/06/08 14:22:41 Ryan Abernathey wrote:
> > Dear Beamer,
> >
> > Thank you for all of your work on this amazing project. I am new to Beam
> > and am quite excited about its potential to help with some data
> processing
> > challenges in my field of climate science.
> >
> > Our community is interested in running Beam on Dask Distributed clusters,
> > which we already know how to deploy. This has been discussed at
> > https://issues.apache.org/jira/browse/BEAM-5336 and
> > https://github.com/apache/beam/issues/18962. It seems technically
> feasible.
> >
> > We are trying to organize a meeting next week to kickstart and coordinate
> > this effort. It would be great if we could entrain some Beam maintainers
> > into this meeting. If you have interest in this topic and are available
> > next week, please share your availability here -
> > https://www.when2meet.com/?15861604-jLnA4
> >
> > Alternatively, if you have any guidance or suggestions you wish to
> provide
> > by email or GitHub discussion, we welcome your input.
> >
> > Thanks again for your open source work.
> >
> > Best,
> > Ryan Abernathey
> >
>


Flaky test issue report (56)

2022-06-21 Thread beamactions
This is your daily summary of Beam's current flaky tests.

These are P1 issues because they have a major negative impact on the 
community and make it hard to determine the quality of the software.



https://api.github.com/repos/apache/beam/issues/21714: 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://api.github.com/repos/apache/beam/issues/21709: 
beam_PostCommit_Java_ValidatesRunner_Samza Failing
https://api.github.com/repos/apache/beam/issues/21708: 
beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing 
consistently
https://api.github.com/repos/apache/beam/issues/21707: GroupByKeyTest 
BasicTests testLargeKeys100MB flake (on ULR)
https://api.github.com/repos/apache/beam/issues/21706: Flaky timeout in github 
Python unit test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://api.github.com/repos/apache/beam/issues/21704: 
beam_PostCommit_Java_DataflowV2 failures parent bug
https://api.github.com/repos/apache/beam/issues/21701: 
beam_PostCommit_Java_DataflowV1 failing with a variety of flakes and errors
https://api.github.com/repos/apache/beam/issues/21698: Docker Snapshots failing 
to be published since April 14th
https://api.github.com/repos/apache/beam/issues/21696: Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://api.github.com/repos/apache/beam/issues/21643: FnRunnerTest with 
non-trivial (order 1000 elements) numpy input flakes in non-cython environment
https://api.github.com/repos/apache/beam/issues/21629: Multiple XVR Suites 
having similar flakes simultaneously
https://api.github.com/repos/apache/beam/issues/21587: 
beam_PreCommit_PythonDocs failing (jinja2)
https://api.github.com/repos/apache/beam/issues/21540: Jenkins worker sometimes 
crashes while running Python Flink pipeline
https://api.github.com/repos/apache/beam/issues/21480: flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://api.github.com/repos/apache/beam/issues/21474: Flaky tests: Gradle 
build daemon disappeared unexpectedly
https://api.github.com/repos/apache/beam/issues/21472: Dataflow streaming tests 
failing new AfterSynchronizedProcessingTime test
https://api.github.com/repos/apache/beam/issues/21471: Flakes: Failed to load 
cache entry
https://api.github.com/repos/apache/beam/issues/21470: Test flake: 
test_split_half_sdf
https://api.github.com/repos/apache/beam/issues/21469: 
beam_PostCommit_XVR_Flink flaky: Connection refused
https://api.github.com/repos/apache/beam/issues/21468: 
beam_PostCommit_Python_Examples_Dataflow failing
https://api.github.com/repos/apache/beam/issues/21467: GBK and CoGBK streaming 
Java load tests failing
https://api.github.com/repos/apache/beam/issues/21464: GroupIntoBatchesTest is 
failing
https://api.github.com/repos/apache/beam/issues/21463: NPE in Flink Portable 
ValidatesRunner streaming suite
https://api.github.com/repos/apache/beam/issues/21462: Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://api.github.com/repos/apache/beam/issues/21333: Flink 
testParDoRequiresStableInput flaky
https://api.github.com/repos/apache/beam/issues/21271: pubsublite.ReadWriteIT 
flaky in beam_PostCommit_Java_DataflowV2  
https://api.github.com/repos/apache/beam/issues/21270: 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://api.github.com/repos/apache/beam/issues/21266: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://api.github.com/repos/apache/beam/issues/21264: beam_PostCommit_Python36 
- CrossLanguageSpannerIOTest - flakey failing
https://api.github.com/repos/apache/beam/issues/21261: 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://api.github.com/repos/apache/beam/issues/21242: 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
 is flaky in Java Spark ValidatesRunner suite 
https://api.github.com/repos/apache/beam/issues/21121: 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://api.github.com/repos/apache/beam/issues/21120: 
beam_PostRelease_NightlySnapshot failed
https://api.github.com/repos/apache/beam/issues/21118: 
PortableRunnerTestWithExternalEnv.test_pardo_timers flaky
https://api.github.com/repos/apache/beam/issues/21116: Python PreCommit flaking 
in PipelineOptionsTest.test_display_data
https://api.github.com/repos/apache/beam/issues/21114: Already Exists: Dataset 
apache-beam-testing:python_bq_file_loads_NNN
https://api.github.com/repos/apache/beam/issues/21113: 
testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky
https://api.github.com/re

P1 issues report (71)

2022-06-21 Thread beamactions
This is your daily summary of Beam's current P1 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/issue-priorities/#p1-critical for 
the meaning and expectations around P1 issues.



https://api.github.com/repos/apache/beam/issues/21946: [Bug]: No way to read or 
write to file when running Beam in Flink
https://api.github.com/repos/apache/beam/issues/21941: [Bug]: No output 
timestamp incorrectly handled in Dataflow runner
https://api.github.com/repos/apache/beam/issues/21935: [Bug]: Reject illformed 
GBK Coders
https://api.github.com/repos/apache/beam/issues/21897: [Feature Request]: Flink 
runner savepoint backward compatibility 
https://api.github.com/repos/apache/beam/issues/21893: [Bug]: BigQuery Storage 
Write API implementation does not support table partitioning
https://api.github.com/repos/apache/beam/issues/21794: Dataflow runner creates 
a new timer whenever the output timestamp is change
https://api.github.com/repos/apache/beam/issues/21763: [Playground Task]: 
Migrate from Google Analytics to Matomo Cloud
https://api.github.com/repos/apache/beam/issues/21715: Data missing when using 
CassandraIO.Read
https://api.github.com/repos/apache/beam/issues/21713: 404s in BigQueryIO don't 
get output to Failed Inserts PCollection
https://api.github.com/repos/apache/beam/issues/21711: Python Streaming job 
failing to drain with BigQueryIO write errors
https://api.github.com/repos/apache/beam/issues/21703: pubsublite.ReadWriteIT 
failing in beam_PostCommit_Java_DataflowV1 and V2
https://api.github.com/repos/apache/beam/issues/21702: SpannerWriteIT failing 
in beam PostCommit Java V1
https://api.github.com/repos/apache/beam/issues/21700: 
--dataflowServiceOptions=use_runner_v2 is broken
https://api.github.com/repos/apache/beam/issues/21699: Changing the output 
timestamp of a timer does not clear the previously set timer
https://api.github.com/repos/apache/beam/issues/21695: DataflowPipelineResult 
does not raise exception for unsuccessful states.
https://api.github.com/repos/apache/beam/issues/21694: BigQuery Storage API 
insert with writeResult retry and write to error table
https://api.github.com/repos/apache/beam/issues/21479: Install Python wheel and 
dependencies to local venv in SDK harness
https://api.github.com/repos/apache/beam/issues/21478: 
KafkaIO.read.withDynamicRead() doesn't pick up new TopicPartitions
https://api.github.com/repos/apache/beam/issues/21477: Add integration testing 
for BQ Storage API  write modes
https://api.github.com/repos/apache/beam/issues/21476: WriteToBigQuery Dynamic 
table destinations returns wrong tableId
https://api.github.com/repos/apache/beam/issues/21475: Beam x-lang Dataflow 
tests failing due to _InactiveRpcError
https://api.github.com/repos/apache/beam/issues/21473: PVR_Spark2_Streaming 
perma-red
https://api.github.com/repos/apache/beam/issues/21466: Simplify version 
override for Dev versions of the Go SDK.
https://api.github.com/repos/apache/beam/issues/21465: Kafka commit offset drop 
data on failure for runners that have non-checkpointing shuffle
https://api.github.com/repos/apache/beam/issues/21269: Delete orphaned files
https://api.github.com/repos/apache/beam/issues/21268: Race between member 
variable being accessed due to leaking uninitialized state via 
OutboundObserverFactory
https://api.github.com/repos/apache/beam/issues/21267: WriteToBigQuery submits 
a duplicate BQ load job if a 503 error code is returned from googleapi
https://api.github.com/repos/apache/beam/issues/21265: 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://api.github.com/repos/apache/beam/issues/21263: (Broken Pipe induced) 
Bricked Dataflow Pipeline 
https://api.github.com/repos/apache/beam/issues/21262: Python AfterAny, 
AfterAll do not follow spec
https://api.github.com/repos/apache/beam/issues/21260: Python DirectRunner does 
not emit data at GC time
https://api.github.com/repos/apache/beam/issues/21259: Consumer group with 
random prefix
https://api.github.com/repos/apache/beam/issues/21258: Dataflow error in 
CombinePerKey operation
https://api.github.com/repos/apache/beam/issues/21257: Either Create or 
DirectRunner fails to produce all elements to the following transform
https://api.github.com/repos/apache/beam/issues/21123: Multiple jobs running on 
Flink session cluster reuse the persistent Python environment.
https://api.github.com/repos/apache/beam/issues/21119: Migrate to the next 
version of Python `requests` when released
https://api.github.com/repos/apache/beam/issues/21117: "Java IO IT Tests" - 
missing data in grafana
https://api.github.com/repos/apache/beam/issues/21115: JdbcIO date conversion 
is sensitive to OS
https://api.github.com/repos/apache/beam/issues/21112: Dataflow SocketException 
(SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery
https:/

P0 issues report (2)

2022-06-21 Thread beamactions
This is your daily summary of Beam's current P0 issues, not including flaky 
tests.

See https://beam.apache.org/contribute/issue-priorities/#p0-outage for the 
meaning and expectations around P0 issues.



https://api.github.com/repos/apache/beam/issues/21948: [Bug]: KinesisIO javadoc 
is no longer up-to-date
https://api.github.com/repos/apache/beam/issues/21824: [Bug]: Disable PR 
comment trigger


Re: Write data to Jdbc from BigQueryIO | Issue

2022-06-21 Thread Ravi Kapoor
Hello Team,
Has anyone aware of this problem?

Thanks,
Ravi

On Sun, Jun 19, 2022 at 10:59 PM Ravi Kapoor  wrote:

> Hi Team
>
> I am trying writing a PCollection from BQ with Schema as
>
>
> *final Schema schema =
> Schema.builder().addInt64Field("user_id").addStringField("user_name").build();*
>  to a JDBC datasource (oracle)
> having table schema as below on Oracle :
>
> Table_A ( user_id NUMBER(3), user_name varchar(10))
> The code flow is such that this will invoke
>
>
> *PCollection expand(PCollection input)*and internally this will
> call
>
>
> *List fields =
> spec.getFilteredFields(input.getSchema());*which converts the
> resultsetmetadata to BeamSchema and for the numeric type this is what is set
> *LogicalTypes.FixedPrecisionNumeric.of* as FieldType in
> jdbcTypeToBeamFieldConverter method which has baseType as Decimal
>
>
> *SchemaUtil.compareSchemaField(tableField, f)*which subsequently compare
> schemaFieldType with below method:
>
>
> *static boolean compareSchemaFieldType(Schema.FieldType a,
> Schema.FieldType b) {*The below code returns false:
>
> *return
> a.getLogicalType().getBaseType().getTypeName().equals(b.getTypeName());*
>  as the base type is set as DECIMAL for the user_id field in oracle
> datasource.
>
> And it eventually fails with
>
> *"Provided schema doesn't match with database schema. " + " Table has
> fields: ",*
> Can you please check whether it's a bug in matching BQ data type to Oracle?
> Or do I require different handling in writing to Jdbc source?
>
> --
> Thanks,
> Ravi Kapoor
> +91-9818764564
> kapoorrav...@gmail.com
>


-- 
Thanks,
Ravi Kapoor
+91-9818764564
kapoorrav...@gmail.com