Re: Why is Avro Date field using InstantCoder?
If I had to change things, I would: 1. When deriving the SCHEMA add a few new types (JAVA_TIME, JAVA_DATE or something along those lines). 2. RowCoderGenerator around line 159 calls "SchemaCoder.coderForFieldType(schema.getField(rowIndex).getType().withNullable(false));" which eventually gets to SchemaCoderHelpers.coderForFieldType. There, CODER_MAP has a hard reference on InstantCoder for DATETIME. Maybe that map can be augmented (possibly dynamically) with new fieldtypes-coder combinations to take care of the new types from #1. I would also like to ask. Looking through the Beam code, I see a lot of static calls. Just wondering why it's done this way. I'm used to projects having some form of dependency injection involved and static calls being frowned upon (lack of mockability, hidden dependencies, tight coupling etc). The only reason I can think of is serializability given Beam's multi-node processing? On Sat, Oct 16, 2021 at 3:11 AM Reuven Lax wrote: > Is the Schema inference the only reason we can't upgrade Avro, or are > there other blockers? Is there any way we can tell at runtime which version > of Avro is running? Since we generate the conversion code at runtime with > ByteBuddy, we could potentially just generate different conversions > depending on the Avro version. > > On Fri, Oct 15, 2021 at 11:56 PM Cristian Constantinescu > wrote: > >> Those are fair points. However please consider that there might be new >> users who will decide that Beam isn't suitable because of things like >> requiring Avro 1.8, Joda time, old Confluent libraries, and, when I started >> using Beam about a year ago, Java 8 (I think we're okay with Java 11 now). >> >> I guess what I'm saying is that there's definitely a non-negligible cost >> associated with old 3rd party libs in Beam's code (even if efforts are put >> in to minimize them). >> >> On Sat, Oct 16, 2021 at 2:33 AM Reuven Lax wrote: >> >>> >>> >>> On Fri, Oct 15, 2021 at 11:13 PM Cristian Constantinescu < >>> zei...@gmail.com> wrote: >>> To my knowledge and reading through AVRO's Jira[1], it does not support jodatime anymore. It seems everything related to this Avro 1.8 dependency is tricky. If you recall, it also prevents us from upgrading to the latest Confluent libs... for enabling Beam to use protobufs schemas with the schema registry. (I was also the one who brought that issue up, also made an exploratory PR to move AVRO outside of Beam core.) I understand that Beam tries to maintain public APIs stable, but I'd like to put forward two points: 1) Schemas are experimental, hence there shouldn't be any API stability guarantees there. >>> >>> Unfortunately at this point, they aren't really. As a community we've >>> been bad about removing the Experimental label - many core, core parts of >>> Beam are still labeled experimental (sources, triggering, state, timers). >>> Realistically they are no longer experimental. >>> >>> 2) Maybe this is the perfect opportunity for the Beam community to think about the long term effects of old dependencies within Beam's codebase, and especially how to deal with them. Perhaps starting/maintaining an "experimental" branch/maven-published-artifacts where Beam does not guarantee backwards compatibility (or maintains it for a shorter period of time) is something to think about. >>> >>> This is why we usually try to prevent third-party libraries from being >>> in our public API. However in this case, that's tricky. >>> >>> The beam community can of course decide to break backwards >>> compatibility. However as stated today, it is maintained. The last time we >>> broke backwards compatibility was when the old Dataflow API was >>> transitioned to Beam, and it was very painful. It took multiple years to >>> get some users onto the Beam API due to the code changes required to >>> migrate (and those required code changes weren't terribly invasive). >>> >>> [1] https://issues.apache.org/jira/browse/AVRO-2335 On Sat, Oct 16, 2021 at 12:40 AM Reuven Lax wrote: > Does this mean more recent versions of avro aren't backwards > compatible with avro 1.8? If so, this might be tricky to fix, since Beam > maintains backwards compatibility on its public API. > > On Fri, Oct 15, 2021 at 5:38 PM Cristian Constantinescu < > zei...@gmail.com> wrote: > >> Hi all, >> >> I've created a small demo project to show the issue[1]. >> >> I've looked at the beam code and all the avro tests use avro 1.8... >> which is hardcoded to Joda, hence why all the tests pass. Avro changed to >> java.time (as most other recent java projects) after 1.8. Is there a plan >> for Beam to do the same? >> >> Does anyone use Avro with java.time instead of joda.time that could >> give me ideas how to make it work? >> >> Thank you, >> Cristian >> >> [1] https://
Performance of Apache Beam
Hi Team Could you please let me know following below answers . I need to know performance of apache beam vs flink if we use flink as runner for Beam, what will be the additional overhead converting Beam to flink How fault tolerance and resiliency handled in apache beam. How apache beam handles backpressure? Thanks Azhar
Flaky test issue report (30)
This is your daily summary of Beam's current flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake) These are P1 issues because they have a major negative impact on the community and make it hard to determine the quality of the software. https://issues.apache.org/jira/browse/BEAM-13025: beam_PostCommit_Java_DataflowV2 failing pubsublite.ReadWriteIT (created 2021-10-08) https://issues.apache.org/jira/browse/BEAM-12928: beam_PostCommit_Python36 - CrossLanguageSpannerIOTest - flakey failing (created 2021-09-21) https://issues.apache.org/jira/browse/BEAM-12859: org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer is flaky (created 2021-09-08) https://issues.apache.org/jira/browse/BEAM-12809: testTwoTimersSettingEachOtherWithCreateAsInputBounded flaky (created 2021-08-26) https://issues.apache.org/jira/browse/BEAM-12794: PortableRunnerTestWithExternalEnv.test_pardo_timers flaky (created 2021-08-24) https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16) https://issues.apache.org/jira/browse/BEAM-12540: beam_PostRelease_NightlySnapshot - Task :runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25) https://issues.apache.org/jira/browse/BEAM-12515: Python PreCommit flaking in PipelineOptionsTest.test_display_data (created 2021-06-18) https://issues.apache.org/jira/browse/BEAM-12322: Python precommit flaky: Failed to read inputs in the data plane (created 2021-05-10) https://issues.apache.org/jira/browse/BEAM-12320: PubsubTableProviderIT.testSQLSelectsArrayAttributes[0] failing in SQL PostCommit (created 2021-05-10) https://issues.apache.org/jira/browse/BEAM-12291: org.apache.beam.runners.flink.ReadSourcePortableTest.testExecution[streaming: false] is flaky (created 2021-05-05) https://issues.apache.org/jira/browse/BEAM-12200: SamzaStoreStateInternalsTest is flaky (created 2021-04-20) https://issues.apache.org/jira/browse/BEAM-12163: Python GHA PreCommits flake with grpc.FutureTimeoutError on SDK harness startup (created 2021-04-13) https://issues.apache.org/jira/browse/BEAM-12061: beam_PostCommit_SQL failing on KafkaTableProviderIT.testFakeNested (created 2021-03-27) https://issues.apache.org/jira/browse/BEAM-11837: Java build flakes: "Memory constraints are impeding performance" (created 2021-02-18) https://issues.apache.org/jira/browse/BEAM-11661: hdfsIntegrationTest flake: network not found (py38 postcommit) (created 2021-01-19) https://issues.apache.org/jira/browse/BEAM-11645: beam_PostCommit_XVR_Flink failing (created 2021-01-15) https://issues.apache.org/jira/browse/BEAM-11641: Bigquery Read tests are flaky on Flink runner in Python PostCommit suites (created 2021-01-15) https://issues.apache.org/jira/browse/BEAM-11541: testTeardownCalledAfterExceptionInProcessElement flakes on direct runner. (created 2020-12-30) https://issues.apache.org/jira/browse/BEAM-10955: Flink Java Runner test flake: Could not find Flink job (FlinkJobNotFoundException) (created 2020-09-23) https://issues.apache.org/jira/browse/BEAM-10866: PortableRunnerTestWithSubprocesses.test_register_finalizations flaky on macOS (created 2020-09-09) https://issues.apache.org/jira/browse/BEAM-10485: Failure / flake: ElasticsearchIOTest > testWriteWithIndexFn (created 2020-07-14) https://issues.apache.org/jira/browse/BEAM-9649: beam_python_mongoio_load_test started failing due to mismatched results (created 2020-03-31) https://issues.apache.org/jira/browse/BEAM-8101: Flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful for Direct, Spark, Flink (created 2019-08-27) https://issues.apache.org/jira/browse/BEAM-8035: WatchTest.testMultiplePollsWithManyResults flake: Outputs must be in timestamp order (sickbayed) (created 2019-08-22) https://issues.apache.org/jira/browse/BEAM-7827: MetricsTest$AttemptedMetricTests.testAllAttemptedMetrics is flaky on DirectRunner (created 2019-07-26) https://issues.apache.org/jira/browse/BEAM-7752: Java Validates DirectRunner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky (created 2019-07-16) https://issues.apache.org/jira/browse/BEAM-6804: [beam_PostCommit_Java] [PubsubReadIT.testReadPublicData] Timeout waiting on Sub (created 2019-03-11) https://issues.apache.org/jira/browse/BEAM-5286: [beam_PostCommit_Java_GradleBuild][org.apache.beam.examples.subprocess.ExampleEchoPipelineTest.testExampleEchoPipeline][Flake] .sh script: text file busy. (created 2018-09-01) https://issues.apache.org/jira/browse/BEAM-5172: org.apache.beam.sdk.io.elasticsearch/ElasticsearchIOTest is flaky (created 2018-08-20)
P1 issues report (49)
This is your daily summary of Beam's current P1 issues, not including flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake). See https://beam.apache.org/contribute/jira-priorities/#p1-critical for the meaning and expectations around P1 issues. https://issues.apache.org/jira/browse/BEAM-13060: Daily Python SDK build is not publicly accessible (created 2021-10-15) https://issues.apache.org/jira/browse/BEAM-13059: Migrate GKE workloads to Containerd (created 2021-10-15) https://issues.apache.org/jira/browse/BEAM-13058: Upgrade Kubernetes APIs (created 2021-10-15) https://issues.apache.org/jira/browse/BEAM-13056: DoFnSignature.fieldAccessDeclarations missing implicit accesses (created 2021-10-14) https://issues.apache.org/jira/browse/BEAM-13053: Avoid runner v2 when streaming engine explicitly disabled. (created 2021-10-14) https://issues.apache.org/jira/browse/BEAM-13025: beam_PostCommit_Java_DataflowV2 failing pubsublite.ReadWriteIT (created 2021-10-08) https://issues.apache.org/jira/browse/BEAM-13010: Delete orphaned files (created 2021-10-06) https://issues.apache.org/jira/browse/BEAM-12995: Consumer group with random prefix (created 2021-10-04) https://issues.apache.org/jira/browse/BEAM-12959: Dataflow error in CombinePerKey operation (created 2021-09-26) https://issues.apache.org/jira/browse/BEAM-12867: Either Create or DirectRunner fails to produce all elements to the following transform (created 2021-09-09) https://issues.apache.org/jira/browse/BEAM-12843: (Broken Pipe induced) Bricked Dataflow Pipeline (created 2021-09-06) https://issues.apache.org/jira/browse/BEAM-12818: When writing to GCS, spread prefix of temporary files and reuse autoscaling of the temporary directory (created 2021-08-30) https://issues.apache.org/jira/browse/BEAM-12807: Java creates an incorrect pipeline proto when core-construction-java jar is not in the CLASSPATH (created 2021-08-26) https://issues.apache.org/jira/browse/BEAM-12792: Beam worker only installs --extra_package once (created 2021-08-24) https://issues.apache.org/jira/browse/BEAM-12766: Already Exists: Dataset apache-beam-testing:python_bq_file_loads_NNN (created 2021-08-16) https://issues.apache.org/jira/browse/BEAM-12632: ElasticsearchIO: Enabling both User/Pass auth and SSL overwrites User/Pass (created 2021-07-16) https://issues.apache.org/jira/browse/BEAM-12540: beam_PostRelease_NightlySnapshot - Task :runners:direct-java:runMobileGamingJavaDirect FAILED (created 2021-06-25) https://issues.apache.org/jira/browse/BEAM-12525: SDF BoundedSource seems to execute significantly slower than 'normal' BoundedSource (created 2021-06-22) https://issues.apache.org/jira/browse/BEAM-12505: codecov/patch has poor behavior (created 2021-06-17) https://issues.apache.org/jira/browse/BEAM-12500: Dataflow SocketException (SSLException) error while trying to send message from Cloud Pub/Sub to BigQuery (created 2021-06-16) https://issues.apache.org/jira/browse/BEAM-12484: JdbcIO date conversion is sensitive to OS (created 2021-06-14) https://issues.apache.org/jira/browse/BEAM-12467: java.io.InvalidClassException With Flink Kafka (created 2021-06-09) https://issues.apache.org/jira/browse/BEAM-12279: Implement destination-dependent sharding in FileIO.writeDynamic (created 2021-05-04) https://issues.apache.org/jira/browse/BEAM-12256: PubsubIO.readAvroGenericRecord creates SchemaCoder that fails to decode some Avro logical types (created 2021-04-29) https://issues.apache.org/jira/browse/BEAM-11959: Python Beam SDK Harness hangs when installing pip packages (created 2021-03-11) https://issues.apache.org/jira/browse/BEAM-11906: No trigger early repeatedly for session windows (created 2021-03-01) https://issues.apache.org/jira/browse/BEAM-11875: XmlIO.Read does not handle XML encoding per spec (created 2021-02-26) https://issues.apache.org/jira/browse/BEAM-11828: JmsIO is not acknowledging messages correctly (created 2021-02-17) https://issues.apache.org/jira/browse/BEAM-11755: Cross-language consistency (RequiresStableInputs) is quietly broken (at least on portable flink runner) (created 2021-02-05) https://issues.apache.org/jira/browse/BEAM-11578: `dataflow_metrics` (python) fails with TypeError (when int overflowing?) (created 2021-01-06) https://issues.apache.org/jira/browse/BEAM-11148: Kafka commitOffsetsInFinalize OOM on Flink (created 2020-10-28) https://issues.apache.org/jira/browse/BEAM-11017: Timer with dataflow runner can be set multiple times (dataflow runner) (created 2020-10-05) https://issues.apache.org/jira/browse/BEAM-10670: Make non-portable Splittable DoFn the only option when executing Java "Read" transforms (created 2020-08-10) https://issues