Re: Disabling Jenkins Jobs
Thank you for doing this. Is there a list of jobs that will be disabled? I am particularly curious about: website publishing job (which I need to use manually sometimes) and the job that publishes daily staging builds (which we share with users sometimes.) Thank you. Ahmet On Tue, Sep 12, 2023 at 11:14 AM Danny McCormick via dev < dev@beam.apache.org> wrote: > Hey everyone, I wanted to let you know that as part of the migration from > Jenkins to GitHub Actions we are going to start disabling Jenkins jobs if > they have a corresponding GitHub Actions job that has been running > successfully for a while. We are starting with Yi's PR here - > https://github.com/apache/beam/pull/28316. This is the next step in the > process we kicked off last year [1] now that self-hosted runners have been > in place and working for a while [2]. > > We will not migrate jobs until we've confirmed we have parity with the > existing Jenkins implementations (for example, some jobs are still missing > test publishing and we won't remove the Jenkins version until they have > it). In the meantime, migrating some load off should help reduce the > overall load on Jenkins so that it experiences fewer issues. > > If you have any objections with this approach, please respond here. If you > run into any problems, please file an issue and tag me (@damccorm), Yi > (@abacn), Andrey (@andreydevyatkin), or Vlado (@volatilemolotov) - or just > tag all of us :). > > Thanks, > Danny > > [1] https://lists.apache.org/thread/0brbkmbd522d1ow43gx5b13dmywt2dgn > [2] - https://lists.apache.org/thread/3k1owt5k16byv39b9lszd3l7qv7od4r8 >
Disabling Jenkins Jobs
Hey everyone, I wanted to let you know that as part of the migration from Jenkins to GitHub Actions we are going to start disabling Jenkins jobs if they have a corresponding GitHub Actions job that has been running successfully for a while. We are starting with Yi's PR here - https://github.com/apache/beam/pull/28316. This is the next step in the process we kicked off last year [1] now that self-hosted runners have been in place and working for a while [2]. We will not migrate jobs until we've confirmed we have parity with the existing Jenkins implementations (for example, some jobs are still missing test publishing and we won't remove the Jenkins version until they have it). In the meantime, migrating some load off should help reduce the overall load on Jenkins so that it experiences fewer issues. If you have any objections with this approach, please respond here. If you run into any problems, please file an issue and tag me (@damccorm), Yi (@abacn), Andrey (@andreydevyatkin), or Vlado (@volatilemolotov) - or just tag all of us :). Thanks, Danny [1] https://lists.apache.org/thread/0brbkmbd522d1ow43gx5b13dmywt2dgn [2] - https://lists.apache.org/thread/3k1owt5k16byv39b9lszd3l7qv7od4r8
Re: Contribution of Asgarde: Error Handling for Beam?
Thanks Mazlum, this sounds great. I think there are two ways we can proceed if we decide to integrate the Asgarde library into Beam. (1) Directly import the code into Beam without significant modifications and/or a review (though we may add tests). (2) Go through a design/code review to determine whether this is the best approach for implementing error handling / DLQ in Beam transforms or whether there are other alternatives/modifications to Asgarde we want to consider. If we do (1) I prefer adding Asgarde as a separate Gradle module in Beam. We can later integrate it into the core module after a design/code review. Thank, Cham On Tue, Sep 12, 2023 at 10:26 AM Mazlum TOSUN wrote: > Hello Austin and everyone, > > I am open for discussion. > > My first intention with Asgarde was to help the Beam community, because > Dead Letter Queue is so important in Beam and all the data pipeline > frameworks. > When I worked with Beam on production with my customers, we needed to > catch errors with side outputs and dead letter queue. > > This library really helped us to keep a less verbose code while applying > all the error handling logic, that is error prone and verbose if it is > repeated. > > As Kennet said, my intention was to stay as close as possible to Beam, > with a Wrapper and a Failure Monad on top of a PCollection, to handle all > the code and complexity for try catch blocks and side output. > > For the governance, even if I am the creator of this library, the most > important isn't me but the community and to help the community. > If the best solution to help the community is including the library > directly on Beam, we can go in this direction, with of course your reviews > and recommendations. > > Then the library will belong to the community and we will continue to > improve it. > > For the decision about the best place, I will comply with the majority. > > Best regards, > > Mazlum > > On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett wrote: > >> @Mazlum TOSUN -- you and I have spoken a few >> times about this. it'd be good for you to comment here on list, on any of >> your concerns with governance, and/or other thoughts. Ex: if you think >> contributing asgarde directly is the thing [ or perhaps expressing any >> interest helping write/contribute the relevant functionality into beam ... >> it is possible that by adding the actual functionality into beam - like >> Kenn's mentioned 'other place' we could make asgarde as an separate add-on >> obsolete ]. >> >> >> >> On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles wrote: >> >>> For anyone who hasn't clicked over the Asgarde, my TL;DR description of >>> it is that it adds the "failure monad" aka "andThen" style error/result >>> handling on top of chaining of PCollections. So it is at a similar level of >>> abstraction of our basic transforms and generally useful for chaining >>> dead-letter side outputs. It is no more or less appropriate for the core >>> SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we >>> actually aspired to have a thin core with the accessories like that in >>> another place, then it should go to that other place. >>> >>> Kenn >>> >>> On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev < >>> dev@beam.apache.org> wrote: >>> > until we *require* Asgard on a core transform, it shouldn't be in the main repo I don't think this is necessarily true if it solves end user use cases. If there is a specific transform that solves a specific use case, we could include it in the transforms folder for end-users, even if it isn't utilized in the I/Os at present. Hence the suggestion to take the most promising transforms and propose adding them with documentation, apis and rationale. -Daniel On Fri, Sep 8, 2023 at 11:20 AM Robert Burke wrote: > I would say until we *require* Asgard on a core transform, it > shouldn't be in the main repo. > > Incorporating something before there's a need for it is premature > abstraction. We can't do things because they *might* be useful. Let's see > concrete places where they are useful, or we're already having a similar > need solved a different way. > > Beam is complicated by itself, and we do encourage multiple ways of > solving problems, but that says to me that having an out of repo ecosystem > is the right path, rather than incorporation. > > On Fri, Sep 8, 2023, 8:14 AM Daniel Collins via dev < > dev@beam.apache.org> wrote: > >> I think there are a lot of interesting and relatively isolated >> components of the project, it might make sense to write per-transform one >> pagers for isolated things like the most useful pieces (just basically >> copying the documentation and justifying the API) instead of doing a >> one-shot import or having it live forever in an external project. >> >> -Daniel >> >> On Fri, Sep 8
Re: Contribution of Asgarde: Error Handling for Beam?
Hello Austin and everyone, I am open for discussion. My first intention with Asgarde was to help the Beam community, because Dead Letter Queue is so important in Beam and all the data pipeline frameworks. When I worked with Beam on production with my customers, we needed to catch errors with side outputs and dead letter queue. This library really helped us to keep a less verbose code while applying all the error handling logic, that is error prone and verbose if it is repeated. As Kennet said, my intention was to stay as close as possible to Beam, with a Wrapper and a Failure Monad on top of a PCollection, to handle all the code and complexity for try catch blocks and side output. For the governance, even if I am the creator of this library, the most important isn't me but the community and to help the community. If the best solution to help the community is including the library directly on Beam, we can go in this direction, with of course your reviews and recommendations. Then the library will belong to the community and we will continue to improve it. For the decision about the best place, I will comply with the majority. Best regards, Mazlum On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett wrote: > @Mazlum TOSUN -- you and I have spoken a few > times about this. it'd be good for you to comment here on list, on any of > your concerns with governance, and/or other thoughts. Ex: if you think > contributing asgarde directly is the thing [ or perhaps expressing any > interest helping write/contribute the relevant functionality into beam ... > it is possible that by adding the actual functionality into beam - like > Kenn's mentioned 'other place' we could make asgarde as an separate add-on > obsolete ]. > > > > On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles wrote: > >> For anyone who hasn't clicked over the Asgarde, my TL;DR description of >> it is that it adds the "failure monad" aka "andThen" style error/result >> handling on top of chaining of PCollections. So it is at a similar level of >> abstraction of our basic transforms and generally useful for chaining >> dead-letter side outputs. It is no more or less appropriate for the core >> SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we >> actually aspired to have a thin core with the accessories like that in >> another place, then it should go to that other place. >> >> Kenn >> >> On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev < >> dev@beam.apache.org> wrote: >> >>> > until we *require* Asgard on a core transform, it shouldn't be in the >>> main repo >>> >>> I don't think this is necessarily true if it solves end user use cases. >>> If there is a specific transform that solves a specific use case, we could >>> include it in the transforms folder for end-users, even if it isn't >>> utilized in the I/Os at present. Hence the suggestion to take the most >>> promising transforms and propose adding them with documentation, apis and >>> rationale. >>> >>> -Daniel >>> >>> On Fri, Sep 8, 2023 at 11:20 AM Robert Burke wrote: >>> I would say until we *require* Asgard on a core transform, it shouldn't be in the main repo. Incorporating something before there's a need for it is premature abstraction. We can't do things because they *might* be useful. Let's see concrete places where they are useful, or we're already having a similar need solved a different way. Beam is complicated by itself, and we do encourage multiple ways of solving problems, but that says to me that having an out of repo ecosystem is the right path, rather than incorporation. On Fri, Sep 8, 2023, 8:14 AM Daniel Collins via dev < dev@beam.apache.org> wrote: > I think there are a lot of interesting and relatively isolated > components of the project, it might make sense to write per-transform one > pagers for isolated things like the most useful pieces (just basically > copying the documentation and justifying the API) instead of doing a > one-shot import or having it live forever in an external project. > > -Daniel > > On Fri, Sep 8, 2023 at 11:10 AM Kenneth Knowles > wrote: > >> I agree with everyone about "not everything has to be in the Beam >> repo". I really like the idea of having a clearer "ecosystem" section of >> the website, which is sort of started at >> https://beam.apache.org/community/integrations/ but that is not very >> prominent. >> >> Agree with John though. The transforms in Asgarde could potentially >> be used in Beam. Potentially best accomplished by just adding them as >> transforms to the core Java SDK? >> >> Kenn >> >> On Wed, Sep 6, 2023 at 1:46 PM John Casey via dev < >> dev@beam.apache.org> wrote: >> >>> Agreed on documentation and on keeping it in a separate repo. >>> >>> We have a few pretty significant beam extensions (scio and Datafl
Re: DRAFT - Apache Beam Board Report - September 2023
As you can probably tell, I copy/pasted. There will be no Beam Summit next week :-) On Tue, Sep 12, 2023 at 11:11 AM Kenneth Knowles wrote: > Hi all, > > The next Beam board report is due tomorrow, Wednesday, September 13. > Please help me to draft it at > https://s.apache.org/beam-draft-report-2023-09. The doc is open for > anyone to edit. > > Ideas: > > - highlights from CHANGES.md > - interesting technical discussions > - integrations with other projects > - community events > - major user facing addition/deprecation > - stuff that will be presented at Beam Summit next week :-) > > Past reports are at https://whimsy.apache.org/board/minutes/Beam.html for > examples. > > Thanks, > > Kenn >
DRAFT - Apache Beam Board Report - September 2023
Hi all, The next Beam board report is due tomorrow, Wednesday, September 13. Please help me to draft it at https://s.apache.org/beam-draft-report-2023-09. The doc is open for anyone to edit. Ideas: - highlights from CHANGES.md - interesting technical discussions - integrations with other projects - community events - major user facing addition/deprecation - stuff that will be presented at Beam Summit next week :-) Past reports are at https://whimsy.apache.org/board/minutes/Beam.html for examples. Thanks, Kenn
Beam High Priority Issue Report (42)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/28383 [Failing Test]: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric https://github.com/apache/beam/issues/28339 Fix failing "beam_PostCommit_XVR_GoUsingJava_Dataflow" job https://github.com/apache/beam/issues/28326 Bug: apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working https://github.com/apache/beam/issues/28168 [Bug]: BigQuery Storage Write API does not write with no complaint https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be leaking on 2.49.0 with Dataflow https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. PeriodicImpulse) running in Flink and polling using tracker.defer_remainder have checkpoint size growing indefinitely https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to SchemaCoder after upgrading to 2.48 https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit is failing due to exceeded rate limits https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in FlinkRunner leads to a data loss https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_te