Re: [PROPOSAL] Preparing for 2.50.0 Release
Despite my best efforts, python continues to vex me. RC1 is almost ready, just missing the beam site and doc updates PR, and (optionally) the typescript container. So I'm calling it a night, and will build and send out a partial docs PR in the morning. Robert Burke 2.50.0 Release Manager On Wed, Aug 16, 2023, 8:08 AM Robert Burke wrote: > Just a status update: Branch is cut and tagged > > https://github.com/apache/beam/tree/release-2.50.0 > https://github.com/apache/beam/tree/v2.50.0-RC1 > > I'm working on the remaining bits to have an RC. The github > build-release-artifacts action failed to > build and publish the Java Artifacts and stage the Docker containers. > > The former says: > > Execution failed for task ':sdks:java:io:solr:compileTestJava'. > GC overhead limit exceeded > > The latter is due to a partial application of the Multi-Arch build to the > github actions, that has already been fixed. > > The Dataflow Legacy Java worker and associated containers have been built > and published, and we apologize for the delay this caused. We're discussing > how we presently interleave Google internal processes with the release, and > how we can streamline things now that Dataflow is transitioning to RunnerV2 > by default. In future releases, we may build the non-portable Dataflow Java > workers after the first RC is tagged and the open side is on its way. > > The hope is RC1 will be available tonight. Either way, this thread will be > updated with the status. > > Robert Burke > Beam 2.50.0 Release Manager > > On 2023/08/14 21:51:47 Robert Burke wrote: > > +1 to what XQ says. > > > > There will be a voting email thread once I've done the appropriate due > > diligence to the branch, and finish with the Dataflow artifacts. > > > > Generally speaking, the best validation is something you're using > already, > > to make sure that the new version of Beam works for your usage. > > > > > > On Mon, Aug 14, 2023, 2:41 PM XQ Hu via dev wrote: > > > > > Welcome to the Beam community! Our release managers usually follow this > > > > https://beam.apache.org/contribute/release-guide/#10-vote-and-validate-release-candidate > > > to send the votes out and ask for any feedback regarding the release > > > candidate. If you could help run any validation on your side and cast > your > > > vote, it would be greatly appreciated and helpful for the community. > > > > > > On Mon, Aug 14, 2023 at 12:23 PM Hong wrote: > > > > > >> I see, thanks for clarifying, Robert! > > >> > > >> Is there anything I can help with validation? Is there a wiki page > with > > >> the expected validations I can help with? > > >> > > >> Best > > >> Hong > > >> > > >> On 14 Aug 2023, at 14:34, Robert Burke wrote: > > >> > > >> > > >> The release branch was cut. Before yhe weekend, I was working on > getting > > >> the non-portable Dataflow Java worker built and available before > producing > > >> the RC1. The actual building bit doesn't take that long, but there's a > > >> bunch of additional validation that goes along with it. > > >> > > >> The current target date for 2.50.0 is September 13th, but ultimately > it's > > >> as soon as we have a validated and voted on RC. > > >> > > >> On Mon, Aug 14, 2023, 3:43 AM Hong Liang wrote: > > >> > > >>> Thanks for driving this Robert! > > >>> > > >>> It seems the two PRs specified have been merged. A little new to > Beam, > > >>> do we have an expected release date for the 2.50 release? > > >>> > > >>> Best, > > >>> Hong > > >>> > > >>> On Thu, Aug 10, 2023 at 3:08 AM Robert Burke > > >>> wrote: > > >>> > > I'm in the process of producing the Cut branch, but due to various > > delays on my part, it will not be cut today. > > > > There are two outstanding PRs blocking the cut, > > https://github.com/apache/beam/pull/27947 and > > https://github.com/apache/beam/pull/27939, but once those are in, > I'll > > proceed. Remaining new issues will be cherry picked as required. > > > > Thanks > > Robert Burke > > Beam 2.50.0 Release Manager > > > > On 2023/07/26 15:49:37 Robert Burke wrote: > > > Hey Beam community, > > > > > > The next release (2.50.0) branch cut is scheduled on August 9th, > 2023, > > > according to > > > the release calendar [1]. > > > > > > I volunteer to perform this release. My plan is to cut the branch > on > > that > > > date, and cherrypick release-blocking fixes afterwards, if any. > > > > > > Please help me make sure the release goes smoothly by: > > > - Making sure that any unresolved release blocking issues for > 2.50.0 > > should > > > have their "Milestone" marked as "2.50.0 Release" as soon as > possible. > > > - Reviewing the current release blockers [2] and remove the > Milestone > > if > > > they don't meet the criteria at [3]. > > > > > > Let me know if you have any comments/objections/questions. > > > > > >
[Request for Feedback] Swift SDK Prototype
Hello everyone, A couple of months ago I decided that I wanted to really understand how the Beam FnApi works and how it interacts with the Portable Runner. For me at least that usually means I need to write some code so I can see things happening in a debugger and to really prove to myself I understood what was going on I decided I couldn't use an existing SDK language to do it since there would be the temptation to read some code and convince myself that I actually understood what was going on. One thing led to another and it turns out that to get a minimal FnApi integration going you end up writing a fair bit of an SDK. So I decided to take things to a point where I had an SDK that could execute a word count example via a portable runner backend. I've now reached that point and would like to submit my prototype SDK to the list for feedback. It's currently living in a branch on my fork here: https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift At the moment it runs via the most recent XCode Beta using Swift 5.9 on Intel Macs, but should also work using beta builds of 5.9 for Linux running on Intel hardware. I haven't had a chance to try it on ARM hardware and make sure all of the endian checks are complete. The "IntegrationTests.swift" file contains a word count example that reads some local files (as well as a missing file to exercise DLQ functionality) and output counts through two separate group by operations to get it past the "map reduce" size of pipeline. I've tested it against the Python Portable Runner. Since my goal was to learn FnApi there is no Direct Runner at this time. I've shown it to a couple of folks already and incorporated some of that feedback already (for example pardo was originally called dofn when defining pipelines). In general I've tried to make the API as "Swift-y" as possible, hence the heavy reliance on closures and while there aren't yet composite PTransforms there's the beginnings of what would be needed for a SwiftUI-like declarative API for creating them. There are of course a ton of missing bits still to be implemented, like counters, metrics, windowing, state, timers, etc. Any and all feedback welcome and happy to submit a PR if folks are interested, though the "Swift Way" would be to have it in its own repo so that it can easily be used from the Swift Package Manager. Best, B
Re: [RFC] Bootloader Buffered Logging
Thanks, Jack! left some comments, looking forward to this work! On Wed, Aug 16, 2023 at 10:31 AM Robert Burke wrote: > I've added some comments but generally +1 on this. > > A later change might be able to build from this to ensure the various > STDErr and STDOut logs from the SDK harness executions are always plumbed > as described. > > But that would take more thought since other incidental logs from the > users worker binary (sic) might be misconstrued as serious when they were > largely benign noise previously ignored (since they were invisible). > > On Wed, Aug 16, 2023, 9:57 AM Jack McCluskey via dev > wrote: > >> Hey everyone, >> >> I've written a small design doc around implementing some buffered logging >> for the Beam boot.go scripts that is available at >> https://s.apache.org/beam-buffered-logging. This should help surface >> errors that occur during worker set-up (like issues with dependency >> installation) that tend to be logged improperly at INFO. >> >> Thanks, >> >> Jack McCluskey >> >> -- >> >> >> Jack McCluskey >> SWE - DataPLS PLAT/ Dataflow ML >> RDU >> jrmcclus...@google.com >> >> >>
Re: [RFC] Bootloader Buffered Logging
I've added some comments but generally +1 on this. A later change might be able to build from this to ensure the various STDErr and STDOut logs from the SDK harness executions are always plumbed as described. But that would take more thought since other incidental logs from the users worker binary (sic) might be misconstrued as serious when they were largely benign noise previously ignored (since they were invisible). On Wed, Aug 16, 2023, 9:57 AM Jack McCluskey via dev wrote: > Hey everyone, > > I've written a small design doc around implementing some buffered logging > for the Beam boot.go scripts that is available at > https://s.apache.org/beam-buffered-logging. This should help surface > errors that occur during worker set-up (like issues with dependency > installation) that tend to be logged improperly at INFO. > > Thanks, > > Jack McCluskey > > -- > > > Jack McCluskey > SWE - DataPLS PLAT/ Dataflow ML > RDU > jrmcclus...@google.com > > >
[RFC] Bootloader Buffered Logging
Hey everyone, I've written a small design doc around implementing some buffered logging for the Beam boot.go scripts that is available at https://s.apache.org/beam-buffered-logging. This should help surface errors that occur during worker set-up (like issues with dependency installation) that tend to be logged improperly at INFO. Thanks, Jack McCluskey -- Jack McCluskey SWE - DataPLS PLAT/ Dataflow ML RDU jrmcclus...@google.com
Re: [PROPOSAL] Preparing for 2.50.0 Release
Just a status update: Branch is cut and tagged https://github.com/apache/beam/tree/release-2.50.0 https://github.com/apache/beam/tree/v2.50.0-RC1 I'm working on the remaining bits to have an RC. The github build-release-artifacts action failed to build and publish the Java Artifacts and stage the Docker containers. The former says: Execution failed for task ':sdks:java:io:solr:compileTestJava'. GC overhead limit exceeded The latter is due to a partial application of the Multi-Arch build to the github actions, that has already been fixed. The Dataflow Legacy Java worker and associated containers have been built and published, and we apologize for the delay this caused. We're discussing how we presently interleave Google internal processes with the release, and how we can streamline things now that Dataflow is transitioning to RunnerV2 by default. In future releases, we may build the non-portable Dataflow Java workers after the first RC is tagged and the open side is on its way. The hope is RC1 will be available tonight. Either way, this thread will be updated with the status. Robert Burke Beam 2.50.0 Release Manager On 2023/08/14 21:51:47 Robert Burke wrote: > +1 to what XQ says. > > There will be a voting email thread once I've done the appropriate due > diligence to the branch, and finish with the Dataflow artifacts. > > Generally speaking, the best validation is something you're using already, > to make sure that the new version of Beam works for your usage. > > > On Mon, Aug 14, 2023, 2:41 PM XQ Hu via dev wrote: > > > Welcome to the Beam community! Our release managers usually follow this > > https://beam.apache.org/contribute/release-guide/#10-vote-and-validate-release-candidate > > to send the votes out and ask for any feedback regarding the release > > candidate. If you could help run any validation on your side and cast your > > vote, it would be greatly appreciated and helpful for the community. > > > > On Mon, Aug 14, 2023 at 12:23 PM Hong wrote: > > > >> I see, thanks for clarifying, Robert! > >> > >> Is there anything I can help with validation? Is there a wiki page with > >> the expected validations I can help with? > >> > >> Best > >> Hong > >> > >> On 14 Aug 2023, at 14:34, Robert Burke wrote: > >> > >> > >> The release branch was cut. Before yhe weekend, I was working on getting > >> the non-portable Dataflow Java worker built and available before producing > >> the RC1. The actual building bit doesn't take that long, but there's a > >> bunch of additional validation that goes along with it. > >> > >> The current target date for 2.50.0 is September 13th, but ultimately it's > >> as soon as we have a validated and voted on RC. > >> > >> On Mon, Aug 14, 2023, 3:43 AM Hong Liang wrote: > >> > >>> Thanks for driving this Robert! > >>> > >>> It seems the two PRs specified have been merged. A little new to Beam, > >>> do we have an expected release date for the 2.50 release? > >>> > >>> Best, > >>> Hong > >>> > >>> On Thu, Aug 10, 2023 at 3:08 AM Robert Burke > >>> wrote: > >>> > I'm in the process of producing the Cut branch, but due to various > delays on my part, it will not be cut today. > > There are two outstanding PRs blocking the cut, > https://github.com/apache/beam/pull/27947 and > https://github.com/apache/beam/pull/27939, but once those are in, I'll > proceed. Remaining new issues will be cherry picked as required. > > Thanks > Robert Burke > Beam 2.50.0 Release Manager > > On 2023/07/26 15:49:37 Robert Burke wrote: > > Hey Beam community, > > > > The next release (2.50.0) branch cut is scheduled on August 9th, 2023, > > according to > > the release calendar [1]. > > > > I volunteer to perform this release. My plan is to cut the branch on > that > > date, and cherrypick release-blocking fixes afterwards, if any. > > > > Please help me make sure the release goes smoothly by: > > - Making sure that any unresolved release blocking issues for 2.50.0 > should > > have their "Milestone" marked as "2.50.0 Release" as soon as possible. > > - Reviewing the current release blockers [2] and remove the Milestone > if > > they don't meet the criteria at [3]. > > > > Let me know if you have any comments/objections/questions. > > > > Thanks, > > > > Robert Burke (he/him) > > Beam Go Busybody > > > > [1] > > > https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com > > [2] https://github.com/apache/beam/milestone/14 > > [3] https://beam.apache.org/contribute/release-blocking/ > > > > >>> >
Beam High Priority Issue Report (39)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. PeriodicImpulse) running in Flink and polling using tracker.defer_remainder have checkpoint size growing indefinitely https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to SchemaCoder after upgrading to 2.48 https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit is failing due to exceeded rate limits https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not reading all rows when set --setEnableBundling=true https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in FlinkRunner leads to a data loss https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, testBigQueryStorageWrite30MProto failing consistently https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit data at GC time https://github.com/apache/beam/issues/21121 apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it flakey https://github.com/apache/beam/issues/21104 Flaky: apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers https://github.com/apache/beam/issues/20976 apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics is flaky https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit empty pane when it should https://github.com/apache/beam/issues/19814 Flink streaming flakes in ParDoLifecycleTest.testTeardownCalledAfterException