Re: [VOTE] Release 2.52.0, release candidate #5
+1 (non binding) Tested the Python SDK RC5 using the ML pipeline under https://github.com/google/dataflow-ml-starter https://github.com/google/dataflow-ml-starter/actions/runs/6898545809/job/18768732434 ran well. On Thu, Nov 16, 2023 at 7:46 PM Robert Bradshaw via dev wrote: > +1 (binding) > > The artifacts all look good, as does Python installation into a fresh > environment. > > > On Thu, Nov 16, 2023 at 2:41 PM Svetak Sundhar via dev < > dev@beam.apache.org> wrote: > >> +1 (non binding) >> >> validated on python use cases. >> >> >> Svetak Sundhar >> >> Data Engineer >> s vetaksund...@google.com >> >> >> >> On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský wrote: >> >>> +1 (binding) >>> >>> Validated Java SDK with Flink runner on own use cases. >>> >>> Jan >>> >>> On 11/15/23 11:35, Jean-Baptiste Onofré wrote: >>> > +1 (binding) >>> > >>> > Quickly tested Java SDK and checked the legal part (hash, signatures, >>> headers). >>> > >>> > Regards >>> > JB >>> > >>> > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev >>> > wrote: >>> >> Hi everyone, >>> >> Please review and vote on the release candidate #5 for the version >>> 2.52.0, as follows: >>> >> [ ] +1, Approve the release >>> >> [ ] -1, Do not approve the release (please provide specific comments) >>> >> >>> >> >>> >> Reviewers are encouraged to test their own use cases with the release >>> candidate, and vote +1 if no issues are found. Only PMC member votes will >>> count towards the final vote, but votes from all community members is >>> encouraged and helpful for finding regressions; you can either test your >>> own use cases or use cases from the validation sheet [10]. >>> >> >>> >> The complete staging area is available for your review, which >>> includes: >>> >> >>> >> GitHub Release notes [1] >>> >> the official Apache source release to be deployed to dist.apache.org >>> [2], which is signed with the key with fingerprint D20316F712213422 [3] >>> >> all artifacts to be deployed to the Maven Central Repository [4] >>> >> source code tag "v2.52.0-RC5" [5] >>> >> website pull request listing the release [6], the blog post [6], and >>> publishing the API reference manual [7] >>> >> Python artifacts are deployed along with the source release to the >>> dist.apache.org [2] and PyPI[8]. >>> >> Go artifacts and documentation are available at pkg.go.dev [9] >>> >> Validation sheet with a tab for 2.52.0 release to help with >>> validation [10] >>> >> Docker images published to Docker Hub [11] >>> >> PR to run tests against release branch [12] >>> >> >>> >> >>> >> The vote will be open for at least 72 hours. It is adopted by >>> majority approval, with at least 3 PMC affirmative votes. >>> >> >>> >> For guidelines on how to try the release in your projects, check out >>> our blog post at https://beam.apache.org/blog/validate-beam-release/. >>> >> >>> >> Thanks, >>> >> Danny >>> >> >>> >> [1] https://github.com/apache/beam/milestone/16 >>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/ >>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS >>> >> [4] >>> https://repository.apache.org/content/repositories/orgapachebeam-1363/ >>> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5 >>> >> [6] https://github.com/apache/beam/pull/29331 >>> >> [7] https://github.com/apache/beam-site/pull/655 >>> >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/ >>> >> [9] >>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam >>> >> [10] >>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510 >>> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image >>> >> [12] https://github.com/apache/beam/pull/29418 >>> >>
Re: [VOTE] Release 2.52.0, release candidate #5
+1 (binding) The artifacts all look good, as does Python installation into a fresh environment. On Thu, Nov 16, 2023 at 2:41 PM Svetak Sundhar via dev wrote: > +1 (non binding) > > validated on python use cases. > > > Svetak Sundhar > > Data Engineer > s vetaksund...@google.com > > > > On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský wrote: > >> +1 (binding) >> >> Validated Java SDK with Flink runner on own use cases. >> >> Jan >> >> On 11/15/23 11:35, Jean-Baptiste Onofré wrote: >> > +1 (binding) >> > >> > Quickly tested Java SDK and checked the legal part (hash, signatures, >> headers). >> > >> > Regards >> > JB >> > >> > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev >> > wrote: >> >> Hi everyone, >> >> Please review and vote on the release candidate #5 for the version >> 2.52.0, as follows: >> >> [ ] +1, Approve the release >> >> [ ] -1, Do not approve the release (please provide specific comments) >> >> >> >> >> >> Reviewers are encouraged to test their own use cases with the release >> candidate, and vote +1 if no issues are found. Only PMC member votes will >> count towards the final vote, but votes from all community members is >> encouraged and helpful for finding regressions; you can either test your >> own use cases or use cases from the validation sheet [10]. >> >> >> >> The complete staging area is available for your review, which includes: >> >> >> >> GitHub Release notes [1] >> >> the official Apache source release to be deployed to dist.apache.org >> [2], which is signed with the key with fingerprint D20316F712213422 [3] >> >> all artifacts to be deployed to the Maven Central Repository [4] >> >> source code tag "v2.52.0-RC5" [5] >> >> website pull request listing the release [6], the blog post [6], and >> publishing the API reference manual [7] >> >> Python artifacts are deployed along with the source release to the >> dist.apache.org [2] and PyPI[8]. >> >> Go artifacts and documentation are available at pkg.go.dev [9] >> >> Validation sheet with a tab for 2.52.0 release to help with validation >> [10] >> >> Docker images published to Docker Hub [11] >> >> PR to run tests against release branch [12] >> >> >> >> >> >> The vote will be open for at least 72 hours. It is adopted by majority >> approval, with at least 3 PMC affirmative votes. >> >> >> >> For guidelines on how to try the release in your projects, check out >> our blog post at https://beam.apache.org/blog/validate-beam-release/. >> >> >> >> Thanks, >> >> Danny >> >> >> >> [1] https://github.com/apache/beam/milestone/16 >> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/ >> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS >> >> [4] >> https://repository.apache.org/content/repositories/orgapachebeam-1363/ >> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5 >> >> [6] https://github.com/apache/beam/pull/29331 >> >> [7] https://github.com/apache/beam-site/pull/655 >> >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/ >> >> [9] >> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam >> >> [10] >> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510 >> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image >> >> [12] https://github.com/apache/beam/pull/29418 >> >
Re: [VOTE] Release 2.52.0, release candidate #5
+1 (non binding) validated on python use cases. Svetak Sundhar Data Engineer s vetaksund...@google.com On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský wrote: > +1 (binding) > > Validated Java SDK with Flink runner on own use cases. > > Jan > > On 11/15/23 11:35, Jean-Baptiste Onofré wrote: > > +1 (binding) > > > > Quickly tested Java SDK and checked the legal part (hash, signatures, > headers). > > > > Regards > > JB > > > > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev > > wrote: > >> Hi everyone, > >> Please review and vote on the release candidate #5 for the version > 2.52.0, as follows: > >> [ ] +1, Approve the release > >> [ ] -1, Do not approve the release (please provide specific comments) > >> > >> > >> Reviewers are encouraged to test their own use cases with the release > candidate, and vote +1 if no issues are found. Only PMC member votes will > count towards the final vote, but votes from all community members is > encouraged and helpful for finding regressions; you can either test your > own use cases or use cases from the validation sheet [10]. > >> > >> The complete staging area is available for your review, which includes: > >> > >> GitHub Release notes [1] > >> the official Apache source release to be deployed to dist.apache.org > [2], which is signed with the key with fingerprint D20316F712213422 [3] > >> all artifacts to be deployed to the Maven Central Repository [4] > >> source code tag "v2.52.0-RC5" [5] > >> website pull request listing the release [6], the blog post [6], and > publishing the API reference manual [7] > >> Python artifacts are deployed along with the source release to the > dist.apache.org [2] and PyPI[8]. > >> Go artifacts and documentation are available at pkg.go.dev [9] > >> Validation sheet with a tab for 2.52.0 release to help with validation > [10] > >> Docker images published to Docker Hub [11] > >> PR to run tests against release branch [12] > >> > >> > >> The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > >> > >> For guidelines on how to try the release in your projects, check out > our blog post at https://beam.apache.org/blog/validate-beam-release/. > >> > >> Thanks, > >> Danny > >> > >> [1] https://github.com/apache/beam/milestone/16 > >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/ > >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS > >> [4] > https://repository.apache.org/content/repositories/orgapachebeam-1363/ > >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5 > >> [6] https://github.com/apache/beam/pull/29331 > >> [7] https://github.com/apache/beam-site/pull/655 > >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/ > >> [9] > https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam > >> [10] > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510 > >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image > >> [12] https://github.com/apache/beam/pull/29418 >
Re: Hiding logging for beam playground examples
Good idea though it ended up being a shallow trace ``` File "/opt/playground/backend/executable_files/91e3e49b-7197-4252-a8bd-93c5b252ed55/91e3e49b-7197-4252-a8bd-93c5b252ed55.py", line 57, in assert False ``` I think I found where the log level is set anyways ``` https://github.com/apache/beam/blob/master/playground/infrastructure/logger.py#L39 ``` When I have some time, I'll try doing a local deployment of playground and modifying those log levels On Wed, Nov 15, 2023 at 10:52 PM Valentyn Tymofieiev wrote: > I am also not familiar with Playground. I suspect you could try to make it > crash and maybe find a stacktrace? Setting logging could like like so: > https://github.com/apache/beam/blob/729c4de416b8252ec99f0a1253ac7af3023733df/sdks/python/apache_beam/examples/wordcount.py#L110 > > On Wed, Nov 15, 2023 at 12:06 PM Joey Tran > wrote: > >> The motivating example does not use LogElements, just Map(print) >> >> https://beam.apache.org/documentation/transforms/python/aggregation/combineglobally/#example-2-combining-with-a-lambda-function >> >> Some examples of the extraneous logging: >> ``` >> 2023-09-08 22:46:37,334 [INFO] > populate_data_channel_coders at 0x7ff2665e1a20> >> 2023-09-08 22:46:37,336 [INFO] Creating state cache with size 104857600 >> 2023-09-08 22:46:37,338 [INFO] Created Worker handler >> > object at 0x7ff2664c9870> for environment >> ref_Environment_default_environment_2 (beam:env:embedded_python:v1, b'') >> ``` >> >> The example code itself doesn't set the log level in some playground >> code. Does anyone have a pointer to where? I'm not familiar >> >> On Wed, Nov 15, 2023 at 2:10 PM Valentyn Tymofieiev via dev < >> dev@beam.apache.org> wrote: >> >>> Are the examples using LogElements? >>> https://github.com/apache/beam/blob/2012107a0fa2bb3fedf1b5aedcb49445534b2dad/sdks/python/apache_beam/transforms/util.py#L1271 >>> >>> Note that LogElements by default prints to stdout, but can be configured >>> to use a different logger. We could also change the default. >>> >>> On Tue, Nov 14, 2023 at 9:48 AM Robert Bradshaw via dev < >>> dev@beam.apache.org> wrote: >>> +1 to at least setting the log level to higher than info. Some runner logging (e.g. job started/done) may be useful. On Tue, Nov 14, 2023 at 9:37 AM Joey Tran wrote: > > Hi all, > > I just had a workshop to demo beam for people at my company and there was a bit of confusion about whether the beam python playground examples were even working and it turned out they just got confused by all the runner logging that is output. > > Is this worth keeping? It seems like it'd be a common source of confusion for new users > > Cheers, > Joey >>>
Beam High Priority Issue Report (50)
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/29429 [Failing Test]: Jenkins job beam_PostCommit_Java_Nexmark_Flink not being finished https://github.com/apache/beam/issues/29413 [Bug]: Can not use Avro over 1.8.2 with Beam 2.52.0 https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness doesn't update user counters in OnTimer callback functions https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github actions tests are failing due to update of pip https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader provided by apache beam does not pick the event time for watermarking https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get stuck for large jobs due to client dead lock https://github.com/apache/beam/issues/28410 Support new versions of pyarrow in apache-beam https://github.com/apache/beam/issues/28383 [Failing Test]: org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric https://github.com/apache/beam/issues/28339 Fix failing "beam_PostCommit_XVR_GoUsingJava_Dataflow" job https://github.com/apache/beam/issues/28326 Bug: apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be leaking on 2.49.0 with Dataflow https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not working when using CreateDisposition.CREATE_IF_NEEDED https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. PeriodicImpulse) running in Flink and polling using tracker.defer_remainder have checkpoint size growing indefinitely https://github.com/apache/beam/issues/27616 [Bug]: Unable to use applyRowMutations() in bigquery IO apache beam java https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with inequality filters https://github.com/apache/beam/issues/27314 [Failing Test]: bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1] https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when using Kafka and GroupByKey on Dataflow Runner https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested ROW (described below) https://github.com/apache/beam/issues/26343 [Bug]: apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is flaky https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not propagate a Coder to AvroSource https://github.com/apache/beam/issues/26041 [Bug]: Unable to create exactly-once Flink pipeline with stream source and file sink https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK Harness ProcessBundleProgress https://github.com/apache/beam/issues/24389 [Failing Test]: HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError ContainerFetchException https://github.com/apache/beam/issues/24313 [Flaky]: apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder https://github.com/apache/beam/issues/23944 beam_PreCommit_Python_Cron regularily failing - test_pardo_large_input flaky https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder will drop message id and orderingKey https://github.com/apache/beam/issues/22913 [Bug]: beam_PostCommit_Java_ValidatesRunner_Flink is flakes in org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it https://github.com/apache/beam/issues/21714 PulsarIOTest.testReadFromSimpleTopic is very flaky https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit test action StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial (order 1000 elements) numpy input flakes in non-cython environment https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table destinations returns wrong tableId https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: Connection refused https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) failing: ParDoTest$TimestampTests/OnWindowExpirationTests https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not follow spec https://github.com/apache/beam/issues/21260 Python