Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-16 Thread XQ Hu via dev
+1 (non binding) Tested the Python SDK RC5 using the ML pipeline under
https://github.com/google/dataflow-ml-starter
https://github.com/google/dataflow-ml-starter/actions/runs/6898545809/job/18768732434
ran well.

On Thu, Nov 16, 2023 at 7:46 PM Robert Bradshaw via dev 
wrote:

> +1 (binding)
>
> The artifacts all look good, as does Python installation into a fresh
> environment.
>
>
> On Thu, Nov 16, 2023 at 2:41 PM Svetak Sundhar via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non binding)
>>
>> validated on python use cases.
>>
>>
>> Svetak Sundhar
>>
>>   Data Engineer
>> s vetaksund...@google.com
>>
>>
>>
>> On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský  wrote:
>>
>>> +1 (binding)
>>>
>>> Validated Java SDK with Flink runner on own use cases.
>>>
>>>   Jan
>>>
>>> On 11/15/23 11:35, Jean-Baptiste Onofré wrote:
>>> > +1 (binding)
>>> >
>>> > Quickly tested Java SDK and checked the legal part (hash, signatures,
>>> headers).
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev
>>> >  wrote:
>>> >> Hi everyone,
>>> >> Please review and vote on the release candidate #5 for the version
>>> 2.52.0, as follows:
>>> >> [ ] +1, Approve the release
>>> >> [ ] -1, Do not approve the release (please provide specific comments)
>>> >>
>>> >>
>>> >> Reviewers are encouraged to test their own use cases with the release
>>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>>> count towards the final vote, but votes from all community members is
>>> encouraged and helpful for finding regressions; you can either test your
>>> own use cases or use cases from the validation sheet [10].
>>> >>
>>> >> The complete staging area is available for your review, which
>>> includes:
>>> >>
>>> >> GitHub Release notes [1]
>>> >> the official Apache source release to be deployed to dist.apache.org
>>> [2], which is signed with the key with fingerprint D20316F712213422 [3]
>>> >> all artifacts to be deployed to the Maven Central Repository [4]
>>> >> source code tag "v2.52.0-RC5" [5]
>>> >> website pull request listing the release [6], the blog post [6], and
>>> publishing the API reference manual [7]
>>> >> Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2] and PyPI[8].
>>> >> Go artifacts and documentation are available at pkg.go.dev [9]
>>> >> Validation sheet with a tab for 2.52.0 release to help with
>>> validation [10]
>>> >> Docker images published to Docker Hub [11]
>>> >> PR to run tests against release branch [12]
>>> >>
>>> >>
>>> >> The vote will be open for at least 72 hours. It is adopted by
>>> majority approval, with at least 3 PMC affirmative votes.
>>> >>
>>> >> For guidelines on how to try the release in your projects, check out
>>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> >>
>>> >> Thanks,
>>> >> Danny
>>> >>
>>> >> [1] https://github.com/apache/beam/milestone/16
>>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> >> [4]
>>> https://repository.apache.org/content/repositories/orgapachebeam-1363/
>>> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
>>> >> [6] https://github.com/apache/beam/pull/29331
>>> >> [7] https://github.com/apache/beam-site/pull/655
>>> >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
>>> >> [9]
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
>>> >> [10]
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>>> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>>> >> [12] https://github.com/apache/beam/pull/29418
>>>
>>


Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-16 Thread Robert Bradshaw via dev
+1 (binding)

The artifacts all look good, as does Python installation into a fresh
environment.


On Thu, Nov 16, 2023 at 2:41 PM Svetak Sundhar via dev 
wrote:

> +1 (non binding)
>
> validated on python use cases.
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>
>
> On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský  wrote:
>
>> +1 (binding)
>>
>> Validated Java SDK with Flink runner on own use cases.
>>
>>   Jan
>>
>> On 11/15/23 11:35, Jean-Baptiste Onofré wrote:
>> > +1 (binding)
>> >
>> > Quickly tested Java SDK and checked the legal part (hash, signatures,
>> headers).
>> >
>> > Regards
>> > JB
>> >
>> > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev
>> >  wrote:
>> >> Hi everyone,
>> >> Please review and vote on the release candidate #5 for the version
>> 2.52.0, as follows:
>> >> [ ] +1, Approve the release
>> >> [ ] -1, Do not approve the release (please provide specific comments)
>> >>
>> >>
>> >> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found. Only PMC member votes will
>> count towards the final vote, but votes from all community members is
>> encouraged and helpful for finding regressions; you can either test your
>> own use cases or use cases from the validation sheet [10].
>> >>
>> >> The complete staging area is available for your review, which includes:
>> >>
>> >> GitHub Release notes [1]
>> >> the official Apache source release to be deployed to dist.apache.org
>> [2], which is signed with the key with fingerprint D20316F712213422 [3]
>> >> all artifacts to be deployed to the Maven Central Repository [4]
>> >> source code tag "v2.52.0-RC5" [5]
>> >> website pull request listing the release [6], the blog post [6], and
>> publishing the API reference manual [7]
>> >> Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> >> Go artifacts and documentation are available at pkg.go.dev [9]
>> >> Validation sheet with a tab for 2.52.0 release to help with validation
>> [10]
>> >> Docker images published to Docker Hub [11]
>> >> PR to run tests against release branch [12]
>> >>
>> >>
>> >> The vote will be open for at least 72 hours. It is adopted by majority
>> approval, with at least 3 PMC affirmative votes.
>> >>
>> >> For guidelines on how to try the release in your projects, check out
>> our blog post at https://beam.apache.org/blog/validate-beam-release/.
>> >>
>> >> Thanks,
>> >> Danny
>> >>
>> >> [1] https://github.com/apache/beam/milestone/16
>> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
>> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> >> [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1363/
>> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
>> >> [6] https://github.com/apache/beam/pull/29331
>> >> [7] https://github.com/apache/beam-site/pull/655
>> >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
>> >> [9]
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
>> >> [10]
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
>> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
>> >> [12] https://github.com/apache/beam/pull/29418
>>
>


Re: [VOTE] Release 2.52.0, release candidate #5

2023-11-16 Thread Svetak Sundhar via dev
+1 (non binding)

validated on python use cases.


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com



On Wed, Nov 15, 2023 at 8:52 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Validated Java SDK with Flink runner on own use cases.
>
>   Jan
>
> On 11/15/23 11:35, Jean-Baptiste Onofré wrote:
> > +1 (binding)
> >
> > Quickly tested Java SDK and checked the legal part (hash, signatures,
> headers).
> >
> > Regards
> > JB
> >
> > On Tue, Nov 14, 2023 at 12:06 AM Danny McCormick via dev
> >  wrote:
> >> Hi everyone,
> >> Please review and vote on the release candidate #5 for the version
> 2.52.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >>
> >> Reviewers are encouraged to test their own use cases with the release
> candidate, and vote +1 if no issues are found. Only PMC member votes will
> count towards the final vote, but votes from all community members is
> encouraged and helpful for finding regressions; you can either test your
> own use cases or use cases from the validation sheet [10].
> >>
> >> The complete staging area is available for your review, which includes:
> >>
> >> GitHub Release notes [1]
> >> the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint D20316F712213422 [3]
> >> all artifacts to be deployed to the Maven Central Repository [4]
> >> source code tag "v2.52.0-RC5" [5]
> >> website pull request listing the release [6], the blog post [6], and
> publishing the API reference manual [7]
> >> Python artifacts are deployed along with the source release to the
> dist.apache.org [2] and PyPI[8].
> >> Go artifacts and documentation are available at pkg.go.dev [9]
> >> Validation sheet with a tab for 2.52.0 release to help with validation
> [10]
> >> Docker images published to Docker Hub [11]
> >> PR to run tests against release branch [12]
> >>
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >>
> >> For guidelines on how to try the release in your projects, check out
> our blog post at https://beam.apache.org/blog/validate-beam-release/.
> >>
> >> Thanks,
> >> Danny
> >>
> >> [1] https://github.com/apache/beam/milestone/16
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.52.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1363/
> >> [5] https://github.com/apache/beam/tree/v2.52.0-RC5
> >> [6] https://github.com/apache/beam/pull/29331
> >> [7] https://github.com/apache/beam-site/pull/655
> >> [8] https://pypi.org/project/apache-beam/2.52.0rc5/
> >> [9]
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.52.0-RC5/go/pkg/beam
> >> [10]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1387982510
> >> [11] https://hub.docker.com/search?q=apache%2Fbeam=image
> >> [12] https://github.com/apache/beam/pull/29418
>


Re: Hiding logging for beam playground examples

2023-11-16 Thread Joey Tran
Good idea though it ended up being a shallow trace
```
  File
"/opt/playground/backend/executable_files/91e3e49b-7197-4252-a8bd-93c5b252ed55/91e3e49b-7197-4252-a8bd-93c5b252ed55.py",
line 57, in 
assert False
```
I think I found where the log level is set anyways
```
https://github.com/apache/beam/blob/master/playground/infrastructure/logger.py#L39
```

When I have some time, I'll try doing a local deployment of playground and
modifying those log levels

On Wed, Nov 15, 2023 at 10:52 PM Valentyn Tymofieiev 
wrote:

> I am also not familiar with Playground. I suspect you could try to make it
> crash and maybe find a stacktrace? Setting logging could like like so:
> https://github.com/apache/beam/blob/729c4de416b8252ec99f0a1253ac7af3023733df/sdks/python/apache_beam/examples/wordcount.py#L110
>
> On Wed, Nov 15, 2023 at 12:06 PM Joey Tran 
> wrote:
>
>> The motivating example does not use LogElements, just Map(print)
>>
>> https://beam.apache.org/documentation/transforms/python/aggregation/combineglobally/#example-2-combining-with-a-lambda-function
>>
>> Some examples of the extraneous logging:
>> ```
>> 2023-09-08 22:46:37,334 [INFO]  > populate_data_channel_coders at 0x7ff2665e1a20> 
>> 2023-09-08 22:46:37,336 [INFO] Creating state cache with size 104857600
>> 2023-09-08 22:46:37,338 [INFO] Created Worker handler
>> > object at 0x7ff2664c9870> for environment
>> ref_Environment_default_environment_2 (beam:env:embedded_python:v1, b'')
>> ```
>>
>> The example code itself doesn't set the log level in some playground
>> code. Does anyone have a pointer to where? I'm not familiar
>>
>> On Wed, Nov 15, 2023 at 2:10 PM Valentyn Tymofieiev via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Are the examples using LogElements?
>>> https://github.com/apache/beam/blob/2012107a0fa2bb3fedf1b5aedcb49445534b2dad/sdks/python/apache_beam/transforms/util.py#L1271
>>>
>>> Note that LogElements by default prints to stdout, but can be configured
>>> to use a different logger. We could also change the default.
>>>
>>> On Tue, Nov 14, 2023 at 9:48 AM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 to at least setting the log level to higher than info. Some runner
 logging (e.g. job started/done) may be useful.

 On Tue, Nov 14, 2023 at 9:37 AM Joey Tran 
 wrote:
 >
 > Hi all,
 >
 > I just had a workshop to demo beam for people at my company and there
 was a bit of confusion about whether the beam python playground examples
 were even working and it turned out they just got confused by all the
 runner logging that is output.
 >
 > Is this worth keeping? It seems like it'd be a common source of
 confusion for new users
 >
 > Cheers,
 > Joey

>>>


Beam High Priority Issue Report (50)

2023-11-16 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/29429 [Failing Test]: Jenkins job 
beam_PostCommit_Java_Nexmark_Flink not being finished
https://github.com/apache/beam/issues/29413 [Bug]: Can not use Avro over 1.8.2 
with Beam 2.52.0
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github 
actions tests are failing due to update of pip 
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get 
stuck for large jobs due to client dead lock
https://github.com/apache/beam/issues/28410 Support new versions of pyarrow in 
apache-beam
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python