Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-28 Thread Chamikara Jayalath via dev
+1 (binding)

Validated by running some multi-lang jobs.

Thanks,
Cham

On Mon, Aug 28, 2023 at 10:40 AM Yi Hu via dev  wrote:

> +1 (non-binding)
>
> Verified Java IO load tests (TextIO, BigQuery, Bigtable) on Dataflow
> runner (legacy and V2) using https://github.com/apache/beam/tree/master/it
>
> On Mon, Aug 28, 2023 at 1:13 PM Ahmet Altay via dev 
> wrote:
>
>> +1 (binding).
>>
>> I validated python quick starts on direct and dataflow runners. Thank you
>> for working on the release!
>>
>> On Mon, Aug 28, 2023 at 8:48 AM Robert Burke  wrote:
>>
>>> Good morning!
>>>
>>> RC2 validation and vote is still open!
>>>
>>> On Sun, Aug 27, 2023, 1:28 PM XQ Hu via dev  wrote:
>>>
 +1
 Ran the simple Dataflow ML GPU batch job using
 https://github.com/google/dataflow-ml-starter with Python 2.50.0rc2 to
 validate the RC works well.

 On Sat, Aug 26, 2023 at 12:16 AM Valentyn Tymofieiev via dev <
 dev@beam.apache.org> wrote:

> +1
>
> Verified that the issue detected in RC0 has been resolved.
> Successfully ran a Python pipeline on ARM Dataflow workers.
>
> Noted that Dataflow runner logs became less verbose as the result of
> https://github.com/apache/beam/pull/27788. One line that I often pay
> attention to no longer appears at the default  INFO log level:
>
> ```
> INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-26T03:45:35.126Z:
> JOB_MESSAGE_DETAILED: All workers have finished the startup processes and
> began to receive work requests.
> ```
>
> Dataflow service can be adjusted to compensate for this (internal
> change: http://cl/560265419 ).
>
> On Fri, Aug 25, 2023 at 3:05 PM Bruno Volpato via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non-binding).
>>
>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>> (Java SDK 11, Dataflow runner).
>>
>> Thanks Robert!
>>
>> On Thu, Aug 24, 2023 at 7:12 PM Robert Burke 
>> wrote:
>>
>>> Two minor erata from the previous email:
>>>
>>> The validation spreadsheet link should be:
>>>
>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
>>>
>>> And the source code tag is: "v2.50.0-RC2"
>>>
>>> On 2023/08/24 23:09:23 Robert Burke wrote:
>>> > Hi everyone,
>>> > Please review and vote on the release candidate #2 for the version
>>> 2.50.0,
>>> > as follows:
>>> > [ ] +1, Approve the release
>>> > [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> >
>>> >
>>> > Reviewers are encouraged to test their own use cases with the
>>> release
>>> > candidate, and vote +1 if
>>> > no issues are found. Only PMC member votes will count towards the
>>> final
>>> > vote, but votes from all
>>> > community members is encouraged and helpful for finding
>>> regressions; you
>>> > can either test your own
>>> > use cases or use cases from the validation sheet [10].
>>> >
>>> > Issues noted in RC1 vote proposal [13] have now been resolved.
>>> >
>>> > The staging area is available for your review, which includes:
>>> > * GitHub Release notes [1],
>>> > * the official Apache source release to be deployed to
>>> dist.apache.org [2],
>>> > which is signed with the key with fingerprint 02677FF4371A3756 (
>>> > lostl...@apache.org) or D20316F712213422
>>> > (GitHub Action automated) [[3],
>>> > * all artifacts to be deployed to the Maven Central Repository [4],
>>> > * source code tag "v2.50.0-RC2" [5],
>>> > * website pull request listing the release [6], the blog post [6],
>>> and
>>> > publishing the API reference manual [7].
>>> > * Java artifacts were built with Gradle 7.5.1 and OpenJDK
>>> (Temurin)(build
>>> > 1.8.0_382-b05).
>>> > * Python artifacts are deployed along with the source release to
>>> the
>>> > dist.apache.org [2] and PyPI[8].
>>> > * Go artifacts and documentation are available at pkg.go.dev [9]
>>> > * Validation sheet with a tab for 2.50.0 release to help with
>>> validation
>>> > [10].
>>> > * Docker images published to Docker Hub [11].
>>> > * PR to run tests against release branch [12].
>>> >
>>> > The vote will be open for at least 72 hours. It is adopted by
>>> majority
>>> > approval, with at least 3 PMC affirmative votes.
>>> >
>>> > For guidelines on how to try the release in your projects, check
>>> out our
>>> > blog post at https://beam.apache.org/blog/validate-beam-release/.
>>> >
>>> > Thanks,
>>> > Robert Burke
>>> > Apache Beam 2.50.0 Release Manager
>>> >
>>> > [1] https://github.com/apache/beam/milestone/14
>>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
>>> > [3] 

Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-28 Thread Yi Hu via dev
+1 (non-binding)

Verified Java IO load tests (TextIO, BigQuery, Bigtable) on Dataflow runner
(legacy and V2) using https://github.com/apache/beam/tree/master/it

On Mon, Aug 28, 2023 at 1:13 PM Ahmet Altay via dev 
wrote:

> +1 (binding).
>
> I validated python quick starts on direct and dataflow runners. Thank you
> for working on the release!
>
> On Mon, Aug 28, 2023 at 8:48 AM Robert Burke  wrote:
>
>> Good morning!
>>
>> RC2 validation and vote is still open!
>>
>> On Sun, Aug 27, 2023, 1:28 PM XQ Hu via dev  wrote:
>>
>>> +1
>>> Ran the simple Dataflow ML GPU batch job using
>>> https://github.com/google/dataflow-ml-starter with Python 2.50.0rc2 to
>>> validate the RC works well.
>>>
>>> On Sat, Aug 26, 2023 at 12:16 AM Valentyn Tymofieiev via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1

 Verified that the issue detected in RC0 has been resolved. Successfully
 ran a Python pipeline on ARM Dataflow workers.

 Noted that Dataflow runner logs became less verbose as the result of
 https://github.com/apache/beam/pull/27788. One line that I often pay
 attention to no longer appears at the default  INFO log level:

 ```
 INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-26T03:45:35.126Z:
 JOB_MESSAGE_DETAILED: All workers have finished the startup processes and
 began to receive work requests.
 ```

 Dataflow service can be adjusted to compensate for this (internal
 change: http://cl/560265419 ).

 On Fri, Aug 25, 2023 at 3:05 PM Bruno Volpato via dev <
 dev@beam.apache.org> wrote:

> +1 (non-binding).
>
> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
> (Java SDK 11, Dataflow runner).
>
> Thanks Robert!
>
> On Thu, Aug 24, 2023 at 7:12 PM Robert Burke 
> wrote:
>
>> Two minor erata from the previous email:
>>
>> The validation spreadsheet link should be:
>>
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
>>
>> And the source code tag is: "v2.50.0-RC2"
>>
>> On 2023/08/24 23:09:23 Robert Burke wrote:
>> > Hi everyone,
>> > Please review and vote on the release candidate #2 for the version
>> 2.50.0,
>> > as follows:
>> > [ ] +1, Approve the release
>> > [ ] -1, Do not approve the release (please provide specific
>> comments)
>> >
>> >
>> > Reviewers are encouraged to test their own use cases with the
>> release
>> > candidate, and vote +1 if
>> > no issues are found. Only PMC member votes will count towards the
>> final
>> > vote, but votes from all
>> > community members is encouraged and helpful for finding
>> regressions; you
>> > can either test your own
>> > use cases or use cases from the validation sheet [10].
>> >
>> > Issues noted in RC1 vote proposal [13] have now been resolved.
>> >
>> > The staging area is available for your review, which includes:
>> > * GitHub Release notes [1],
>> > * the official Apache source release to be deployed to
>> dist.apache.org [2],
>> > which is signed with the key with fingerprint 02677FF4371A3756 (
>> > lostl...@apache.org) or D20316F712213422
>> > (GitHub Action automated) [[3],
>> > * all artifacts to be deployed to the Maven Central Repository [4],
>> > * source code tag "v2.50.0-RC2" [5],
>> > * website pull request listing the release [6], the blog post [6],
>> and
>> > publishing the API reference manual [7].
>> > * Java artifacts were built with Gradle 7.5.1 and OpenJDK
>> (Temurin)(build
>> > 1.8.0_382-b05).
>> > * Python artifacts are deployed along with the source release to the
>> > dist.apache.org [2] and PyPI[8].
>> > * Go artifacts and documentation are available at pkg.go.dev [9]
>> > * Validation sheet with a tab for 2.50.0 release to help with
>> validation
>> > [10].
>> > * Docker images published to Docker Hub [11].
>> > * PR to run tests against release branch [12].
>> >
>> > The vote will be open for at least 72 hours. It is adopted by
>> majority
>> > approval, with at least 3 PMC affirmative votes.
>> >
>> > For guidelines on how to try the release in your projects, check
>> out our
>> > blog post at https://beam.apache.org/blog/validate-beam-release/.
>> >
>> > Thanks,
>> > Robert Burke
>> > Apache Beam 2.50.0 Release Manager
>> >
>> > [1] https://github.com/apache/beam/milestone/14
>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
>> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > [4]
>> https://repository.apache.org/content/repositories/orgapachebeam-1355/
>> > [5] https://github.com/apache/beam/tree/v2.50.0-RC2
>> > [6] https://github.com/apache/beam/pull/28055
>> > [7] 

Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-28 Thread Ahmet Altay via dev
+1 (binding).

I validated python quick starts on direct and dataflow runners. Thank you
for working on the release!

On Mon, Aug 28, 2023 at 8:48 AM Robert Burke  wrote:

> Good morning!
>
> RC2 validation and vote is still open!
>
> On Sun, Aug 27, 2023, 1:28 PM XQ Hu via dev  wrote:
>
>> +1
>> Ran the simple Dataflow ML GPU batch job using
>> https://github.com/google/dataflow-ml-starter with Python 2.50.0rc2 to
>> validate the RC works well.
>>
>> On Sat, Aug 26, 2023 at 12:16 AM Valentyn Tymofieiev via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1
>>>
>>> Verified that the issue detected in RC0 has been resolved. Successfully
>>> ran a Python pipeline on ARM Dataflow workers.
>>>
>>> Noted that Dataflow runner logs became less verbose as the result of
>>> https://github.com/apache/beam/pull/27788. One line that I often pay
>>> attention to no longer appears at the default  INFO log level:
>>>
>>> ```
>>> INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-26T03:45:35.126Z:
>>> JOB_MESSAGE_DETAILED: All workers have finished the startup processes and
>>> began to receive work requests.
>>> ```
>>>
>>> Dataflow service can be adjusted to compensate for this (internal
>>> change: http://cl/560265419 ).
>>>
>>> On Fri, Aug 25, 2023 at 3:05 PM Bruno Volpato via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 (non-binding).

 Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
 (Java SDK 11, Dataflow runner).

 Thanks Robert!

 On Thu, Aug 24, 2023 at 7:12 PM Robert Burke 
 wrote:

> Two minor erata from the previous email:
>
> The validation spreadsheet link should be:
>
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
>
> And the source code tag is: "v2.50.0-RC2"
>
> On 2023/08/24 23:09:23 Robert Burke wrote:
> > Hi everyone,
> > Please review and vote on the release candidate #2 for the version
> 2.50.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > Reviewers are encouraged to test their own use cases with the release
> > candidate, and vote +1 if
> > no issues are found. Only PMC member votes will count towards the
> final
> > vote, but votes from all
> > community members is encouraged and helpful for finding regressions;
> you
> > can either test your own
> > use cases or use cases from the validation sheet [10].
> >
> > Issues noted in RC1 vote proposal [13] have now been resolved.
> >
> > The staging area is available for your review, which includes:
> > * GitHub Release notes [1],
> > * the official Apache source release to be deployed to
> dist.apache.org [2],
> > which is signed with the key with fingerprint 02677FF4371A3756 (
> > lostl...@apache.org) or D20316F712213422
> > (GitHub Action automated) [[3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.50.0-RC2" [5],
> > * website pull request listing the release [6], the blog post [6],
> and
> > publishing the API reference manual [7].
> > * Java artifacts were built with Gradle 7.5.1 and OpenJDK
> (Temurin)(build
> > 1.8.0_382-b05).
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org [2] and PyPI[8].
> > * Go artifacts and documentation are available at pkg.go.dev [9]
> > * Validation sheet with a tab for 2.50.0 release to help with
> validation
> > [10].
> > * Docker images published to Docker Hub [11].
> > * PR to run tests against release branch [12].
> >
> > The vote will be open for at least 72 hours. It is adopted by
> majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > For guidelines on how to try the release in your projects, check out
> our
> > blog post at https://beam.apache.org/blog/validate-beam-release/.
> >
> > Thanks,
> > Robert Burke
> > Apache Beam 2.50.0 Release Manager
> >
> > [1] https://github.com/apache/beam/milestone/14
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1355/
> > [5] https://github.com/apache/beam/tree/v2.50.0-RC2
> > [6] https://github.com/apache/beam/pull/28055
> > [7] https://github.com/apache/beam-site/pull/648
> > [8] https://pypi.org/project/apache-beam/2.50.0rc2/
> > [9]
> >
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0-RC2/go/pkg/beam
> > [10]
> >
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
> > [11] https://hub.docker.com/search?q=apache%2Fbeam=image

Re: Suspected memory leak in Python Pubsub ReadFromPubsub

2023-08-28 Thread Valentyn Tymofieiev via dev
This appears to be a recent issue reported also by others (e.g.
https://github.com/apache/beam/issues/28142), it's being actively
investigated. Therefore, it is unlikely that memory fragmentation is an
issue.

On Tue, Aug 22, 2023 at 5:21 PM Valentyn Tymofieiev 
wrote:

> Hi, thanks for reaching out.
>
> I'd be curious to see whether the memory consumption patterns you observe
> change if you switch the memory allocator library.
>
> For example, you could try to use a custom container, install jemalloc and
> enable it. See: https://beam.apache.org/documentation/runtime/environments
> , https://cloud.google.com/dataflow/docs/guides/using-custom-containers
>
> Your Dockerfile might look like the following:
>
> FROM apache/beam_python3.10_sdk:2.49.0
>
> # Prebuilt other dependencies
> RUN apt-get update \
>   && apt-get install -y libjemalloc-dev
>
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
>
> # Set the entrypoint to the Apache Beam SDK launcher.
> ENTRYPOINT ["/opt/apache/beam/boot"]
>
>
> On Tue, Aug 22, 2023 at 10:42 AM Cheng Han Lee  wrote:
>
>> Hello!
>>
>> I'm an avid apache beam user (on Dataflow) and we use beam to stream
>> blockchain data to various sinks. I recently noticed some memory issues
>> across all our pipelines but have yet to be able to find the root cause and
>> was hoping someone on your team might be able to help. If this isn't the
>> right avenue for it, please let me know how I should reach out.
>>
>> The details are here in stackoverflow:
>>
>>
>> https://stackoverflow.com/questions/76950068/memory-leak-in-apache-beam-python-readfrompubsub-io
>>
>> Thanks,
>> Chenghan
>> CTO | Allium
>>
>


Re: Proposal for pyproject.toml Support in Apache Beam Python

2023-08-28 Thread Austin Bennett
I've thought about this a ton, but haven't been in a position to undertake
the work.  Thanks for bringing this up, @Anand Inguva
 !

I'd point us to https://python-poetry.org/  ... [ which is where I'd look
take us, but I'm also not able to do all the work, so my
suggestion/preference doensn't matter that much ]

https://python-poetry.org/docs/pyproject#the-pyprojecttoml-file <- for info
on pyproject.toml file.

Notice the use of a 'lock' file is very valuable, ex:
https://python-poetry.org/docs/basic-usage/#committing-your-poetrylock-file-to-version-control

I haven't come across `build`, that might be great too.  I'd highlight that
Poetry is pretty common across industry these days, rock-solid, ecosystem
of interoperability, users, etc...   If not familiar, PLEASE have a look at
that.




On Mon, Aug 28, 2023 at 8:04 AM Kerry Donny-Clark via dev <
dev@beam.apache.org> wrote:

> +1
> Hi Anand,
> I appreciate this effort. Managing python dependencies has been a major
> pain point for me, and I think this approach would help.
> Kerry
>
> On Mon, Aug 28, 2023 at 10:14 AM Anand Inguva via dev 
> wrote:
>
>> Hello Beam Dev Team,
>>
>> I've compiled a design document
>> [1]
>> proposing the integration of pyproject.toml into Apache Beam's Python build
>> process. Your insights and feedback would be invaluable.
>>
>> What is pyproject.toml?
>> pyproject.toml is a configuration file that specifies a project's build
>> dependencies and other project-related metadata in a standardized
>> format. Before pyproject.toml, Python projects often had multiple
>> configuration files (like setup.py, setup.cfg, and requirements.txt).
>> pyproject.toml aims to centralize these configurations into one place,
>> making project setups more organized and straightforward. One of the
>> significant features enabled by pyproject.toml is the ability to perform
>> isolated builds. This ensures that build dependencies are separated from
>> the project's runtime dependencies, leading to more consistent and
>> reproducible builds.
>>
>> [1]
>> https://docs.google.com/document/d/17-y48WW25-VGBWZNyTdoN0WUN03k9ZhJjLp9wtyG1Wc/edit#heading=h.wskna8eurvjv
>>
>> Thanks,
>> Anand
>>
>


Re: [VOTE] Release 2.50.0, release candidate #2

2023-08-28 Thread Robert Burke
Good morning!

RC2 validation and vote is still open!

On Sun, Aug 27, 2023, 1:28 PM XQ Hu via dev  wrote:

> +1
> Ran the simple Dataflow ML GPU batch job using
> https://github.com/google/dataflow-ml-starter with Python 2.50.0rc2 to
> validate the RC works well.
>
> On Sat, Aug 26, 2023 at 12:16 AM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> +1
>>
>> Verified that the issue detected in RC0 has been resolved. Successfully
>> ran a Python pipeline on ARM Dataflow workers.
>>
>> Noted that Dataflow runner logs became less verbose as the result of
>> https://github.com/apache/beam/pull/27788. One line that I often pay
>> attention to no longer appears at the default  INFO log level:
>>
>> ```
>> INFO:apache_beam.runners.dataflow.dataflow_runner:2023-08-26T03:45:35.126Z:
>> JOB_MESSAGE_DETAILED: All workers have finished the startup processes and
>> began to receive work requests.
>> ```
>>
>> Dataflow service can be adjusted to compensate for this (internal change:
>> http://cl/560265419 ).
>>
>> On Fri, Aug 25, 2023 at 3:05 PM Bruno Volpato via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (non-binding).
>>>
>>> Tested with https://github.com/GoogleCloudPlatform/DataflowTemplates
>>> (Java SDK 11, Dataflow runner).
>>>
>>> Thanks Robert!
>>>
>>> On Thu, Aug 24, 2023 at 7:12 PM Robert Burke 
>>> wrote:
>>>
 Two minor erata from the previous email:

 The validation spreadsheet link should be:

 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464

 And the source code tag is: "v2.50.0-RC2"

 On 2023/08/24 23:09:23 Robert Burke wrote:
 > Hi everyone,
 > Please review and vote on the release candidate #2 for the version
 2.50.0,
 > as follows:
 > [ ] +1, Approve the release
 > [ ] -1, Do not approve the release (please provide specific comments)
 >
 >
 > Reviewers are encouraged to test their own use cases with the release
 > candidate, and vote +1 if
 > no issues are found. Only PMC member votes will count towards the
 final
 > vote, but votes from all
 > community members is encouraged and helpful for finding regressions;
 you
 > can either test your own
 > use cases or use cases from the validation sheet [10].
 >
 > Issues noted in RC1 vote proposal [13] have now been resolved.
 >
 > The staging area is available for your review, which includes:
 > * GitHub Release notes [1],
 > * the official Apache source release to be deployed to
 dist.apache.org [2],
 > which is signed with the key with fingerprint 02677FF4371A3756 (
 > lostl...@apache.org) or D20316F712213422
 > (GitHub Action automated) [[3],
 > * all artifacts to be deployed to the Maven Central Repository [4],
 > * source code tag "v2.50.0-RC2" [5],
 > * website pull request listing the release [6], the blog post [6], and
 > publishing the API reference manual [7].
 > * Java artifacts were built with Gradle 7.5.1 and OpenJDK
 (Temurin)(build
 > 1.8.0_382-b05).
 > * Python artifacts are deployed along with the source release to the
 > dist.apache.org [2] and PyPI[8].
 > * Go artifacts and documentation are available at pkg.go.dev [9]
 > * Validation sheet with a tab for 2.50.0 release to help with
 validation
 > [10].
 > * Docker images published to Docker Hub [11].
 > * PR to run tests against release branch [12].
 >
 > The vote will be open for at least 72 hours. It is adopted by majority
 > approval, with at least 3 PMC affirmative votes.
 >
 > For guidelines on how to try the release in your projects, check out
 our
 > blog post at https://beam.apache.org/blog/validate-beam-release/.
 >
 > Thanks,
 > Robert Burke
 > Apache Beam 2.50.0 Release Manager
 >
 > [1] https://github.com/apache/beam/milestone/14
 > [2] https://dist.apache.org/repos/dist/dev/beam/2.50.0/
 > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
 > [4]
 https://repository.apache.org/content/repositories/orgapachebeam-1355/
 > [5] https://github.com/apache/beam/tree/v2.50.0-RC2
 > [6] https://github.com/apache/beam/pull/28055
 > [7] https://github.com/apache/beam-site/pull/648
 > [8] https://pypi.org/project/apache-beam/2.50.0rc2/
 > [9]
 >
 https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0-RC2/go/pkg/beam
 > [10]
 >
 https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1014811464
 > [11] https://hub.docker.com/search?q=apache%2Fbeam=image
 > [12] https://github.com/apache/beam/pull/27962
 > [13] https://lists.apache.org/thread/xgx49zshms7253lfx6d6lsnvwf7tyyfp
 >

>>>


Re: Proposal for pyproject.toml Support in Apache Beam Python

2023-08-28 Thread Kerry Donny-Clark via dev
+1
Hi Anand,
I appreciate this effort. Managing python dependencies has been a major
pain point for me, and I think this approach would help.
Kerry

On Mon, Aug 28, 2023 at 10:14 AM Anand Inguva via dev 
wrote:

> Hello Beam Dev Team,
>
> I've compiled a design document
> [1]
> proposing the integration of pyproject.toml into Apache Beam's Python build
> process. Your insights and feedback would be invaluable.
>
> What is pyproject.toml?
> pyproject.toml is a configuration file that specifies a project's build
> dependencies and other project-related metadata in a standardized
> format. Before pyproject.toml, Python projects often had multiple
> configuration files (like setup.py, setup.cfg, and requirements.txt).
> pyproject.toml aims to centralize these configurations into one place,
> making project setups more organized and straightforward. One of the
> significant features enabled by pyproject.toml is the ability to perform
> isolated builds. This ensures that build dependencies are separated from
> the project's runtime dependencies, leading to more consistent and
> reproducible builds.
>
> [1]
> https://docs.google.com/document/d/17-y48WW25-VGBWZNyTdoN0WUN03k9ZhJjLp9wtyG1Wc/edit#heading=h.wskna8eurvjv
>
> Thanks,
> Anand
>


Re: Proposal for pyproject.toml Support in Apache Beam Python

2023-08-28 Thread Danny McCormick via dev
Thanks Anand! This generally sounds good to me. I left a few questions
before giving a full +1, but it definitely seems like we need some
migration effort here and this seems like a good route.

Thanks,
Danny

On Mon, Aug 28, 2023 at 10:14 AM Anand Inguva via dev 
wrote:

> Hello Beam Dev Team,
>
> I've compiled a design document
> [1]
> proposing the integration of pyproject.toml into Apache Beam's Python build
> process. Your insights and feedback would be invaluable.
>
> What is pyproject.toml?
> pyproject.toml is a configuration file that specifies a project's build
> dependencies and other project-related metadata in a standardized
> format. Before pyproject.toml, Python projects often had multiple
> configuration files (like setup.py, setup.cfg, and requirements.txt).
> pyproject.toml aims to centralize these configurations into one place,
> making project setups more organized and straightforward. One of the
> significant features enabled by pyproject.toml is the ability to perform
> isolated builds. This ensures that build dependencies are separated from
> the project's runtime dependencies, leading to more consistent and
> reproducible builds.
>
> [1]
> https://docs.google.com/document/d/17-y48WW25-VGBWZNyTdoN0WUN03k9ZhJjLp9wtyG1Wc/edit#heading=h.wskna8eurvjv
>
> Thanks,
> Anand
>


Proposal for pyproject.toml Support in Apache Beam Python

2023-08-28 Thread Anand Inguva via dev
Hello Beam Dev Team,

I've compiled a design document
[1]
proposing the integration of pyproject.toml into Apache Beam's Python build
process. Your insights and feedback would be invaluable.

What is pyproject.toml?
pyproject.toml is a configuration file that specifies a project's build
dependencies and other project-related metadata in a standardized
format. Before pyproject.toml, Python projects often had multiple
configuration files (like setup.py, setup.cfg, and requirements.txt).
pyproject.toml aims to centralize these configurations into one place,
making project setups more organized and straightforward. One of the
significant features enabled by pyproject.toml is the ability to perform
isolated builds. This ensures that build dependencies are separated from
the project's runtime dependencies, leading to more consistent and
reproducible builds.

[1]
https://docs.google.com/document/d/17-y48WW25-VGBWZNyTdoN0WUN03k9ZhJjLp9wtyG1Wc/edit#heading=h.wskna8eurvjv

Thanks,
Anand


Re: [ANNOUNCE] New committer: Ahmed Abualsaud

2023-08-28 Thread Ahmed Abualsaud via dev
Thanks to the PMC for these responsibilities, and thank you all for guiding
me along this journey. I'm looking forward to helping this community
however I can :)

Best,
Ahmed

On Sun, Aug 27, 2023 at 8:48 PM Reza Rokni via dev 
wrote:

> Congrats Ahmed!
>
> On Fri, Aug 25, 2023 at 2:34 PM John Casey via dev 
> wrote:
>
>> Congrats Ahmed!
>>
>> On Fri, Aug 25, 2023 at 10:43 AM Bjorn Pedersen via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congrats Ahmed! Well deserved!
>>>
>>> On Fri, Aug 25, 2023 at 10:36 AM Yi Hu via dev 
>>> wrote:
>>>
 Congrats Ahmed!

 On Fri, Aug 25, 2023 at 10:11 AM Ritesh Ghorse via dev <
 dev@beam.apache.org> wrote:

> Congrats Ahmed!
>
> On Fri, Aug 25, 2023 at 9:53 AM Kerry Donny-Clark via dev <
> dev@beam.apache.org> wrote:
>
>> Well done Ahmed!
>>
>> On Fri, Aug 25, 2023 at 9:17 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Congrats Ahmed!
>>>
>>> On Fri, Aug 25, 2023 at 3:16 AM Jan Lukavský 
>>> wrote:
>>>
 Congrats Ahmed!
 On 8/25/23 07:56, Anand Inguva via dev wrote:

 Congratulations Ahmed :)

 On Fri, Aug 25, 2023 at 1:17 AM Damon Douglas <
 damondoug...@apache.org> wrote:

> Well deserved! Congratulations, Ahmed! I'm so happy for you.
>
> On Thu, Aug 24, 2023, 5:46 PM Byron Ellis via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations!
>>
>> On Thu, Aug 24, 2023 at 5:34 PM Robert Burke 
>> wrote:
>>
>>> Congratulations Ahmed!!
>>>
>>> On Thu, Aug 24, 2023, 4:08 PM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Congrats Ahmed!!

 On Thu, Aug 24, 2023 at 4:06 PM Bruno Volpato via dev <
 dev@beam.apache.org> wrote:

> Congratulations, Ahmed!
>
> Very well deserved!
>
>
> On Thu, Aug 24, 2023 at 6:09 PM XQ Hu via dev <
> dev@beam.apache.org> wrote:
>
>> Congratulations, Ahmed!
>>
>> On Thu, Aug 24, 2023, 5:49 PM Ahmet Altay via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> Please join me and the rest of the Beam PMC in welcoming a
>>> new committer: Ahmed Abualsaud (ahmedabuals...@apache.org).
>>>
>>> Ahmed has been part of the Beam community since January
>>> 2022, working mostly on IO connectors, made a large amount of 
>>> contributions
>>> to make Beam IOs more usable, performant, and reliable. And at 
>>> the same
>>> time Ahmed was active in the user list and at the Beam summit 
>>> helping users
>>> by sharing his knowledge.
>>>
>>> Considering their contributions to the project over this
>>> timeframe, the Beam PMC trusts Ahmed with the responsibilities 
>>> of a Beam
>>> committer. [1]
>>>
>>> Thank you Ahmed! And we are looking to see more of your
>>> contributions!
>>>
>>> Ahmet, on behalf of the Apache Beam PMC
>>>
>>> [1]
>>>
>>> https://beam.apache.org/contribute/become-a-committer/#an-apache-beam-committer
>>>
>>>


Beam High Priority Issue Report (40)

2023-08-28 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/28104 [Failing Test]: gradlew 
:sdks:python:test-suites:tox:pycommon:docs is failing
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it