Re: [PROPOSAL] Preparing for 2.47.0 Release

2023-04-12 Thread Ahmet Altay via dev
Jack, how is the release coming along?

On Tue, Apr 4, 2023 at 12:23 PM Jack McCluskey via dev 
wrote:

> Hey everyone,
>
> I need a PMC member's help adding my pubkey to
> https://dist.apache.org/repos/dist/release/beam/KEYS as well as adding
> PyPI user jrmccluskey to the maintainers of the Apache Beam package. These
> are the last steps I have to do to complete prep for the release.
>
> Thanks,
>
> Jack McCluskey
>
> On Wed, Mar 22, 2023 at 11:38 AM Jack McCluskey 
> wrote:
>
>> Hey all,
>>
>> The next (2.47.0) release branch cut is scheduled for April 5th, 2023,
>> according to
>> the release calendar [1].
>>
>> I will be performing this release. My plan is to cut the branch on that
>> date, and cherrypick release-blocking fixes afterwards, if any.
>>
>> Please help me make sure the release goes smoothly by:
>> - Making sure that any unresolved release blocking issues
>> for 2.47.0 should have their "Milestone" marked as "2.47.0 Release" as
>> soon as possible.
>> - Reviewing the current release blockers [2] and remove the Milestone if
>> they don't meet the criteria at [3].
>>
>> Let me know if you have any comments/objections/questions.
>>
>> Thanks,
>>
>> Jack McCluskey
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/10
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>> --
>>
>>
>> Jack McCluskey
>> SWE - DataPLS PLAT/ Dataflow ML
>> RDU
>> jrmcclus...@google.com
>>
>>
>>


Re: [DISCUSS] @Experimental, @Internal, @Stable, etc annotations

2023-04-12 Thread Ahmet Altay via dev
I agree with Alexey and Byron.
1. We do not have any concrete evidence of our users paying attention to
any of those annotations. Experimental API that were in that state for a
long while are good examples. A possible exception is a deprecated
annotation. My preference would be to simplify annotations to nothing
(stable enough for use and will evolve backward compatibility), and maybe
deprecated annotations.
2. If you all think that Experimental annotation is needed, Byron's
suggestion (more or less what we do today) but with some concrete life
cycle definitions of those annotations would be useful to our users. (An
example could be: experimental APIs either need to graduate or be removed
in X releases.)



On Tue, Apr 4, 2023 at 9:01 AM Alexey Romanenko 
wrote:

> Great and long-to-wait topic to discuss.
>
> My personal opinion based on what I saw on different open-source projects
> is that all such annotations, like @Experimental or @Stable, are not
> usefull along the time and even rather useless and misleading. What
> actually play roles is artifacts publishing and public API despite how it
> was annotated. Once a class/method was published and available for users to
> use, it should be considered as “stable" (even if it’s not yet stable from
> its developers point of view) and can’t be easily removed/changed in the
> next releases.
>
> At Beam, we have a “good" example with @Experimental that was used to
> annotate many parts of code in the beginning of its creation but then
> perhaps forgotten to be removed whenever this code is already used by many
> users and API can’t be just changed despite of this annotation.
>
> So, I’m pro to dismiss such annotations and consider all public and
> user-available API as “stable”. If it’s needed to change/remove a public
> API then we should follow the procedure of API deprecation and final
> removing, at least, after 3 major (x.y) Beam releases. It should help to
> have the clear rules for API changes and avoiding breaking changes for
> users.
>
> —
> Alexey
>
>
> On 3 Apr 2023, at 17:04, Byron Ellis via dev  wrote:
>
> Honestly, I think APIs could be pretty simply defined if you think of it
> in terms of the user:
>
> @Deprecated = this was either stable or evolve but the
> functionality/interface will go away at a future date
>
> @Stable = the user of this API opting out of changes to functionality and
> interface. For example, default options don't change for a transform
> annotated this way.
>
> Evolving (No Annotation) = the user is opting in to changes to
> functionality but not to interface. We should generally try to write
> backwards compatible code, but on the other hand the release model does not
> force users into an upgrade
>
> @Experimental = this functionality / interface might be a bad idea and
> could go away at any time
>
>
> On Mon, Apr 3, 2023 at 7:22 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> *;tldr - I'd like "evolving" to be further defined, specifically around
>> how we will make decisions about breaking behavior and API changes*
>>
>> I don't particularly care what tags we use as long as they're well
>> documented. With that said, I think the following framing needs to be
>> documented with more definition to flesh out the underlying philosophy:
>>
>> *>  - new code is changeable/evolving by default (so we don't have to
>> always remember to annotate it) but users have confidence they can use it
>> in production (because we have good software engineering practices)*
>>
>> * > - Experimental would be reserved for more risky things*
>> * > - after we are confident an API is stable, because it has been the
>> same across a couple releases, we mark it*
>>
>> Here, we have 3 classes of APIs - "experimental", "stable", and
>> "evolving" (or alternately "undefined").
>>
>> "Experimental" seems clear - we can make any changes we want. "Stable" is
>> reasonably straightforward as well - we will only make non-breaking changes
>> except in exceptional cases (e.g. security hole, total failure of
>> functionality, etc...)
>>
>> With "evolving" is the idea that we can still make any changes we want,
>> but we think it's less likely we'll need to? Are silent behavior changes
>> acceptable here (my vote would be no)? What about breaking API changes (my
>> vote would be rarely)?
>>
>> I think being able to change our APIs is an ok goal, but outside of a
>> true experimental context we should still be weighing the cost of API
>> changes against the benefit; we have a problem of people not updating to
>> newer SDKs, and introducing more breaking changes will just exacerbate that
>> problem. Maybe my concerns are just a consequence of me not really seeing
>> the same things that you're seeing, specifically: "*I'm seeing a culture
>> of being afraid to change things, even when it would be good for users,
>> because our API surface area is far too large and not explicitly chosen.*"
>> Mostly what I've seen is a healthy concern about 

Re: Python 3.11 support in Apache Beam

2023-04-12 Thread Ahmet Altay via dev
Thank you, this is great!

Python 3.11 announcement had a claim about performance [1]:

"CPython 3.11 is an average of 25% faster than CPython 3.10 as measured
with the pyperformance benchmark suite, when compiled with GCC on Ubuntu
Linux. Depending on your workload, the overall speedup could be 10-60%."

Have we measured this in Beam? Are we seeing any benefits? If not, why? If
yes, this would be a cool blog post as well.

Ahmet


On Wed, Apr 5, 2023 at 1:12 PM Anand Inguva via dev 
wrote:

> Python 3.11 support has been merged at
> https://github.com/apache/beam/pull/26121 targeting Beam 2.47.0 release.
>
> Please let me know if you have any questions.
>
> Thanks,
> Anand
>
> On Tue, Feb 21, 2023 at 6:04 PM Valentyn Tymofieiev 
> wrote:
>
>> Thanks a lot Anand. I'll take a look at the PRs.
>>
>> On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva 
>> wrote:
>>
>>> I was able to spin up a PR: https://github.com/apache/beam/pull/24599
>>> that updates the build dependencies of Apache Beam.
>>>
>>> Several GCP dependencies needed to be updated as well. I covered them in
>>> the PR: https://github.com/apache/beam/pull/24599
>>>
>>> On Thu, Feb 9, 2023 at 3:29 PM Anand Inguva 
>>> wrote:
>>>
 Yes, we may need to update all of them
 .
 I can add more information once I dig into the issue(most likely next
 week). I will comment on my findings on the issue:
 https://github.com/apache/beam/issues/24569 and will periodically
 update this thread.

 On Tue, Feb 7, 2023 at 5:47 PM Valentyn Tymofieiev 
 wrote:

> On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva 
> wrote:
>
>> Yes, it is related to protobuf only. But I think the update of these
>> dependencies are required for Python 3.11 since the newer versions have
>> support for Python 3.11 wheels.
>>
> Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for
> protobuf==3.x.x and that can cause friction.
> https://pypi.org/project/protobuf/3.20.3/#files
>
> I would probably narrow the problem further to demonstrate which stubs
> are not being generated, and if reason not obvious we can also ask for
> feedback from protobuf maintainers. Also - do we by chance need to
> update some other deps from
> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
> for this to work?
>
> Also: tracking issue for protobuf4 support in Beam:
> https://github.com/apache/beam/issues/24569.
>
> If we use older versions of these packages, then we have to depend on
>> installing those packages on Python 3.11 from source distributions which 
>> is
>> not desired.
>>
>> I am working parallely on that issue in a different PR
>> https://github.com/apache/beam/pull/24599 but I think this issue
>> should be a blocker for Python 3.11 update.
>>
>> On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>>
>>> Hi Anand,
>>>
>>> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hi all,

 We are planning to work on adding support for Python 3.11[1] to
 Apache Beam Python SDK.

 As part of this effort, we are going to update the python build
 dependencies defined at [2].

 Right now, there is an error with the newer version of
 protobuf(4.21.11). It is not generating _urn files.

 It can be reproduced by

>>>
 1. python setup.py sdist
 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
 3. switch to python interpreter and run import apache_beam as beam

>>> I think the error you are describing is related to protobuf 4, so
>>> the repro should focus on the portion where generation of stubs is
>>> happening. Presumably some stubs are not generated on protobuf 4 + 
>>> Python
>>> 3.11?
>>>
>>>

 will lead to *ImportError: cannot import name
 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'.  
 *Running
 `python gen_protos.py` to forcefully generate files didn't help either.

 If you have encountered this error and found a resolution, please
 let me know(that would be super helpful).

 I am going to work on this soon. Please let me know if you want to
 collaborate.

 Thanks,
 Anand Inguva

 *[1] *https://github.com/apache/beam/pull/24721
 [2]
 https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt

>>>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Yi Hu via dev
Sounds good, thanks!

Best,
Yi

On Wed, Apr 12, 2023 at 2:20 PM Anand Inguva  wrote:

> @Yi Hu  I think adding them to Jenkins or github
> actions is okay with me. With Github actions, since we don't use self
> hosted runners yet, I worry that action workers might get queued up.
>
> Also, I plan to not run these on every commit but run it as a cron
> job(maybe once per day) and also as trigger phrases and only on the lowest
> and highest python version. Also, migrating this workflow to jenkins would
> be trivial in the future once beam starts the migration. For now, I think
> it might be best to run on jenkins.
>
> On Wed, Apr 12, 2023 at 1:32 PM Valentyn Tymofieiev 
> wrote:
>
>> I think case in point dependency that would benefit from this testing is
>> grpcio, which includes pre-releases, and broke us and multiple of it's
>> released versions were yanked. https://pypi.org/project/grpcio/#history .
>>
>> We can look at how grpcio affected Beam previously. Couple of issues:
>>
>> - https://github.com/grpc/grpc/issues/30446 -- affected XLang tests
>> - https://github.com/apache/beam/issues/23734 -- affected MacOS suites
>> - https://github.com/apache/beam/issues/22159 -- (not detected by us,
>> but potentially could have affected a performance test).
>>
>> I'm afraid a dedicated suite may not give us desired test coverage to
>> catch regression at RC stage.
>>
>> On Wed, Apr 12, 2023 at 10:19 AM Yi Hu via dev 
>> wrote:
>>
>>> Thanks Anand,
>>>
>>> This would be very helpful to avoid experiencing multiple time (
>>> https://s.apache.org/beam-python-dependencies-pm). One thing to note is
>>> that Beam Jenkins CI is experiencing many issues recently, mostly due to
>>> that multiple Jenkins plugins does not scale (draining GitHub API call
>>> limit; disk usage, etc) so more PreCommit may add more pressures to Jenkins
>>> if going ahead with Option 1. As we have started GitHub Action migration,
>>> is it considered to add these new tests to GitHub Action?
>>>
>>> Best,
>>> Yi
>>>
>>> On Wed, Apr 12, 2023 at 10:46 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Thanks for doing this Anand, I'm +1 on option 1 as well - I think
 having the clear signal of the normal suite succeeding and the prerelease
 one failing would be helpful and there shouldn't be too much additional
 code necessary. That makes it really easy to treat the prerelease suite as
 a (at least temporary) signal on needing upper bounds on our dependencies.

 Thanks,
 Danny

 On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev <
 dev@beam.apache.org> wrote:

> Hi all,
>
> For Apache Beam Python we are considering using pre-released
> dependencies for unit testing by using the --pre flag to install
> pre-released dependencies of packages.
>
> We believe that using pre-released dependencies may help us to
> identify and resolve bugs more quickly, and to take advantage of new
> features or bug fixes that are not yet available in stable releases.
> However, we also understand that using pre-released dependencies may
> introduce new risks and challenges, including potential code duplication
> and stability issues.
>
> Before proceeding, we wanted to get your feedback on this approach.
>
> 1. Create a new PreCommit test suite and a PostCommit test suite that
> runs tests by installing pre-released dependencies.
>
> Pros:
>
>- stable and pre-released test suites are separate and it will be
>easier to debug if the pre-released test suite fails.
>
> Cons:
>
>- More test infra code to maintain. More tests to monitor.
>
>
> 2. Make use of the current PreCommit and PostCommit test suite and
> modify it so that it installs pre-released dependencies.
>
> Pros:
>
>- Less infra code and less tests to monitor.
>
> Cons:
>
>- Leads to noisy test signals if the pre-release candidate is
>unstable.
>
> I am in favor of approach 1 since this approach would ensure that any
> issues encountered during pre-release testing do not impact the stable
> release environment, and vice versa.
>
> If you have experience or done any testing work using pre-released
> dependencies, please let me know if you took any different approaches. It
> will be really helpful.
>
> Thanks,
> Anand
>



Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Anand Inguva via dev
@Yi Hu  I think adding them to Jenkins or github actions
is okay with me. With Github actions, since we don't use self hosted
runners yet, I worry that action workers might get queued up.

Also, I plan to not run these on every commit but run it as a cron
job(maybe once per day) and also as trigger phrases and only on the lowest
and highest python version. Also, migrating this workflow to jenkins would
be trivial in the future once beam starts the migration. For now, I think
it might be best to run on jenkins.

On Wed, Apr 12, 2023 at 1:32 PM Valentyn Tymofieiev 
wrote:

> I think case in point dependency that would benefit from this testing is
> grpcio, which includes pre-releases, and broke us and multiple of it's
> released versions were yanked. https://pypi.org/project/grpcio/#history .
>
> We can look at how grpcio affected Beam previously. Couple of issues:
>
> - https://github.com/grpc/grpc/issues/30446 -- affected XLang tests
> - https://github.com/apache/beam/issues/23734 -- affected MacOS suites
> - https://github.com/apache/beam/issues/22159 -- (not detected by us, but
> potentially could have affected a performance test).
>
> I'm afraid a dedicated suite may not give us desired test coverage to
> catch regression at RC stage.
>
> On Wed, Apr 12, 2023 at 10:19 AM Yi Hu via dev 
> wrote:
>
>> Thanks Anand,
>>
>> This would be very helpful to avoid experiencing multiple time (
>> https://s.apache.org/beam-python-dependencies-pm). One thing to note is
>> that Beam Jenkins CI is experiencing many issues recently, mostly due to
>> that multiple Jenkins plugins does not scale (draining GitHub API call
>> limit; disk usage, etc) so more PreCommit may add more pressures to Jenkins
>> if going ahead with Option 1. As we have started GitHub Action migration,
>> is it considered to add these new tests to GitHub Action?
>>
>> Best,
>> Yi
>>
>> On Wed, Apr 12, 2023 at 10:46 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
>>> the clear signal of the normal suite succeeding and the prerelease one
>>> failing would be helpful and there shouldn't be too much additional code
>>> necessary. That makes it really easy to treat the prerelease suite as a (at
>>> least temporary) signal on needing upper bounds on our dependencies.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Hi all,

 For Apache Beam Python we are considering using pre-released
 dependencies for unit testing by using the --pre flag to install
 pre-released dependencies of packages.

 We believe that using pre-released dependencies may help us to identify
 and resolve bugs more quickly, and to take advantage of new features or bug
 fixes that are not yet available in stable releases. However, we also
 understand that using pre-released dependencies may introduce new risks and
 challenges, including potential code duplication and stability issues.

 Before proceeding, we wanted to get your feedback on this approach.

 1. Create a new PreCommit test suite and a PostCommit test suite that
 runs tests by installing pre-released dependencies.

 Pros:

- stable and pre-released test suites are separate and it will be
easier to debug if the pre-released test suite fails.

 Cons:

- More test infra code to maintain. More tests to monitor.


 2. Make use of the current PreCommit and PostCommit test suite and
 modify it so that it installs pre-released dependencies.

 Pros:

- Less infra code and less tests to monitor.

 Cons:

- Leads to noisy test signals if the pre-release candidate is
unstable.

 I am in favor of approach 1 since this approach would ensure that any
 issues encountered during pre-release testing do not impact the stable
 release environment, and vice versa.

 If you have experience or done any testing work using pre-released
 dependencies, please let me know if you took any different approaches. It
 will be really helpful.

 Thanks,
 Anand

>>>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Valentyn Tymofieiev via dev
I think case in point dependency that would benefit from this testing is
grpcio, which includes pre-releases, and broke us and multiple of it's
released versions were yanked. https://pypi.org/project/grpcio/#history .

We can look at how grpcio affected Beam previously. Couple of issues:

- https://github.com/grpc/grpc/issues/30446 -- affected XLang tests
- https://github.com/apache/beam/issues/23734 -- affected MacOS suites
- https://github.com/apache/beam/issues/22159 -- (not detected by us, but
potentially could have affected a performance test).

I'm afraid a dedicated suite may not give us desired test coverage to catch
regression at RC stage.

On Wed, Apr 12, 2023 at 10:19 AM Yi Hu via dev  wrote:

> Thanks Anand,
>
> This would be very helpful to avoid experiencing multiple time (
> https://s.apache.org/beam-python-dependencies-pm). One thing to note is
> that Beam Jenkins CI is experiencing many issues recently, mostly due to
> that multiple Jenkins plugins does not scale (draining GitHub API call
> limit; disk usage, etc) so more PreCommit may add more pressures to Jenkins
> if going ahead with Option 1. As we have started GitHub Action migration,
> is it considered to add these new tests to GitHub Action?
>
> Best,
> Yi
>
> On Wed, Apr 12, 2023 at 10:46 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
>> the clear signal of the normal suite succeeding and the prerelease one
>> failing would be helpful and there shouldn't be too much additional code
>> necessary. That makes it really easy to treat the prerelease suite as a (at
>> least temporary) signal on needing upper bounds on our dependencies.
>>
>> Thanks,
>> Danny
>>
>> On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> For Apache Beam Python we are considering using pre-released
>>> dependencies for unit testing by using the --pre flag to install
>>> pre-released dependencies of packages.
>>>
>>> We believe that using pre-released dependencies may help us to identify
>>> and resolve bugs more quickly, and to take advantage of new features or bug
>>> fixes that are not yet available in stable releases. However, we also
>>> understand that using pre-released dependencies may introduce new risks and
>>> challenges, including potential code duplication and stability issues.
>>>
>>> Before proceeding, we wanted to get your feedback on this approach.
>>>
>>> 1. Create a new PreCommit test suite and a PostCommit test suite that
>>> runs tests by installing pre-released dependencies.
>>>
>>> Pros:
>>>
>>>- stable and pre-released test suites are separate and it will be
>>>easier to debug if the pre-released test suite fails.
>>>
>>> Cons:
>>>
>>>- More test infra code to maintain. More tests to monitor.
>>>
>>>
>>> 2. Make use of the current PreCommit and PostCommit test suite and
>>> modify it so that it installs pre-released dependencies.
>>>
>>> Pros:
>>>
>>>- Less infra code and less tests to monitor.
>>>
>>> Cons:
>>>
>>>- Leads to noisy test signals if the pre-release candidate is
>>>unstable.
>>>
>>> I am in favor of approach 1 since this approach would ensure that any
>>> issues encountered during pre-release testing do not impact the stable
>>> release environment, and vice versa.
>>>
>>> If you have experience or done any testing work using pre-released
>>> dependencies, please let me know if you took any different approaches. It
>>> will be really helpful.
>>>
>>> Thanks,
>>> Anand
>>>
>>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Yi Hu via dev
Thanks Anand,

This would be very helpful to avoid experiencing multiple time (
https://s.apache.org/beam-python-dependencies-pm). One thing to note is
that Beam Jenkins CI is experiencing many issues recently, mostly due to
that multiple Jenkins plugins does not scale (draining GitHub API call
limit; disk usage, etc) so more PreCommit may add more pressures to Jenkins
if going ahead with Option 1. As we have started GitHub Action migration,
is it considered to add these new tests to GitHub Action?

Best,
Yi

On Wed, Apr 12, 2023 at 10:46 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
> the clear signal of the normal suite succeeding and the prerelease one
> failing would be helpful and there shouldn't be too much additional code
> necessary. That makes it really easy to treat the prerelease suite as a (at
> least temporary) signal on needing upper bounds on our dependencies.
>
> Thanks,
> Danny
>
> On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev 
> wrote:
>
>> Hi all,
>>
>> For Apache Beam Python we are considering using pre-released dependencies
>> for unit testing by using the --pre flag to install pre-released
>> dependencies of packages.
>>
>> We believe that using pre-released dependencies may help us to identify
>> and resolve bugs more quickly, and to take advantage of new features or bug
>> fixes that are not yet available in stable releases. However, we also
>> understand that using pre-released dependencies may introduce new risks and
>> challenges, including potential code duplication and stability issues.
>>
>> Before proceeding, we wanted to get your feedback on this approach.
>>
>> 1. Create a new PreCommit test suite and a PostCommit test suite that
>> runs tests by installing pre-released dependencies.
>>
>> Pros:
>>
>>- stable and pre-released test suites are separate and it will be
>>easier to debug if the pre-released test suite fails.
>>
>> Cons:
>>
>>- More test infra code to maintain. More tests to monitor.
>>
>>
>> 2. Make use of the current PreCommit and PostCommit test suite and modify
>> it so that it installs pre-released dependencies.
>>
>> Pros:
>>
>>- Less infra code and less tests to monitor.
>>
>> Cons:
>>
>>- Leads to noisy test signals if the pre-release candidate is
>>unstable.
>>
>> I am in favor of approach 1 since this approach would ensure that any
>> issues encountered during pre-release testing do not impact the stable
>> release environment, and vice versa.
>>
>> If you have experience or done any testing work using pre-released
>> dependencies, please let me know if you took any different approaches. It
>> will be really helpful.
>>
>> Thanks,
>> Anand
>>
>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Valentyn Tymofieiev via dev
2. Make use of the current PreCommit and PostCommit test suite and modify
it so that it installs pre-released dependencies.

> Leads to noisy test signals if the pre-release candidate is unstable.

I am favor of option 2 since it's a simple solution that is easy to
implement and try out. The disadvantage rests on an assumption
that pre-released candidates would be unstable, which may not be the case.
We could try this and pivot if we find this create too much noise. @Jarek
Potiuk  - curious, from your experience with Airflow
dependency management and testing, which option do you use (if you have a
similar scenario)?

On Wed, Apr 12, 2023 at 7:45 AM Danny McCormick via dev 
wrote:

> Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
> the clear signal of the normal suite succeeding and the prerelease one
> failing would be helpful and there shouldn't be too much additional code
> necessary. That makes it really easy to treat the prerelease suite as a (at
> least temporary) signal on needing upper bounds on our dependencies.
>
> Thanks,
> Danny
>
> On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev 
> wrote:
>
>> Hi all,
>>
>> For Apache Beam Python we are considering using pre-released dependencies
>> for unit testing by using the --pre flag to install pre-released
>> dependencies of packages.
>>
>> We believe that using pre-released dependencies may help us to identify
>> and resolve bugs more quickly, and to take advantage of new features or bug
>> fixes that are not yet available in stable releases. However, we also
>> understand that using pre-released dependencies may introduce new risks and
>> challenges, including potential code duplication and stability issues.
>>
>> Before proceeding, we wanted to get your feedback on this approach.
>>
>> 1. Create a new PreCommit test suite and a PostCommit test suite that
>> runs tests by installing pre-released dependencies.
>>
>> Pros:
>>
>>- stable and pre-released test suites are separate and it will be
>>easier to debug if the pre-released test suite fails.
>>
>> Cons:
>>
>>- More test infra code to maintain. More tests to monitor.
>>
>>
>> 2. Make use of the current PreCommit and PostCommit test suite and modify
>> it so that it installs pre-released dependencies.
>>
>> Pros:
>>
>>- Less infra code and less tests to monitor.
>>
>> Cons:
>>
>>- Leads to noisy test signals if the pre-release candidate is
>>unstable.
>>
>> I am in favor of approach 1 since this approach would ensure that any
>> issues encountered during pre-release testing do not impact the stable
>> release environment, and vice versa.
>>
>> If you have experience or done any testing work using pre-released
>> dependencies, please let me know if you took any different approaches. It
>> will be really helpful.
>>
>> Thanks,
>> Anand
>>
>


Re: [Python SDK] Use pre-released dependencies for Beam python unit testing

2023-04-12 Thread Danny McCormick via dev
Thanks for doing this Anand, I'm +1 on option 1 as well - I think having
the clear signal of the normal suite succeeding and the prerelease one
failing would be helpful and there shouldn't be too much additional code
necessary. That makes it really easy to treat the prerelease suite as a (at
least temporary) signal on needing upper bounds on our dependencies.

Thanks,
Danny

On Wed, Apr 12, 2023 at 12:36 AM Anand Inguva via dev 
wrote:

> Hi all,
>
> For Apache Beam Python we are considering using pre-released dependencies
> for unit testing by using the --pre flag to install pre-released
> dependencies of packages.
>
> We believe that using pre-released dependencies may help us to identify
> and resolve bugs more quickly, and to take advantage of new features or bug
> fixes that are not yet available in stable releases. However, we also
> understand that using pre-released dependencies may introduce new risks and
> challenges, including potential code duplication and stability issues.
>
> Before proceeding, we wanted to get your feedback on this approach.
>
> 1. Create a new PreCommit test suite and a PostCommit test suite that runs
> tests by installing pre-released dependencies.
>
> Pros:
>
>- stable and pre-released test suites are separate and it will be
>easier to debug if the pre-released test suite fails.
>
> Cons:
>
>- More test infra code to maintain. More tests to monitor.
>
>
> 2. Make use of the current PreCommit and PostCommit test suite and modify
> it so that it installs pre-released dependencies.
>
> Pros:
>
>- Less infra code and less tests to monitor.
>
> Cons:
>
>- Leads to noisy test signals if the pre-release candidate is
>unstable.
>
> I am in favor of approach 1 since this approach would ensure that any
> issues encountered during pre-release testing do not impact the stable
> release environment, and vice versa.
>
> If you have experience or done any testing work using pre-released
> dependencies, please let me know if you took any different approaches. It
> will be really helpful.
>
> Thanks,
> Anand
>


Beam High Priority Issue Report (26)

2023-04-12 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/26126 [Failing Test]: 
beam_PostCommit_XVR_Samza permared validatesCrossLanguageRunnerGoUsingJava 
TestDebeziumIO_BasicRead
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId