Re: [VOTE] Release 2.46.0, release candidate #1

2023-04-28 Thread Reuven Lax via dev
Those particular errors are often expected in the sink due to the protocol
used. If a work item retries before committing (which could happen for many
reasons including worker crashes), it will experience those errors.

On Fri, Apr 28, 2023 at 12:55 PM Ahmed Abualsaud 
wrote:

> @Danny McCormick  @Reuven Lax
>  sorry it's been a while since you looked into this,
> but do you remember if the fix in #25642
>  issue is related to the
> recent "ALREADY_EXISTS: The offset is within stream, expected offset..."
>  errors?
>
> On Fri, Mar 10, 2023 at 7:47 PM Ahmet Altay via dev 
> wrote:
>
>> Thank you!
>>
>> Is there a tracking issue for this known issue? And would the known
>> issues section of the release notes link to that?
>>
>>
>> On Fri, Mar 10, 2023 at 11:38 AM Danny McCormick via dev <
>> dev@beam.apache.org> wrote:
>>
>>> We determined that the same issue exists in the 2.45 release, so we are
>>> going to continue finalizing the release candidate. Thank you for your
>>> patience.
>>>
>>> Thanks,
>>> Danny
>>>
>>> On Wed, Mar 8, 2023 at 3:15 PM Reuven Lax  wrote:
>>>
 We are trying to reproduce and debug the issue we saw to validate
 whether it was a real regression or not. Will update when we know more.

 On Wed, Mar 8, 2023 at 11:31 AM Danny McCormick <
 dannymccorm...@google.com> wrote:

>
> @Reuven Lax  found a new potential regression in
> BigQuery I/O, so I have paused the release rollout. I had already pushed
> the Python artifacts and Go tags, but not the Java ones. We have since
> temporarily yanked  the Python release
> and deleted the Go tags, they were live for around an hour. The possible
> regression is in Java, so neither of those releases should be affected, 
> but
> x-lang may not work properly because it depends on versioning. I will
> update this thread with next steps when we know more.
>
> Thanks,
> Danny
> On Wed, Mar 8, 2023 at 5:59 AM Jan Lukavský  wrote:
>
>> +1 (binding)
>>
>> Tested Java SDK with Flink and Spark 3 runner.
>>
>> Thanks,
>>  Jan
>>
>> On 3/8/23 01:53, Valentyn Tymofieiev via dev wrote:
>>
>> +1. Verified the composition of Python containers and ran Python
>> pipelines on Dataflow runner v1 and runner v2.
>>
>> On Tue, Mar 7, 2023 at 4:11 PM Ritesh Ghorse via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1 (non-binding)
>>> Validated Go SDK quickstart on direct and dataflow runner
>>>
>>> On Tue, Mar 7, 2023 at 10:54 AM Alexey Romanenko <
>>> aromanenko@gmail.com> wrote:
>>>
 +1 (binding)

 Tested with  https://github.com/Talend/beam-samples/
 (Java SDK v8/v11/v17, Spark 3.x runner).

 ---
 Alexey

 On 7 Mar 2023, at 07:38, Ahmet Altay via dev 
 wrote:

 +1 (binding) - I validated python quickstarts on direct & dataflow
 runners.

 Thank you for doing the release!

 On Sat, Mar 4, 2023 at 8:01 AM Chamikara Jayalath via dev <
 dev@beam.apache.org> wrote:

> +1 (binding)
>
> Validated multi-language Java and Python pipelines.
>
> On Fri, Mar 3, 2023 at 1:59 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > I have encountered a failure in a Python pipeline running with
>> Runner v1:
>>
>> > RuntimeError: Beam SDK base version 2.46.0 does not match
>> Dataflow Python worker version 2.45.0. Please check Dataflow worker 
>> startup
>> logs and make sure that correct version of Beam SDK is installed.
>>
>> > We should understand why Python ValidatesRunner tests (which
>> have passed)  didn't catch this error.
>>
>> > This can be remediated in Dataflow containers without  changes
>> to the release candidate.
>>
>> Good catch! I've kicked off a release to fix this, it should be
>> done later this evening - I won't be available when it completes, 
>> but I
>> would expect it to be around 5:00 PST.
>>
>> On Fri, Mar 3, 2023 at 3:49 PM Danny McCormick <
>> dannymccorm...@google.com> wrote:
>>
>>> Hey Reuven, could you provide some more context on the bug/why
>>> it is important? Does it meet the standard in
>>> https://beam.apache.org/contribute/release-guide/#7-triage-release-blocking-issues-in-github?
>>>
>>>
>>> The release branch was cut last Wednesday, so that is why it is
>>> not included.
>>>
>>
> Seems like this was a revert of a previous commit that was also
> not included in the 2.46.0 release branch (
> 

Re: [VOTE] Release 2.46.0, release candidate #1

2023-04-28 Thread Ahmed Abualsaud via dev
@Danny McCormick  @Reuven Lax
 sorry
it's been a while since you looked into this, but do you remember if the
fix in #25642  issue is related
to the recent "ALREADY_EXISTS: The offset is within stream, expected
offset..." errors?

On Fri, Mar 10, 2023 at 7:47 PM Ahmet Altay via dev 
wrote:

> Thank you!
>
> Is there a tracking issue for this known issue? And would the known issues
> section of the release notes link to that?
>
>
> On Fri, Mar 10, 2023 at 11:38 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> We determined that the same issue exists in the 2.45 release, so we are
>> going to continue finalizing the release candidate. Thank you for your
>> patience.
>>
>> Thanks,
>> Danny
>>
>> On Wed, Mar 8, 2023 at 3:15 PM Reuven Lax  wrote:
>>
>>> We are trying to reproduce and debug the issue we saw to validate
>>> whether it was a real regression or not. Will update when we know more.
>>>
>>> On Wed, Mar 8, 2023 at 11:31 AM Danny McCormick <
>>> dannymccorm...@google.com> wrote:
>>>

 @Reuven Lax  found a new potential regression in
 BigQuery I/O, so I have paused the release rollout. I had already pushed
 the Python artifacts and Go tags, but not the Java ones. We have since
 temporarily yanked  the Python release
 and deleted the Go tags, they were live for around an hour. The possible
 regression is in Java, so neither of those releases should be affected, but
 x-lang may not work properly because it depends on versioning. I will
 update this thread with next steps when we know more.

 Thanks,
 Danny
 On Wed, Mar 8, 2023 at 5:59 AM Jan Lukavský  wrote:

> +1 (binding)
>
> Tested Java SDK with Flink and Spark 3 runner.
>
> Thanks,
>  Jan
>
> On 3/8/23 01:53, Valentyn Tymofieiev via dev wrote:
>
> +1. Verified the composition of Python containers and ran Python
> pipelines on Dataflow runner v1 and runner v2.
>
> On Tue, Mar 7, 2023 at 4:11 PM Ritesh Ghorse via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (non-binding)
>> Validated Go SDK quickstart on direct and dataflow runner
>>
>> On Tue, Mar 7, 2023 at 10:54 AM Alexey Romanenko <
>> aromanenko@gmail.com> wrote:
>>
>>> +1 (binding)
>>>
>>> Tested with  https://github.com/Talend/beam-samples/
>>> (Java SDK v8/v11/v17, Spark 3.x runner).
>>>
>>> ---
>>> Alexey
>>>
>>> On 7 Mar 2023, at 07:38, Ahmet Altay via dev 
>>> wrote:
>>>
>>> +1 (binding) - I validated python quickstarts on direct & dataflow
>>> runners.
>>>
>>> Thank you for doing the release!
>>>
>>> On Sat, Mar 4, 2023 at 8:01 AM Chamikara Jayalath via dev <
>>> dev@beam.apache.org> wrote:
>>>
 +1 (binding)

 Validated multi-language Java and Python pipelines.

 On Fri, Mar 3, 2023 at 1:59 PM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> > I have encountered a failure in a Python pipeline running with
> Runner v1:
>
> > RuntimeError: Beam SDK base version 2.46.0 does not match
> Dataflow Python worker version 2.45.0. Please check Dataflow worker 
> startup
> logs and make sure that correct version of Beam SDK is installed.
>
> > We should understand why Python ValidatesRunner tests (which
> have passed)  didn't catch this error.
>
> > This can be remediated in Dataflow containers without  changes
> to the release candidate.
>
> Good catch! I've kicked off a release to fix this, it should be
> done later this evening - I won't be available when it completes, but 
> I
> would expect it to be around 5:00 PST.
>
> On Fri, Mar 3, 2023 at 3:49 PM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> Hey Reuven, could you provide some more context on the bug/why it
>> is important? Does it meet the standard in
>> https://beam.apache.org/contribute/release-guide/#7-triage-release-blocking-issues-in-github?
>>
>>
>> The release branch was cut last Wednesday, so that is why it is
>> not included.
>>
>
 Seems like this was a revert of a previous commit that was also not
 included in the 2.46.0 release branch (
 https://github.com/apache/beam/pull/25627) ?

 If so we might not need a new RC but good to confirm.

 Thanks,
 Cham


>> On Fri, Mar 3, 2023 at 3:24 PM Reuven Lax 
>> wrote:
>>
>>> If possible, I would like to see if we could include
>>> https://github.com/apache/beam/pull/25642 as we believe this
>>> bug has been impacting multiple users. This was 

Re: Starter projects for Beam

2023-04-28 Thread Tariq Hasan
Hello Svetak,

Thanks for the suggestion.

I will look into them.

Sincerely,

Tariq Hasan

On Fri, Apr 28, 2023 at 2:21 PM Svetak Sundhar via dev 
wrote:

> Hi Tariq,
>
> Thanks for your interest! A good starting point are good first issues:
> https://github.com/apache/beam/labels/good%20first%20issue?page=2=is%3Aopen+label%3A%22good+first+issue%22
> .
>
> Feel free to assign an issue to yourself and put up a PR/ask any needed
> questions when ready.
>
> Thanks,
>
>
> Svetak Sundhar
>
>   Technical Solutions Engineer, Data
> s vetaksund...@google.com
>
>
>
> On Fri, Apr 28, 2023 at 2:17 PM Tariq Hasan 
> wrote:
>
>> Hello,
>>
>> I am reaching out as a new entrant into the Apache Beam project.
>>
>> As a developer with a few years of experience, I was looking to grow my
>> passion around software development through open-source contributions.
>>
>> With Apache Beam, I am quite interested in working across multiple areas,
>> including but not limited to Java and Python SDKs and the various runners
>> and transforms on the roadmap.
>>
>> I was reaching out here for some guidance with regards to starter
>> projects that could be a viable starting point.
>>
>> If anyone can offer suggestions on possible scope to contribute to the
>> project and resources to get going, that would be very helpful.
>>
>> Sincerely,
>>
>> Tariq Hasan
>>
>>


Re: Starter projects for Beam

2023-04-28 Thread Svetak Sundhar via dev
Hi Tariq,

Thanks for your interest! A good starting point are good first issues:
https://github.com/apache/beam/labels/good%20first%20issue?page=2=is%3Aopen+label%3A%22good+first+issue%22
.

Feel free to assign an issue to yourself and put up a PR/ask any needed
questions when ready.

Thanks,


Svetak Sundhar

  Technical Solutions Engineer, Data
s vetaksund...@google.com



On Fri, Apr 28, 2023 at 2:17 PM Tariq Hasan  wrote:

> Hello,
>
> I am reaching out as a new entrant into the Apache Beam project.
>
> As a developer with a few years of experience, I was looking to grow my
> passion around software development through open-source contributions.
>
> With Apache Beam, I am quite interested in working across multiple areas,
> including but not limited to Java and Python SDKs and the various runners
> and transforms on the roadmap.
>
> I was reaching out here for some guidance with regards to starter projects
> that could be a viable starting point.
>
> If anyone can offer suggestions on possible scope to contribute to the
> project and resources to get going, that would be very helpful.
>
> Sincerely,
>
> Tariq Hasan
>
>


Starter projects for Beam

2023-04-28 Thread Tariq Hasan
Hello,

I am reaching out as a new entrant into the Apache Beam project.

As a developer with a few years of experience, I was looking to grow my
passion around software development through open-source contributions.

With Apache Beam, I am quite interested in working across multiple areas,
including but not limited to Java and Python SDKs and the various runners
and transforms on the roadmap.

I was reaching out here for some guidance with regards to starter projects
that could be a viable starting point.

If anyone can offer suggestions on possible scope to contribute to the
project and resources to get going, that would be very helpful.

Sincerely,

Tariq Hasan


Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-28 Thread Jack McCluskey via dev
The move to support and require protobuf v4 in the python SDK has
inadvertently broken downstream users who depend on Beam, who have asked if
that requirement can be lowered to also allow protobuf v3. Further
investigation has determined that the lower bounds of this requirement can
be relaxed without any problems. As a result, we will be building an RC2 to
resolve this issue and unblock users. This vote is closed, and I'll send a
new vote out once RC2 is available.

On Fri, Apr 28, 2023 at 10:52 AM Alexey Romanenko 
wrote:

> +1 (binding)
>
> Tested with  https://github.com/Talend/beam-samples/
> (Java SDK v8/v11/v17, Spark 3.x runner).
>
> ---
> Alexey
>
> On 28 Apr 2023, at 16:06, Jack McCluskey via dev 
> wrote:
>
> There was a GCP outage that impacted pushing containers to GCR, I expected
> it to impact Java containers specifically but it looks like it also
> affected Python containers. I believe the situation is resolved and I can
> get the containers pushed now, if that continues to be an issue I'll follow
> up.
>
> On Thu, Apr 27, 2023 at 7:21 PM Chamikara Jayalath 
> wrote:
>
>> I tried to run a Java multi-lang pipeline and it's failing due to the
>> following error during worker setup.
>>
>> Error syncing pod, skipping" err="failed to \"StartContainer\" for
>> \"sdk-1-0\" with ImagePullBackOff: \"Back-off pulling image \\\"
>> gcr.io/cloud-dataflow/v1beta3/beam_python3.8_sdk:2.47.0\\\
>> "\""
>> pod="default/df-runinferenceexample-chami-04271607-gwf8-harness-vj8w"
>> podUID=37d8de0a068391920b98dce559c4886f
>>
>> Are these containers not available yet to test Dataflow ?
>>
>> Thanks,
>> Cham
>>
>> On Thu, Apr 27, 2023 at 2:17 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> The artifacts and signatures all look good, and I validated a couple of
>>> Python pipelines in a fresh install.
>>>
>>> Assuming all the tests (including the Dataflow ones) pass (modulo the
>>> two mentioned above; seems a fair justification to not block on those)
>>> I'm +1 (binding) on this release.
>>>
>>> On Wed, Apr 26, 2023 at 12:39 PM Jack McCluskey via dev <
>>> dev@beam.apache.org> wrote:
>>>
 There's also a good chance that newer test suites haven't been included
 in mass_comment.py (
 https://github.com/apache/beam/blob/master/release/src/main/scripts/mass_comment.py)
 and as a result they were not executed.

 On Wed, Apr 26, 2023 at 3:29 PM Jack McCluskey 
 wrote:

> The Dataflow CrossLanguageValidatesRunner GoUsingJava Tests have been
> broken for quite some time (
> https://github.com/apache/beam/issues/21645) and the Kafka issue is
> tied to a test timeout that John Casey has fixed but didn't get
> cherrypicked (just fell through the cracks while waiting on tests to pass,
> but conversations with them led to the conclusion that we would just get 
> it
> into an RC2 if necessary since it's a matter of how the tests run not how
> the code under test functions.)
>
> The tests still marked "pending" passed but did not get updated on the
> GitHub side from when Jenkins was straining under load, I'm guessing those
> builds have since been deleted under our new retention policy to
> alleviate the OOM Jenkins issues. I will try to re-run those for the sake
> of having clear and obvious results.
>
> On Wed, Apr 26, 2023 at 3:23 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> Thanks, Jack!
>>
>> re [12]:
>>
>> I am seeing some test errors - have they been investigated?
>> Also, did all test suites run? I think I am not seeing output of some
>> of the suites, like
>> Run Python Dataflow V2 ValidatesRunner
>>
>>
>> On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #3 for the version
>>> 1.2.3, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>>
>>> Reviewers are encouraged to test their own use cases with the
>>> release candidate, and vote +1 if no issues are found.
>>>
>>> The complete staging area is available for your review, which
>>> includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to
>>> dist.apache.org [2], which is signed with the key with fingerprint
>>> DF3CBA4F3F4199F4 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.47.0-RC1" [5],
>>> * website pull request listing the release [6], the blog post [6],
>>> and publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
>>> 8.0.322.
>>> * Python artifacts are deployed along with 

Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-28 Thread Alexey Romanenko
+1 (binding)

Tested with  https://github.com/Talend/beam-samples/ 
(Java SDK v8/v11/v17, Spark 3.x runner).

---
Alexey

> On 28 Apr 2023, at 16:06, Jack McCluskey via dev  wrote:
> 
> There was a GCP outage that impacted pushing containers to GCR, I expected it 
> to impact Java containers specifically but it looks like it also affected 
> Python containers. I believe the situation is resolved and I can get the 
> containers pushed now, if that continues to be an issue I'll follow up. 
> 
> On Thu, Apr 27, 2023 at 7:21 PM Chamikara Jayalath  > wrote:
>> I tried to run a Java multi-lang pipeline and it's failing due to the 
>> following error during worker setup.
>> 
>> Error syncing pod, skipping" err="failed to \"StartContainer\" for 
>> \"sdk-1-0\" with ImagePullBackOff: \"Back-off pulling image 
>> \\\"gcr.io/cloud-dataflow/v1beta3/beam_python3.8_sdk:2.47.0\\\ 
>> "\""
>>  pod="default/df-runinferenceexample-chami-04271607-gwf8-harness-vj8w" 
>> podUID=37d8de0a068391920b98dce559c4886f
>> 
>> Are these containers not available yet to test Dataflow ?
>> 
>> Thanks,
>> Cham
>> 
>> On Thu, Apr 27, 2023 at 2:17 PM Robert Bradshaw via dev > > wrote:
>>> The artifacts and signatures all look good, and I validated a couple of 
>>> Python pipelines in a fresh install. 
>>> 
>>> Assuming all the tests (including the Dataflow ones) pass (modulo the two 
>>> mentioned above; seems a fair justification to not block on those) I'm +1 
>>> (binding) on this release. 
>>> 
>>> On Wed, Apr 26, 2023 at 12:39 PM Jack McCluskey via dev 
>>> mailto:dev@beam.apache.org>> wrote:
 There's also a good chance that newer test suites haven't been included in 
 mass_comment.py 
 (https://github.com/apache/beam/blob/master/release/src/main/scripts/mass_comment.py)
  and as a result they were not executed. 
 
 On Wed, Apr 26, 2023 at 3:29 PM Jack McCluskey >>> > wrote:
> The Dataflow CrossLanguageValidatesRunner GoUsingJava Tests have been 
> broken for quite some time (https://github.com/apache/beam/issues/21645) 
> and the Kafka issue is tied to a test timeout that John Casey has fixed 
> but didn't get cherrypicked (just fell through the cracks while waiting 
> on tests to pass, but conversations with them led to the conclusion that 
> we would just get it into an RC2 if necessary since it's a matter of how 
> the tests run not how the code under test functions.) 
> 
> The tests still marked "pending" passed but did not get updated on the 
> GitHub side from when Jenkins was straining under load, I'm guessing 
> those builds have since been deleted under our new retention policy to 
> alleviate the OOM Jenkins issues. I will try to re-run those for the sake 
> of having clear and obvious results.
> 
> On Wed, Apr 26, 2023 at 3:23 PM Valentyn Tymofieiev  > wrote:
>> Thanks, Jack!
>> 
>> re [12]: 
>> 
>> I am seeing some test errors - have they been investigated?
>> Also, did all test suites run? I think I am not seeing output of some of 
>> the suites, like 
>> Run Python Dataflow V2 ValidatesRunner
>> 
>> 
>> On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev 
>> mailto:dev@beam.apache.org>> wrote:
>>> Hi everyone,
>>> 
>>> Please review and vote on the release candidate #3 for the version 
>>> 1.2.3, as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> Reviewers are encouraged to test their own use cases with the release 
>>> candidate, and vote +1 if no issues are found.
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * GitHub Release notes [1],
>>> * the official Apache source release to be deployed to dist.apache.org 
>>>  [2], which is signed with the key with 
>>> fingerprint DF3CBA4F3F4199F4 [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "v2.47.0-RC1" [5],
>>> * website pull request listing the release [6], the blog post [6], and 
>>> publishing the API reference manual [7].
>>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK 
>>> 8.0.322.
>>> * Python artifacts are deployed along with the source release to the 
>>> dist.apache.org  [2] and PyPI[8].
>>> * Go artifacts and documentation are available at pkg.go.dev 
>>>  [9]
>>> * Validation sheet with a tab for 2.47.0 release to help with 
>>> validation [10].
>>> * Docker images published to Docker Hub [11].
>>> * PR to run tests against release branch [12].
>>> 
>>> The vote 

Re: [VOTE] Release 2.47.0, release candidate #1

2023-04-28 Thread Jack McCluskey via dev
There was a GCP outage that impacted pushing containers to GCR, I expected
it to impact Java containers specifically but it looks like it also
affected Python containers. I believe the situation is resolved and I can
get the containers pushed now, if that continues to be an issue I'll follow
up.

On Thu, Apr 27, 2023 at 7:21 PM Chamikara Jayalath 
wrote:

> I tried to run a Java multi-lang pipeline and it's failing due to the
> following error during worker setup.
>
> Error syncing pod, skipping" err="failed to \"StartContainer\" for
> \"sdk-1-0\" with ImagePullBackOff: \"Back-off pulling image \\\"
> gcr.io/cloud-dataflow/v1beta3/beam_python3.8_sdk:2.47.0\\\
> "\""
> pod="default/df-runinferenceexample-chami-04271607-gwf8-harness-vj8w"
> podUID=37d8de0a068391920b98dce559c4886f
>
> Are these containers not available yet to test Dataflow ?
>
> Thanks,
> Cham
>
> On Thu, Apr 27, 2023 at 2:17 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> The artifacts and signatures all look good, and I validated a couple of
>> Python pipelines in a fresh install.
>>
>> Assuming all the tests (including the Dataflow ones) pass (modulo the two
>> mentioned above; seems a fair justification to not block on those) I'm +1
>> (binding) on this release.
>>
>> On Wed, Apr 26, 2023 at 12:39 PM Jack McCluskey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> There's also a good chance that newer test suites haven't been included
>>> in mass_comment.py (
>>> https://github.com/apache/beam/blob/master/release/src/main/scripts/mass_comment.py)
>>> and as a result they were not executed.
>>>
>>> On Wed, Apr 26, 2023 at 3:29 PM Jack McCluskey 
>>> wrote:
>>>
 The Dataflow CrossLanguageValidatesRunner GoUsingJava Tests have been
 broken for quite some time (https://github.com/apache/beam/issues/21645)
 and the Kafka issue is tied to a test timeout that John Casey has fixed but
 didn't get cherrypicked (just fell through the cracks while waiting on
 tests to pass, but conversations with them led to the conclusion that we
 would just get it into an RC2 if necessary since it's a matter of how the
 tests run not how the code under test functions.)

 The tests still marked "pending" passed but did not get updated on the
 GitHub side from when Jenkins was straining under load, I'm guessing those
 builds have since been deleted under our new retention policy to
 alleviate the OOM Jenkins issues. I will try to re-run those for the sake
 of having clear and obvious results.

 On Wed, Apr 26, 2023 at 3:23 PM Valentyn Tymofieiev <
 valen...@google.com> wrote:

> Thanks, Jack!
>
> re [12]:
>
> I am seeing some test errors - have they been investigated?
> Also, did all test suites run? I think I am not seeing output of some
> of the suites, like
>
> Run Python Dataflow V2 ValidatesRunner
>
>
>
> On Wed, Apr 26, 2023 at 9:14 PM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> Hi everyone,
>>
>> Please review and vote on the release candidate #3 for the version
>> 1.2.3, as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>> Reviewers are encouraged to test their own use cases with the release
>> candidate, and vote +1 if no issues are found.
>>
>> The complete staging area is available for your review, which
>> includes:
>> * GitHub Release notes [1],
>> * the official Apache source release to be deployed to
>> dist.apache.org [2], which is signed with the key with fingerprint
>> DF3CBA4F3F4199F4 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v2.47.0-RC1" [5],
>> * website pull request listing the release [6], the blog post [6],
>> and publishing the API reference manual [7].
>> * Java artifacts were built with Gradle 7.5.1 and OpenJDK/Oracle JDK
>> 8.0.322.
>> * Python artifacts are deployed along with the source release to the
>> dist.apache.org [2] and PyPI[8].
>> * Go artifacts and documentation are available at pkg.go.dev [9]
>> * Validation sheet with a tab for 2.47.0 release to help with
>> validation [10].
>> * Docker images published to Docker Hub [11].
>> * PR to run tests against release branch [12].
>>
>> The vote will be open for at least 72 hours. It is adopted by
>> majority approval, with at least 3 PMC affirmative votes.
>>
>> For guidelines on how to try the release in your projects, check out
>> our blog post at /blog/validate-beam-release/.
>>
>> *Note: Dataflow containers for Java are still being finalized. I will
>> follow up once that is completed; however, this should not block 
>> validation
>> for other SDKs and runners. *
>>

Beam High Priority Issue Report (30)

2023-04-28 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/26458 [Bug]: Error installing Beam in 
Python 3.11
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26126 [Failing Test]: 
beam_PostCommit_XVR_Samza permared validatesCrossLanguageRunnerGoUsingJava 
TestDebeziumIO_BasicRead
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.




Re: Regarding Project proposal review and feedback

2023-04-28 Thread Siddharth Aryan
Hello Jeff,
Thank you for the idea, as it will allow beam users to write sql
queries using the Beam SQL API and execute them on the Flink Table API.I
will look into it later as my current focus is to implement an integration
between Apache Beam and the Flink DataStream API. While the existing Flink
runner is based on DataStream and Operator API, my project aims to create a
new runner that specifically utilizes the Flink DataStream API.
And thanks for the feedback.

Best Regards,
Siddharth Aryan

On Thu, Apr 27, 2023 at 1:39 PM Jeff Zhang  wrote:

> Same question as David,  one idea in my mind is to integrate the beam sql
> api with flink table api, this does not exist in the current flink runner.
>
> On Thu, Apr 27, 2023 at 3:46 PM David Morávek  wrote:
>
>> Hi Siddharth,
>>
>> Thanks for your interest in the Flink Runner for Beam. Reading through
>> the project, one thing that immediately strikes me is that there already is
>> a Flink runner based on DataStream and Operator (one level below
>> DataStream) API in the code base. Are you aware of this? If yes, how does
>> the runner you want to introduce differ from the existing one?
>>
>> Best,
>> D.
>>
>> On Sun, Apr 2, 2023 at 9:41 PM Svetak Sundhar via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi Siddharth,
>>> I left some comments as well on the sentiment analysis proposal.
>>>
>>> Thanks,
>>>
>>>
>>> Svetak Sundhar
>>>
>>>   Technical Solutions Engineer, Data
>>> s vetaksund...@google.com
>>>
>>>
>>>
>>> On Sun, Apr 2, 2023 at 1:58 PM Anand Inguva via dev 
>>> wrote:
>>>
 I left some comments on the sentiment analysis proposal.

 Thanks,
 Anand

 On Thu, Mar 30, 2023 at 9:59 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Thanks Siddharth! I left some comments on the sentiment analysis
> proposal, I am probably not the best person to comment on the flink
> datastream api one though.
>
> Thanks,
> Danny
>
> On Fri, Mar 24, 2023 at 11:53 PM Siddharth Aryan <
> siddhartharyan...@gmail.com> wrote:
>
>> Hello ,
>> I am Siddharth Aryan a undergrad and I am looking forward to someone
>> who can help me reviewing my proposal and give me a feedback on the them
>> which help me to create a good proposal.
>> Here ,I am attaching my both the project proposals:
>> >Sentimental Analysis Pipeline with the help of Machine Learnig:
>>
>> https://docs.google.com/document/d/1U6zcXAWsDCrWlbf14f5VlLqPZFucwXR48tD7mrERW-g/edit?usp=sharing
>>
>> >Integrating Apache Beam with Flink Datastream API:
>>
>> https://docs.google.com/document/d/1sQEe9eVuoHX9QWS9Zj5wVl7MLmfk7QO09pjZOsk-TFY/edit?usp=sharing
>>
>> Best Regards
>> Siddharth Aryan
>>
>> Github :https://github.com/nervoussidd
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Regarding Project proposal review and feedback

2023-04-28 Thread Siddharth Aryan
Hello David,
First of all thanks for the reply .Now answering your question I would like
to state that Yes, I am aware that there is an existing Flink runner based
on the DataStream and Operator API in the Beam codebase. However, the
runner I propose to develop will differ from the existing one in a few
significant ways.

Firstly, the existing runner uses Flink's lower-level Operator API, which
provides more control over the execution of the pipeline but requires more
manual configuration. In contrast, the runner I propose to develop will use
Flink's higher-level DataStream API, which provides a more user-friendly
and streamlined way to define streaming data processing pipelines. This
will make it easier for Beam users to write and execute streaming pipelines
using Flink.

Secondly, the existing runner does not support Beam's windowing and
triggering semantics in the Flink DataStream API, which can be a
significant limitation for users who want to take advantage of these
features. In contrast, the runner I propose to develop will provide support
for these features, enabling Beam users to write streaming pipelines that
use Beam's windowing and triggering semantics and execute them on the Flink
streaming runtime.

Overall, the runner I propose to develop will provide a more user-friendly
and feature-rich integration between Beam and Flink, making it easier for
Beam users to take advantage of Flink's high-performance capabilities for
data processing.

Hope you have found all the answers in the above lines.

Best Regards,
Siddharth Aryan.

On Thu, Apr 27, 2023, 1:39 PM Jeff Zhang  wrote:

> Same question as David,  one idea in my mind is to integrate the beam sql
> api with flink table api, this does not exist in the current flink runner.
>
> On Thu, Apr 27, 2023 at 3:46 PM David Morávek  wrote:
>
>> Hi Siddharth,
>>
>> Thanks for your interest in the Flink Runner for Beam. Reading through
>> the project, one thing that immediately strikes me is that there already is
>> a Flink runner based on DataStream and Operator (one level below
>> DataStream) API in the code base. Are you aware of this? If yes, how does
>> the runner you want to introduce differ from the existing one?
>>
>> Best,
>> D.
>>
>> On Sun, Apr 2, 2023 at 9:41 PM Svetak Sundhar via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi Siddharth,
>>> I left some comments as well on the sentiment analysis proposal.
>>>
>>> Thanks,
>>>
>>>
>>> Svetak Sundhar
>>>
>>>   Technical Solutions Engineer, Data
>>> s vetaksund...@google.com
>>>
>>>
>>>
>>> On Sun, Apr 2, 2023 at 1:58 PM Anand Inguva via dev 
>>> wrote:
>>>
 I left some comments on the sentiment analysis proposal.

 Thanks,
 Anand

 On Thu, Mar 30, 2023 at 9:59 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Thanks Siddharth! I left some comments on the sentiment analysis
> proposal, I am probably not the best person to comment on the flink
> datastream api one though.
>
> Thanks,
> Danny
>
> On Fri, Mar 24, 2023 at 11:53 PM Siddharth Aryan <
> siddhartharyan...@gmail.com> wrote:
>
>> Hello ,
>> I am Siddharth Aryan a undergrad and I am looking forward to someone
>> who can help me reviewing my proposal and give me a feedback on the them
>> which help me to create a good proposal.
>> Here ,I am attaching my both the project proposals:
>> >Sentimental Analysis Pipeline with the help of Machine Learnig:
>>
>> https://docs.google.com/document/d/1U6zcXAWsDCrWlbf14f5VlLqPZFucwXR48tD7mrERW-g/edit?usp=sharing
>>
>> >Integrating Apache Beam with Flink Datastream API:
>>
>> https://docs.google.com/document/d/1sQEe9eVuoHX9QWS9Zj5wVl7MLmfk7QO09pjZOsk-TFY/edit?usp=sharing
>>
>> Best Regards
>> Siddharth Aryan
>>
>> Github :https://github.com/nervoussidd
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>