Beam High Priority Issue Report (34)

2023-03-06 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/25675 [Bug]: Reenable 
GroupIntoBatchesTest.testWithShardedKeyInGlobalWindow: causes dataflow suite to 
be permared
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23848 Support for Python 3.11
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId




Re: new contributor messaging: behaviorbot/welcome

2023-03-06 Thread Austin Bennett
Nudge on https://github.com/apache/beam/pull/25586 ...

Can a PMC member install the bot [ or work with infra to make that happen,
ex: via https://github.com/apps/welcome/installations/new ]?  I'd be happy
to, but do not believe I have those permissions - do advise if I should
message/create-tickets and copy any individual from PMC specifically.  Once
that's done, we can merge the code for the bot to be configured - imagining
that is a better second step, so we do not have code in the codebase that
doesn't do anything.


On Tue, Feb 21, 2023 at 8:42 PM Austin Bennett  wrote:

> A PR: https://github.com/apache/beam/pull/25586
>
> text could likely be improved ( open to suggestions/changes ), but this
> captures at least the intent.
>
> For this to work, we need to install the bot as also mentioned in the PR.
>
>
>
> On Tue, Feb 21, 2023 at 6:02 PM Robert Burke  wrote:
>
>> I agree that the bot is better than nothing at all.
>>
>> +1 to getting a PR with messaging out for review.
>>
>> On Tue, Feb 21, 2023, 5:29 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> FWIW, I'm generally in favor of such a bot. I think it really boils
>>> down to a concrete proposal of what the content (and triggers) would
>>> be.
>>>
>>> On Tue, Feb 21, 2023 at 1:36 PM Austin Bennett
>>>  wrote:
>>> >
>>> > It is fantastic if generally able to address welcoming newcomers
>>> manually [ @Robert Burke ! ] .  Community communication, human connection [
>>> ex: community > code ] ideal!!  In this particular case, I imagine
>>> automation does not contradict - nor detract from - the manual/human touch.
>>> >
>>> > As shared, the very specific use case I had in mind was to support -->
>>> https://news.apache.org/foundation/entry/the-asf-launches-firstasfcontribution-campaign
>>> ...  I wanted to send a message thanking for someone's first PR merge, and
>>> encourage them to fill out the form ( while that campaign is active.  In
>>> that case, I did imagine a static [ meaning hardcoded, non-changing ]
>>> message that prompts them at the moment that they make their real first
>>> code contribution [ as it gets merged ], since that would be most relevant
>>> and immediate feedback.
>>> >
>>> > If we think overkill, no problem either.  If an issue with choosing to
>>> use a bot, vs a GH action - I can also spend time to create a custom GH
>>> Action that accommodates that.  But, that might not be worthwhile if the
>>> discussed use case isn't functionality we even want as part of the project.
>>> >
>>> > On Tue, Feb 21, 2023 at 12:28 PM Robert Bradshaw 
>>> wrote:
>>> >>
>>> >> On Tue, Feb 21, 2023 at 10:59 AM Kenneth Knowles 
>>> wrote:
>>> >> >
>>> >> > Agree with Robert here. The human connection is important. Can we
>>> have a behaviorbot that reminds the reviewer to be extra welcoming up
>>> front, and then thankful afterwards, instead? :-)
>>> >>
>>> >> +1
>>> >>
>>> >> > That said, a bot comment would at least state our intention of
>>> being welcoming and grateful, even if we then do not live up to it
>>> perfectly. It isn't very different than having it in the PR template or
>>> https://beam.apache.org/contribute/ or CONTRIBUTING.md which GitHub
>>> presents to first time contributors. I tend to favor static text that can
>>> be referred to over dynamic text posted by code in special circumstances.
>>> But I think hitting this from all angles, for different sorts of people in
>>> the world, is fine, if the maintenance burden is very low (which it appears
>>> to be)
>>> >>
>>> >> I think the primary value in such a bot is to set expectations/inform
>>> >> the contributor of something they might not know but is relevant to
>>> >> their action. Otherwise, I am more in favor of static text somewhere
>>> >> they're sure to encounter it (and there are benefits to doing it
>>> >> before they create a PR, e.g. as part of a template, rather than
>>> >> after).
>>> >>
>>> >>
>>> >> > On Tue, Feb 21, 2023 at 10:01 AM Robert Burke 
>>> wrote:
>>> >> >>
>>> >> >> I can't speak for all committers but I'm always aware when it's
>>> someone's first time contributing to beam (the First Time Contributor badge
>>> is instrumental here), and manually thank them and welcome them to Beam.
>>> >> >>
>>> >> >> Seems more meaningful for the merging comitter to do it rather
>>> than an automated process.
>>> >> >>
>>> >> >> Maybe i just have bad experiences with automated phone trees
>>> >> >>
>>> >> >> On Tue, Feb 21, 2023, 9:02 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>> >> >>>
>>> >> >>> If the merge message is a key part of this then I'm fine using
>>> behaviorbot (though I think a PMC member would need to install it, I don't
>>> have the right permission set).
>>> >> >>>
>>> >> >>> > I'd also be happy to leverage first-interaction for everything
>>> it can do, and only use welcome-bot for the things that aren't met
>>> elsewhere [ also happy to eventually remove welcome-bot, ex: after that ASF
>>> campaign or 

Re: [VOTE] Release 2.46.0, release candidate #1

2023-03-06 Thread Ahmet Altay via dev
+1 (binding) - I validated python quickstarts on direct & dataflow runners.

Thank you for doing the release!

On Sat, Mar 4, 2023 at 8:01 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:

> +1 (binding)
>
> Validated multi-language Java and Python pipelines.
>
> On Fri, Mar 3, 2023 at 1:59 PM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> > I have encountered a failure in a Python pipeline running with Runner
>> v1:
>>
>> > RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow
>> Python worker version 2.45.0. Please check Dataflow worker startup logs and
>> make sure that correct version of Beam SDK is installed.
>>
>> > We should understand why Python ValidatesRunner tests (which have
>> passed)  didn't catch this error.
>>
>> > This can be remediated in Dataflow containers without  changes to the
>> release candidate.
>>
>> Good catch! I've kicked off a release to fix this, it should be done
>> later this evening - I won't be available when it completes, but I would
>> expect it to be around 5:00 PST.
>>
>> On Fri, Mar 3, 2023 at 3:49 PM Danny McCormick 
>> wrote:
>>
>>> Hey Reuven, could you provide some more context on the bug/why it is
>>> important? Does it meet the standard in
>>> https://beam.apache.org/contribute/release-guide/#7-triage-release-blocking-issues-in-github
>>> ?
>>>
>>> The release branch was cut last Wednesday, so that is why it is not
>>> included.
>>>
>>
> Seems like this was a revert of a previous commit that was also not
> included in the 2.46.0 release branch (
> https://github.com/apache/beam/pull/25627) ?
>
> If so we might not need a new RC but good to confirm.
>
> Thanks,
> Cham
>
>
>>> On Fri, Mar 3, 2023 at 3:24 PM Reuven Lax  wrote:
>>>
 If possible, I would like to see if we could include
 https://github.com/apache/beam/pull/25642 as we believe this bug has
 been impacting multiple users. This was merged 4 days ago, but this RC cut
 does not seem to include it.

 On Fri, Mar 3, 2023 at 12:18 PM Valentyn Tymofieiev via dev <
 dev@beam.apache.org> wrote:

> I have encountered a failure in a Python pipeline running with Runner
> v1:
>
> RuntimeError: Beam SDK base version 2.46.0 does not match Dataflow
> Python worker version 2.45.0. Please check Dataflow worker startup logs 
> and
> make sure that correct version of Beam SDK is installed.
>
> We should understand why Python ValidatesRunner tests (which have
> passed)  didn't catch this error.
>
> This can be remediated in Dataflow containers without  changes to the
> release candidate.
>
> On Fri, Mar 3, 2023 at 11:22 AM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> +1 (binding).
>>
>> I verified that the artifacts and signatures all look good, all the
>> containers are pushed, and tested some pipelines with a fresh install
>> from one of the Python wheels.
>>
>> On Fri, Mar 3, 2023 at 11:13 AM Danny McCormick
>>  wrote:
>> >
>> > > The released artifacts seem to be missing the last commit at
>> > >
>> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
>> > > . Is this essential to the release, or just useful for validating
>> it?
>> >
>> > It's strictly a test infrastructure change, it has no functional
>> impact. For context, the changes included were from
>> https://github.com/apache/beam/pull/25661 and
>> https://github.com/apache/beam/pull/25654, both were keeping
>> integration tests from running correctly.
>>
>> Thanks.
>>
>> > On Fri, Mar 3, 2023 at 2:09 PM Robert Bradshaw 
>> wrote:
>> >>
>> >> The released artifacts seem to be missing the last commit at
>> >>
>> https://github.com/apache/beam/commit/c528eab18b32342daed53b750fe330d30c7e5224
>> >> . Is this essential to the release, or just useful for validating
>> it?
>> >>
>> >> On Fri, Mar 3, 2023 at 11:02 AM Danny McCormick
>> >>  wrote:
>> >> >
>> >> > Thanks for calling that out, and thanks for helping me fix it!
>> We should be all set now
>> >> >
>> >> > On Fri, Mar 3, 2023 at 1:38 PM Robert Bradshaw <
>> rober...@google.com> wrote:
>> >> >>
>> >> >> It appears your public key is not published in
>> >> >> https://dist.apache.org/repos/dist/release/beam/KEYS .
>> >> >>
>> >> >> On Fri, Mar 3, 2023 at 8:33 AM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>> >> >> >
>> >> >> > +1 (non-binding)
>> >> >> > Tested python wordcount quick start
>> https://beam.apache.org/get-started/quickstart-py/ on Direct Runner
>> and Dataflow Runner.
>> >> >> >
>> >> >> > Thanks!
>> >> >> >
>> >> >> > On Fri, Mar 3, 2023 at 11:21 AM Bruno Volpato via dev <
>> dev@beam.apache.org> wrote:
>> >> >> >>
>> >> >> >> +1 (non-binding)
>> >> >> >>
>> >> >> >> Test