Re: [PROPOSAL] Preparing for 2.49.0 Release

2023-06-30 Thread Yi Hu via dev
Hi everyone,

The release branch for 2.49.0 has been cut. There is currently no
outstanding change that needs to be cherry-picked, and no open milestone
[1]. The verification on the release branch is done on [2]. There were a
few postcommit failures and flaky tests, all known issues identified as not
release blocking (see the comment in that PR for more detail).

As such, I plan to move ahead to building an RC1.

Thanks,
Yi Hu

[1] https://github.com/apache/beam/milestone/13
[2] https://github.com/apache/beam/pull/27307


On Mon, Jun 26, 2023 at 4:34 PM Yi Hu  wrote:

> Hi,
>
> As for preparing the release, could one of the owner of apache-beam
> project in PyPi please add me as a maintainer?
>
> Username: abacn
> email: y...@apache.org
>
> Best,
> Yi
>
> On Thu, Jun 15, 2023 at 10:44 AM Yi Hu  wrote:
>
>> Hey Beam community,
>>
>> The next release (2.49.0) branch cut is scheduled on June 28th, 2023,
>> according to
>> the release calendar [1].
>>
>> I volunteer to perform this release. My plan is to cut the branch on
>> that date, and cherrypick release-blocking fixes afterwards, if any.
>>
>> Please help me make sure the release goes smoothly by:
>> - Making sure that any unresolved release blocking issues for 2.49.0 should
>> have their "Milestone" marked as "2.49.0 Release" as soon as possible.
>> - Reviewing the current release blockers [2] and remove the Milestone if
>> they don't meet the criteria at [3].
>>
>> Let me know if you have any comments/objections/questions.
>>
>> Thanks,
>>
>> Yi
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/13
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>> --
>>
>> Yi Hu, (he/him/his)
>>
>> Software Engineer
>>
>>
>>


Re: [DISCUSS] Enable Github Discussions?

2023-06-30 Thread Danny McCormick via dev
Thanks for starting this discussion!

I'm a weak -1 for this proposal. While I think that GH Discussions can be a
good forum, I think most of the things that Discussions do are covered by
some combination of the dev/user lists and GitHub issues, and the net
outcome of this will be creating one more forum to pay attention to. I know
in the past we've had a hard time keeping up with Stack overflow questions
for a similar reason. With that said, I'm not opposed to trying it out and
experimenting as long as we have (a) clear criteria for understanding if
the change is effective or not (can be subjective), (b) a clear idea of
when we'd revisit the discussion, and (c) a clear path to rollback the
decision without it being *too *much work (this might mean something like
disabling future discussions and keeping the history or somehow moving the
history to the dev or user list). If we do this, I also think we should
update https://beam.apache.org/community/contact-us/ with a clear taxonomy
of what goes where (this is what I'm unsure of today).

FWIW, if we were proposing cutting either the user list or both the user
and dev list in favor of discussions, I would be +1. I do think the
advantages of discussions over email are real (threaded, easy to convert
to/from issues, markdown, one place for all things Beam).

Thanks,
Danny

On Fri, Jun 30, 2023 at 10:23 AM Svetak Sundhar via dev 
wrote:

> Hi all,
>
> I wanted to start a discussion to gauge interest on enabling Github
> Discussions  in Apache
> Beam.
>
>
> Pros:
>
> + GH Discussions allows for folks to get unblocked on small/medium
> implementation blocker (Google employees can often get this help by
> scheduling a call with teammates whereas there is a larger barrier for
> non-Google employees to get this help).
>
> + On the above point, more visibility into the development blockers that
> others have previously faced.
>
> + GH Discussions is more discoverable and approachable for new users and
> contributors.
>
> + A centralized place to have discussions. Long term, it makes sense to
> eventually fully migrate to GH Discussions.
>
>
> Cons:
>
> - For a period of time when we use both the dev list and GH Discussions,
> context can be confusing.
>
> - Anything else?
>
>
> To be clear, I’m not advocating that we move off the dev list immediately.
> I propose that over time we slowly start moving discussions over to GH
> discussions, utilizing things such as the poll feature.
>
>
> I am aware that the Airflow project [1] uses both GH Discussions today and
> a dev@ list [2] today.
>
>
> [1] https://github.com/apache/airflow/discussions
>
> [2] https://lists.apache.org/list.html?d...@airflow.apache.org
>
> Thanks,
>
>
> Svetak Sundhar
>
>   Data Engineer
> s vetaksund...@google.com
>
>


[DISCUSS] Enable Github Discussions?

2023-06-30 Thread Svetak Sundhar via dev
Hi all,

I wanted to start a discussion to gauge interest on enabling Github
Discussions  in Apache
Beam.


Pros:

+ GH Discussions allows for folks to get unblocked on small/medium
implementation blocker (Google employees can often get this help by
scheduling a call with teammates whereas there is a larger barrier for
non-Google employees to get this help).

+ On the above point, more visibility into the development blockers that
others have previously faced.

+ GH Discussions is more discoverable and approachable for new users and
contributors.

+ A centralized place to have discussions. Long term, it makes sense to
eventually fully migrate to GH Discussions.


Cons:

- For a period of time when we use both the dev list and GH Discussions,
context can be confusing.

- Anything else?


To be clear, I’m not advocating that we move off the dev list immediately.
I propose that over time we slowly start moving discussions over to GH
discussions, utilizing things such as the poll feature.


I am aware that the Airflow project [1] uses both GH Discussions today and
a dev@ list [2] today.


[1] https://github.com/apache/airflow/discussions

[2] https://lists.apache.org/list.html?d...@airflow.apache.org

Thanks,


Svetak Sundhar

  Data Engineer
s vetaksund...@google.com


Beam High Priority Issue Report (37)

2023-06-30 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/27312 [Bug]: JmsIO create connection 
based on the number of threads
https://github.com/apache/beam/issues/27292 [Failing Test]: Java Unit Test 
GitHub Action failing on MacOS and Windows
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26547 [Failing Test]: 
beam_PostCommit_Java_DataflowV2
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/26902 [Bug]: Images built not saved in 
the local image store
https://github.com/apache/beam/issues/26723 [Failing Test]: Tour of Beam 
Frontend Test suite is