2.47.0 Release Update and Brief Code Freeze

2023-04-05 Thread Jack McCluskey via dev
Hey everyone,

While I was working on cutting the release branch this afternoon I hit a
snag with the release script, and in trying to revert the halfway-completed
work on that the entire Beam repo was (briefly)
deleted. This has been fixed, but as a consequence of this and the
permissions on the repo the history for the repo is messed up (AKA every
file has a removal and restoration in its history now.) While we work with
infra to gain permission to amend the history (tracking for that will be at
https://issues.apache.org/jira/browse/INFRA-24433) we will have a brief
code freeze to ensure that everything is restored properly.

I take full responsibility for the incident and will be working on
diagnosing how this happened + improving the release process so this can't
happen again.

Thanks,

Jack McCluskey
-- 


Jack McCluskey
SWE - DataPLS PLAT/ Dataflow ML
RDU
jrmcclus...@google.com


Re: Python 3.11 support in Apache Beam

2023-04-05 Thread Anand Inguva via dev
Python 3.11 support has been merged at
https://github.com/apache/beam/pull/26121 targeting Beam 2.47.0 release.

Please let me know if you have any questions.

Thanks,
Anand

On Tue, Feb 21, 2023 at 6:04 PM Valentyn Tymofieiev 
wrote:

> Thanks a lot Anand. I'll take a look at the PRs.
>
> On Tue, Feb 21, 2023 at 1:56 PM Anand Inguva 
> wrote:
>
>> I was able to spin up a PR: https://github.com/apache/beam/pull/24599
>> that updates the build dependencies of Apache Beam.
>>
>> Several GCP dependencies needed to be updated as well. I covered them in
>> the PR: https://github.com/apache/beam/pull/24599
>>
>> On Thu, Feb 9, 2023 at 3:29 PM Anand Inguva 
>> wrote:
>>
>>> Yes, we may need to update all of them
>>> .
>>> I can add more information once I dig into the issue(most likely next
>>> week). I will comment on my findings on the issue:
>>> https://github.com/apache/beam/issues/24569 and will periodically
>>> update this thread.
>>>
>>> On Tue, Feb 7, 2023 at 5:47 PM Valentyn Tymofieiev 
>>> wrote:
>>>
 On Tue, Feb 7, 2023 at 2:35 PM Anand Inguva 
 wrote:

> Yes, it is related to protobuf only. But I think the update of these
> dependencies are required for Python 3.11 since the newer versions have
> support for Python 3.11 wheels.
>
 Assuming you refer to protobuf. Yes, there are no wheels for 3.10 for
 protobuf==3.x.x and that can cause friction.
 https://pypi.org/project/protobuf/3.20.3/#files

 I would probably narrow the problem further to demonstrate which stubs
 are not being generated, and if reason not obvious we can also ask for
 feedback from protobuf maintainers. Also - do we by chance need to
 update some other deps from
 https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt#L28-L33
 for this to work?

 Also: tracking issue for protobuf4 support in Beam:
 https://github.com/apache/beam/issues/24569.

 If we use older versions of these packages, then we have to depend on
> installing those packages on Python 3.11 from source distributions which 
> is
> not desired.
>
> I am working parallely on that issue in a different PR
> https://github.com/apache/beam/pull/24599 but I think this issue
> should be a blocker for Python 3.11 update.
>
> On Tue, Feb 7, 2023 at 5:25 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> Hi Anand,
>>
>> On Tue, Feb 7, 2023 at 1:35 PM Anand Inguva via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi all,
>>>
>>> We are planning to work on adding support for Python 3.11[1] to
>>> Apache Beam Python SDK.
>>>
>>> As part of this effort, we are going to update the python build
>>> dependencies defined at [2].
>>>
>>> Right now, there is an error with the newer version of
>>> protobuf(4.21.11). It is not generating _urn files.
>>>
>>> It can be reproduced by
>>>
>>
>>> 1. python setup.py sdist
>>> 2. pip install dist/apache-beam-x.xx.x.dev0.tar.gz
>>> 3. switch to python interpreter and run import apache_beam as beam
>>>
>> I think the error you are describing is related to protobuf 4, so the
>> repro should focus on the portion where generation of stubs is happening.
>> Presumably some stubs are not generated on protobuf 4 + Python 3.11?
>>
>>
>>>
>>> will lead to *ImportError: cannot import name
>>> 'beam_runner_api_pb2_urns' from 'apache_beam.portability.api'.  *Running
>>> `python gen_protos.py` to forcefully generate files didn't help either.
>>>
>>> If you have encountered this error and found a resolution, please
>>> let me know(that would be super helpful).
>>>
>>> I am going to work on this soon. Please let me know if you want to
>>> collaborate.
>>>
>>> Thanks,
>>> Anand Inguva
>>>
>>> *[1] *https://github.com/apache/beam/pull/24721
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/build-requirements.txt
>>>
>>


Beam High Priority Issue Report (26)

2023-04-05 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId