Re: [RESULT] [VOTE] Release 2.53.0, release candidate #2

2024-01-05 Thread Jack McCluskey via dev
And with that, the 2.53.0 release is complete. Thank you, everyone!

On Fri, Jan 5, 2024 at 2:39 PM Robert Burke  wrote:

> Done!
>
> On Fri, Jan 5, 2024, 11:30 AM Robert Burke  wrote:
>
>> Going to try to get this done. Will report back when completed (or I get
>> pulled elsewhere).
>>
>> On Thu, Jan 4, 2024, 11:23 AM Jack McCluskey via dev 
>> wrote:
>>
>>> Hey everyone,
>>>
>>> Following up on this, I do need help from a PMC member for the PMC-only
>>> finalization steps (
>>> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#pmc-only-finalization
>>> )
>>>
>>> Thanks,
>>>
>>> Jack McCluskey
>>>
>>> On Thu, Jan 4, 2024 at 10:27 AM Jack McCluskey 
>>> wrote:
>>>
 Hey everyone,

 I'm happy to announce that we have unanimously approved this release.

 There are nine approving votes, three of which are binding:
 * Jan Lukavský (binding)
 * Chamikara Jayalath (binding)
 * Robert Burke (binding)
 * XQ Hu
 * Danny McCormick
 * Bruno Volpato
 * Svetak Sundhar
 * Yi Hu
 * Johanna Öjeling

 There are no disapproving votes. I will begin finalizing the release.

 Thanks everyone!

 --


 Jack McCluskey
 SWE - DataPLS PLAT/ Dataflow ML
 RDU
 jrmcclus...@google.com





Re: [RESULT] [VOTE] Release 2.53.0, release candidate #2

2024-01-05 Thread Robert Burke
Done!

On Fri, Jan 5, 2024, 11:30 AM Robert Burke  wrote:

> Going to try to get this done. Will report back when completed (or I get
> pulled elsewhere).
>
> On Thu, Jan 4, 2024, 11:23 AM Jack McCluskey via dev 
> wrote:
>
>> Hey everyone,
>>
>> Following up on this, I do need help from a PMC member for the PMC-only
>> finalization steps (
>> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#pmc-only-finalization
>> )
>>
>> Thanks,
>>
>> Jack McCluskey
>>
>> On Thu, Jan 4, 2024 at 10:27 AM Jack McCluskey 
>> wrote:
>>
>>> Hey everyone,
>>>
>>> I'm happy to announce that we have unanimously approved this release.
>>>
>>> There are nine approving votes, three of which are binding:
>>> * Jan Lukavský (binding)
>>> * Chamikara Jayalath (binding)
>>> * Robert Burke (binding)
>>> * XQ Hu
>>> * Danny McCormick
>>> * Bruno Volpato
>>> * Svetak Sundhar
>>> * Yi Hu
>>> * Johanna Öjeling
>>>
>>> There are no disapproving votes. I will begin finalizing the release.
>>>
>>> Thanks everyone!
>>>
>>> --
>>>
>>>
>>> Jack McCluskey
>>> SWE - DataPLS PLAT/ Dataflow ML
>>> RDU
>>> jrmcclus...@google.com
>>>
>>>
>>>


Re: [RESULT] [VOTE] Release 2.53.0, release candidate #2

2024-01-05 Thread Robert Burke
Going to try to get this done. Will report back when completed (or I get
pulled elsewhere).

On Thu, Jan 4, 2024, 11:23 AM Jack McCluskey via dev 
wrote:

> Hey everyone,
>
> Following up on this, I do need help from a PMC member for the PMC-only
> finalization steps (
> https://github.com/apache/beam/blob/master/contributor-docs/release-guide.md#pmc-only-finalization
> )
>
> Thanks,
>
> Jack McCluskey
>
> On Thu, Jan 4, 2024 at 10:27 AM Jack McCluskey 
> wrote:
>
>> Hey everyone,
>>
>> I'm happy to announce that we have unanimously approved this release.
>>
>> There are nine approving votes, three of which are binding:
>> * Jan Lukavský (binding)
>> * Chamikara Jayalath (binding)
>> * Robert Burke (binding)
>> * XQ Hu
>> * Danny McCormick
>> * Bruno Volpato
>> * Svetak Sundhar
>> * Yi Hu
>> * Johanna Öjeling
>>
>> There are no disapproving votes. I will begin finalizing the release.
>>
>> Thanks everyone!
>>
>> --
>>
>>
>> Jack McCluskey
>> SWE - DataPLS PLAT/ Dataflow ML
>> RDU
>> jrmcclus...@google.com
>>
>>
>>


Re: (python SDK) "Any" coder bypasses registry coders

2024-01-05 Thread Robert Bradshaw via dev
On Fri, Jan 5, 2024 at 9:42 AM Joey Tran  wrote:
>
>
> I think my original message made it sound like what I thought was confusing 
> was how `Any` works. The scenario that I actually think is confusing is *if a 
> user registers a coder for a data type, this preference will get ignored in 
> non-obvious situations and can (and in my scenario, has) result in 
> non-obvious downstream issues.*


I agree this can be confusing. Essentially, Coders are attached to
PCollections (which are assumed to be of homogeneous type) at compile
time.

>
> On Fri, Jan 5, 2024 at 12:05 PM Robert Bradshaw via dev  
> wrote:
>>
>> On Fri, Jan 5, 2024 at 7:38 AM Joey Tran  wrote:
>>>
>>> I've been working with a few data types that are in practice unpicklable 
>>> and I've run into a couple issues stemming from the `Any` type hint, which 
>>> when used, will result in the PickleCoder getting used even if there's a 
>>> coder in the coder registry that matches the data element.
>>
>>
>> This is likely because we don't know the data type at the time we choose the 
>> coder.
>>
>>>
>>> This was pretty unexpected to me and can result in pretty cryptic 
>>> downstream issues. In the best case, you get an error at pickling time [1], 
>>> and in the worse case, the pickling "succeeds" (since many objects can get 
>>> (de)pickeld without obvious error) but then results in downstream issues 
>>> (e.g. some data doesn't survive depickling).
>>
>>
>> It shouldn't be the case that an object depickles successfully but 
>> incorrectly; sounds like a bug in some custom pickling code.
>
> You don't need custom pickling code for this to happen. For a contrived 
> example, you could imagine some class that caches some state specific to a 
> local system and saves it to a private local variable. If you pickle one of 
> these and then unpickle it on a different system, it would've been unpickled 
> successfully but would be in a bad state.
>
> Rather than mucking around with custom pickling, someone might want to just 
> implement a coder for their special class instead.


It's the same work in both cases, though I can see a coder being
preferable if one does not own the class. (Though copyreg should work
just as well.) In that case, I'd consider explicitly making the class
throw an exception on pickling (though I agree it's hard to see how
one could know to do this by default).

>>>
>>> One example case of the latter is if you flatten a few pcollections 
>>> including a pcollection generated by `beam.Create([])` (the inferred output 
>>> type an empty create becomes Any)
>>
>>
>> Can you add a type hint to the Create?
>
> Yeah this fixes the issue, it's just not obvious (or at least to me) that (1) 
> beam.Create([]) will have an output type of Any (often times the parameter to 
> beam.Create will be some local variable which makes it less obvious) and that


We could update Beam to let the type hint be the empty union, which
would correspond to a coder that can't encode/decode anything, but
when unioned with others (e.g. in a Flatten) does not "taint" the
rest. This doesn't solve unioning two other disjoint types resolving
to the Any coder though.

>
> (2) in this particular case, _how_ downstream pcollections get decoded will 
> be slightly different. In the worse case, the issue won't even result in an 
> error at decoding time (as mentioned before), so then you have to backtrack 
> from some possibly unrelated sounding traceback.
>
>>>
>>> Would it make sense to introduce a new fallback coder that takes precedence 
>>> over the `PickleCoder` that encodes both the data type (by just pickling 
>>> it) and the data encoded using the registry-found coder?
>>
>>
>> This is essentially re-implementing pickle :)
>
> Pickle doesn't use coders from the coder registry which I think is the key 
> distinction here

Pickle is just a dynamic dispatch of type -> encoder.

>>>
>>> This would have some space ramifications for storing the data type for 
>>> every element. Of course this coder would only kick in _if_ the data type 
>>> was found in the registry, otherwise we'd proceed to the picklecoder like 
>>> we do currently
>>
>>
>> I do not think we'd want to introduce this as the default--that'd likely 
>> make common cases much more expensive. IIRC you can manually override the 
>> fallback coder with one of your own choosing. Alternatively, you could look 
>> at using copyreg for your problematic types.
>>
>
> Ah you can indeed override the fallback coder. Okay I'll just do that for our 
> use of Beam.
>
> For sake of discussion though, I think it'd be a small-ish cost incurred once 
> per data type in the collection. The first time we run into a data type, we 
> see if it's the registry and if it is, use that coder, otherwise cache the 
> result (`types_not_in_registry: set`). All data types not in the registry 
> could then just be fast tracked to the picklecoder as before.

The risk is that if I register a coder for unpicklable type T and I
have a 

Re: (python SDK) "Any" coder bypasses registry coders

2024-01-05 Thread Joey Tran
Oh actually, overriding the fallback coder doesn't actually do anything
because the issue is not with the fallback coders in the registry but the
fastprimitivescoder's fallback coder

On Fri, Jan 5, 2024 at 12:42 PM Joey Tran  wrote:

>
> I think my original message made it sound like what I thought was
> confusing was how `Any` works. The scenario that I actually think is
> confusing is *if a user registers a coder for a data type, this preference
> will get ignored in non-obvious situations and can (and in my scenario,
> has) result in non-obvious downstream issues.*
>
> On Fri, Jan 5, 2024 at 12:05 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
>
>> On Fri, Jan 5, 2024 at 7:38 AM Joey Tran 
>> wrote:
>>
>>> I've been working with a few data types that are in practice
>>> unpicklable and I've run into a couple issues stemming from the `Any` type
>>> hint, which when used, will result in the PickleCoder getting used even if
>>> there's a coder in the coder registry that matches the data element.
>>>
>>
>> This is likely because we don't know the data type at the time we choose
>> the coder.
>>
>>
>>> This was pretty unexpected to me and can result in pretty cryptic
>>> downstream issues. In the best case, you get an error at pickling time [1],
>>> and in the worse case, the pickling "succeeds" (since many objects can get
>>> (de)pickeld without obvious error) but then results in downstream issues
>>> (e.g. some data doesn't survive depickling).
>>>
>>
>> It shouldn't be the case that an object depickles successfully but
>> incorrectly; sounds like a bug in some custom pickling code.
>>
> You don't need custom pickling code for this to happen. For a contrived
> example, you could imagine some class that caches some state specific to a
> local system and saves it to a private local variable. If you pickle one of
> these and then unpickle it on a different system, it would've been
> unpickled successfully but would be in a bad state.
>
> Rather than mucking around with custom pickling, someone might want to
> just implement a coder for their special class instead.
>
>
>> One example case of the latter is if you flatten a few pcollections
>>> including a pcollection generated by `beam.Create([])` (the inferred output
>>> type an empty create becomes Any)
>>>
>>
>> Can you add a type hint to the Create?
>>
> Yeah this fixes the issue, it's just not obvious (or at least to me) that
> (1) beam.Create([]) will have an output type of Any (often times the
> parameter to beam.Create will be some local variable which makes it less
> obvious) and that (2) in this particular case, _how_ downstream
> pcollections get decoded will be slightly different. In the worse case, the
> issue won't even result in an error at decoding time (as mentioned before),
> so then you have to backtrack from some possibly unrelated sounding
> traceback.
>
>
>>
>>
>>> Would it make sense to introduce a new fallback coder that takes
>>> precedence over the `PickleCoder` that encodes both the data type (by just
>>> pickling it) and the data encoded using the registry-found coder?
>>>
>>
>> This is essentially re-implementing pickle :)
>>
> Pickle doesn't use coders from the coder registry which I think is the key
> distinction here
>
>
>
>> This would have some space ramifications for storing the data type for
>>> every element. Of course this coder would only kick in _if_ the data type
>>> was found in the registry, otherwise we'd proceed to the picklecoder like
>>> we do currently
>>>
>>
>> I do not think we'd want to introduce this as the default--that'd likely
>> make common cases much more expensive. IIRC you can manually override the
>> fallback coder with one of your own choosing. Alternatively, you could look
>> at using copyreg for your problematic types.
>>
>>
> Ah you can indeed override the fallback coder. Okay I'll just do that for
> our use of Beam.
>
> For sake of discussion though, I think it'd be a small-ish cost incurred
> once per data type in the collection. The first time we run into a data
> type, we see if it's the registry and if it is, use that coder, otherwise
> cache the result (`types_not_in_registry: set`). All data types not in the
> registry could then just be fast tracked to the picklecoder as before.
>
>
>> [1] https://github.com/apache/beam/issues/29908 (Issue arises from
>>> ReshuffleFromKey using `Any` as a pcollection type
>>>
>>


Re: (python SDK) "Any" coder bypasses registry coders

2024-01-05 Thread Joey Tran
I think my original message made it sound like what I thought was confusing
was how `Any` works. The scenario that I actually think is confusing is *if
a user registers a coder for a data type, this preference will get ignored
in non-obvious situations and can (and in my scenario, has) result in
non-obvious downstream issues.*

On Fri, Jan 5, 2024 at 12:05 PM Robert Bradshaw via dev 
wrote:

> On Fri, Jan 5, 2024 at 7:38 AM Joey Tran 
> wrote:
>
>> I've been working with a few data types that are in practice
>> unpicklable and I've run into a couple issues stemming from the `Any` type
>> hint, which when used, will result in the PickleCoder getting used even if
>> there's a coder in the coder registry that matches the data element.
>>
>
> This is likely because we don't know the data type at the time we choose
> the coder.
>
>
>> This was pretty unexpected to me and can result in pretty cryptic
>> downstream issues. In the best case, you get an error at pickling time [1],
>> and in the worse case, the pickling "succeeds" (since many objects can get
>> (de)pickeld without obvious error) but then results in downstream issues
>> (e.g. some data doesn't survive depickling).
>>
>
> It shouldn't be the case that an object depickles successfully but
> incorrectly; sounds like a bug in some custom pickling code.
>
You don't need custom pickling code for this to happen. For a contrived
example, you could imagine some class that caches some state specific to a
local system and saves it to a private local variable. If you pickle one of
these and then unpickle it on a different system, it would've been
unpickled successfully but would be in a bad state.

Rather than mucking around with custom pickling, someone might want to just
implement a coder for their special class instead.


> One example case of the latter is if you flatten a few pcollections
>> including a pcollection generated by `beam.Create([])` (the inferred output
>> type an empty create becomes Any)
>>
>
> Can you add a type hint to the Create?
>
Yeah this fixes the issue, it's just not obvious (or at least to me) that
(1) beam.Create([]) will have an output type of Any (often times the
parameter to beam.Create will be some local variable which makes it less
obvious) and that (2) in this particular case, _how_ downstream
pcollections get decoded will be slightly different. In the worse case, the
issue won't even result in an error at decoding time (as mentioned before),
so then you have to backtrack from some possibly unrelated sounding
traceback.


>
>
>> Would it make sense to introduce a new fallback coder that takes
>> precedence over the `PickleCoder` that encodes both the data type (by just
>> pickling it) and the data encoded using the registry-found coder?
>>
>
> This is essentially re-implementing pickle :)
>
Pickle doesn't use coders from the coder registry which I think is the key
distinction here



> This would have some space ramifications for storing the data type for
>> every element. Of course this coder would only kick in _if_ the data type
>> was found in the registry, otherwise we'd proceed to the picklecoder like
>> we do currently
>>
>
> I do not think we'd want to introduce this as the default--that'd likely
> make common cases much more expensive. IIRC you can manually override the
> fallback coder with one of your own choosing. Alternatively, you could look
> at using copyreg for your problematic types.
>
>
Ah you can indeed override the fallback coder. Okay I'll just do that for
our use of Beam.

For sake of discussion though, I think it'd be a small-ish cost incurred
once per data type in the collection. The first time we run into a data
type, we see if it's the registry and if it is, use that coder, otherwise
cache the result (`types_not_in_registry: set`). All data types not in the
registry could then just be fast tracked to the picklecoder as before.


> [1] https://github.com/apache/beam/issues/29908 (Issue arises from
>> ReshuffleFromKey using `Any` as a pcollection type
>>
>


Re: (python SDK) "Any" coder bypasses registry coders

2024-01-05 Thread Robert Bradshaw via dev
On Fri, Jan 5, 2024 at 7:38 AM Joey Tran  wrote:

> I've been working with a few data types that are in practice
> unpicklable and I've run into a couple issues stemming from the `Any` type
> hint, which when used, will result in the PickleCoder getting used even if
> there's a coder in the coder registry that matches the data element.
>

This is likely because we don't know the data type at the time we choose
the coder.


> This was pretty unexpected to me and can result in pretty cryptic
> downstream issues. In the best case, you get an error at pickling time [1],
> and in the worse case, the pickling "succeeds" (since many objects can get
> (de)pickeld without obvious error) but then results in downstream issues
> (e.g. some data doesn't survive depickling).
>

It shouldn't be the case that an object depickles successfully but
incorrectly; sounds like a bug in some custom pickling code.

One example case of the latter is if you flatten a few pcollections
> including a pcollection generated by `beam.Create([])` (the inferred output
> type an empty create becomes Any)
>

Can you add a type hint to the Create?


> Would it make sense to introduce a new fallback coder that takes
> precedence over the `PickleCoder` that encodes both the data type (by just
> pickling it) and the data encoded using the registry-found coder?
>

This is essentially re-implementing pickle :)


> This would have some space ramifications for storing the data type for
> every element. Of course this coder would only kick in _if_ the data type
> was found in the registry, otherwise we'd proceed to the picklecoder like
> we do currently
>

I do not think we'd want to introduce this as the default--that'd likely
make common cases much more expensive. IIRC you can manually override the
fallback coder with one of your own choosing. Alternatively, you could look
at using copyreg for your problematic types.


> [1] https://github.com/apache/beam/issues/29908 (Issue arises from
> ReshuffleFromKey using `Any` as a pcollection type
>


(python SDK) "Any" coder bypasses registry coders

2024-01-05 Thread Joey Tran
I've been working with a few data types that are in practice
unpicklable and I've run into a couple issues stemming from the `Any` type
hint, which when used, will result in the PickleCoder getting used even if
there's a coder in the coder registry that matches the data element.

This was pretty unexpected to me and can result in pretty cryptic
downstream issues. In the best case, you get an error at pickling time [1],
and in the worse case, the pickling "succeeds" (since many objects can get
(de)pickeld without obvious error) but then results in downstream issues
(e.g. some data doesn't survive depickling). One example case of the latter
is if you flatten a few pcollections including a pcollection generated by
`beam.Create([])` (the inferred output type an empty create becomes Any)

Would it make sense to introduce a new fallback coder that takes precedence
over the `PickleCoder` that encodes both the data type (by just pickling
it) and the data encoded using the registry-found coder? This would have
some space ramifications for storing the data type for every element. Of
course this coder would only kick in _if_ the data type was found in the
registry, otherwise we'd proceed to the picklecoder like we do currently

[1] https://github.com/apache/beam/issues/29908 (Issue arises from
ReshuffleFromKey using `Any` as a pcollection type


[ACTION REQUESTED] Help me draft the Beam Board Report for January 2024

2024-01-05 Thread Kenneth Knowles
Hi all,

The next Beam board report is due next Wednesday, January 10. Please help
me to draft it at https://s.apache.org/beam-draft-report-2024-01. The doc
is open for anyone to edit.

Ideas:

 - highlights from CHANGES.md
 - interesting technical discussions
 - integrations with other projects
 - community events
 - major user facing addition/deprecation

Past reports are at https://whimsy.apache.org/board/minutes/Beam.html for
examples.

Thanks,

Kenn


Re: Setting up beam locally

2024-01-05 Thread Shunping Huang via dev
That's right. You need to put quotes (single quotes or double quotes)
around .[gcp,test] since the square brackets are interpreted as a
glob-pattern in shell.

We will also need to add the quotes to the command listed in the beam wiki
page:
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-VirtualEnvironmentSetup




On Fri, Jan 5, 2024 at 7:44 AM XQ Hu via dev  wrote:

> Assuming you run this under beam/sdks/python, since I use zsh, I have to
> do this `pip install -e ".[gcp,test]"`.
>
> On Fri, Jan 5, 2024 at 1:33 AM G Gautam  wrote:
>
>> Hi everyone,
>>
>> Need help in setting beam locally.
>>
>> When trying to setup locally
>> on entering this command: pip install -e .[gcp,test]
>> I am getting: no matches found: .[gcp,test]
>> am i missing anything
>>
>> I am following this steps
>> -
>> $ python3 -m venv ~/.virtualenvs/env
>>
>> # Activate virtual environment.
>> $ . ~/.virtualenvs/env/bin/activate
>>
>> # Upgrade other tools. (Optional)
>> pip install --upgrade pip
>> pip install --upgrade setuptools
>>
>> # Install Apache Beam package in editable mode.
>> (env) $ pip install -e .[gcp,test]
>> -
>> Slack link
>> 
>>
>> Thanks,
>>
>> Gautam
>>
>> [image: Please consider the environment before printing]
>>
>


Re: Setting up beam locally

2024-01-05 Thread XQ Hu via dev
Assuming you run this under beam/sdks/python, since I use zsh, I have to do
this `pip install -e ".[gcp,test]"`.

On Fri, Jan 5, 2024 at 1:33 AM G Gautam  wrote:

> Hi everyone,
>
> Need help in setting beam locally.
>
> When trying to setup locally
> on entering this command: pip install -e .[gcp,test]
> I am getting: no matches found: .[gcp,test]
> am i missing anything
>
> I am following this steps
> -
> $ python3 -m venv ~/.virtualenvs/env
>
> # Activate virtual environment.
> $ . ~/.virtualenvs/env/bin/activate
>
> # Upgrade other tools. (Optional)
> pip install --upgrade pip
> pip install --upgrade setuptools
>
> # Install Apache Beam package in editable mode.
> (env) $ pip install -e .[gcp,test]
> -
> Slack link
> 
>
> Thanks,
>
> Gautam
>
> [image: Please consider the environment before printing]
>


Beam High Priority Issue Report (51)

2024-01-05 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/29926 [Bug]: FileIO: lack of timeouts may 
cause the pipeline to get stuck indefinitely
https://github.com/apache/beam/issues/29912 [Bug]: floatValueExtractor judge 
float and double equality directly
https://github.com/apache/beam/issues/29825 [Bug]: Usage of logical types 
breaks Beam YAML Sql
https://github.com/apache/beam/issues/29413 [Bug]: Can not use Avro over 1.8.2 
with Beam 2.52.0
https://github.com/apache/beam/issues/29099 [Bug]: FnAPI Java SDK Harness 
doesn't update user counters in OnTimer callback functions
https://github.com/apache/beam/issues/29022 [Failing Test]: Python Github 
actions tests are failing due to update of pip 
https://github.com/apache/beam/issues/28760 [Bug]: EFO Kinesis IO reader 
provided by apache beam does not pick the event time for watermarking
https://github.com/apache/beam/issues/28715 [Bug]: Python WriteToBigtable get 
stuck for large jobs due to client dead lock
https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests