Dependabot questions

2023-02-27 Thread Valentyn Tymofieiev via dev
I noticed that human-readable dependency reports are not being generated.
Can this functionality be replaced with Dependabot?

Does Dependabot provide a view of what is currently outdated from its
standpoint?

Also, I noticed that some dependencies are outdated, yet not updated by
Dependabot. Possibly, because a prior update PR was silenced. Is it
possible to see the state of which dependencies are currently opted out?


Thanks!


Re: Consuming one PCollection before consuming another with Beam

2023-02-27 Thread Reuven Lax via dev
How large is this state spec stored in BQ? If the size isn't too large, you
can read it from BQ and make it a side input into the DoFn.

On Mon, Feb 27, 2023 at 11:06 AM Sahil Modak 
wrote:

> We are trying to re-initialize our state specs in the BusinessLogic() DoFn
> from BQ.
> BQ has data about the state spec, and we would like to make sure that the
> state specs in our BusinessLogic() dofn are initialized before it starts
> consuming the pub/sub.
>
> This is for handling the case of redeployment of the dataflow jobs so that
> the states are preserved and the BusinessLogic() can work seamlessly as it
> was previously. All our dofns are operating in a global window and do not
> perform any aggregation.
>
> We are currently using Redis to preserve the state spec information but
> would like to explore using BQ as an alternative to Redis.
>
> On Fri, Feb 24, 2023 at 12:51 PM Kenneth Knowles  wrote:
>
>> My suggestion is to try to solve the problem in terms of what you want to
>> compute. Instead of trying to control the operational aspects like "read
>> all the BQ before reading Pubsub" there is presumably some reason that the
>> BQ data naturally "comes first", for example if its timestamps are earlier
>> or if there is a join or an aggregation that must include it. Whenever you
>> think you want to set up an operational dependency between two things that
>> "happen" in a pipeline, it is often best to pivot your thinking to the data
>> and what you are trying to compute, and the built-in dependencies will
>> solve the ordering problems.
>>
>> So - is there a way to describe your problem in terms of the data and
>> what you are trying to compute?
>>
>> Kenn
>>
>> On Fri, Feb 24, 2023 at 10:46 AM Reuven Lax via dev 
>> wrote:
>>
>>> First PCollections are completely unordered, so there is no guarantee on
>>> what order you'll see events in the flattened PCollection.
>>>
>>> There may be ways to process the BigQuery data in a separate transform
>>> first, but it depends on the structure of the data. How large is the
>>> BigQuery table? Are you doing any windowed aggregations here?
>>>
>>> Reuven
>>>
>>> On Fri, Feb 24, 2023 at 10:40 AM Sahil Modak <
>>> smo...@paloaltonetworks.com> wrote:
>>>
 Yes, this is a streaming pipeline.

 Some more details about existing implementation v/s what we want to
 achieve.

 Current implementation:
 Reading from pub-sub:

 Pipeline input = Pipeline.create(options);

 PCollection pubsubStream = input.apply("Read From Pubsub", 
 PubsubIO.readMessagesWithAttributesAndMessageId()

 .fromSubscription(inputSubscriptionId))


 Reading from bigquery:

 PCollection bqStream = input.apply("Read from BQ", BigQueryIO
 .readTableRows().fromQuery(bqQuery).usingStandardSql())

 .apply("JSon Transform", AsJsons.of(TableRow.class));


 Merge the inputs:

 PCollection mergedInput = 
 PCollectionList.of(pubsubStream).and(bqStream).apply("Merge Input", 
 Flatten.pCollections());



 Business Logic:

 mergedInput.apply("Business Logic", ParDo.of(new BusinessLogic()))



 Above logic is what we use currently in our pipeline.

 We want to make sure that we read from BigQuery first & pass the bqStream 
 through our BusinessLogic() before we start consuming pubsubStream.

 Is there a way to achieve this?


 Thanks,

 Sahil


 On Thu, Feb 23, 2023 at 10:21 PM Reuven Lax  wrote:

> Can you explain this use case some more? Is this a streaming pipeline?
> If so, how are you reading from BigQuery?
>
> On Thu, Feb 23, 2023 at 10:06 PM Sahil Modak via dev <
> dev@beam.apache.org> wrote:
>
>> Hi,
>>
>> We have a requirement wherein we are consuming input from pub/sub
>> (PubSubIO) as well as BQ (BQIO)
>>
>> We want to make sure that we consume the BQ stream first before we
>> start consuming the data from pub-sub. Is there a way to achieve this? 
>> Can
>> you please help with some code samples?
>>
>> Currently, we read data from big query using BigQueryIO into a
>> PCollection & also read data from pubsub using PubsubIO. We then use the
>> flatten transform in this manner.
>>
>> PCollection pubsubKvPairs = reads from pubsub using PubsubIO
>> PCollection bigqueryKvPairs = reads from bigquery using BigQueryIO
>>
>> kvPairs = 
>> PCollectionList.of(pubsubKvPairs).and(bigqueryKvPairs).apply("Merge 
>> Input", Flatten.pCollections());
>>
>>
>> Thanks,
>> Sahil
>>
>>


Re: Consuming one PCollection before consuming another with Beam

2023-02-27 Thread Niel Markwick via dev
Why not pass the BQ data as.a side input to your transform?

Side inputs are read fully and materialised before the transform starts.

This will allow your transform to initialize its state before processing
any elements from the PubSub input.

On Mon, 27 Feb 2023, 20:43 Daniel Collins via dev, 
wrote:

> It sounds like what you're doing here might be best done outside the beam
> model. Instead of performing the initial computation reading from BQ into a
> PCollection, perform it using the BigQuery client library in the same
> manner as you currently do to load the data from redis.
>
> On Mon, Feb 27, 2023 at 2:07 PM Sahil Modak via dev 
> wrote:
>
>> We are trying to re-initialize our state specs in the BusinessLogic()
>> DoFn from BQ.
>> BQ has data about the state spec, and we would like to make sure that the
>> state specs in our BusinessLogic() dofn are initialized before it starts
>> consuming the pub/sub.
>>
>> This is for handling the case of redeployment of the dataflow jobs so
>> that the states are preserved and the BusinessLogic() can work seamlessly
>> as it was previously. All our dofns are operating in a global window and do
>> not perform any aggregation.
>>
>> We are currently using Redis to preserve the state spec information but
>> would like to explore using BQ as an alternative to Redis.
>>
>> On Fri, Feb 24, 2023 at 12:51 PM Kenneth Knowles  wrote:
>>
>>> My suggestion is to try to solve the problem in terms of what you want
>>> to compute. Instead of trying to control the operational aspects like "read
>>> all the BQ before reading Pubsub" there is presumably some reason that the
>>> BQ data naturally "comes first", for example if its timestamps are earlier
>>> or if there is a join or an aggregation that must include it. Whenever you
>>> think you want to set up an operational dependency between two things that
>>> "happen" in a pipeline, it is often best to pivot your thinking to the data
>>> and what you are trying to compute, and the built-in dependencies will
>>> solve the ordering problems.
>>>
>>> So - is there a way to describe your problem in terms of the data and
>>> what you are trying to compute?
>>>
>>> Kenn
>>>
>>> On Fri, Feb 24, 2023 at 10:46 AM Reuven Lax via dev 
>>> wrote:
>>>
 First PCollections are completely unordered, so there is no guarantee
 on what order you'll see events in the flattened PCollection.

 There may be ways to process the BigQuery data in a separate transform
 first, but it depends on the structure of the data. How large is the
 BigQuery table? Are you doing any windowed aggregations here?

 Reuven

 On Fri, Feb 24, 2023 at 10:40 AM Sahil Modak <
 smo...@paloaltonetworks.com> wrote:

> Yes, this is a streaming pipeline.
>
> Some more details about existing implementation v/s what we want to
> achieve.
>
> Current implementation:
> Reading from pub-sub:
>
> Pipeline input = Pipeline.create(options);
>
> PCollection pubsubStream = input.apply("Read From Pubsub", 
> PubsubIO.readMessagesWithAttributesAndMessageId()
>
> .fromSubscription(inputSubscriptionId))
>
>
> Reading from bigquery:
>
> PCollection bqStream = input.apply("Read from BQ", BigQueryIO
> .readTableRows().fromQuery(bqQuery).usingStandardSql())
>
> .apply("JSon Transform", AsJsons.of(TableRow.class));
>
>
> Merge the inputs:
>
> PCollection mergedInput = 
> PCollectionList.of(pubsubStream).and(bqStream).apply("Merge Input", 
> Flatten.pCollections());
>
>
>
> Business Logic:
>
> mergedInput.apply("Business Logic", ParDo.of(new BusinessLogic()))
>
>
>
> Above logic is what we use currently in our pipeline.
>
> We want to make sure that we read from BigQuery first & pass the bqStream 
> through our BusinessLogic() before we start consuming pubsubStream.
>
> Is there a way to achieve this?
>
>
> Thanks,
>
> Sahil
>
>
> On Thu, Feb 23, 2023 at 10:21 PM Reuven Lax  wrote:
>
>> Can you explain this use case some more? Is this a streaming
>> pipeline? If so, how are you reading from BigQuery?
>>
>> On Thu, Feb 23, 2023 at 10:06 PM Sahil Modak via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Hi,
>>>
>>> We have a requirement wherein we are consuming input from pub/sub
>>> (PubSubIO) as well as BQ (BQIO)
>>>
>>> We want to make sure that we consume the BQ stream first before we
>>> start consuming the data from pub-sub. Is there a way to achieve this? 
>>> Can
>>> you please help with some code samples?
>>>
>>> Currently, we read data from big query using BigQueryIO into a
>>> PCollection & also read data from pubsub using PubsubIO. We then use the
>>> flatten transform in this manner.
>>>
>>> 

Re: Consuming one PCollection before consuming another with Beam

2023-02-27 Thread Daniel Collins via dev
It sounds like what you're doing here might be best done outside the beam
model. Instead of performing the initial computation reading from BQ into a
PCollection, perform it using the BigQuery client library in the same
manner as you currently do to load the data from redis.

On Mon, Feb 27, 2023 at 2:07 PM Sahil Modak via dev 
wrote:

> We are trying to re-initialize our state specs in the BusinessLogic() DoFn
> from BQ.
> BQ has data about the state spec, and we would like to make sure that the
> state specs in our BusinessLogic() dofn are initialized before it starts
> consuming the pub/sub.
>
> This is for handling the case of redeployment of the dataflow jobs so that
> the states are preserved and the BusinessLogic() can work seamlessly as it
> was previously. All our dofns are operating in a global window and do not
> perform any aggregation.
>
> We are currently using Redis to preserve the state spec information but
> would like to explore using BQ as an alternative to Redis.
>
> On Fri, Feb 24, 2023 at 12:51 PM Kenneth Knowles  wrote:
>
>> My suggestion is to try to solve the problem in terms of what you want to
>> compute. Instead of trying to control the operational aspects like "read
>> all the BQ before reading Pubsub" there is presumably some reason that the
>> BQ data naturally "comes first", for example if its timestamps are earlier
>> or if there is a join or an aggregation that must include it. Whenever you
>> think you want to set up an operational dependency between two things that
>> "happen" in a pipeline, it is often best to pivot your thinking to the data
>> and what you are trying to compute, and the built-in dependencies will
>> solve the ordering problems.
>>
>> So - is there a way to describe your problem in terms of the data and
>> what you are trying to compute?
>>
>> Kenn
>>
>> On Fri, Feb 24, 2023 at 10:46 AM Reuven Lax via dev 
>> wrote:
>>
>>> First PCollections are completely unordered, so there is no guarantee on
>>> what order you'll see events in the flattened PCollection.
>>>
>>> There may be ways to process the BigQuery data in a separate transform
>>> first, but it depends on the structure of the data. How large is the
>>> BigQuery table? Are you doing any windowed aggregations here?
>>>
>>> Reuven
>>>
>>> On Fri, Feb 24, 2023 at 10:40 AM Sahil Modak <
>>> smo...@paloaltonetworks.com> wrote:
>>>
 Yes, this is a streaming pipeline.

 Some more details about existing implementation v/s what we want to
 achieve.

 Current implementation:
 Reading from pub-sub:

 Pipeline input = Pipeline.create(options);

 PCollection pubsubStream = input.apply("Read From Pubsub", 
 PubsubIO.readMessagesWithAttributesAndMessageId()

 .fromSubscription(inputSubscriptionId))


 Reading from bigquery:

 PCollection bqStream = input.apply("Read from BQ", BigQueryIO
 .readTableRows().fromQuery(bqQuery).usingStandardSql())

 .apply("JSon Transform", AsJsons.of(TableRow.class));


 Merge the inputs:

 PCollection mergedInput = 
 PCollectionList.of(pubsubStream).and(bqStream).apply("Merge Input", 
 Flatten.pCollections());



 Business Logic:

 mergedInput.apply("Business Logic", ParDo.of(new BusinessLogic()))



 Above logic is what we use currently in our pipeline.

 We want to make sure that we read from BigQuery first & pass the bqStream 
 through our BusinessLogic() before we start consuming pubsubStream.

 Is there a way to achieve this?


 Thanks,

 Sahil


 On Thu, Feb 23, 2023 at 10:21 PM Reuven Lax  wrote:

> Can you explain this use case some more? Is this a streaming pipeline?
> If so, how are you reading from BigQuery?
>
> On Thu, Feb 23, 2023 at 10:06 PM Sahil Modak via dev <
> dev@beam.apache.org> wrote:
>
>> Hi,
>>
>> We have a requirement wherein we are consuming input from pub/sub
>> (PubSubIO) as well as BQ (BQIO)
>>
>> We want to make sure that we consume the BQ stream first before we
>> start consuming the data from pub-sub. Is there a way to achieve this? 
>> Can
>> you please help with some code samples?
>>
>> Currently, we read data from big query using BigQueryIO into a
>> PCollection & also read data from pubsub using PubsubIO. We then use the
>> flatten transform in this manner.
>>
>> PCollection pubsubKvPairs = reads from pubsub using PubsubIO
>> PCollection bigqueryKvPairs = reads from bigquery using BigQueryIO
>>
>> kvPairs = 
>> PCollectionList.of(pubsubKvPairs).and(bigqueryKvPairs).apply("Merge 
>> Input", Flatten.pCollections());
>>
>>
>> Thanks,
>> Sahil
>>
>>


Re: Consuming one PCollection before consuming another with Beam

2023-02-27 Thread Sahil Modak via dev
We are trying to re-initialize our state specs in the BusinessLogic() DoFn
from BQ.
BQ has data about the state spec, and we would like to make sure that the
state specs in our BusinessLogic() dofn are initialized before it starts
consuming the pub/sub.

This is for handling the case of redeployment of the dataflow jobs so that
the states are preserved and the BusinessLogic() can work seamlessly as it
was previously. All our dofns are operating in a global window and do not
perform any aggregation.

We are currently using Redis to preserve the state spec information but
would like to explore using BQ as an alternative to Redis.

On Fri, Feb 24, 2023 at 12:51 PM Kenneth Knowles  wrote:

> My suggestion is to try to solve the problem in terms of what you want to
> compute. Instead of trying to control the operational aspects like "read
> all the BQ before reading Pubsub" there is presumably some reason that the
> BQ data naturally "comes first", for example if its timestamps are earlier
> or if there is a join or an aggregation that must include it. Whenever you
> think you want to set up an operational dependency between two things that
> "happen" in a pipeline, it is often best to pivot your thinking to the data
> and what you are trying to compute, and the built-in dependencies will
> solve the ordering problems.
>
> So - is there a way to describe your problem in terms of the data and what
> you are trying to compute?
>
> Kenn
>
> On Fri, Feb 24, 2023 at 10:46 AM Reuven Lax via dev 
> wrote:
>
>> First PCollections are completely unordered, so there is no guarantee on
>> what order you'll see events in the flattened PCollection.
>>
>> There may be ways to process the BigQuery data in a separate transform
>> first, but it depends on the structure of the data. How large is the
>> BigQuery table? Are you doing any windowed aggregations here?
>>
>> Reuven
>>
>> On Fri, Feb 24, 2023 at 10:40 AM Sahil Modak 
>> wrote:
>>
>>> Yes, this is a streaming pipeline.
>>>
>>> Some more details about existing implementation v/s what we want to
>>> achieve.
>>>
>>> Current implementation:
>>> Reading from pub-sub:
>>>
>>> Pipeline input = Pipeline.create(options);
>>>
>>> PCollection pubsubStream = input.apply("Read From Pubsub", 
>>> PubsubIO.readMessagesWithAttributesAndMessageId()
>>>
>>> .fromSubscription(inputSubscriptionId))
>>>
>>>
>>> Reading from bigquery:
>>>
>>> PCollection bqStream = input.apply("Read from BQ", BigQueryIO
>>> .readTableRows().fromQuery(bqQuery).usingStandardSql())
>>>
>>> .apply("JSon Transform", AsJsons.of(TableRow.class));
>>>
>>>
>>> Merge the inputs:
>>>
>>> PCollection mergedInput = 
>>> PCollectionList.of(pubsubStream).and(bqStream).apply("Merge Input", 
>>> Flatten.pCollections());
>>>
>>>
>>>
>>> Business Logic:
>>>
>>> mergedInput.apply("Business Logic", ParDo.of(new BusinessLogic()))
>>>
>>>
>>>
>>> Above logic is what we use currently in our pipeline.
>>>
>>> We want to make sure that we read from BigQuery first & pass the bqStream 
>>> through our BusinessLogic() before we start consuming pubsubStream.
>>>
>>> Is there a way to achieve this?
>>>
>>>
>>> Thanks,
>>>
>>> Sahil
>>>
>>>
>>> On Thu, Feb 23, 2023 at 10:21 PM Reuven Lax  wrote:
>>>
 Can you explain this use case some more? Is this a streaming pipeline?
 If so, how are you reading from BigQuery?

 On Thu, Feb 23, 2023 at 10:06 PM Sahil Modak via dev <
 dev@beam.apache.org> wrote:

> Hi,
>
> We have a requirement wherein we are consuming input from pub/sub
> (PubSubIO) as well as BQ (BQIO)
>
> We want to make sure that we consume the BQ stream first before we
> start consuming the data from pub-sub. Is there a way to achieve this? Can
> you please help with some code samples?
>
> Currently, we read data from big query using BigQueryIO into a
> PCollection & also read data from pubsub using PubsubIO. We then use the
> flatten transform in this manner.
>
> PCollection pubsubKvPairs = reads from pubsub using PubsubIO
> PCollection bigqueryKvPairs = reads from bigquery using BigQueryIO
>
> kvPairs = 
> PCollectionList.of(pubsubKvPairs).and(bigqueryKvPairs).apply("Merge 
> Input", Flatten.pCollections());
>
>
> Thanks,
> Sahil
>
>


Re: Introduction and Idea for GSOC 2023.

2023-02-27 Thread Siddharth Aryan
Hello Pabloem,
I'm waiting for your reply,and eager to work under your guidance.Hope you
are fine..

On Thu, Feb 23, 2023, 8:03 PM Siddharth Aryan 
wrote:

> HELLO Pablo,
>
> I hope this email finds you well. My name is Siddharth Aryan a 2 year
> undergrad student in Bachelors in Computer Application, and a student who
> is very interested in participating in Google Summer of Code this year. I
> am writing to introduce myself and express my interest in your organization.
>
> I started contributing to Beam from Febuary,2023.Since then it's all
> Apache Beam,sharing and learn about its components like
> Ptransforms,Pipeline,Pcollections ,watermark and many more.There was a
> rough idea that to create online Python SDK where every Apache runner or
> Beam Dataflow can be run.The idea is bit rough but I got this from my
> limitations. So what actually happened was i have laptop which still has
> Windows 7,2gb ram,and Intel 2,So because of these i was unable to install
> many things In respect to Apache to run piplines.
>
> I would love to discuss this idea further with you and get your thoughts
> on it. Additionally, I would appreciate any feedback you may have on my
> proposal and any advice on how I can better prepare myself for Google
> Summer of Code.
>
> Thank you for taking the time to read my email. I look forward to hearing
> back from you soon.
>
> Best regards,
> Siddharth Aryan
>
> Shared about APACHE BEAM WHICH  HELPED THE UPCOMING CONTRIBUTORS
>
> https://twitter.com/Siddhar33812778/status/1626234564648787969?t=EsYzDyQq5jGKS17V4ko94w=19
>


Beam High Priority Issue Report (36)

2023-02-27 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be 
passed to flink runner
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/25412 [Feature Request]: Google Cloud 
Bigtable Change Stream Connector
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21645 
beam_PostCommit_XVR_GoUsingJava_Dataflow fails on some test transforms
https://github.com/apache/beam/issues/21476