Re: Consuming one PCollection before consuming another with Beam

2023-02-23 Thread Reuven Lax via dev
Can you explain this use case some more? Is this a streaming pipeline? If
so, how are you reading from BigQuery?

On Thu, Feb 23, 2023 at 10:06 PM Sahil Modak via dev 
wrote:

> Hi,
>
> We have a requirement wherein we are consuming input from pub/sub
> (PubSubIO) as well as BQ (BQIO)
>
> We want to make sure that we consume the BQ stream first before we start
> consuming the data from pub-sub. Is there a way to achieve this? Can you
> please help with some code samples?
>
> Currently, we read data from big query using BigQueryIO into a PCollection
> & also read data from pubsub using PubsubIO. We then use the flatten
> transform in this manner.
>
> PCollection pubsubKvPairs = reads from pubsub using PubsubIO
> PCollection bigqueryKvPairs = reads from bigquery using BigQueryIO
>
> kvPairs = PCollectionList.of(pubsubKvPairs).and(bigqueryKvPairs).apply("Merge 
> Input", Flatten.pCollections());
>
>
> Thanks,
> Sahil
>
>


Consuming one PCollection before consuming another with Beam

2023-02-23 Thread Sahil Modak via dev
Hi,

We have a requirement wherein we are consuming input from pub/sub
(PubSubIO) as well as BQ (BQIO)

We want to make sure that we consume the BQ stream first before we start
consuming the data from pub-sub. Is there a way to achieve this? Can you
please help with some code samples?

Currently, we read data from big query using BigQueryIO into a PCollection
& also read data from pubsub using PubsubIO. We then use the flatten
transform in this manner.

PCollection pubsubKvPairs = reads from pubsub using PubsubIO
PCollection bigqueryKvPairs = reads from bigquery using BigQueryIO

kvPairs = PCollectionList.of(pubsubKvPairs).and(bigqueryKvPairs).apply("Merge
Input", Flatten.pCollections());


Thanks,
Sahil


Beam Website Feedback: Request: Docon Error handling patterns

2023-02-23 Thread nirav patel
It’d be nice to see some common error handling patterns;
1) how to handle per op bases : map, flatMap, parDo
2) how to handle uniformly across multiple ops
..




Re: Review ask for Flink Runner Backlog Metric Bug Fix

2023-02-23 Thread Talat Uyarer via dev
Would you like to be a volunteer  +Andrew Pilloud  :)


On Thu, Feb 23, 2023 at 4:51 PM Andrew Pilloud  wrote:

> The bot says there are no reviewers for Flink. Possibly you'll find a
> volunteer to review it here?
>
> On Thu, Feb 23, 2023 at 4:47 PM Talat Uyarer via dev 
> wrote:
>
>> Hi,
>>
>> I created a bugfix for Flink Runner backlog metrics. I asked OWNERs and
>> try to run assign reviewer command. But I am not sure. I pressed the right
>> button :)
>>
>> If you know some who can review this change
>> https://github.com/apache/beam/pull/25554
>> 
>>
>> Could you assign him/her to this mr ?
>>
>> Thanks
>>
>


Re: Review ask for Flink Runner Backlog Metric Bug Fix

2023-02-23 Thread Andrew Pilloud via dev
The bot says there are no reviewers for Flink. Possibly you'll find a
volunteer to review it here?

On Thu, Feb 23, 2023 at 4:47 PM Talat Uyarer via dev 
wrote:

> Hi,
>
> I created a bugfix for Flink Runner backlog metrics. I asked OWNERs and
> try to run assign reviewer command. But I am not sure. I pressed the right
> button :)
>
> If you know some who can review this change
> https://github.com/apache/beam/pull/25554
>
> Could you assign him/her to this mr ?
>
> Thanks
>


Review ask for Flink Runner Backlog Metric Bug Fix

2023-02-23 Thread Talat Uyarer via dev
Hi,

I created a bugfix for Flink Runner backlog metrics. I asked OWNERs and try
to run assign reviewer command. But I am not sure. I pressed the right
button :)

If you know some who can review this change
https://github.com/apache/beam/pull/25554

Could you assign him/her to this mr ?

Thanks


Re: Beam Release 2.46

2023-02-23 Thread Valentyn Tymofieiev via dev
Thanks for the update!

I'd like to suggest that we include in the release voting email template a
link to a PR that runs all tests against the release branch. I think we
used to include it, but I haven't seen it in recent voting threads.

Thanks,
Valentyn

On Thu, Feb 23, 2023 at 9:28 AM Danny McCormick via dev 
wrote:

> I cut the release branch yesterday at commit
> 4ce8eeda19699cc64ae8cf310267a478cfe9e4b8
> .
> There is currently one open release blocking issue
> :
>
> - #25601: [Failing Test]: Python PostCommit failing due to duplicate
> AvroSchemaIO autoservice 
>  - @Alexey Romanenko , @Yi Hu ,
> and Moritz Mack have been working together on resolving that issue (Alexey
> has a pr up here - #25611 ).
>
> Once that blocker is resolved/cherry picked, I will work on generating the
> first release candidate
>
> Thanks,
> Danny
>
>
>
>
> On Wed, Feb 15, 2023 at 10:05 AM Danny McCormick <
> dannymccorm...@google.com> wrote:
>
>> > Do you mind if I shadow you while you do this?
>>
>> Sure!
>>
>> On Tue, Feb 14, 2023 at 12:32 PM Damon Douglas 
>> wrote:
>>
>>> Hello Danny,
>>>
>>> Do you mind if I shadow you while you do this?
>>>
>>> Best,
>>>
>>> Damon
>>>
>>> On Thu, Feb 9, 2023 at 3:17 PM Kenneth Knowles  wrote:
>>>
 Excellent! Keep that release train rolling.

 On Thu, Feb 9, 2023 at 9:28 AM Ahmet Altay via dev 
 wrote:

> Thank you Danny!
>
> On Wed, Feb 8, 2023 at 6:46 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> Hey everyone, I would like to volunteer myself to do the 2.46.0
>> release.
>>
>> I will cut the branch Feb 22 [1], and cherrypick any blocking fixes
>> afterwards. Please review the current release blockers [2] and remove the
>> 2.46 milestone if they don't meet the criteria at [3].
>>
>> Thanks,
>> Danny
>>
>> [1]
>> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>> [2] https://github.com/apache/beam/milestone/9
>> [3] https://beam.apache.org/contribute/release-blocking/
>>
>


Introduction and Idea for GSOC 2023.

2023-02-23 Thread Siddharth Aryan
HELLO Pablo,

I hope this email finds you well. My name is Siddharth Aryan a 2 year
undergrad student in Bachelors in Computer Application, and a student who
is very interested in participating in Google Summer of Code this year. I
am writing to introduce myself and express my interest in your organization.

I started contributing to Beam from Febuary,2023.Since then it's all Apache
Beam,sharing and learn about its components like
Ptransforms,Pipeline,Pcollections ,watermark and many more.There was a
rough idea that to create online Python SDK where every Apache runner or
Beam Dataflow can be run.The idea is bit rough but I got this from my
limitations. So what actually happened was i have laptop which still has
Windows 7,2gb ram,and Intel 2,So because of these i was unable to install
many things In respect to Apache to run piplines.

I would love to discuss this idea further with you and get your thoughts on
it. Additionally, I would appreciate any feedback you may have on my
proposal and any advice on how I can better prepare myself for Google
Summer of Code.

Thank you for taking the time to read my email. I look forward to hearing
back from you soon.

Best regards,
Siddharth Aryan

Shared about APACHE BEAM WHICH  HELPED THE UPCOMING CONTRIBUTORS
https://twitter.com/Siddhar33812778/status/1626234564648787969?t=EsYzDyQq5jGKS17V4ko94w=19


Beam Dependency Check Report (2023-02-23)

2023-02-23 Thread Apache Jenkins Server
<<< text/html; charset=UTF-8: Unrecognized >>>


Re: Beam Release 2.46

2023-02-23 Thread Danny McCormick via dev
I cut the release branch yesterday at commit
4ce8eeda19699cc64ae8cf310267a478cfe9e4b8
.
There is currently one open release blocking issue
:

- #25601: [Failing Test]: Python PostCommit failing due to duplicate
AvroSchemaIO autoservice 
 - @Alexey Romanenko , @Yi Hu ,
and Moritz Mack have been working together on resolving that issue (Alexey
has a pr up here - #25611 ).

Once that blocker is resolved/cherry picked, I will work on generating the
first release candidate

Thanks,
Danny




On Wed, Feb 15, 2023 at 10:05 AM Danny McCormick 
wrote:

> > Do you mind if I shadow you while you do this?
>
> Sure!
>
> On Tue, Feb 14, 2023 at 12:32 PM Damon Douglas 
> wrote:
>
>> Hello Danny,
>>
>> Do you mind if I shadow you while you do this?
>>
>> Best,
>>
>> Damon
>>
>> On Thu, Feb 9, 2023 at 3:17 PM Kenneth Knowles  wrote:
>>
>>> Excellent! Keep that release train rolling.
>>>
>>> On Thu, Feb 9, 2023 at 9:28 AM Ahmet Altay via dev 
>>> wrote:
>>>
 Thank you Danny!

 On Wed, Feb 8, 2023 at 6:46 AM Danny McCormick via dev <
 dev@beam.apache.org> wrote:

> Hey everyone, I would like to volunteer myself to do the 2.46.0
> release.
>
> I will cut the branch Feb 22 [1], and cherrypick any blocking fixes
> afterwards. Please review the current release blockers [2] and remove the
> 2.46 milestone if they don't meet the criteria at [3].
>
> Thanks,
> Danny
>
> [1]
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
> [2] https://github.com/apache/beam/milestone/9
> [3] https://beam.apache.org/contribute/release-blocking/
>



Beam High Priority Issue Report (37)

2023-02-23 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/25601 [Failing Test]: Python PostCommit 
failing due to duplicate AvroSchemaIO autoservice
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24367 [Bug]: workflow.tar.gz cannot be 
passed to flink runner
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/24267 [Failing Test]: Timeout waiting to 
lock gradle
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/22969 Discrepancy in behavior of 
`DoFn.process()` when `yield` is combined with `return` statement, or vice versa
https://github.com/apache/beam/issues/22961 [Bug]: WriteToBigQuery silently 
skips most of records without job fail
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22115 [Bug]: 
apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses
 is flaky
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
https://github.com/apache/beam/issues/20974 Python GHA PreCommits flake with 
grpc.FutureTimeoutError on SDK harness startup
https://github.com/apache/beam/issues/20689 Kafka commitOffsetsInFinalize OOM 
on Flink
https://github.com/apache/beam/issues/20108 Python direct runner doesn't emit 
empty pane when it should
https://github.com/apache/beam/issues/19814 Flink streaming flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
https://github.com/apache/beam/issues/19465 Explore possibilities to lower 
in-use IP address quota footprint.
https://github.com/apache/beam/issues/19241 Python Dataflow integration tests 
should export the pipeline Job ID and console output to Jenkins Test Result 
section


P1 Issues with no update in the last week:

https://github.com/apache/beam/issues/25412 [Feature Request]: Google Cloud 
Bigtable Change Stream Connector
https://github.com/apache/beam/issues/23875 [Bug]: beam.Row.__eq__ returns true 
for unequal rows
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently