Re: Disabling Jenkins Jobs

2023-09-12 Thread Ahmet Altay via dev
Thank you for doing this.

Is there a list of jobs that will be disabled? I am particularly curious
about: website publishing job (which I need to use manually sometimes) and
the job that publishes daily staging builds (which we share with users
sometimes.)

Thank you.
Ahmet

On Tue, Sep 12, 2023 at 11:14 AM Danny McCormick via dev <
dev@beam.apache.org> wrote:

> Hey everyone, I wanted to let you know that as part of the migration from
> Jenkins to GitHub Actions we are going to start disabling Jenkins jobs if
> they have a corresponding GitHub Actions job that has been running
> successfully for a while. We are starting with Yi's PR here -
> https://github.com/apache/beam/pull/28316. This is the next step in the
> process we kicked off last year [1] now that self-hosted runners have been
> in place and working for a while [2].
>
> We will not migrate jobs until we've confirmed we have parity with the
> existing Jenkins implementations (for example, some jobs are still missing
> test publishing and we won't remove the Jenkins version until they have
> it). In the meantime, migrating some load off should help reduce the
> overall load on Jenkins so that it experiences fewer issues.
>
> If you have any objections with this approach, please respond here. If you
> run into any problems, please file an issue and tag me (@damccorm), Yi
> (@abacn), Andrey (@andreydevyatkin), or Vlado (@volatilemolotov) - or just
> tag all of us :).
>
> Thanks,
> Danny
>
> [1] https://lists.apache.org/thread/0brbkmbd522d1ow43gx5b13dmywt2dgn
> [2] - https://lists.apache.org/thread/3k1owt5k16byv39b9lszd3l7qv7od4r8
>


Disabling Jenkins Jobs

2023-09-12 Thread Danny McCormick via dev
Hey everyone, I wanted to let you know that as part of the migration from
Jenkins to GitHub Actions we are going to start disabling Jenkins jobs if
they have a corresponding GitHub Actions job that has been running
successfully for a while. We are starting with Yi's PR here -
https://github.com/apache/beam/pull/28316. This is the next step in the
process we kicked off last year [1] now that self-hosted runners have been
in place and working for a while [2].

We will not migrate jobs until we've confirmed we have parity with the
existing Jenkins implementations (for example, some jobs are still missing
test publishing and we won't remove the Jenkins version until they have
it). In the meantime, migrating some load off should help reduce the
overall load on Jenkins so that it experiences fewer issues.

If you have any objections with this approach, please respond here. If you
run into any problems, please file an issue and tag me (@damccorm), Yi
(@abacn), Andrey (@andreydevyatkin), or Vlado (@volatilemolotov) - or just
tag all of us :).

Thanks,
Danny

[1] https://lists.apache.org/thread/0brbkmbd522d1ow43gx5b13dmywt2dgn
[2] - https://lists.apache.org/thread/3k1owt5k16byv39b9lszd3l7qv7od4r8


Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-12 Thread Chamikara Jayalath via dev
Thanks Mazlum, this sounds great. I think there are two ways we can proceed
if we decide to integrate the Asgarde library into Beam.

(1) Directly import the code into Beam without significant modifications
and/or a review (though we may add tests).

(2) Go through a design/code review to determine whether this is the best
approach for implementing error handling / DLQ in Beam transforms or
whether there are other alternatives/modifications to Asgarde we want to
consider.

If we do (1) I prefer adding Asgarde as a separate Gradle module in Beam.
We can later integrate it into the core module after a design/code review.

Thank,
Cham



On Tue, Sep 12, 2023 at 10:26 AM Mazlum TOSUN 
wrote:

> Hello Austin and everyone,
>
> I am open for discussion.
>
> My first intention with Asgarde was to help the Beam community, because
> Dead Letter Queue is so important in Beam and all the data pipeline
> frameworks.
> When I worked with Beam on production with my customers, we needed to
> catch errors with side outputs and dead letter queue.
>
> This library really helped us to keep a less verbose code while applying
> all the error handling logic, that is error prone and verbose if it is
> repeated.
>
> As Kennet said, my intention was to stay as close as possible to Beam,
> with a Wrapper and a Failure Monad on top of a PCollection, to handle all
> the code and complexity for try catch blocks and side output.
>
> For the governance, even if I am the creator of this library, the most
> important isn't me but the community and to help the community.
> If the best solution to help the community is including the library
> directly on Beam, we can go in this direction, with of course your reviews
> and recommendations.
>
> Then the library will belong to the community and we will continue to
> improve it.
>
> For the decision about the best place, I will comply with the majority.
>
> Best regards,
>
> Mazlum
>
> On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett  wrote:
>
>> @Mazlum TOSUN  --  you and I have spoken a few
>> times about this.  it'd be good for you to comment here on list, on any of
>> your concerns with governance, and/or other thoughts.  Ex: if you think
>> contributing asgarde directly is the thing [ or perhaps expressing any
>> interest helping write/contribute the relevant functionality into beam ...
>> it is possible that by adding the actual functionality into beam - like
>> Kenn's mentioned 'other place' we could make asgarde as an separate add-on
>> obsolete ].
>>
>>
>>
>> On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles  wrote:
>>
>>> For anyone who hasn't clicked over the Asgarde, my TL;DR description of
>>> it is that it adds the "failure monad" aka "andThen" style error/result
>>> handling on top of chaining of PCollections. So it is at a similar level of
>>> abstraction of our basic transforms and generally useful for chaining
>>> dead-letter side outputs. It is no more or less appropriate for the core
>>> SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we
>>> actually aspired to have a thin core with the accessories like that in
>>> another place, then it should go to that other place.
>>>
>>> Kenn
>>>
>>> On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev <
>>> dev@beam.apache.org> wrote:
>>>
 > until we *require* Asgard on a core transform, it shouldn't be in the
 main repo

 I don't think this is necessarily true if it solves end user use cases.
 If there is a specific transform that solves a specific use case, we could
 include it in the transforms folder for end-users, even if it isn't
 utilized in the I/Os at present. Hence the suggestion to take the most
 promising transforms and propose adding them with documentation, apis and
 rationale.

 -Daniel

 On Fri, Sep 8, 2023 at 11:20 AM Robert Burke 
 wrote:

> I would say until we *require* Asgard on a core transform, it
> shouldn't be in the main repo.
>
> Incorporating something before there's a need for it is premature
> abstraction. We can't do things because they *might* be useful. Let's see
> concrete places where they are useful, or we're already having a similar
> need solved a different way.
>
> Beam is complicated by itself, and we do encourage multiple ways of
> solving problems, but that says to me that having an out of repo ecosystem
> is the right path, rather than incorporation.
>
> On Fri, Sep 8, 2023, 8:14 AM Daniel Collins via dev <
> dev@beam.apache.org> wrote:
>
>> I think there are a lot of interesting and relatively isolated
>> components of the project, it might make sense to write per-transform one
>> pagers for isolated things like the most useful pieces (just basically
>> copying the documentation and justifying the API) instead of doing a
>> one-shot import or having it live forever in an external project.
>>
>> -Daniel
>>
>> On Fri, Sep 8

Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-12 Thread Mazlum TOSUN
Hello Austin and everyone,

I am open for discussion.

My first intention with Asgarde was to help the Beam community, because
Dead Letter Queue is so important in Beam and all the data pipeline
frameworks.
When I worked with Beam on production with my customers, we needed to catch
errors with side outputs and dead letter queue.

This library really helped us to keep a less verbose code while applying
all the error handling logic, that is error prone and verbose if it is
repeated.

As Kennet said, my intention was to stay as close as possible to Beam, with
a Wrapper and a Failure Monad on top of a PCollection, to handle all the
code and complexity for try catch blocks and side output.

For the governance, even if I am the creator of this library, the most
important isn't me but the community and to help the community.
If the best solution to help the community is including the library
directly on Beam, we can go in this direction, with of course your reviews
and recommendations.

Then the library will belong to the community and we will continue to
improve it.

For the decision about the best place, I will comply with the majority.

Best regards,

Mazlum

On Mon, Sep 11, 2023 at 11:15 PM Austin Bennett  wrote:

> @Mazlum TOSUN  --  you and I have spoken a few
> times about this.  it'd be good for you to comment here on list, on any of
> your concerns with governance, and/or other thoughts.  Ex: if you think
> contributing asgarde directly is the thing [ or perhaps expressing any
> interest helping write/contribute the relevant functionality into beam ...
> it is possible that by adding the actual functionality into beam - like
> Kenn's mentioned 'other place' we could make asgarde as an separate add-on
> obsolete ].
>
>
>
> On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles  wrote:
>
>> For anyone who hasn't clicked over the Asgarde, my TL;DR description of
>> it is that it adds the "failure monad" aka "andThen" style error/result
>> handling on top of chaining of PCollections. So it is at a similar level of
>> abstraction of our basic transforms and generally useful for chaining
>> dead-letter side outputs. It is no more or less appropriate for the core
>> SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we
>> actually aspired to have a thin core with the accessories like that in
>> another place, then it should go to that other place.
>>
>> Kenn
>>
>> On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev <
>> dev@beam.apache.org> wrote:
>>
>>> > until we *require* Asgard on a core transform, it shouldn't be in the
>>> main repo
>>>
>>> I don't think this is necessarily true if it solves end user use cases.
>>> If there is a specific transform that solves a specific use case, we could
>>> include it in the transforms folder for end-users, even if it isn't
>>> utilized in the I/Os at present. Hence the suggestion to take the most
>>> promising transforms and propose adding them with documentation, apis and
>>> rationale.
>>>
>>> -Daniel
>>>
>>> On Fri, Sep 8, 2023 at 11:20 AM Robert Burke  wrote:
>>>
 I would say until we *require* Asgard on a core transform, it shouldn't
 be in the main repo.

 Incorporating something before there's a need for it is premature
 abstraction. We can't do things because they *might* be useful. Let's see
 concrete places where they are useful, or we're already having a similar
 need solved a different way.

 Beam is complicated by itself, and we do encourage multiple ways of
 solving problems, but that says to me that having an out of repo ecosystem
 is the right path, rather than incorporation.

 On Fri, Sep 8, 2023, 8:14 AM Daniel Collins via dev <
 dev@beam.apache.org> wrote:

> I think there are a lot of interesting and relatively isolated
> components of the project, it might make sense to write per-transform one
> pagers for isolated things like the most useful pieces (just basically
> copying the documentation and justifying the API) instead of doing a
> one-shot import or having it live forever in an external project.
>
> -Daniel
>
> On Fri, Sep 8, 2023 at 11:10 AM Kenneth Knowles 
> wrote:
>
>> I agree with everyone about "not everything has to be in the Beam
>> repo". I really like the idea of having a clearer "ecosystem" section of
>> the website, which is sort of started at
>> https://beam.apache.org/community/integrations/ but that is not very
>> prominent.
>>
>> Agree with John though. The transforms in Asgarde could potentially
>> be used in Beam. Potentially best accomplished by just adding them as
>> transforms to the core Java SDK?
>>
>> Kenn
>>
>> On Wed, Sep 6, 2023 at 1:46 PM John Casey via dev <
>> dev@beam.apache.org> wrote:
>>
>>> Agreed on documentation and on keeping it in a separate repo.
>>>
>>> We have a few pretty significant beam extensions (scio and Datafl

Re: DRAFT - Apache Beam Board Report - September 2023

2023-09-12 Thread Kenneth Knowles
As you can probably tell, I copy/pasted. There will be no Beam Summit next
week :-)

On Tue, Sep 12, 2023 at 11:11 AM Kenneth Knowles  wrote:

> Hi all,
>
> The next Beam board report is due tomorrow, Wednesday, September 13.
> Please help me to draft it at
> https://s.apache.org/beam-draft-report-2023-09. The doc is open for
> anyone to edit.
>
> Ideas:
>
>  - highlights from CHANGES.md
>  - interesting technical discussions
>  - integrations with other projects
>  - community events
>  - major user facing addition/deprecation
>  - stuff that will be presented at Beam Summit next week :-)
>
> Past reports are at https://whimsy.apache.org/board/minutes/Beam.html for
> examples.
>
> Thanks,
>
> Kenn
>


DRAFT - Apache Beam Board Report - September 2023

2023-09-12 Thread Kenneth Knowles
Hi all,

The next Beam board report is due tomorrow, Wednesday, September 13. Please
help me to draft it at https://s.apache.org/beam-draft-report-2023-09. The
doc is open for anyone to edit.

Ideas:

 - highlights from CHANGES.md
 - interesting technical discussions
 - integrations with other projects
 - community events
 - major user facing addition/deprecation
 - stuff that will be presented at Beam Summit next week :-)

Past reports are at https://whimsy.apache.org/board/minutes/Beam.html for
examples.

Thanks,

Kenn


Beam High Priority Issue Report (42)

2023-09-12 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28168 [Bug]: BigQuery Storage Write API 
does not write with no complaint
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_te