Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-06 Thread John Casey via dev
Agreed on documentation and on keeping it in a separate repo.

We have a few pretty significant beam extensions (scio and Dataflow
Templates also come to mind) that Beam should highlight, but are separate
repos for their own governance, contributions, and release reasons.

The difference with Asgarde is that we might want to use it in Beam itself,
which makes it more reasonable to include in the main repo.

On Tue, Sep 5, 2023 at 8:36 PM Robert Bradshaw via dev 
wrote:

> I think this is a great library. I'm on the fence of whether it makes
> sense to include with Beam proper vs. be a library that builds on top of
> Beam. (Would there be benefits of tighter integration? There is the
> maintenance/loss of governance issue.) I am definitely not on the side that
> the entire Beam ecosystem needs to be distributed/maintained by Beam
> itself.
>
> Regardless of the direction we go, I think it could make a lot of sense to
> put pointers to it in our documentation.
>
>
> On Tue, Sep 5, 2023 at 7:21 AM Danny McCormick via dev <
> dev@beam.apache.org> wrote:
>
>> I think my only concerns here are around the toil we'll be taking on, and
>> will we be leaving the asgarde project in a better or worse place.
>>
>> From a release standpoint, we would need to release it with the same
>> cadence as Beam. Adding asgarde into our standard release process seems
>> fairly straightforward, though, so I'm not too worried about it - looks
>> like it's basically (1) add a commit like this
>> ,
>> (2) run this workflow
>> ,
>> and (3) tag/mark the release as released on GitHub.
>>
>> In terms of bug fixes and improvements, though, I'm a little worried that
>> we might be leaving things in a worse state since Mazlum has been the only
>> contributor thus far, and he would lose some governance (and possibly the
>> ability to commit code on his own). An extra motivated community member or
>> two could change the math a bit, but I'm not sure if there are actually
>> clear advantages to including it in Apache other than visibility. Would
>> adding links to our docs calling Asgarde out as an option accomplish the
>> same purpose?
>>
>> > Let's be careful about whether these tests are included in our
>> presubmits. Contrib code with flaky tests has been a major pain point in
>> the past.
>>
>> +1 - I think if we do this I'd vote that it be in a separate repo (
>> github.com/apache/beam-asgarde made sense to me).
>>
>> ---
>>
>> Overall, I'm probably a slight -1 to adding this to the Apache workspace,
>> but +1 to at least adding links from the Beam docs to Asgarde.
>>
>> Thanks,
>> Danny
>>
>>
>>
>> On Tue, Sep 5, 2023 at 12:03 AM Reuven Lax via dev 
>> wrote:
>>
>>> Let's be careful about whether these tests are included in our
>>> presubmits. Contrib code with flaky tests has been a major pain point in
>>> the past.
>>>
>>> On Sat, Sep 2, 2023 at 12:02 PM Austin Bennett 
>>> wrote:
>>>
 Wanting us to not miss this. @Mazlum TOSUN  is
 happy to donate Asgarde to our project.

 It looks like he'd need a SGA and CCLA [ 1 ] on file; anything else?

 I recalled the donation of Euphoria [ 2 ] , so I looked at those
 threads [ 3 ]  for insights into the process.  It didn't look like there
 was a needed VOTE, so mostly a matter of ensuring necessary signatures, and
 ideally some sort of consensus [ or non-opposition ] to the donation.


 [ 1 ] https://www.apache.org/licenses/contributor-agreements.html
 [ 2 ] https://beam.apache.org/documentation/sdks/java/euphoria/
 [ 3 ] https://lists.apache.org/thread/xzlx4rm2tvc36mmwvhyvtdvsw7bnjscp



 On Thu, Jun 15, 2023 at 7:05 AM Kerry Donny-Clark via dev <
 dev@beam.apache.org> wrote:

> This looks like an excellent contribution. I can easily understand the
> motivation, and I think Beam would benefit from a higher level abstraction
> for error handling.
> Kerry
>
> On Wed, Jun 14, 2023, 6:31 PM Austin Bennett 
> wrote:
>
>> Hi Beam Devs,
>>
>> @Mazlum  was
>> suggested to consider donating Asgarde
>>  to Beam for Java/Kotlin error
>> handling to Beam [ see:
>> https://2022.beamsummit.org/sessions/error-handling-asgarde/ for
>> last year's Beam Summit talk ], he is also the author of Pasgard
>> e [ for Python ] and Milgard [
>> for a simplified Kotlin API ].
>>
>> Would Asgarde be a good contribution, something the Beam community
>> would be willing to accept?  I imagine we might want it to live at
>> github.com/apache/beam-asgarde ?  Or perhaps there is a good place
>> in github.com/apache/beam ??
>>
>

Beam High Priority Issue Report (40)

2023-09-06 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/28168 [Bug]: BigQuery Storage Write API 
does not write with no complaint
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121 
apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT.test_streaming_wordcount_it
 flakey
https://github.com/apache/beam/issues/21104 Flaky: 
apache_beam.runners.portability.fn_api_runner.fn_runner_test.FnApiRunnerTestWithGrpcAndMultiWorkers
https://github.com/apache/beam/issues/20976 
apache_beam.runners.portability.flink_runner_test.FlinkRunnerTestOptimized.test_flink_metrics
 is flaky
htt