Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-11 Thread Austin Bennett
@Mazlum TOSUN  --  you and I have spoken a few
times about this.  it'd be good for you to comment here on list, on any of
your concerns with governance, and/or other thoughts.  Ex: if you think
contributing asgarde directly is the thing [ or perhaps expressing any
interest helping write/contribute the relevant functionality into beam ...
it is possible that by adding the actual functionality into beam - like
Kenn's mentioned 'other place' we could make asgarde as an separate add-on
obsolete ].



On Fri, Sep 8, 2023 at 8:55 AM Kenneth Knowles  wrote:

> For anyone who hasn't clicked over the Asgarde, my TL;DR description of it
> is that it adds the "failure monad" aka "andThen" style error/result
> handling on top of chaining of PCollections. So it is at a similar level of
> abstraction of our basic transforms and generally useful for chaining
> dead-letter side outputs. It is no more or less appropriate for the core
> SDK than, say, the Project/Filter/Join transforms, or Watch, etc. If we
> actually aspired to have a thin core with the accessories like that in
> another place, then it should go to that other place.
>
> Kenn
>
> On Fri, Sep 8, 2023 at 11:24 AM Daniel Collins via dev <
> dev@beam.apache.org> wrote:
>
>> > until we *require* Asgard on a core transform, it shouldn't be in the
>> main repo
>>
>> I don't think this is necessarily true if it solves end user use cases.
>> If there is a specific transform that solves a specific use case, we could
>> include it in the transforms folder for end-users, even if it isn't
>> utilized in the I/Os at present. Hence the suggestion to take the most
>> promising transforms and propose adding them with documentation, apis and
>> rationale.
>>
>> -Daniel
>>
>> On Fri, Sep 8, 2023 at 11:20 AM Robert Burke  wrote:
>>
>>> I would say until we *require* Asgard on a core transform, it shouldn't
>>> be in the main repo.
>>>
>>> Incorporating something before there's a need for it is premature
>>> abstraction. We can't do things because they *might* be useful. Let's see
>>> concrete places where they are useful, or we're already having a similar
>>> need solved a different way.
>>>
>>> Beam is complicated by itself, and we do encourage multiple ways of
>>> solving problems, but that says to me that having an out of repo ecosystem
>>> is the right path, rather than incorporation.
>>>
>>> On Fri, Sep 8, 2023, 8:14 AM Daniel Collins via dev 
>>> wrote:
>>>
 I think there are a lot of interesting and relatively isolated
 components of the project, it might make sense to write per-transform one
 pagers for isolated things like the most useful pieces (just basically
 copying the documentation and justifying the API) instead of doing a
 one-shot import or having it live forever in an external project.

 -Daniel

 On Fri, Sep 8, 2023 at 11:10 AM Kenneth Knowles 
 wrote:

> I agree with everyone about "not everything has to be in the Beam
> repo". I really like the idea of having a clearer "ecosystem" section of
> the website, which is sort of started at
> https://beam.apache.org/community/integrations/ but that is not very
> prominent.
>
> Agree with John though. The transforms in Asgarde could potentially be
> used in Beam. Potentially best accomplished by just adding them as
> transforms to the core Java SDK?
>
> Kenn
>
> On Wed, Sep 6, 2023 at 1:46 PM John Casey via dev 
> wrote:
>
>> Agreed on documentation and on keeping it in a separate repo.
>>
>> We have a few pretty significant beam extensions (scio and Dataflow
>> Templates also come to mind) that Beam should highlight, but are separate
>> repos for their own governance, contributions, and release reasons.
>>
>> The difference with Asgarde is that we might want to use it in Beam
>> itself, which makes it more reasonable to include in the main repo.
>>
>> On Tue, Sep 5, 2023 at 8:36 PM Robert Bradshaw via dev <
>> dev@beam.apache.org> wrote:
>>
>>> I think this is a great library. I'm on the fence of whether it
>>> makes sense to include with Beam proper vs. be a library that builds on 
>>> top
>>> of Beam. (Would there be benefits of tighter integration? There is the
>>> maintenance/loss of governance issue.) I am definitely not on the side 
>>> that
>>> the entire Beam ecosystem needs to be distributed/maintained by Beam
>>> itself.
>>>
>>> Regardless of the direction we go, I think it could make a lot of
>>> sense to put pointers to it in our documentation.
>>>
>>>
>>> On Tue, Sep 5, 2023 at 7:21 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 I think my only concerns here are around the toil we'll be taking
 on, and will we be leaving the asgarde project in a better or worse 
 place.

 From a release standpoint, we would need to release it 

Re: [Proposal] Enable EnricoMi/publish-unit-test-result-action

2023-09-11 Thread Kenneth Knowles
Awesome, thanks!

On Fri, Sep 8, 2023 at 1:22 PM Yi Hu via dev  wrote:

> Thanks for the feedback! A request has been sent to Apache Infra.
>
> I checked that GitHub Action workflows actually published the gradle scan
> (done by [1]), but some workflow added after [1] simply missed that
> setting. Opened [2] for clean up and improvements.
>
> Best,
> Yi
>
> [1] https://github.com/apache/beam/pull/28212
> [2] https://github.com/apache/beam/issues/28378
>
> On Tue, Sep 5, 2023 at 12:26 PM Kenneth Knowles  wrote:
>
>> +1 this seems useful.
>>
>> Some of the same functionality is also done pretty well or even more in
>> depth via gradle scan. If I recall, some GHA jobs do not upload those. Is
>> that also on the roadmap or is it blocked for some reason?
>>
>> Kenn
>>
>> On Tue, Sep 5, 2023 at 11:54 AM Bruno Volpato via dev <
>> dev@beam.apache.org> wrote:
>>
>>> +1, this is helpful.
>>>
>>> We had a similar situation with DataflowTemplates
>>> .
>>> Even though we used a different repository (mikepenz/action-junit-report
>>> ), this strategy was
>>> invaluable to reduce troubleshoot time (sample report
>>> 
>>> ).
>>>
>>> Thanks Yi!
>>>
>>>
>>>
>>> On Tue, Sep 5, 2023 at 11:26 AM Danny McCormick via dev <
>>> dev@beam.apache.org> wrote:
>>>
 Thanks Yi, I'm definitely +1 on adding this, this is definitely a gap
 in our GitHub Actions infra today.

 Thanks,
 Danny

 On Tue, Sep 5, 2023 at 10:35 AM Yi Hu via dev 
 wrote:

> Hi everyone,
>
> As you may have noticed, GitHub Action for test suites are setting up
> in the Beam repository. A current gap is that Jenkins has a pretty
> convenient test result page showing all tests / failed tests and stack
> trace, while these are not available in github workflow logs.
>
> Here we propose to introduce EnricoMi/publish-unit-test-result-action (
> https://github.com/EnricoMi/publish-unit-test-result-action) to
> publish Java (and possibly Python in the future) test results. An example
> PR can be found [1] and an INFRA ticket in [2]. Currently both Java and
> Python test reports are supported by this action (note that Jenkins test
> report page is also available only for Java and Python currently).
>
> Please feel free to comment if you have any questions and suggestions.
>
>
> [1] https://github.com/apache/beam/pull/28075
> [2] https://issues.apache.org/jira/browse/INFRA-24950
>
> Regards,
> Yi
> --
>
> Yi Hu, (he/him/his)
>
> Software Engineer
>
>
>


Beam High Priority Issue Report (43)

2023-09-11 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/28383 [Failing Test]: 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testMaxThreadMetric
https://github.com/apache/beam/issues/28339 Fix failing 
"beam_PostCommit_XVR_GoUsingJava_Dataflow" job
https://github.com/apache/beam/issues/28326 Bug: 
apache_beam.io.gcp.pubsublite.ReadFromPubSubLite not working
https://github.com/apache/beam/issues/28168 [Bug]: BigQuery Storage Write API 
does not write with no complaint
https://github.com/apache/beam/issues/28142 [Bug]: [Go SDK] Memory seems to be 
leaking on 2.49.0 with Dataflow
https://github.com/apache/beam/issues/27892 [Bug]: ignoreUnknownValues not 
working when using CreateDisposition.CREATE_IF_NEEDED 
https://github.com/apache/beam/issues/27648 [Bug]: Python SDFs (e.g. 
PeriodicImpulse) running in Flink and polling using tracker.defer_remainder 
have checkpoint size growing indefinitely 
https://github.com/apache/beam/issues/27616 [Bug]: Unable to use 
applyRowMutations() in bigquery IO apache beam java
https://github.com/apache/beam/issues/27486 [Bug]: Read from datastore with 
inequality filters
https://github.com/apache/beam/issues/27314 [Failing Test]: 
bigquery.StorageApiSinkCreateIfNeededIT.testCreateManyTables[1]
https://github.com/apache/beam/issues/27238 [Bug]: Window trigger has lag when 
using Kafka and GroupByKey on Dataflow Runner
https://github.com/apache/beam/issues/26981 [Bug]: Getting an error related to 
SchemaCoder after upgrading to 2.48
https://github.com/apache/beam/issues/26969 [Failing Test]: Python PostCommit 
is failing due to exceeded rate limits
https://github.com/apache/beam/issues/26911 [Bug]: UNNEST ARRAY with a nested 
ROW (described below)
https://github.com/apache/beam/issues/26354 [Bug]: BigQueryIO direct read not 
reading all rows when set --setEnableBundling=true
https://github.com/apache/beam/issues/26343 [Bug]: 
apache_beam.io.gcp.bigquery_read_it_test.ReadAllBQTests.test_read_queries is 
flaky
https://github.com/apache/beam/issues/26329 [Bug]: BigQuerySourceBase does not 
propagate a Coder to AvroSource
https://github.com/apache/beam/issues/26041 [Bug]: Unable to create 
exactly-once Flink pipeline with stream source and file sink
https://github.com/apache/beam/issues/25975 [Bug]: Reducing parallelism in 
FlinkRunner leads to a data loss
https://github.com/apache/beam/issues/24776 [Bug]: Race condition in Python SDK 
Harness ProcessBundleProgress
https://github.com/apache/beam/issues/24389 [Failing Test]: 
HadoopFormatIOElasticTest.classMethod ExceptionInInitializerError 
ContainerFetchException
https://github.com/apache/beam/issues/24313 [Flaky]: 
apache_beam/runners/portability/portable_runner_test.py::PortableRunnerTestWithSubprocesses::test_pardo_state_with_custom_key_coder
https://github.com/apache/beam/issues/23944  beam_PreCommit_Python_Cron 
regularily failing - test_pardo_large_input flaky
https://github.com/apache/beam/issues/23709 [Flake]: Spark batch flakes in 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElement and 
ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundle
https://github.com/apache/beam/issues/23525 [Bug]: Default PubsubMessage coder 
will drop message id and orderingKey
https://github.com/apache/beam/issues/22913 [Bug]: 
beam_PostCommit_Java_ValidatesRunner_Flink is flakes in 
org.apache.beam.sdk.transforms.GroupByKeyTest$BasicTests.testAfterProcessingTimeContinuationTriggerUsingState
https://github.com/apache/beam/issues/22605 [Bug]: Beam Python failure for 
dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it
https://github.com/apache/beam/issues/21714 
PulsarIOTest.testReadFromSimpleTopic is very flaky
https://github.com/apache/beam/issues/21708 beam_PostCommit_Java_DataflowV2, 
testBigQueryStorageWrite30MProto failing consistently
https://github.com/apache/beam/issues/21706 Flaky timeout in github Python unit 
test action 
StatefulDoFnOnDirectRunnerTest.test_dynamic_timer_clear_then_set_timer
https://github.com/apache/beam/issues/21643 FnRunnerTest with non-trivial 
(order 1000 elements) numpy input flakes in non-cython environment
https://github.com/apache/beam/issues/21476 WriteToBigQuery Dynamic table 
destinations returns wrong tableId
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21424 Java VR (Dataflow, V2, Streaming) 
failing: ParDoTest$TimestampTests/OnWindowExpirationTests
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21121