Easy Multi-language via a SchemaTransform-aware Expansion Service

2022-08-04 Thread Chamikara Jayalath via dev
Hi All,

I believe we can make the multi-language pipelines offering [1] much easier
to use by updating the expansion service to be fully aware of
SchemaTransforms. Additionally this will make it easy to
register/discover/use transforms defined in one SDK from all other SDKs.
Specifically we could add the following features.

   - Expansion service can be used to easily initialize and expand
   transforms without need for additional code.
   - Expansion service can be used to easily discover already registered
   transforms.
   - Pipeline SDKs can generate user-friendly stub-APIs based on transforms
   registered with an expansion service, eliminating the need to develop
   language-specific wrappers.

Please see here for my proposal: https://s.apache.org/easy-multi-language

Lemme know if you have any comments/questions/suggestions :)

Thanks,
Cham

[1]
https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines


Vendored gRPC update

2022-08-04 Thread Luke Cwik via dev
I was looking to update gRPC that we use to the latest (1.48.1) version to
move off of a vulnerable version of Netty that a user pointed out in
BEAM-14118. This would supersede the work done in
https://github.com/apache/beam/pull/17206 as that PR has stalled.

If there aren't any concerns I'll test an updated artifact and start a
voting thread once the tests complete.


Re: [RFC] State & Timers API Design for Go SDK

2022-08-04 Thread Robert Burke
Fantastic document! I have a much stronger idea of how state and timers are
supposed to work generally by reading it.

And the approach is very good, leveraging the new Go Generics in a
reasonable way, without breaking the feel of the SDK.

I can also see us adopting a similar technique more generally for
SideInputs and Emitters, to help reduce the verbosity of the positional API
the sdk has, though that is highly future looking of me.
Maybe if we ever decide to do a V3.

Thanks!

On Thu, Jul 28, 2022, 1:17 PM Austin Bennett 
wrote:

> Looks great!
>
> On Thu, Jul 28, 2022 at 10:54 AM Jack McCluskey via dev <
> dev@beam.apache.org> wrote:
>
>> Great write-up on state and timers! The solution you chose feels very
>> in-line with how the Go SDK works. Make sure the design doc makes it onto
>> the wiki once you've addressed any feedback!
>>
>> On Thu, Jul 28, 2022 at 1:49 PM Kerry Donny-Clark via dev <
>> dev@beam.apache.org> wrote:
>>
>>> I think this a perfect example of a clear design doc. Great, deeply
>>> detailed alternatives considered and why they were rejected. This makes
>>> review easy, and lets us follow your thought process.
>>> I think this is a good implementation, and I support the chosen approach.
>>> Kerry
>>>
>>> On Thu, Jul 28, 2022 at 1:41 PM Kenneth Knowles  wrote:
>>>
 Really thorough. Love it!

 On Thu, Jul 28, 2022 at 9:02 AM Ritesh Ghorse via dev <
 dev@beam.apache.org> wrote:

> Hey everyone,
>
> Danny  and I have been working on
> designing the state and timers for Go SDK. We wrote a design doc with
> user-facing API, execution details, and different alternatives considered.
> It would be really helpful if we could get your
> suggestions/feedback/comments on the design.
>
> Design Doc:
> https://docs.google.com/document/d/1rcKa1Z6orDDFr1l8t6NA1eLl6zanQbYAEiAqk39NQUU/edit?usp=sharing
>
> Thanks!
> Ritesh Ghorse
>



Re: [Release] 2.41.0 release update

2022-08-04 Thread Kiley Sok via dev
Last remaining issue was cherry-picked. There may be one last issue with
gRPC that's being investigated.

https://github.com/apache/beam/issues/22283

On Thu, Jul 28, 2022 at 2:20 PM Kiley Sok  wrote:

> Quick update for today:
>
> I'm still working through the validation tests, but we currently have 2
> open issues:
> https://github.com/apache/beam/issues/22454
> https://github.com/apache/beam/issues/22188
>
>
>
> On Wed, Jul 27, 2022 at 5:03 PM Kiley Sok  wrote:
>
>> Hi all,
>>
>> I've cut the release branch:
>> https://github.com/apache/beam/tree/release-2.41.0
>>
>> There's one known issue
>> 
>>  that
>> needs to be cherry picked. Please let me know if you have a change that
>> needs to go in.
>>
>> Thanks,
>> Kiley
>>
>>


Beam Dependency Check Report (2022-08-04)

2022-08-04 Thread Apache Jenkins Server
<<< text/html; charset=UTF-8: Unrecognized >>>


Beam High Priority Issue Report

2022-08-04 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need 
attention.

See https://beam.apache.org/contribute/issue-priorities for the meaning and 
expectations around issue priorities.

Unassigned P1 Issues:

https://github.com/apache/beam/issues/22543 [Bug]: ClassCastException when 
using custom DynamicDestination in BigQueryIO.Write
https://github.com/apache/beam/issues/22440 [Bug]: Python Batch Dataflow 
SideInput LoadTests failing
https://github.com/apache/beam/issues/22401 [Bug]: BigQueryIO getFailedInserts 
fails when using Storage APIs 
https://github.com/apache/beam/issues/22321 
PortableRunnerTestWithExternalEnv.test_pardo_large_input is regularly failing 
on jenkins
https://github.com/apache/beam/issues/22303 [Task]: Add tests to Kafka SDF and 
fix known and discovered issues
https://github.com/apache/beam/issues/22299 [Bug]: JDBCIO Write freeze at 
getConnection() in WriteFn
https://github.com/apache/beam/issues/22188 BigQuery Storage API sink sometimes 
gets stuck outputting to an invalid timestamp
https://github.com/apache/beam/issues/21794 Dataflow runner creates a new timer 
whenever the output timestamp is change
https://github.com/apache/beam/issues/21713 404s in BigQueryIO don't get output 
to Failed Inserts PCollection
https://github.com/apache/beam/issues/21704 beam_PostCommit_Java_DataflowV2 
failures parent bug
https://github.com/apache/beam/issues/21703 pubsublite.ReadWriteIT failing in 
beam_PostCommit_Java_DataflowV1 and V2
https://github.com/apache/beam/issues/21702 SpannerWriteIT failing in beam 
PostCommit Java V1
https://github.com/apache/beam/issues/21701 beam_PostCommit_Java_DataflowV1 
failing with a variety of flakes and errors
https://github.com/apache/beam/issues/21700 
--dataflowServiceOptions=use_runner_v2 is broken
https://github.com/apache/beam/issues/21696 Flink Tests failure :  
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.beam.runners.core.construction.SerializablePipelineOptions 
https://github.com/apache/beam/issues/21695 DataflowPipelineResult does not 
raise exception for unsuccessful states.
https://github.com/apache/beam/issues/21694 BigQuery Storage API insert with 
writeResult retry and write to error table
https://github.com/apache/beam/issues/21480 flake: 
FlinkRunnerTest.testEnsureStdoutStdErrIsRestored
https://github.com/apache/beam/issues/21472 Dataflow streaming tests failing 
new AfterSynchronizedProcessingTime test
https://github.com/apache/beam/issues/21471 Flakes: Failed to load cache entry
https://github.com/apache/beam/issues/21470 Test flake: test_split_half_sdf
https://github.com/apache/beam/issues/21469 beam_PostCommit_XVR_Flink flaky: 
Connection refused
https://github.com/apache/beam/issues/21468 
beam_PostCommit_Python_Examples_Dataflow failing
https://github.com/apache/beam/issues/21467 GBK and CoGBK streaming Java load 
tests failing
https://github.com/apache/beam/issues/21465 Kafka commit offset drop data on 
failure for runners that have non-checkpointing shuffle
https://github.com/apache/beam/issues/21463 NPE in Flink Portable 
ValidatesRunner streaming suite
https://github.com/apache/beam/issues/21462 Flake in 
org.apache.beam.sdk.io.mqtt.MqttIOTest.testReadObject: Address already in use
https://github.com/apache/beam/issues/21271 pubsublite.ReadWriteIT flaky in 
beam_PostCommit_Java_DataflowV2  
https://github.com/apache/beam/issues/21270 
org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testWindowedCombineGloballyAsSingletonView
 flaky on Dataflow Runner V2
https://github.com/apache/beam/issues/21268 Race between member variable being 
accessed due to leaking uninitialized state via OutboundObserverFactory
https://github.com/apache/beam/issues/21267 WriteToBigQuery submits a duplicate 
BQ load job if a 503 error code is returned from googleapi
https://github.com/apache/beam/issues/21266 
org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInProcessElementStateful
 is flaky in Java ValidatesRunner Flink suite.
https://github.com/apache/beam/issues/21265 
apache_beam.runners.portability.fn_api_runner.translations_test.TranslationsTest.test_run_packable_combine_globally
 'apache_beam.coders.coder_impl._AbstractIterable' object is not reversible
https://github.com/apache/beam/issues/21263 (Broken Pipe induced) Bricked 
Dataflow Pipeline 
https://github.com/apache/beam/issues/21262 Python AfterAny, AfterAll do not 
follow spec
https://github.com/apache/beam/issues/21261 
org.apache.beam.runners.dataflow.worker.fn.logging.BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
 is flaky
https://github.com/apache/beam/issues/21260 Python DirectRunner does not emit 
data at GC time
https://github.com/apache/beam/issues/21257 Either Create or DirectRunner fails 
to produce all elements to the following transform
https://github.com/apache/beam/issues/21123 Multiple jobs running on Flink 
session cluster reuse the persistent 

Community event cooperation

2022-08-04 Thread 曾辉
Hey, Developers in the Apache Beam community, How's your day?

 I'm Apache DolphinScheduler Community Manager, you can call me Niko, nice
to meet you all

Apache DolphinScheduler is a worldly renowned data orchestration tool that
has largely taken the scheduler market in China. Over 1000 companies,
including IBM, Tencent, iFlytek, Meituan, 360, China Unicom, Shein, and SF
Express, are relying on its decentralized infrastructure and no-code DAG
interface. Apache DolphinScheduler also owns the largest developer
community in China and each meetup gathers over 3K attendees.

We would love to find partners like Apache Beam to co-host events in the
Bay Area, to share our resources with fellow Apache teams.

click the link

is an introduction to our community programs. If you are interested in
becoming our partner and holding a Meetup together, please contact me in
the mail, or schedule a zoom call to discuss the details sometime next
week.   looking forward to your reply.

I believe that combining the influence of our two sides can let more people
know about the Apache Beam open source project.

Best,
Niko