Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Wesley Peng
Rion Williams wrote: The three authors of Streaming Systems are folks that work on Google’s Dataflow Project, which for all intents and purposes is essentially an implementation of the Beam Model. Two of them are members of the Beam PMC (essentially a steering committee for the project)

Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Wesley Peng
Hi Luke Luke Cwik wrote: The author for Apache Beam A Complete Guide does not have good reviews on Amazon for their other books and as you mentioned no reviews for this one. I would second the Streaming Systems book as the authors directly worked on Apache Beam. So, can Apache beam team

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Brian Hulette
It just occurred to me that BEAM-10265 [1] could be the cause of the stack overflow. Does ArticleEnvelope refer to itself recursively? Beam schemas are not allowed to be recursive, and it looks like we don't fail gracefully for recursive proto definitions. Brian [1]

Re: DoFn with SideInput

2020-06-29 Thread Praveen K Viswanathan
Thank you Luke. I changed DefaultTrigger.of() to AfterProcessingTime. pastFirstElementInPane() and it worked. On Mon, Jun 29, 2020 at 9:09 AM Luke Cwik wrote: > The UpdateFn won't be invoked till the side input is ready which requires > either the watermark to pass the end of the global window

Re: [Java - Beam Schema] Manually Generating a Beam Schema for a POJO class

2020-06-29 Thread Brian Hulette
Hm it looks like the error is from trying to call the zero-arg constructor for the ArticleEnvelope proto class. Do you have a schema registered for ArticleEnvelope? I think maybe what's happening is Beam finds there's no schema registered for ArticleEnvelope, so it just recursively applies

Re: Concurrency issue with KafkaIO

2020-06-29 Thread wang Wu
Hi, We are using version 2.16.0. More about our dependencies: +- org.apache.beam:beam-sdks-java-core:jar:2.16.0:compile [INFO] | +- org.apache.beam:beam-model-job-management:jar:2.16.0:compile [INFO] | +- org.apache.beam:beam-vendor-bytebuddy-1_9_3:jar:0.1:compile [INFO] | \-

Re: KafkaIO does not support add or remove topics

2020-06-29 Thread Talat Uyarer
Thanks Alexey for sharing tickets and code. I found one workaround to use the update function. If I generate different KafkaIO step name for each submission and provide --transformNameMapping="{"Kafka_IO_3242/Read(KafkaUnboundedSource)/DataflowRunner.StreamingUnboundedRead.ReadWithIds":""}" My

Re: Concurrency issue with KafkaIO

2020-06-29 Thread Alexey Romanenko
I don’t think it’s a known issue. Could you tell with version of Beam you use? > On 28 Jun 2020, at 14:43, wang Wu wrote: > > Hi, > We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple > Kafka topics. Our job run fine until it is under heavy load: millions of > Kafka

Re: KafkaIO does not support add or remove topics

2020-06-29 Thread Alexey Romanenko
Yes, it’s a known limitation [1] mostly due to the fact that KafkaIO.Read is based on UnboundedSource API and it fetches all information about topic and partitions only once during a “split" phase [2]. There is on-going work to make KafkaIO.Read based on Splittable DoFn [3] which should allow

Re: Can SpannerIO read data from different GCP project?

2020-06-29 Thread Luke Cwik
The intent is that you grant permissions to the account that is running the Dataflow job to the resources you want it to access in project B before you start the pipeline. This allows for much finer grain access control and the ability to revoke permissions without having to disable an entire

KafkaIO does not support add or remove topics

2020-06-29 Thread Talat Uyarer
Hi, I am using Dataflow. When I update my DF job with a new topic or update partition count on the same topic. I can not use DF's update function. Could you help me to understand why I can not use the update function for that ? I checked the Beam source code but I could not find the right place

Re: DoFn with SideInput

2020-06-29 Thread Luke Cwik
The UpdateFn won't be invoked till the side input is ready which requires either the watermark to pass the end of the global window + allowed lateness (to show that the side input is empty) or at least one firing to populate it with data. See this general section on side inputs[1] and some useful

Re: Apache Beam a Complete Guide - Review?

2020-06-29 Thread Luke Cwik
The author for Apache Beam A Complete Guide does not have good reviews on Amazon for their other books and as you mentioned no reviews for this one. I would second the Streaming Systems book as the authors directly worked on Apache Beam. On Sun, Jun 28, 2020 at 6:46 PM Wesley Peng wrote: > Hi