Re: Spark structured streaming - performance tuning

2020-05-08 Thread Srinivas V
Anyone else can answer below questions on performance tuning Structured streaming? @Jacek? On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and tasks. >

Re: Spark structured streaming - performance tuning

2020-05-02 Thread Srinivas V
Hi Alex, read the book , it is a good one but i don’t see things which I strongly want to understand. You are right on the partition and tasks. 1.How to use coalesce with spark structured streaming ? Also I want to ask few more questions, 2. How to restrict number of executors on structured

Re: Spark structured streaming - performance tuning

2020-04-18 Thread Alex Ott
Just to clarify - I didn't write this explicitly in my answer. When you're working with Kafka, every partition in Kafka is mapped into Spark partition. And in Spark, every partition is mapped into task. But you can use `coalesce` to decrease the number of Spark partitions, so you'll have less

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Srinivas V
Thank you Alex. I will check it out and let you know if I have any questions On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: > http://shop.oreilly.com/product/0636920047568.do has quite good > information > on it. For Kafka, you need to start with approximation that processing of > each

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Alex Ott
http://shop.oreilly.com/product/0636920047568.do has quite good information on it. For Kafka, you need to start with approximation that processing of each partition is a separate task that need to be executed, so you need to plan number of cores correspondingly. Srinivas V at "Thu, 16 Apr 2020

Spark structured streaming - performance tuning

2020-04-16 Thread Srinivas V
Hello, Can someone point me to a good video or document which takes about performance tuning for structured streaming app? I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . Trying to figure out best cluster size and number of executors and cores required.