Re: Spark structured streaming - performance tuning
Anyone else can answer below questions on performance tuning Structured streaming? @Jacek? On Sun, May 3, 2020 at 12:07 AM Srinivas V wrote: > Hi Alex, read the book , it is a good one but i don’t see things which I > strongly want to understand. > You are right on the partition and tasks. > 1.How to use coalesce with spark structured streaming ? > > Also I want to ask few more questions, > 2. How to restrict number of executors on structured streaming? > —num-executors is minimum is it ? > To cap max, can I use spark.dynamicAllocation.maxExecutors ? > > 3. Does other streaming properties hold good for structured streaming? > Like spark.streaming.dynamicAllocation.enabled ? > If not what are the ones it takes into consideration? > > 4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/ > cores? In case of Kafka consumer, when the cluster has to scale down, does > it reconfigure the mapping of executors cores to kaka partitions? > > 5. Why spark srtructured Streaming web ui (SQL tab) is not so informative > like streaming tab of Spark streaming ? > > It would be great if these questions are answered, otherwise the only > option left would be to go through the spark code and figure out. > > On Sat, Apr 18, 2020 at 1:09 PM Alex Ott wrote: > >> Just to clarify - I didn't write this explicitly in my answer. When you're >> working with Kafka, every partition in Kafka is mapped into Spark >> partition. And in Spark, every partition is mapped into task. But you >> can >> use `coalesce` to decrease the number of Spark partitions, so you'll have >> less tasks... >> >> Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530" wrote: >> SV> Thank you Alex. I will check it out and let you know if I have any >> questions >> >> SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: >> >> SV> http://shop.oreilly.com/product/0636920047568.do has quite good >> information >> SV> on it. For Kafka, you need to start with approximation that >> processing of >> SV> each partition is a separate task that need to be executed, so >> you need to >> SV> plan number of cores correspondingly. >> SV> >> SV> Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: >> SV> SV> Hello, >> SV> SV> Can someone point me to a good video or document which >> takes about performance tuning for structured streaming app? >> SV> SV> I am looking especially for listening to Kafka topics say 5 >> topics each with 100 portions . >> SV> SV> Trying to figure out best cluster size and number of >> executors and cores required. >> >> >> -- >> With best wishes,Alex Ott >> http://alexott.net/ >> Twitter: alexott_en (English), alexott (Russian) >> >
Re: Spark structured streaming - performance tuning
Hi Alex, read the book , it is a good one but i don’t see things which I strongly want to understand. You are right on the partition and tasks. 1.How to use coalesce with spark structured streaming ? Also I want to ask few more questions, 2. How to restrict number of executors on structured streaming? —num-executors is minimum is it ? To cap max, can I use spark.dynamicAllocation.maxExecutors ? 3. Does other streaming properties hold good for structured streaming? Like spark.streaming.dynamicAllocation.enabled ? If not what are the ones it takes into consideration? 4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/ cores? In case of Kafka consumer, when the cluster has to scale down, does it reconfigure the mapping of executors cores to kaka partitions? 5. Why spark srtructured Streaming web ui (SQL tab) is not so informative like streaming tab of Spark streaming ? It would be great if these questions are answered, otherwise the only option left would be to go through the spark code and figure out. On Sat, Apr 18, 2020 at 1:09 PM Alex Ott wrote: > Just to clarify - I didn't write this explicitly in my answer. When you're > working with Kafka, every partition in Kafka is mapped into Spark > partition. And in Spark, every partition is mapped into task. But you can > use `coalesce` to decrease the number of Spark partitions, so you'll have > less tasks... > > Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530" wrote: > SV> Thank you Alex. I will check it out and let you know if I have any > questions > > SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: > > SV> http://shop.oreilly.com/product/0636920047568.do has quite good > information > SV> on it. For Kafka, you need to start with approximation that > processing of > SV> each partition is a separate task that need to be executed, so > you need to > SV> plan number of cores correspondingly. > SV> > SV> Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: > SV> SV> Hello, > SV> SV> Can someone point me to a good video or document which takes > about performance tuning for structured streaming app? > SV> SV> I am looking especially for listening to Kafka topics say 5 > topics each with 100 portions . > SV> SV> Trying to figure out best cluster size and number of > executors and cores required. > > > -- > With best wishes,Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) >
Re: Spark structured streaming - performance tuning
Just to clarify - I didn't write this explicitly in my answer. When you're working with Kafka, every partition in Kafka is mapped into Spark partition. And in Spark, every partition is mapped into task. But you can use `coalesce` to decrease the number of Spark partitions, so you'll have less tasks... Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530" wrote: SV> Thank you Alex. I will check it out and let you know if I have any questions SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: SV> http://shop.oreilly.com/product/0636920047568.do has quite good information SV> on it. For Kafka, you need to start with approximation that processing of SV> each partition is a separate task that need to be executed, so you need to SV> plan number of cores correspondingly. SV> SV> Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: SV> SV> Hello, SV> SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? SV> SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . SV> SV> Trying to figure out best cluster size and number of executors and cores required. -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark structured streaming - performance tuning
Thank you Alex. I will check it out and let you know if I have any questions On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: > http://shop.oreilly.com/product/0636920047568.do has quite good > information > on it. For Kafka, you need to start with approximation that processing of > each partition is a separate task that need to be executed, so you need to > plan number of cores correspondingly. > > Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: > SV> Hello, > SV> Can someone point me to a good video or document which takes about > performance tuning for structured streaming app? > SV> I am looking especially for listening to Kafka topics say 5 topics > each with 100 portions . > SV> Trying to figure out best cluster size and number of executors and > cores required. > > > -- > With best wishes,Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) >
Re: Spark structured streaming - performance tuning
http://shop.oreilly.com/product/0636920047568.do has quite good information on it. For Kafka, you need to start with approximation that processing of each partition is a separate task that need to be executed, so you need to plan number of cores correspondingly. Srinivas V at "Thu, 16 Apr 2020 22:49:15 +0530" wrote: SV> Hello, SV> Can someone point me to a good video or document which takes about performance tuning for structured streaming app? SV> I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . SV> Trying to figure out best cluster size and number of executors and cores required. -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark structured streaming - performance tuning
Hello, Can someone point me to a good video or document which takes about performance tuning for structured streaming app? I am looking especially for listening to Kafka topics say 5 topics each with 100 portions . Trying to figure out best cluster size and number of executors and cores required. Regards Srini