Re: Spark structured streaming - performance tuning

2020-05-08 Thread Srinivas V
Anyone else can answer below questions on performance tuning Structured
streaming?
@Jacek?

On Sun, May 3, 2020 at 12:07 AM Srinivas V  wrote:

> Hi Alex, read the book , it is a good one but i don’t see things which I
> strongly want to understand.
> You are right on the partition and tasks.
> 1.How to use coalesce with spark structured streaming ?
>
> Also I want to ask few more questions,
> 2. How to restrict number of executors on structured streaming?
>  —num-executors is minimum is it ?
> To cap max, can I use spark.dynamicAllocation.maxExecutors ?
>
> 3. Does other streaming properties hold good for structured streaming?
> Like spark.streaming.dynamicAllocation.enabled ?
> If not what are the ones it takes into consideration?
>
> 4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/
> cores? In case of Kafka consumer, when the cluster has to scale down, does
> it reconfigure the mapping of executors cores to kaka partitions?
>
> 5. Why spark srtructured  Streaming web ui (SQL tab) is not so informative
> like streaming tab of Spark streaming ?
>
> It would be great if these questions are answered, otherwise the only
> option left would be to go through the spark code and figure out.
>
> On Sat, Apr 18, 2020 at 1:09 PM Alex Ott  wrote:
>
>> Just to clarify - I didn't write this explicitly in my answer. When you're
>> working with Kafka, every partition in Kafka is mapped into Spark
>> partition. And in Spark, every partition is mapped into task.   But you
>> can
>> use `coalesce` to decrease the number of Spark partitions, so you'll have
>> less tasks...
>>
>> Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
>>  SV> Thank you Alex. I will check it out and let you know if I have any
>> questions
>>
>>  SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott  wrote:
>>
>>  SV> http://shop.oreilly.com/product/0636920047568.do has quite good
>> information
>>  SV> on it.  For Kafka, you need to start with approximation that
>> processing of
>>  SV> each partition is a separate task that need to be executed, so
>> you need to
>>  SV> plan number of cores correspondingly.
>>  SV>
>>  SV> Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
>>  SV>  SV> Hello,
>>  SV>  SV> Can someone point me to a good video or document which
>> takes about performance tuning for structured streaming app?
>>  SV>  SV> I am looking especially for listening to Kafka topics say 5
>> topics each with 100 portions .
>>  SV>  SV> Trying to figure out best cluster size and number of
>> executors and cores required.
>>
>>
>> --
>> With best wishes,Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>


Re: Spark structured streaming - performance tuning

2020-05-02 Thread Srinivas V
Hi Alex, read the book , it is a good one but i don’t see things which I
strongly want to understand.
You are right on the partition and tasks.
1.How to use coalesce with spark structured streaming ?

Also I want to ask few more questions,
2. How to restrict number of executors on structured streaming?
 —num-executors is minimum is it ?
To cap max, can I use spark.dynamicAllocation.maxExecutors ?

3. Does other streaming properties hold good for structured streaming?
Like spark.streaming.dynamicAllocation.enabled ?
If not what are the ones it takes into consideration?

4. Does structured streaming 2.4.5 allow dynamicAllocation of executors/
cores? In case of Kafka consumer, when the cluster has to scale down, does
it reconfigure the mapping of executors cores to kaka partitions?

5. Why spark srtructured  Streaming web ui (SQL tab) is not so informative
like streaming tab of Spark streaming ?

It would be great if these questions are answered, otherwise the only
option left would be to go through the spark code and figure out.

On Sat, Apr 18, 2020 at 1:09 PM Alex Ott  wrote:

> Just to clarify - I didn't write this explicitly in my answer. When you're
> working with Kafka, every partition in Kafka is mapped into Spark
> partition. And in Spark, every partition is mapped into task.   But you can
> use `coalesce` to decrease the number of Spark partitions, so you'll have
> less tasks...
>
> Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
>  SV> Thank you Alex. I will check it out and let you know if I have any
> questions
>
>  SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott  wrote:
>
>  SV> http://shop.oreilly.com/product/0636920047568.do has quite good
> information
>  SV> on it.  For Kafka, you need to start with approximation that
> processing of
>  SV> each partition is a separate task that need to be executed, so
> you need to
>  SV> plan number of cores correspondingly.
>  SV>
>  SV> Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
>  SV>  SV> Hello,
>  SV>  SV> Can someone point me to a good video or document which takes
> about performance tuning for structured streaming app?
>  SV>  SV> I am looking especially for listening to Kafka topics say 5
> topics each with 100 portions .
>  SV>  SV> Trying to figure out best cluster size and number of
> executors and cores required.
>
>
> --
> With best wishes,Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>


Re: Spark structured streaming - performance tuning

2020-04-18 Thread Alex Ott
Just to clarify - I didn't write this explicitly in my answer. When you're
working with Kafka, every partition in Kafka is mapped into Spark
partition. And in Spark, every partition is mapped into task.   But you can
use `coalesce` to decrease the number of Spark partitions, so you'll have
less tasks...

Srinivas V  at "Sat, 18 Apr 2020 10:32:33 +0530" wrote:
 SV> Thank you Alex. I will check it out and let you know if I have any 
questions

 SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott  wrote:

 SV> http://shop.oreilly.com/product/0636920047568.do has quite good 
information
 SV> on it.  For Kafka, you need to start with approximation that 
processing of
 SV> each partition is a separate task that need to be executed, so you 
need to
 SV> plan number of cores correspondingly.
 SV>
 SV> Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV>  SV> Hello, 
 SV>  SV> Can someone point me to a good video or document which takes 
about performance tuning for structured streaming app? 
 SV>  SV> I am looking especially for listening to Kafka topics say 5 
topics each with 100 portions .
 SV>  SV> Trying to figure out best cluster size and number of executors 
and cores required. 


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark structured streaming - performance tuning

2020-04-17 Thread Srinivas V
Thank you Alex. I will check it out and let you know if I have any questions

On Fri, Apr 17, 2020 at 11:36 PM Alex Ott  wrote:

> http://shop.oreilly.com/product/0636920047568.do has quite good
> information
> on it.  For Kafka, you need to start with approximation that processing of
> each partition is a separate task that need to be executed, so you need to
> plan number of cores correspondingly.
>
> Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
>  SV> Hello,
>  SV> Can someone point me to a good video or document which takes about
> performance tuning for structured streaming app?
>  SV> I am looking especially for listening to Kafka topics say 5 topics
> each with 100 portions .
>  SV> Trying to figure out best cluster size and number of executors and
> cores required.
>
>
> --
> With best wishes,Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
>


Re: Spark structured streaming - performance tuning

2020-04-17 Thread Alex Ott
http://shop.oreilly.com/product/0636920047568.do has quite good information
on it.  For Kafka, you need to start with approximation that processing of
each partition is a separate task that need to be executed, so you need to
plan number of cores correspondingly.

Srinivas V  at "Thu, 16 Apr 2020 22:49:15 +0530" wrote:
 SV> Hello, 
 SV> Can someone point me to a good video or document which takes about 
performance tuning for structured streaming app? 
 SV> I am looking especially for listening to Kafka topics say 5 topics each 
with 100 portions .
 SV> Trying to figure out best cluster size and number of executors and cores 
required. 


-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark structured streaming - performance tuning

2020-04-16 Thread Srinivas V
Hello,
Can someone point me to a good video or document which takes about
performance tuning for structured streaming app?
I am looking especially for listening to Kafka topics say 5 topics each
with 100 portions .
Trying to figure out best cluster size and number of executors and cores
required.

Regards
Srini