Re: Dynamic allocation does not deallocate executors

2023-08-08 Thread Holden Karau
So if you disable shuffle tracking but enable shuffle block decommissioning
it should work from memory

On Tue, Aug 8, 2023 at 4:13 AM Mich Talebzadeh 
wrote:

> Hm. I don't think it will work
>
> --conf spark.dynamicAllocation.shuffleTracking.enabled=false
>
> In Spark 3.4.1 running spark in k8s
>
> you get
>
> : org.apache.spark.SparkException: Dynamic allocation of executors
> requires the external shuffle service. You may enable this through
> spark.shuffle.service.enabled.
>
> HTH
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 7 Aug 2023 at 21:24, Holden Karau  wrote:
>
>> I think you need to set 
>> "spark.dynamicAllocation.shuffleTracking.enabled=true"
>> to false.
>>
>> On Mon, Aug 7, 2023 at 2:50 AM Mich Talebzadeh 
>> wrote:
>>
>>> Yes I have seen cases where the driver gone but a couple of executors
>>> hanging on. Sounds like a code issue.
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Solutions Architect/Engineering Lead
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 27 Jul 2023 at 15:01, Sergei Zhgirovski 
>>> wrote:
>>>
 Hi everyone

 I'm trying to use pyspark 3.3.2.
 I have these relevant options set:

 
 spark.dynamicAllocation.enabled=true
 spark.dynamicAllocation.shuffleTracking.enabled=true
 spark.dynamicAllocation.shuffleTracking.timeout=20s
 spark.dynamicAllocation.executorIdleTimeout=30s
 spark.dynamicAllocation.cachedExecutorIdleTimeout=40s
 spark.executor.instances=0
 spark.dynamicAllocation.minExecutors=0
 spark.dynamicAllocation.maxExecutors=20
 spark.master=k8s://https://k8s-api.<>:6443
 

 So I'm using kubernetes to deploy up to 20 executors

 then I run this piece of code:
 
 df = spark.read.parquet("s3a://>>> files>")
 print(df.count())
 time.sleep(999)
 

 This works fine and as expected: during the execution ~1600 tasks are
 completed, 20 executors get deployed and are being quickly removed after
 the calculation is complete.

 Next, I add these to the config:
 
 spark.decommission.enabled=true
 spark.storage.decommission.shuffleBlocks.enabled=true
 spark.storage.decommission.enabled=true
 spark.storage.decommission.rddBlocks.enabled=true
 

 I repeat the experiment on an empty kubernetes cluster, so that no
 actual pod evicting is occuring.

 This time executors deallocation is not working as expected: depending
 on the run, after the job is complete, 0-3 executors out of 20 remain
 present forever and never seem to get removed.

 I tried to debug the code and found out that inside the
 'ExecutorMonitor.timedOutExecutors' function those executors that never get
 to be removed do not make it to the 'timedOutExecs' variable, because the
 property 'hasActiveShuffle' remains 'true' for them.

 I'm a little stuck here trying to understand how all pod management,
 shuffle tracking and decommissioning were supposed to be working together,
 how to debug this and whether this is an expected behavior at all (to me it
 is not).

 Thank you for any hints!

>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Dynamic allocation does not deallocate executors

2023-08-08 Thread Mich Talebzadeh
Hm. I don't think it will work

--conf spark.dynamicAllocation.shuffleTracking.enabled=false

In Spark 3.4.1 running spark in k8s

you get

: org.apache.spark.SparkException: Dynamic allocation of executors requires
the external shuffle service. You may enable this through
spark.shuffle.service.enabled.

HTH

Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 7 Aug 2023 at 21:24, Holden Karau  wrote:

> I think you need to set "spark.dynamicAllocation.shuffleTracking.enabled=true"
> to false.
>
> On Mon, Aug 7, 2023 at 2:50 AM Mich Talebzadeh 
> wrote:
>
>> Yes I have seen cases where the driver gone but a couple of executors
>> hanging on. Sounds like a code issue.
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Solutions Architect/Engineering Lead
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 27 Jul 2023 at 15:01, Sergei Zhgirovski 
>> wrote:
>>
>>> Hi everyone
>>>
>>> I'm trying to use pyspark 3.3.2.
>>> I have these relevant options set:
>>>
>>> 
>>> spark.dynamicAllocation.enabled=true
>>> spark.dynamicAllocation.shuffleTracking.enabled=true
>>> spark.dynamicAllocation.shuffleTracking.timeout=20s
>>> spark.dynamicAllocation.executorIdleTimeout=30s
>>> spark.dynamicAllocation.cachedExecutorIdleTimeout=40s
>>> spark.executor.instances=0
>>> spark.dynamicAllocation.minExecutors=0
>>> spark.dynamicAllocation.maxExecutors=20
>>> spark.master=k8s://https://k8s-api.<>:6443
>>> 
>>>
>>> So I'm using kubernetes to deploy up to 20 executors
>>>
>>> then I run this piece of code:
>>> 
>>> df = spark.read.parquet("s3a://")
>>> print(df.count())
>>> time.sleep(999)
>>> 
>>>
>>> This works fine and as expected: during the execution ~1600 tasks are
>>> completed, 20 executors get deployed and are being quickly removed after
>>> the calculation is complete.
>>>
>>> Next, I add these to the config:
>>> 
>>> spark.decommission.enabled=true
>>> spark.storage.decommission.shuffleBlocks.enabled=true
>>> spark.storage.decommission.enabled=true
>>> spark.storage.decommission.rddBlocks.enabled=true
>>> 
>>>
>>> I repeat the experiment on an empty kubernetes cluster, so that no
>>> actual pod evicting is occuring.
>>>
>>> This time executors deallocation is not working as expected: depending
>>> on the run, after the job is complete, 0-3 executors out of 20 remain
>>> present forever and never seem to get removed.
>>>
>>> I tried to debug the code and found out that inside the
>>> 'ExecutorMonitor.timedOutExecutors' function those executors that never get
>>> to be removed do not make it to the 'timedOutExecs' variable, because the
>>> property 'hasActiveShuffle' remains 'true' for them.
>>>
>>> I'm a little stuck here trying to understand how all pod management,
>>> shuffle tracking and decommissioning were supposed to be working together,
>>> how to debug this and whether this is an expected behavior at all (to me it
>>> is not).
>>>
>>> Thank you for any hints!
>>>
>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: Dynamic allocation does not deallocate executors

2023-08-07 Thread Holden Karau
I think you need to set "spark.dynamicAllocation.shuffleTracking.enabled=true"
to false.

On Mon, Aug 7, 2023 at 2:50 AM Mich Talebzadeh 
wrote:

> Yes I have seen cases where the driver gone but a couple of executors
> hanging on. Sounds like a code issue.
>
> HTH
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 27 Jul 2023 at 15:01, Sergei Zhgirovski 
> wrote:
>
>> Hi everyone
>>
>> I'm trying to use pyspark 3.3.2.
>> I have these relevant options set:
>>
>> 
>> spark.dynamicAllocation.enabled=true
>> spark.dynamicAllocation.shuffleTracking.enabled=true
>> spark.dynamicAllocation.shuffleTracking.timeout=20s
>> spark.dynamicAllocation.executorIdleTimeout=30s
>> spark.dynamicAllocation.cachedExecutorIdleTimeout=40s
>> spark.executor.instances=0
>> spark.dynamicAllocation.minExecutors=0
>> spark.dynamicAllocation.maxExecutors=20
>> spark.master=k8s://https://k8s-api.<>:6443
>> 
>>
>> So I'm using kubernetes to deploy up to 20 executors
>>
>> then I run this piece of code:
>> 
>> df = spark.read.parquet("s3a://")
>> print(df.count())
>> time.sleep(999)
>> 
>>
>> This works fine and as expected: during the execution ~1600 tasks are
>> completed, 20 executors get deployed and are being quickly removed after
>> the calculation is complete.
>>
>> Next, I add these to the config:
>> 
>> spark.decommission.enabled=true
>> spark.storage.decommission.shuffleBlocks.enabled=true
>> spark.storage.decommission.enabled=true
>> spark.storage.decommission.rddBlocks.enabled=true
>> 
>>
>> I repeat the experiment on an empty kubernetes cluster, so that no actual
>> pod evicting is occuring.
>>
>> This time executors deallocation is not working as expected: depending on
>> the run, after the job is complete, 0-3 executors out of 20 remain present
>> forever and never seem to get removed.
>>
>> I tried to debug the code and found out that inside the
>> 'ExecutorMonitor.timedOutExecutors' function those executors that never get
>> to be removed do not make it to the 'timedOutExecs' variable, because the
>> property 'hasActiveShuffle' remains 'true' for them.
>>
>> I'm a little stuck here trying to understand how all pod management,
>> shuffle tracking and decommissioning were supposed to be working together,
>> how to debug this and whether this is an expected behavior at all (to me it
>> is not).
>>
>> Thank you for any hints!
>>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: Dynamic allocation does not deallocate executors

2023-08-07 Thread Mich Talebzadeh
Yes I have seen cases where the driver gone but a couple of executors
hanging on. Sounds like a code issue.

HTH

Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 27 Jul 2023 at 15:01, Sergei Zhgirovski  wrote:

> Hi everyone
>
> I'm trying to use pyspark 3.3.2.
> I have these relevant options set:
>
> 
> spark.dynamicAllocation.enabled=true
> spark.dynamicAllocation.shuffleTracking.enabled=true
> spark.dynamicAllocation.shuffleTracking.timeout=20s
> spark.dynamicAllocation.executorIdleTimeout=30s
> spark.dynamicAllocation.cachedExecutorIdleTimeout=40s
> spark.executor.instances=0
> spark.dynamicAllocation.minExecutors=0
> spark.dynamicAllocation.maxExecutors=20
> spark.master=k8s://https://k8s-api.<>:6443
> 
>
> So I'm using kubernetes to deploy up to 20 executors
>
> then I run this piece of code:
> 
> df = spark.read.parquet("s3a://")
> print(df.count())
> time.sleep(999)
> 
>
> This works fine and as expected: during the execution ~1600 tasks are
> completed, 20 executors get deployed and are being quickly removed after
> the calculation is complete.
>
> Next, I add these to the config:
> 
> spark.decommission.enabled=true
> spark.storage.decommission.shuffleBlocks.enabled=true
> spark.storage.decommission.enabled=true
> spark.storage.decommission.rddBlocks.enabled=true
> 
>
> I repeat the experiment on an empty kubernetes cluster, so that no actual
> pod evicting is occuring.
>
> This time executors deallocation is not working as expected: depending on
> the run, after the job is complete, 0-3 executors out of 20 remain present
> forever and never seem to get removed.
>
> I tried to debug the code and found out that inside the
> 'ExecutorMonitor.timedOutExecutors' function those executors that never get
> to be removed do not make it to the 'timedOutExecs' variable, because the
> property 'hasActiveShuffle' remains 'true' for them.
>
> I'm a little stuck here trying to understand how all pod management,
> shuffle tracking and decommissioning were supposed to be working together,
> how to debug this and whether this is an expected behavior at all (to me it
> is not).
>
> Thank you for any hints!
>


Dynamic allocation does not deallocate executors

2023-07-27 Thread Sergei Zhgirovski
Hi everyone

I'm trying to use pyspark 3.3.2.
I have these relevant options set:


spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.shuffleTracking.enabled=true
spark.dynamicAllocation.shuffleTracking.timeout=20s
spark.dynamicAllocation.executorIdleTimeout=30s
spark.dynamicAllocation.cachedExecutorIdleTimeout=40s
spark.executor.instances=0
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.maxExecutors=20
spark.master=k8s://https://k8s-api.<>:6443


So I'm using kubernetes to deploy up to 20 executors

then I run this piece of code:

df = spark.read.parquet("s3a://")
print(df.count())
time.sleep(999)


This works fine and as expected: during the execution ~1600 tasks are
completed, 20 executors get deployed and are being quickly removed after
the calculation is complete.

Next, I add these to the config:

spark.decommission.enabled=true
spark.storage.decommission.shuffleBlocks.enabled=true
spark.storage.decommission.enabled=true
spark.storage.decommission.rddBlocks.enabled=true


I repeat the experiment on an empty kubernetes cluster, so that no actual
pod evicting is occuring.

This time executors deallocation is not working as expected: depending on
the run, after the job is complete, 0-3 executors out of 20 remain present
forever and never seem to get removed.

I tried to debug the code and found out that inside the
'ExecutorMonitor.timedOutExecutors' function those executors that never get
to be removed do not make it to the 'timedOutExecs' variable, because the
property 'hasActiveShuffle' remains 'true' for them.

I'm a little stuck here trying to understand how all pod management,
shuffle tracking and decommissioning were supposed to be working together,
how to debug this and whether this is an expected behavior at all (to me it
is not).

Thank you for any hints!