OK basically, do we have a scenario where Spark or for that matter any
cluster manager can deploy a new node (after the loss of  an existing node)
with the view of running the failed tasks on the new executor(s) deployed
on that newly spun node?



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 5 Feb 2022 at 00:00, Holden Karau <hol...@pigscanfly.ca> wrote:

> We don’t block scaling up after node failure in classic Spark if that’s
> the question.
>
> On Fri, Feb 4, 2022 at 6:30 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> From what I can see in auto scaling setup, you will always need a min of
>> two worker nodes as primary. It also states and I quote "Scaling primary
>> workers is not recommended due to HDFS limitations which result in
>> instability while scaling. These limitations do not exist for secondary
>> workers". So the scaling comes with the secondary workers specifying the
>> min and max instances. It also defaults to 2 minutes for the so-called auto
>> scaling cooldown duration hence that delay observed. I presume task
>> allocation to the new executors is FIFO for new tasks. This link
>> <https://docs.qubole.com/en/latest/admin-guide/engine-admin/spark-admin/autoscale-spark.html#:~:text=dynamic%20allocation%20configurations.-,Autoscaling%20in%20Spark%20Clusters,scales%20down%20towards%20the%20minimum.&text=By%20default%2C%20Spark%20uses%20a%20static%20allocation%20of%20resources.>
>> does some explanation on autoscaling.
>>
>> Handling Spot Node Loss and Spot Blocks in Spark Clusters
>> "When the Spark AM receives the spot loss (Spot Node Loss or Spot
>> Blocks) notification from the RM, it notifies the Spark driver. The driver
>> then performs the following actions:
>>
>>    1. Identifies all the executors affected by the upcoming node loss.
>>    2. Moves all of the affected executors to a decommissioning state,
>>    and no new tasks are scheduled on these executors.
>>    3. Kills all the executors after reaching 50% of the termination time.
>>    4. *Starts the failed tasks (if any) on other executors.*
>>    5. For these nodes, it removes all the entries of the shuffle data
>>    from the map output tracker on driver after reaching 90% of the 
>> termination
>>    time. This helps in preventing the shuffle-fetch failures due to spot 
>> loss.
>>    6. Recomputes the shuffle data from the lost node by stage
>>    resubmission and at the time shuffles data of spot node if required."
>>    7.
>>    8. So basically when a node fails classic spark comes into play and
>>    no new nodes are added etc (no rescaling) and tasks are redistributed 
>> among
>>    the existing executors as I read it?
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Fri, 4 Feb 2022 at 13:55, Sean Owen <sro...@gmail.com> wrote:
>>
>>> I have not seen stack traces under autoscaling, so not even sure what
>>> the error in question is.
>>> There is always delay in acquiring a whole new executor in the cloud as
>>> it usually means a new VM is provisioned.
>>> Spark treats the new executor like any other, available for executing
>>> tasks.
>>>
>>> On Fri, Feb 4, 2022 at 4:28 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Thanks for the info.
>>>>
>>>> My concern has always been on how Spark handles autoscaling (adding new
>>>> executors) when the load pattern changes.I have tried to test this with
>>>> setting the following parameters (Spark 3.1.2 on GCP)
>>>>
>>>>         spark-submit --verbose \
>>>>         .......
>>>>           --conf spark.dynamicAllocation.enabled="true" \
>>>>            --conf spark.shuffle.service.enabled="true" \
>>>>            --conf spark.dynamicAllocation.minExecutors=2 \
>>>>            --conf spark.dynamicAllocation.maxExecutors=10 \
>>>>            --conf spark.dynamicAllocation.initialExecutors=4 \
>>>>
>>>> It is not very clear to me how Spark distributes tasks on the added
>>>> executors and the source of delay. As you have observed there is a delay in
>>>> adding new resources and allocating tasks. If that process is efficient?
>>>>
>>>> Thanks
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 4 Feb 2022 at 03:04, Maksim Grinman <m...@resolute.ai> wrote:
>>>>
>>>>> It's actually on AWS EMR. The job bootstraps and runs fine -- the
>>>>> autoscaling group is to bring up a service that spark will be calling. 
>>>>> Some
>>>>> code waits for the autoscaling group to come up before continuing
>>>>> processing in Spark, since the Spark cluster will need to make requests to
>>>>> the service in the autoscaling group. It takes several minutes for the
>>>>> service to come up, and during the wait, Spark starts to show these thread
>>>>> dumps, as presumably it thinks something is wrong since the executor is
>>>>> busy waiting and not doing anything. The previous version of Spark did not
>>>>> do this (2.4.4).
>>>>>
>>>>> On Thu, Feb 3, 2022 at 6:59 PM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Sounds like you are running this on Google Dataproc cluster (spark
>>>>>> 3.1.2)  with auto scaling policy?
>>>>>>
>>>>>>  Can you describe if this happens before Spark starts a new job on
>>>>>> the cluster or somehow half way through processing an existing job?
>>>>>>
>>>>>> Also is the job involved doing Spark Structured Streaming?
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, 3 Feb 2022 at 21:29, Maksim Grinman <m...@resolute.ai> wrote:
>>>>>>
>>>>>>> We've got a spark task that, after some processing, starts an
>>>>>>> autoscaling group and waits for it to be up before continuing 
>>>>>>> processing.
>>>>>>> While waiting for the autoscaling group, spark starts throwing full 
>>>>>>> thread
>>>>>>> dumps, presumably at the spark.executor.heartbeat interval. Is there a 
>>>>>>> way
>>>>>>> to prevent the thread dumps?
>>>>>>>
>>>>>>> --
>>>>>>> Maksim Grinman
>>>>>>> VP Engineering
>>>>>>> Resolute AI
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Maksim Grinman
>>>>> VP Engineering
>>>>> Resolute AI
>>>>>
>>>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Reply via email to