We don’t block scaling up after node failure in classic Spark if that’s the
question.

On Fri, Feb 4, 2022 at 6:30 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> From what I can see in auto scaling setup, you will always need a min of
> two worker nodes as primary. It also states and I quote "Scaling primary
> workers is not recommended due to HDFS limitations which result in
> instability while scaling. These limitations do not exist for secondary
> workers". So the scaling comes with the secondary workers specifying the
> min and max instances. It also defaults to 2 minutes for the so-called auto
> scaling cooldown duration hence that delay observed. I presume task
> allocation to the new executors is FIFO for new tasks. This link
> <https://docs.qubole.com/en/latest/admin-guide/engine-admin/spark-admin/autoscale-spark.html#:~:text=dynamic%20allocation%20configurations.-,Autoscaling%20in%20Spark%20Clusters,scales%20down%20towards%20the%20minimum.&text=By%20default%2C%20Spark%20uses%20a%20static%20allocation%20of%20resources.>
> does some explanation on autoscaling.
>
> Handling Spot Node Loss and Spot Blocks in Spark Clusters
> "When the Spark AM receives the spot loss (Spot Node Loss or Spot Blocks)
> notification from the RM, it notifies the Spark driver. The driver then
> performs the following actions:
>
>    1. Identifies all the executors affected by the upcoming node loss.
>    2. Moves all of the affected executors to a decommissioning state, and
>    no new tasks are scheduled on these executors.
>    3. Kills all the executors after reaching 50% of the termination time.
>    4. *Starts the failed tasks (if any) on other executors.*
>    5. For these nodes, it removes all the entries of the shuffle data
>    from the map output tracker on driver after reaching 90% of the termination
>    time. This helps in preventing the shuffle-fetch failures due to spot loss.
>    6. Recomputes the shuffle data from the lost node by stage
>    resubmission and at the time shuffles data of spot node if required."
>    7.
>    8. So basically when a node fails classic spark comes into play and no
>    new nodes are added etc (no rescaling) and tasks are redistributed among
>    the existing executors as I read it?
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 4 Feb 2022 at 13:55, Sean Owen <sro...@gmail.com> wrote:
>
>> I have not seen stack traces under autoscaling, so not even sure what the
>> error in question is.
>> There is always delay in acquiring a whole new executor in the cloud as
>> it usually means a new VM is provisioned.
>> Spark treats the new executor like any other, available for executing
>> tasks.
>>
>> On Fri, Feb 4, 2022 at 4:28 AM Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Thanks for the info.
>>>
>>> My concern has always been on how Spark handles autoscaling (adding new
>>> executors) when the load pattern changes.I have tried to test this with
>>> setting the following parameters (Spark 3.1.2 on GCP)
>>>
>>>         spark-submit --verbose \
>>>         .......
>>>           --conf spark.dynamicAllocation.enabled="true" \
>>>            --conf spark.shuffle.service.enabled="true" \
>>>            --conf spark.dynamicAllocation.minExecutors=2 \
>>>            --conf spark.dynamicAllocation.maxExecutors=10 \
>>>            --conf spark.dynamicAllocation.initialExecutors=4 \
>>>
>>> It is not very clear to me how Spark distributes tasks on the added
>>> executors and the source of delay. As you have observed there is a delay in
>>> adding new resources and allocating tasks. If that process is efficient?
>>>
>>> Thanks
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 4 Feb 2022 at 03:04, Maksim Grinman <m...@resolute.ai> wrote:
>>>
>>>> It's actually on AWS EMR. The job bootstraps and runs fine -- the
>>>> autoscaling group is to bring up a service that spark will be calling. Some
>>>> code waits for the autoscaling group to come up before continuing
>>>> processing in Spark, since the Spark cluster will need to make requests to
>>>> the service in the autoscaling group. It takes several minutes for the
>>>> service to come up, and during the wait, Spark starts to show these thread
>>>> dumps, as presumably it thinks something is wrong since the executor is
>>>> busy waiting and not doing anything. The previous version of Spark did not
>>>> do this (2.4.4).
>>>>
>>>> On Thu, Feb 3, 2022 at 6:59 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Sounds like you are running this on Google Dataproc cluster (spark
>>>>> 3.1.2)  with auto scaling policy?
>>>>>
>>>>>  Can you describe if this happens before Spark starts a new job on the
>>>>> cluster or somehow half way through processing an existing job?
>>>>>
>>>>> Also is the job involved doing Spark Structured Streaming?
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 3 Feb 2022 at 21:29, Maksim Grinman <m...@resolute.ai> wrote:
>>>>>
>>>>>> We've got a spark task that, after some processing, starts an
>>>>>> autoscaling group and waits for it to be up before continuing processing.
>>>>>> While waiting for the autoscaling group, spark starts throwing full 
>>>>>> thread
>>>>>> dumps, presumably at the spark.executor.heartbeat interval. Is there a 
>>>>>> way
>>>>>> to prevent the thread dumps?
>>>>>>
>>>>>> --
>>>>>> Maksim Grinman
>>>>>> VP Engineering
>>>>>> Resolute AI
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Maksim Grinman
>>>> VP Engineering
>>>> Resolute AI
>>>>
>>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to