We don’t block scaling up after node failure in classic Spark if that’s the question.
On Fri, Feb 4, 2022 at 6:30 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > From what I can see in auto scaling setup, you will always need a min of > two worker nodes as primary. It also states and I quote "Scaling primary > workers is not recommended due to HDFS limitations which result in > instability while scaling. These limitations do not exist for secondary > workers". So the scaling comes with the secondary workers specifying the > min and max instances. It also defaults to 2 minutes for the so-called auto > scaling cooldown duration hence that delay observed. I presume task > allocation to the new executors is FIFO for new tasks. This link > <https://docs.qubole.com/en/latest/admin-guide/engine-admin/spark-admin/autoscale-spark.html#:~:text=dynamic%20allocation%20configurations.-,Autoscaling%20in%20Spark%20Clusters,scales%20down%20towards%20the%20minimum.&text=By%20default%2C%20Spark%20uses%20a%20static%20allocation%20of%20resources.> > does some explanation on autoscaling. > > Handling Spot Node Loss and Spot Blocks in Spark Clusters > "When the Spark AM receives the spot loss (Spot Node Loss or Spot Blocks) > notification from the RM, it notifies the Spark driver. The driver then > performs the following actions: > > 1. Identifies all the executors affected by the upcoming node loss. > 2. Moves all of the affected executors to a decommissioning state, and > no new tasks are scheduled on these executors. > 3. Kills all the executors after reaching 50% of the termination time. > 4. *Starts the failed tasks (if any) on other executors.* > 5. For these nodes, it removes all the entries of the shuffle data > from the map output tracker on driver after reaching 90% of the termination > time. This helps in preventing the shuffle-fetch failures due to spot loss. > 6. Recomputes the shuffle data from the lost node by stage > resubmission and at the time shuffles data of spot node if required." > 7. > 8. So basically when a node fails classic spark comes into play and no > new nodes are added etc (no rescaling) and tasks are redistributed among > the existing executors as I read it? > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 4 Feb 2022 at 13:55, Sean Owen <sro...@gmail.com> wrote: > >> I have not seen stack traces under autoscaling, so not even sure what the >> error in question is. >> There is always delay in acquiring a whole new executor in the cloud as >> it usually means a new VM is provisioned. >> Spark treats the new executor like any other, available for executing >> tasks. >> >> On Fri, Feb 4, 2022 at 4:28 AM Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Thanks for the info. >>> >>> My concern has always been on how Spark handles autoscaling (adding new >>> executors) when the load pattern changes.I have tried to test this with >>> setting the following parameters (Spark 3.1.2 on GCP) >>> >>> spark-submit --verbose \ >>> ....... >>> --conf spark.dynamicAllocation.enabled="true" \ >>> --conf spark.shuffle.service.enabled="true" \ >>> --conf spark.dynamicAllocation.minExecutors=2 \ >>> --conf spark.dynamicAllocation.maxExecutors=10 \ >>> --conf spark.dynamicAllocation.initialExecutors=4 \ >>> >>> It is not very clear to me how Spark distributes tasks on the added >>> executors and the source of delay. As you have observed there is a delay in >>> adding new resources and allocating tasks. If that process is efficient? >>> >>> Thanks >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Fri, 4 Feb 2022 at 03:04, Maksim Grinman <m...@resolute.ai> wrote: >>> >>>> It's actually on AWS EMR. The job bootstraps and runs fine -- the >>>> autoscaling group is to bring up a service that spark will be calling. Some >>>> code waits for the autoscaling group to come up before continuing >>>> processing in Spark, since the Spark cluster will need to make requests to >>>> the service in the autoscaling group. It takes several minutes for the >>>> service to come up, and during the wait, Spark starts to show these thread >>>> dumps, as presumably it thinks something is wrong since the executor is >>>> busy waiting and not doing anything. The previous version of Spark did not >>>> do this (2.4.4). >>>> >>>> On Thu, Feb 3, 2022 at 6:59 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Sounds like you are running this on Google Dataproc cluster (spark >>>>> 3.1.2) with auto scaling policy? >>>>> >>>>> Can you describe if this happens before Spark starts a new job on the >>>>> cluster or somehow half way through processing an existing job? >>>>> >>>>> Also is the job involved doing Spark Structured Streaming? >>>>> >>>>> HTH >>>>> >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, 3 Feb 2022 at 21:29, Maksim Grinman <m...@resolute.ai> wrote: >>>>> >>>>>> We've got a spark task that, after some processing, starts an >>>>>> autoscaling group and waits for it to be up before continuing processing. >>>>>> While waiting for the autoscaling group, spark starts throwing full >>>>>> thread >>>>>> dumps, presumably at the spark.executor.heartbeat interval. Is there a >>>>>> way >>>>>> to prevent the thread dumps? >>>>>> >>>>>> -- >>>>>> Maksim Grinman >>>>>> VP Engineering >>>>>> Resolute AI >>>>>> >>>>> >>>> >>>> -- >>>> Maksim Grinman >>>> VP Engineering >>>> Resolute AI >>>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau