Re: [EXTERNAL] Re: Stage level scheduling - lower the number of executors when using GPUs
Thanks Artemis. We are not using Rapids, but rather using GPUs through the Stage Level Scheduling feature with ResourceProfile. In Kubernetes you have to turn on shuffle tracking for dynamic allocation, anyhow. The question is how we can limit the number of executors when building a new ResourceProfile, directly (API) or indirectly (some advanced workaround). Thanks, Shay From: Artemis User Sent: Thursday, November 3, 2022 1:16 AM To: user@spark.apache.org Subject: [EXTERNAL] Re: Stage level scheduling - lower the number of executors when using GPUs ATTENTION: This email originated from outside of GM. Are you using Rapids for GPU support in Spark? Couple of options you may want to try: 1. In addition to dynamic allocation turned on, you may also need to turn on external shuffling service. 2. Sounds like you are using Kubernetes. In that case, you may also need to turn on shuffle tracking. 3. The "stages" are controlled by the APIs. The APIs for dynamic resource request (change of stage) do exist, but only for RDDs (e.g. TaskResourceRequest and ExecutorResourceRequest). On 11/2/22 11:30 AM, Shay Elbaz wrote: Hi, Our typical applications need less executors for a GPU stage than for a CPU stage. We are using dynamic allocation with stage level scheduling, and Spark tries to maximize the number of executors also during the GPU stage, causing a bit of resources chaos in the cluster. This forces us to use a lower value for 'maxExecutors' in the first place, at the cost of the CPU stages performance. Or try to solve this in the Kubernets scheduler level, which is not straightforward and doesn't feel like the right way to go. Is there a way to effectively use less executors in Stage Level Scheduling? The API does not seem to include such an option, but maybe there is some more advanced workaround? Thanks, Shay
Re: Stage level scheduling - lower the number of executors when using GPUs
Are you using Rapids for GPU support in Spark? Couple of options you may want to try: 1. In addition to dynamic allocation turned on, you may also need to turn on external shuffling service. 2. Sounds like you are using Kubernetes. In that case, you may also need to turn on shuffle tracking. 3. The "stages" are controlled by the APIs. The APIs for dynamic resource request (change of stage) do exist, but only for RDDs (e.g. TaskResourceRequest and ExecutorResourceRequest). On 11/2/22 11:30 AM, Shay Elbaz wrote: Hi, Our typical applications need less *executors* for a GPU stage than for a CPU stage. We are using dynamic allocation with stage level scheduling, and Spark tries to maximize the number of executors also during the GPU stage, causing a bit of resources chaos in the cluster. This forces us to use a lower value for 'maxExecutors' in the first place, at the cost of the CPU stages performance. Or try to solve this in the Kubernets scheduler level, which is not straightforward and doesn't feel like the right way to go. Is there a way to effectively use less executors in Stage Level Scheduling? The API does not seem to include such an option, but maybe there is some more advanced workaround? Thanks, Shay
Stage level scheduling - lower the number of executors when using GPUs
Hi, Our typical applications need less executors for a GPU stage than for a CPU stage. We are using dynamic allocation with stage level scheduling, and Spark tries to maximize the number of executors also during the GPU stage, causing a bit of resources chaos in the cluster. This forces us to use a lower value for 'maxExecutors' in the first place, at the cost of the CPU stages performance. Or try to solve this in the Kubernets scheduler level, which is not straightforward and doesn't feel like the right way to go. Is there a way to effectively use less executors in Stage Level Scheduling? The API does not seem to include such an option, but maybe there is some more advanced workaround? Thanks, Shay
[*IMPORTANT*] update Streaming Query Statistics url
Hello Team, I wanna write a mail to inform you that I'm using the spark 3.3.0 release and in that release when im access the spark monitoring UI and I want to access the Streaming Query Statistics tab (It contains all the running ID), after that when I click and of running id the page redirected to me this given URL. */prspark-4-wsmsetsibatchpm-1666174298886-driver/StreamingQuery/statistics?id=ef071381-7586-45fe-a8ba-53ac5838d835* and this URL gives me a 500 response. [image: image.png] But after applying "/" on this URL I get a result so please update your URL. this issue is also present on spark 3.0.1 and spark 3.2.0 release. correct URL is :- prspark-4-wsmsetsibatchpm-1666174298886-driver/StreamingQuery/statistics/?id=ef071381-7586-45fe-a8ba-53ac5838d835 [image: image.png] Thanks & Regards Priyanshi Sahu
should one every make a spark streaming job in pyspark
Dear community, I had a general question about the use of scala VS pyspark for spark streaming. I believe spark streaming will work most efficiently when written in scala. I believe however that things can be implemented in pyspark. My question: 1)is it completely dumb to make a streaming job in pyspark? 2)what are the technical reasons that it is done best in scala (is this easy to understand why)? 3)any good links anyone has seen with numbers of the difference in performance and under what circumstances+explanation? 4)are there certain scenarios when the use of pyspark can be motivated (maybe when someone doesn’t feel confortable writing a job in scala and the number of messages/minute aren’t gigantic so performance isnt that crucial?) Thanks for any input! - To unsubscribe e-mail: user-unsubscr...@spark.apache.org