Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-05 Thread Sean Owen
Doesn't a persist break stages? On Thu, Aug 5, 2021, 11:40 AM Tom Graves wrote: > As Sean mentioned its only available at Stage level but you said you don't > want to shuffle so splitting into stages doesn't help you. Without more > details it seems like you could "hack" this by just

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-05 Thread Tom Graves
As Sean mentioned its only available at Stage level but you said you don't want to shuffle so splitting into stages doesn't help you.  Without more details it seems like you could "hack" this by just requesting an executor with 1 GPU (allowing 2 tasks per gpu) and 2 CPUs and the one task would

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-01 Thread Sean Owen
Oh I see, I missed that. You can specify at the stage level, nice. I think you are more looking to break these operations into two stages. You can do that with a persist or something - which has a cost but may work fine. Does it actually help much with GPU utilization - in theory yes but

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-01 Thread Gourav Sengupta
Hi Andreas, I know that NVIDIA team is a wonderful team to reach out to, they respond quite quickly and help you along the way. I am not quite sure about SPARK community leaders will be willing to allow the overall SPARK community to build native integrations with Deep Learning systems. ray.io

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-01 Thread Andreas Kunft
Hi, @Sean: Since Spark 3.x, stage level resource scheduling is available: https://databricks.com/session_na21/stage-level-scheduling-improving-big-data-and-ai-integration @Gourav: I'm using the latest version of Spark 3.1.2. I want to split the two maps on different executors, as both the GPU

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-08-01 Thread Gourav Sengupta
Hi Andreas, just to understand the question first, what is it you want to achieve by breaking the map operations across the GPU and CPU? Also it will be wonderful to understand the version of SPARK you are using, and your GPU details a bit more. Regards, Gourav On Sat, Jul 31, 2021 at 9:57 AM

Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-07-31 Thread Sean Owen
No, unless I'm crazy you can't even change resource requirements at the job level let alone stage. Does it help you though? Is something else even able to use the GPU otherwise? On Sat, Jul 31, 2021, 3:56 AM Andreas Kunft wrote: > I have a setup with two work intensive tasks, one map using GPU

[Spark Core, PySpark] Separate stage level scheduling for consecutive map functions

2021-07-31 Thread Andreas Kunft
I have a setup with two work intensive tasks, one map using GPU followed by a map using only CPU. Using stage level resource scheduling, I request a GPU node, but would also like to execute the consecutive CPU map on a different executor so that the GPU node is not blocked. However, spark will