How to Spawn Child Thread or Sub-jobs in a Spark Session

Artemis User Fri, 04 Dec 2020 11:01:44 -0800

We have a Spark job that produces a result data frame, say DF-1 at theend of the pipeline (i.e. Proc-1). From DF-1, we need to create two ormore dataf rames, say DF-2 and DF-3 via additional SQL or ML processes,i.e. Proc-2 and Proc-3. Ideally, we would like to perform Proc-2 andProc-3 in parallel, since Proc-2 and Proc-3 can be executedindependently, with DF-1 made immutable and DF-2 and DF-3 aremutual-exclusive.

Does Spark has some built-in APIs to support spawning sub-jobs in asingle session? If multi-threading is needed, what are the common bestpractices in this case?


Thanks in advance for your help!

-- ND


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How to Spawn Child Thread or Sub-jobs in a Spark Session

Reply via email to