We have a Spark job that produces a result data frame, say DF-1 at the end of the pipeline (i.e. Proc-1).  From DF-1, we need to create two or more dataf rames, say DF-2 and DF-3 via additional SQL or ML processes, i.e. Proc-2 and Proc-3.  Ideally, we would like to perform Proc-2 and Proc-3 in parallel, since Proc-2 and Proc-3 can be executed independently, with DF-1 made immutable and DF-2 and DF-3 are mutual-exclusive.

Does Spark has some built-in APIs to support spawning sub-jobs in a single session?  If multi-threading is needed, what are the common best practices in this case?

Thanks in advance for your help!

-- ND


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to