Hi spark committers

I would like to discuss the possibility of changing the signature
of SparkContext 's setJobGroup and clearJobGroup functions to return a
replica of SparkContext with the job group set/unset instead of mutating
the original context. I am building a spark job server and I am assigning
job groups before passing control to user provided logic that uses spark
context to define and execute a job (very much like job-server). The issue
is that I can't reliably know when to clear the job group as user defined
code can use futures to submit multiple tasks in parallel. In fact, I am
even allowing users to return a future from their function on which spark
server can register callbacks to know when the user defined job is
complete. Now, if I set the job group before passing control to user
function and wait on future to complete so that I can clear the job group,
I can no longer use that SparkContext for any other job. This means I will
have to lock on the SparkContext which seems like a bad idea. Therefore, my
proposal would be to return new instance of SparkContext (a replica with
just job group set/unset) that can further be used in concurrent
environment safely. I am also happy mutating the original SparkContext just
not break backward compatibility as long as the returned SparkContext is
not affected by set/unset of job groups on original SparkContext.

Thoughts please?

Thanks,
Aniket

Reply via email to