[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182511#comment-14182511 ]
Patrick Wendell commented on SPARK-3561: ---------------------------------------- Hey [~ozhurakousky] - adding an @Experimental interface is our way of previewing future public API's to the community. What I'm saying is that Spark's internal execution (and in particular runJob) is not now, and IMO should never be, a public API we want others to extend. This just isn't the design of Spark to have pluggable execution engines, and that is at the core of this proposal. We have a many other extension points for Spark-YARN integration, such as our resource management layer which is specifically designed for this. For things like unit testing, we can just refactor using internal/private interfaces rather than public ones, so I think the testing discussion is orthogonal to this patch. > Allow for pluggable execution contexts in Spark > ----------------------------------------------- > > Key: SPARK-3561 > URL: https://issues.apache.org/jira/browse/SPARK-3561 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 1.1.0 > Reporter: Oleg Zhurakousky > Labels: features > Fix For: 1.2.0 > > Attachments: SPARK-3561.pdf > > > Currently Spark provides integration with external resource-managers such as > Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the > current architecture of Spark-on-YARN can be enhanced to provide > significantly better utilization of cluster resources for large scale, batch > and/or ETL applications when run alongside other applications (Spark and > others) and services in YARN. > Proposal: > The proposed approach would introduce a pluggable JobExecutionContext (trait) > - a gateway and a delegate to Hadoop execution environment - as a non-public > api (@Experimental) not exposed to end users of Spark. > The trait will define 6 operations: > * hadoopFile > * newAPIHadoopFile > * broadcast > * runJob > * persist > * unpersist > Each method directly maps to the corresponding methods in current version of > SparkContext. JobExecutionContext implementation will be accessed by > SparkContext via master URL as > "execution-context:foo.bar.MyJobExecutionContext" with default implementation > containing the existing code from SparkContext, thus allowing current > (corresponding) methods of SparkContext to delegate to such implementation. > An integrator will now have an option to provide custom implementation of > DefaultExecutionContext by either implementing it from scratch or extending > form DefaultExecutionContext. > Please see the attached design doc for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org