[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514062#comment-14514062 ]
Oleg Zhurakousky commented on SPARK-3561: ----------------------------------------- Here is an interesting read that provides an ever stronger case for separating flow construction vs execution context. http://blog.acolyer.org/2015/04/27/musketeer-part-i-whats-the-best-data-processing-system/ and http://www.cl.cam.ac.uk/research/srg/netos/camsas/pubs/eurosys15-musketeer.pdf The key points are: _It thus makes little sense to force the user to target a single system at workflow implementation time. Instead, we argue that users should, in principle, be able to execute their high-level workflow on any data processing system (ยง3). Being able to do this has three main benefits:_ _1. Users write their workflow once, in a way they choose, but can easily execute it on alternative systems;_ _2. Multiple sub-components of a workflow can be executed on different back-end systems; and_ _3. Existing workflows can easily be ported to new systems._ > Allow for pluggable execution contexts in Spark > ----------------------------------------------- > > Key: SPARK-3561 > URL: https://issues.apache.org/jira/browse/SPARK-3561 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 1.1.0 > Reporter: Oleg Zhurakousky > Labels: features > Attachments: SPARK-3561.pdf > > > Currently Spark provides integration with external resource-managers such as > Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the > current architecture of Spark-on-YARN can be enhanced to provide > significantly better utilization of cluster resources for large scale, batch > and/or ETL applications when run alongside other applications (Spark and > others) and services in YARN. > Proposal: > The proposed approach would introduce a pluggable JobExecutionContext (trait) > - a gateway and a delegate to Hadoop execution environment - as a non-public > api (@Experimental) not exposed to end users of Spark. > The trait will define 6 operations: > * hadoopFile > * newAPIHadoopFile > * broadcast > * runJob > * persist > * unpersist > Each method directly maps to the corresponding methods in current version of > SparkContext. JobExecutionContext implementation will be accessed by > SparkContext via master URL as > "execution-context:foo.bar.MyJobExecutionContext" with default implementation > containing the existing code from SparkContext, thus allowing current > (corresponding) methods of SparkContext to delegate to such implementation. > An integrator will now have an option to provide custom implementation of > DefaultExecutionContext by either implementing it from scratch or extending > form DefaultExecutionContext. > Please see the attached design doc for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org