[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

Sandy Ryza (JIRA) Fri, 03 Oct 2014 15:57:07 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sandy Ryza updated SPARK-3561:
------------------------------
    Description: 
Currently Spark's API is tightly coupled with its backend execution engine.   
It could be useful to provide a point of pluggability between the two to allow 
Spark to run on other DAG execution engines with similar distributed memory 
abstractions.

Proposal:
The proposed approach would introduce a pluggable JobExecutionContext (trait) - 
a gateway and a delegate to Hadoop execution environment - as a non-public api 
(@DeveloperAPI) not exposed to end users of Spark.
The trait will define 4 only operations:
* hadoopFile
* newAPIHadoopFile
* broadcast
* runJob

Each method directly maps to the corresponding methods in current version of 
SparkContext. JobExecutionContext implementation will be accessed by 
SparkContext via master URL as 
"execution-context:foo.bar.MyJobExecutionContext" with default implementation 
containing the existing code from SparkContext, thus allowing current 
(corresponding) methods of SparkContext to delegate to such implementation. An 
integrator will now have an option to provide custom implementation of 
DefaultExecutionContext by either implementing it from scratch or extending 
form DefaultExecutionContext.

Please see the attached design doc for more details.
Pull Request will be posted shortly as well

  was:
Currently Spark provides integration with external resource-managers such as 
Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the current 
architecture of Spark-on-YARN can be enhanced to provide significantly better 
utilization of cluster resources for large scale, batch and/or ETL applications 
when run alongside other applications (Spark and others) and services in YARN. 

Proposal:
The proposed approach would introduce a pluggable JobExecutionContext (trait) - 
a gateway and a delegate to Hadoop execution environment - as a non-public api 
(@DeveloperAPI) not exposed to end users of Spark.
The trait will define 4 only operations:
* hadoopFile
* newAPIHadoopFile
* broadcast
* runJob

Each method directly maps to the corresponding methods in current version of 
SparkContext. JobExecutionContext implementation will be accessed by 
SparkContext via master URL as 
"execution-context:foo.bar.MyJobExecutionContext" with default implementation 
containing the existing code from SparkContext, thus allowing current 
(corresponding) methods of SparkContext to delegate to such implementation. An 
integrator will now have an option to provide custom implementation of 
DefaultExecutionContext by either implementing it from scratch or extending 
form DefaultExecutionContext.

Please see the attached design doc for more details.
Pull Request will be posted shortly as well


> Decouple Spark's API from its execution engine
> ----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>             Fix For: 1.2.0
>
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark's API is tightly coupled with its backend execution engine.   
> It could be useful to provide a point of pluggability between the two to 
> allow Spark to run on other DAG execution engines with similar distributed 
> memory abstractions.
> Proposal:
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@DeveloperAPI) not exposed to end users of Spark.
> The trait will define 4 only operations:
> * hadoopFile
> * newAPIHadoopFile
> * broadcast
> * runJob
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext.
> Please see the attached design doc for more details.
> Pull Request will be posted shortly as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3561) Decouple Spark's API from its execution engine

Reply via email to