[ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158571#comment-14158571
 ] 

Sandy Ryza edited comment on SPARK-3561 at 10/3/14 11:00 PM:
-------------------------------------------------------------

I think there may be somewhat of a misunderstanding about the relationship 
between Spark and YARN.  YARN is not an "execution environment", but a cluster 
resource manager that has the ability to start processes on behalf of execution 
engines like Spark.  Spark already supports YARN as a cluster resource manager, 
but YARN doesn't provide its own execution engine.  YARN doesn't provide a 
stateless shuffle (although execution engines built atop it like MR and Tez 
do). 

If I understand, the broader intent is to decouple the Spark API from the 
execution engine it runs on top of.  Changing the title to reflect this.  That 
said, the Spark API is currently very tightly integrated with its execution 
engine, and frankly, decoupling the two so that Spark would be able to run on 
top of execution engines with similar properties seems more trouble than its 
worth.


was (Author: sandyr):
I think there may be somewhat of a misunderstanding about the relationship 
between Spark and YARN.  YARN is not an "execution environment", but a cluster 
resource manager that has the ability to start processes on behalf of execution 
engines like Spark.  Spark already supports YARN as a cluster resource manager, 
but YARN doesn't provide its own execution engine.  YARN doesn't provide a 
stateless shuffle (although execution engines built atop it like MR and Tez 
do). 

If I understand, the broader intent is to decouple the Spark API from the 
execution engine it runs on top of.  Changing the title to reflect this.  That, 
the Spark API is currently very tightly integrated with its execution engine, 
and frankly, decoupling the two so that Spark would be able to run on top of 
execution engines with similar properties seems more trouble than its worth.

> Decouple Spark's API from its execution engine
> ----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>             Fix For: 1.2.0
>
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark's user-facing API is tightly coupled with its backend 
> execution engine.   It could be useful to provide a point of pluggability 
> between the two to allow Spark to run on other DAG execution engines with 
> similar distributed memory abstractions.
> Proposal:
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - as a non-public api (@DeveloperAPI) not exposed to end users of Spark.
> The trait will define 4 only operations:
> * hadoopFile
> * newAPIHadoopFile
> * broadcast
> * runJob
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext.
> Please see the attached design doc for more details.
> Pull Request will be posted shortly as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to