[ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163351#comment-14163351
 ] 

Mridul Muralidharan commented on SPARK-3561:
--------------------------------------------

[~pwendell] If I understood the proposal and the initial pr submitted - the 
intent of this JIRA, as initally proposed by [~ozhurakousky] is fairly 
different from the other efforts referenced if I am not wrong.
The focus of this change seems to be to completely bypass spark execution 
engine and substitute an alternative : so only the current api (and so dag 
creation from the spark program) and user interfaces in spark remain - the 
block management, execution engine, execution state management, etc would all 
be replaced under the covers by what Tez (or something else in future) provides.

If I am not wrong the changes would be :
a) Applies only to yarn mode - when specified execution environment can be run.
b) the current spark AM would no longer request for any executors.
c) spark block manager would no longer be required (other than possibly for 
hosting broadcast via http i guess ?).
d) the actual DAG execution would be taken up by the overridden execution 
engine - spark's Task manager and DAG scheduler are noop's.

I might be missing things which Oleg can elaborate on.


This functionality, IMO, is fundamentally different from what is being explored 
in the other jira's - and so has value to be pursued independent of the other 
efforts.
Obviously this does not work in all usecases where spark is run on - but 
handles a subset of usecases where other execution engines might do much better 
than spark currently does - simply because of better code maturity and 
specialized usecases they target.


> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>             Fix For: 1.2.0
>
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@DeveloperAPI) not exposed to end users of Spark. 
> The trait will define 4 only operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 
> Pull Request will be posted shortly as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to