[ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163376#comment-14163376
 ] 

Mridul Muralidharan commented on SPARK-3561:
--------------------------------------------

[~ozhurakousky] I think the disconnect here is that the interfaces proposed do 
not show much value in what is supposed to be the functionality to be exposed 
to the users. A followup pr showing how this interfaces are used in context of 
Tez would show value in why this change is relevant in context of spark.

The disconnect, if I am not wrong, is that we do not want to expose spi's which 
we would then need to maintain in spark core - while unknown implementations 
extend it in non standard ways causing issues to our end users.


For example, even though TaskScheduler is an spi and can in theory be extended 
in arbitrary ways - all the spi implementations currently 'live' within spark 
and are in harmony with rest of the code - and changes which occur within spark 
core (when functionality is added or extended).
This allows us to decouple the actual TaskScheduler implementation from spark 
code, while still keeping them in sync and maintainable while adding 
functionality independent of other pieces : case in point, yarn support has 
significantly evolved from when I initially added it - to the point where it 
probably does not share even a single line of code I initially wrote :) - and 
yet this has been done pretty much independent of changes to core while at the 
same time ensuring that it is compatible with changes in spark core and vice 
versa.


The next step, imo, would be a PR which shows how these interfaces are used for 
non trivial usecase : Tez in this case.
The default implementation provided in the pr can be removed (since it should 
not be used/exposed to users).

Once that is done, we can evaluate the interface proposed in context of the 
functionality exposed, and see how it fits in context of rest of spark.

> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>             Fix For: 1.2.0
>
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@DeveloperAPI) not exposed to end users of Spark. 
> The trait will define 4 only operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 
> Pull Request will be posted shortly as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to