[ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273609#comment-14273609
 ] 

Oleg Zhurakousky commented on SPARK-3561:
-----------------------------------------

Thanks Patrick

I 100% agree that Spark is _NOT just an API_ and in fact in our current efforts 
we are using much more of Spark then its user facing API but here is the thing; 
The reasons for extending execution environment could be many and indeed _RDD_ 
is a great extension point, just like _SparkContext_ is to accomplish that. 
However, both are less then ideal since they would require constant code 
modification forcing _re-compilation and re-packaging_ of an application every 
time one wants to delegate to an alternative execution environment (regardless 
of the reasons).
But since we all seem to agree (based on previous comments) that _SparkContext_ 
is the right API-based extension point to address such extension requirements, 
then why not allow it to be extended via configuration as well? Merely a 
convenience without any harm. . . . no different then a configuration based 
“driver” model (e.g., JDBC).




> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@Experimental) not exposed to end users of Spark. 
> The trait will define 6 operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> * persist
> * unpersist
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to