[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark

Oleg Zhurakousky (JIRA) Tue, 07 Oct 2014 08:43:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162013#comment-14162013
 ]


Oleg Zhurakousky commented on SPARK-3561:
-----------------------------------------

Patrick, your point about confusion with other JIRAs makes sense. Thanks.

With regard to detailed design, can you please let me know what you are looking 
for and what would be useful? The only changes I'm proposing is the addition of 
an @Experimental interface with the 4 methods for the reasons stated in the 
design doc. 

For example: 
* Would it be useful if I sent another PR with the implementation of the 
interface? 
* Would it be useful if I shared benchmarks which showcase some of the benefits 
of alternative execution for Batch/ETL scenarios?

Since this is my first involvement in the Spark community, I appreciate your 
guidance and I'm happy to provide any details you might find useful. Thanks!

> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>             Fix For: 1.2.0
>
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@DeveloperAPI) not exposed to end users of Spark. 
> The trait will define 4 only operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 
> Pull Request will be posted shortly as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark

Reply via email to