[ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163618#comment-14163618
 ] 

Nicholas Chammas commented on SPARK-3561:
-----------------------------------------

{quote}
Obviously this does not work in all usecases where spark is run on - but 
handles a subset of usecases where other execution engines might do much better 
than spark currently does - simply because of better code maturity and 
specialized usecases they target.
{quote}

As a side note, if we know what these use cases are where other engines do much 
better than Spark, then we want to create JIRA issues for them and tackle them 
where possible, or at least document why Spark cannot currently address those 
use cases as well as people want.

If the main driver behind this proposal is Spark's immaturity in some areas, 
then I'd hope that that driver would have a short lifespan. If it's the 
specialized use cases that we want to address, then I wonder how plugging in 
one general engine (e.g. Tez) in place of another would help. 

Granted I'm just commenting at a conceptual level; as others have already 
mentioned, specific use cases would help clarify what the real need is. For 
example, [~seanmcn], [~kawaa], and [~mayank_bansal] mentioned some workloads 
they were having trouble with [earlier in this 
thread|https://issues.apache.org/jira/browse/SPARK-3561?focusedCommentId=14138130&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14138130]
 that they thought might be addressed by this proposal. It would be good to 
understand more specifically the issues they were running into.

> Allow for pluggable execution contexts in Spark
> -----------------------------------------------
>
>                 Key: SPARK-3561
>                 URL: https://issues.apache.org/jira/browse/SPARK-3561
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Oleg Zhurakousky
>              Labels: features
>             Fix For: 1.2.0
>
>         Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@DeveloperAPI) not exposed to end users of Spark. 
> The trait will define 4 only operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 
> Pull Request will be posted shortly as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to