[jira] [Commented] (SPARK-9850) Adaptive execution in Spark

Assaf Mendelson (JIRA) Tue, 15 Nov 2016 02:43:26 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15666806#comment-15666806
 ]


Assaf Mendelson commented on SPARK-9850:
----------------------------------------

I like the overall idea.
What I am trying to figure out is the portion where first the map portion is 
performed and then the reducer.
if DAGScheduler.submitMapStage() waits for all the map processing to finish and 
only then reducer start, this can really slow things down as it will begin only 
when the last map finishes.

Wouldn't it be better to start the reducers once the first few mappers finished 
(or at least when there are idle executors)? Assuming the first few mappers are 
a representative of the entire maps then this shouldn't affect the assessment 
of statistics too much.


> Adaptive execution in Spark
> ---------------------------
>
>                 Key: SPARK-9850
>                 URL: https://issues.apache.org/jira/browse/SPARK-9850
>             Project: Spark
>          Issue Type: Epic
>          Components: Spark Core, SQL
>            Reporter: Matei Zaharia
>            Assignee: Yin Huai
>         Attachments: AdaptiveExecutionInSpark.pdf
>
>
> Query planning is one of the main factors in high performance, but the 
> current Spark engine requires the execution DAG for a job to be set in 
> advance. Even with cost-based optimization, it is hard to know the behavior 
> of data and user-defined functions well enough to always get great execution 
> plans. This JIRA proposes to add adaptive query execution, so that the engine 
> can change the plan for each query as it sees what data earlier stages 
> produced.
> We propose adding this to Spark SQL / DataFrames first, using a new API in 
> the Spark engine that lets libraries run DAGs adaptively. In future JIRAs, 
> the functionality could be extended to other libraries or the RDD API, but 
> that is more difficult than adding it in SQL.
> I've attached a design doc by Yin Huai and myself explaining how it would 
> work in more detail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9850) Adaptive execution in Spark

Reply via email to