[ 
https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237056#comment-15237056
 ] 

Justin Uang commented on SPARK-9850:
------------------------------------

I like this idea a lot. One thing we encounter in our use cases is that we end 
up accidentally joining on a field that is 50% nulls, or a string that 
represents null like "N/A". It then becomes quite cumbersome to have to 
constantly have to have a Spark expert dig in and find why there is 1 task that 
will never finish. Would it be possible to add a threshold such that if a join 
key ever gets too big, it will just fail the job with an error message?

> Adaptive execution in Spark
> ---------------------------
>
>                 Key: SPARK-9850
>                 URL: https://issues.apache.org/jira/browse/SPARK-9850
>             Project: Spark
>          Issue Type: Epic
>          Components: Spark Core, SQL
>            Reporter: Matei Zaharia
>            Assignee: Yin Huai
>         Attachments: AdaptiveExecutionInSpark.pdf
>
>
> Query planning is one of the main factors in high performance, but the 
> current Spark engine requires the execution DAG for a job to be set in 
> advance. Even with cost­-based optimization, it is hard to know the behavior 
> of data and user-defined functions well enough to always get great execution 
> plans. This JIRA proposes to add adaptive query execution, so that the engine 
> can change the plan for each query as it sees what data earlier stages 
> produced.
> We propose adding this to Spark SQL / DataFrames first, using a new API in 
> the Spark engine that lets libraries run DAGs adaptively. In future JIRAs, 
> the functionality could be extended to other libraries or the RDD API, but 
> that is more difficult than adding it in SQL.
> I've attached a design doc by Yin Huai and myself explaining how it would 
> work in more detail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to