[jira] [Commented] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

Thomas Graves (JIRA) Tue, 24 Jul 2018 07:32:47 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554306#comment-16554306
 ]


Thomas Graves commented on SPARK-23128:
---------------------------------------

we also did some initial evaluation with it as well and it was looking good so 
that is why I pinged on here.  We should work to get it in rather then everyone 
having their own version.

It might be nice to post on https://issues.apache.org/jira/browse/SPARK-9850 
since it initial version but I don't see any activity there. 

Ideally we would have a committer that is more familiar with the sql code 
shepherd it in.  I'm just still learning the sql side of the code here so don't 
consider myself an expert there.   Have you posted this to the dev list at all? 
 We should probably have a SPIP for this, which your doc above should pretty 
much cover, although you may want to make sure its up to date.  So I think the 
first step would be to post to the dev list to get any feedback and see if 
someone else is willing to volunteer to review.  Could you just post a DISCUSS 
thread about it?  If no one else will review I will, it may just take me longer.

 

> A new approach to do adaptive execution in Spark SQL
> ----------------------------------------------------
>
>                 Key: SPARK-23128
>                 URL: https://issues.apache.org/jira/browse/SPARK-23128
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Carson Wang
>            Priority: Major
>         Attachments: AdaptiveExecutioninBaidu.pdf
>
>
> SPARK-9850 proposed the basic idea of adaptive execution in Spark. In 
> DAGScheduler, a new API is added to support submitting a single map stage.  
> The current implementation of adaptive execution in Spark SQL supports 
> changing the reducer number at runtime. An Exchange coordinator is used to 
> determine the number of post-shuffle partitions for a stage that needs to 
> fetch shuffle data from one or multiple stages. The current implementation 
> adds ExchangeCoordinator while we are adding Exchanges. However there are 
> some limitations. First, it may cause additional shuffles that may decrease 
> the performance. We can see this from EnsureRequirements rule when it adds 
> ExchangeCoordinator.  Secondly, it is not a good idea to add 
> ExchangeCoordinators while we are adding Exchanges because we don’t have a 
> global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 
> 3 tables’ join in a single stage, the same ExchangeCoordinator should be used 
> in three Exchanges but currently two separated ExchangeCoordinator will be 
> added. Thirdly, with the current framework it is not easy to implement other 
> features in adaptive execution flexibly like changing the execution plan and 
> handling skewed join at runtime.
> We'd like to introduce a new way to do adaptive execution in Spark SQL and 
> address the limitations. The idea is described at 
> [https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

Reply via email to