[ 
https://issues.apache.org/jira/browse/DRILL-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356597#comment-16356597
 ] 

ASF GitHub Bot commented on DRILL-6089:
---------------------------------------

GitHub user ilooner opened a pull request:

    https://github.com/apache/drill/pull/1117

    DRILL-6089 Removed ordering trait from HashJoin in planner

    HashJoin typically does not preserve ordering in most databases. Drill's 
HashJoin operator technically preserved ordering up to this point, but after 
spilling is implemented it will no longer preserve ordering. This change makes 
sure the planner knows that HashJoin does not preserve ordering.
    
    All unit and functional tests are passing @amansinha100 @Ben-Zvi please 
review.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ilooner/drill DRILL-6089

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1117.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1117
    
----
commit 40521a6edb77d018902903892ba356033e7bf7ca
Author: Timothy Farkas <timothyfarkas@...>
Date:   2018-01-26T21:46:22Z

    DRILL-6089: Removed ordering trait from HashJoin in planner and verified 
the planner does not assume HashJoin preserves ordering.

commit 726bf663186a21e3b54a58d2ed79eaca8d746bbf
Author: Timothy Farkas <timothyfarkas@...>
Date:   2018-02-08T07:26:36Z

    DRILL-6144: Tune direct memory to prevent unit test hangs on Travis and on 
Jenkins

----


> Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, 
> MaprDB, or Hive
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-6089
>                 URL: https://issues.apache.org/jira/browse/DRILL-6089
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Explanation provided by Boaz:
> (As explained in the design document) The new "automatic spill" feature of 
> the Hash-Join operator may cause (if spilling occurs) the rows from the 
> left/probe side to be returned in a different order than their incoming order 
> (due to splitting the rows into partitions).
> Currently the Drill planner assumes that left-order is preserved by the 
> Hash-Join operator; therefore if not changes, a query relying on that order 
> may return wrong results (when the Hash-Join spills).
> A fix is needed. Here are few options (ordered from the simpler down to the 
> most complex):
>  # Change the order rule in the planner. Thus whenever an order is needed 
> above (downstream) the Hash-Join, the planner would add a sort operator. That 
> would be a big execution time waste.
>  # When the planner needs the left-order above the Hash-Join, it may assess 
> the size of the right/build side (need statistics). If the right side is 
> small enough, the planner would set an option for the runtime to avoid 
> spilling, hence preserving the left-side order. In case spilling becomes 
> necessary, the code would return an error (possibly with a message suggesting 
> setting some special option and retrying; the special option would add a sort 
> operator and allow the hash-join to spill).
>  # When generating the code for the fragment above the Hash-Join (where 
> left-order should be maintained) - at code-gen time check if the hash-join 
> below spilled, and if so, add a sort operator. (Nothing like that exists in 
> Drill now, so it may be complicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to