[jira] [Updated] (DRILL-2083) order by on large dataset returns wrong results

Venki Korukanti (JIRA) Thu, 23 Apr 2015 11:45:54 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Venki Korukanti updated DRILL-2083:
-----------------------------------
    Fix Version/s:     (was: 1.0.0)
                   0.9.0

> order by on large dataset returns wrong results
> -----------------------------------------------
>
>                 Key: DRILL-2083
>                 URL: https://issues.apache.org/jira/browse/DRILL-2083
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types, Execution - Relational Operators
>    Affects Versions: 0.8.0
>            Reporter: Chun Chang
>            Assignee: Venki Korukanti
>            Priority: Critical
>             Fix For: 0.9.0
>
>         Attachments: DRILL-2083.patch
>
>
> #Mon Jan 26 14:10:51 PST 2015
> git.commit.id.abbrev=3c6d0ef
> Test data has 1 million rows and can be accessed at 
> http://apache-drill.s3.amazonaws.com/files/complex.json.gz
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count (t.id) from 
> `complex.json` t;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1000000    |
> +------------+
> {code}
> But order by returned 30 more rows.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.id from 
> `complex.json` t order by t.id;
> ....
> | 999997     |
> | 999998     |
> | 999999     |
> | 1000000    |
> +------------+
> 1,000,030 rows selected (19.449 seconds)
> {code}
> physical plan
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select t.id 
> from `complex.json` t order by t.id;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      SingleMergeExchange(sort0=[0 ASC])
> 01-01        SelectionVectorRemover
> 01-02          Sort(sort0=[$0], dir0=[ASC])
> 01-03            HashToRandomExchange(dist0=[[$0]])
> 02-01              Scan(groupscan=[EasyGroupScan 
> [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, 
> columns=[`id`], 
> files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2083) order by on large dataset returns wrong results

Reply via email to