[ https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Venki Korukanti updated DRILL-2083: ----------------------------------- Fix Version/s: (was: 1.0.0) 0.9.0 > order by on large dataset returns wrong results > ----------------------------------------------- > > Key: DRILL-2083 > URL: https://issues.apache.org/jira/browse/DRILL-2083 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Execution - Relational Operators > Affects Versions: 0.8.0 > Reporter: Chun Chang > Assignee: Venki Korukanti > Priority: Critical > Fix For: 0.9.0 > > Attachments: DRILL-2083.patch > > > #Mon Jan 26 14:10:51 PST 2015 > git.commit.id.abbrev=3c6d0ef > Test data has 1 million rows and can be accessed at > http://apache-drill.s3.amazonaws.com/files/complex.json.gz > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count (t.id) from > `complex.json` t; > +------------+ > | EXPR$0 | > +------------+ > | 1000000 | > +------------+ > {code} > But order by returned 30 more rows. > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.id from > `complex.json` t order by t.id; > .... > | 999997 | > | 999998 | > | 999999 | > | 1000000 | > +------------+ > 1,000,030 rows selected (19.449 seconds) > {code} > physical plan > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select t.id > from `complex.json` t order by t.id; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 SingleMergeExchange(sort0=[0 ASC]) > 01-01 SelectionVectorRemover > 01-02 Sort(sort0=[$0], dir0=[ASC]) > 01-03 HashToRandomExchange(dist0=[[$0]]) > 02-01 Scan(groupscan=[EasyGroupScan > [selectionRoot=/drill/testdata/complex_type/json/complex.json, numFiles=1, > columns=[`id`], > files=[maprfs:/drill/testdata/complex_type/json/complex.json]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)