[ https://issues.apache.org/jira/browse/DRILL-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527382#comment-14527382 ]
Jason Altekruse commented on DRILL-2953: ---------------------------------------- All data read out of csv will be read as a list of varchars. If you look at the results they are ordered according to an ASCII string comparison. If you add a cast to integer in the order by statement you should see the order you are expecting. As this is almost certainly the cause of the issue I'm going to close this as invalid, please re-open if the cast does not produce the result you are expecting. > Group By + Order By query results are not ordered. > -------------------------------------------------- > > Key: DRILL-2953 > URL: https://issues.apache.org/jira/browse/DRILL-2953 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.9.0 > Environment: 10833d2cae9f5312cf0e31f8c9f3f8a9dcdc0c45 | Commit 0.9.0 > release version. | 03.05.2015 @ 14:56:56 EDT > Reporter: Khurram Faraaz > Assignee: Jinfeng Ni > Priority: Critical > > Group by + order by query does not return results in correct order. Sort is > performed before the aggregation is done, which should not be the case. > Test was performed on 4 node cluster on CentOS. > {code} > 0: jdbc:drill:> select cast(columns[0] as int) c1 from `testWindow.csv` t2 > where t2.columns[0] is not null group by columns[0] order by columns[0]; > +------------+ > | c1 | > +------------+ > | 10 | > | 100 | > | 113 | > | 119 | > | 2 | > | 50 | > | 55 | > | 57 | > | 61 | > | 67 | > | 89 | > +------------+ > 11 rows selected (0.218 seconds) > {code} > Explain plan for that query that returns wrong results. > {code} > 0: jdbc:drill:> explain plan for select cast(columns[0] as int) c1 from > `testWindow.csv` t2 where t2.columns[0] is not null group by columns[0] order > by columns[0]; > +------------+------------+ > | text | json | > +------------+------------+ > | 00-00 Screen > 00-01 Project(c1=[$0]) > 00-02 Project(c1=[CAST($0):INTEGER], EXPR$1=[$0]) > 00-03 StreamAgg(group=[{0}]) > 00-04 Sort(sort0=[$0], dir0=[ASC]) > 00-05 Filter(condition=[IS NOT NULL($0)]) > 00-06 Project(ITEM=[ITEM($0, 0)]) > 00-07 Scan(groupscan=[EasyGroupScan > [selectionRoot=/tmp/testWindow.csv, numFiles=1, columns=[`columns`[0]], > files=[maprfs:/tmp/testWindow.csv]]]) > {code} > Incorrect results , not in order. > {code} > 0: jdbc:drill:> select cast(columns[0] as int) from `testWindow.csv` t2 where > t2.columns[0] is not null group by columns[0] order by columns[0]; > +------------+ > | EXPR$0 | > +------------+ > | 10 | > | 100 | > | 113 | > | 119 | > | 2 | > | 50 | > | 55 | > | 57 | > | 61 | > | 67 | > | 89 | > +------------+ > 11 rows selected (0.214 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)