[ https://issues.apache.org/jira/browse/DRILL-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384748#comment-14384748 ]
Kristine Hahn commented on DRILL-1499: -------------------------------------- [~vicky] re: [your comment|https://issues.apache.org/jira/browse/DRILL-1499?focusedCommentId=14384534&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384534]: {quote} If we are not planning to fix it ever, we need to make sure it is documented in our SQL section on differences between us and SQL standard along with NULL behavior. {quote} I added the following bullet in tips for querying data with Drill: {color: blue} When querying schema-less data using SELECT *, the column order in the resultset might differ from the stored order and vary from query to query. {color} This bullet will appear on the ["Query Data"|http://drill.apache.org/docs/query-data/ page]. I'll update this comment when the doc is checked in and ready for review. Seems like the SQL Reference page for SELECT would be also be a good place to add the column order info. > Different column order could appear in the result set for a schema-less > select * query, even there are no changing schemas. > --------------------------------------------------------------------------------------------------------------------------- > > Key: DRILL-1499 > URL: https://issues.apache.org/jira/browse/DRILL-1499 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Steven Phillips > Fix For: Future > > > For a select * query referring to a schema-less table, Drill could return > different column, depending on the physical operators the query involves: > Q1: > {code} > select * from cp.`employee.json` limit 3; > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > | employee_id | full_name | first_name | last_name | position_id | > position_title | store_id | department_id | birth_date | hire_date | > salary | supervisor_id | education_level | marital_status | gender | > management_role | > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > {code} > Q2: > {code} > select * from cp.`employee.json` order by last_name limit 3; > +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+ > | birth_date | department_id | education_level | employee_id | first_name | > full_name | gender | hire_date | last_name | management_role | > marital_status | position_id | position_title | salary | store_id | > supervisor_id | > +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+ > {code} > The difference between Q1 and Q2 is the order by clause. With order by > clause in Q2, Drill will sort the column names alphabetically, while for Q1, > the column names are in the same order as in the data source. > The underlying cause for such difference is that the sort or sort-based > merger operator would require canonicalization, since the incoming batches > could contain different schemas. > However, it would be better that such canonicalization is used only when the > incoming batches have changing schemas. If all the incoming batches have > identical schemas, no need to sort the column orders. With this fix, Drill > will present the same column order in the result set, for a schema-less > select * query, if there is no changing schemas from incoming data sources. -- This message was sent by Atlassian JIRA (v6.3.4#6332)