[jira] [Commented] (DRILL-1499) Different column order could appear in the result set for a schema-less select * query, even there are no changing schemas.

Victoria Markman (JIRA) Fri, 27 Mar 2015 13:17:22 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384534#comment-14384534
 ]


Victoria Markman commented on DRILL-1499:
-----------------------------------------

I just ran into this again, in completely different context and it only 
happened when I had Exchange in my plan.

{code:sql}
select     *
from    test  left outer join  j2
on (COALESCE(test.c_integer, 10000) = COALESCE(j2.c_integer, 10000))
where
        test.c_integer IS NULL 
        and j2.c_integer IS NULL
{code}

If we are not planning to fix it ever, we need to make sure it is documented in 
our SQL section on differences between us and SQL standard along with NULL 
behavior.

> Different column order could appear in the result set for a schema-less 
> select * query, even there are no changing schemas.
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-1499
>                 URL: https://issues.apache.org/jira/browse/DRILL-1499
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Steven Phillips
>             Fix For: Future
>
>
> For a select * query referring to a schema-less table, Drill could return 
> different column, depending on the physical operators the query involves:
> Q1:
> {code}
> select * from cp.`employee.json` limit 3;
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> | employee_id | full_name  | first_name | last_name  | position_id | 
> position_title |  store_id  | department_id | birth_date | hire_date  |   
> salary   | supervisor_id | education_level | marital_status |   gender   | 
> management_role |
> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+
> {code}
> Q2:
> {code}
> select * from cp.`employee.json` order by last_name limit 3;
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> | birth_date | department_id | education_level | employee_id | first_name | 
> full_name  |   gender   | hire_date  | last_name  | management_role | 
> marital_status | position_id | position_title |   salary   |  store_id  | 
> supervisor_id |
> +------------+---------------+-----------------+-------------+------------+------------+------------+------------+------------+-----------------+----------------+-------------+----------------+------------+------------+---------------+
> {code}
> The difference between Q1 and Q2 is the order by clause.  With order by 
> clause in Q2, Drill will sort the column names alphabetically, while for Q1, 
> the column names are in the same order as in the data source. 
> The underlying cause for such difference is that the sort or sort-based 
> merger operator would require canonicalization, since the incoming batches 
> could contain different schemas. 
>  However, it would be better that such canonicalization is used only when the 
> incoming batches have changing schemas. If all the incoming batches have 
> identical schemas, no need to sort the column orders.  With this fix, Drill 
> will present the same column order in the result set, for a schema-less 
> select * query,  if there is no changing schemas from incoming data sources. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-1499) Different column order could appear in the result set for a schema-less select * query, even there are no changing schemas.

Reply via email to