Tobias created DRILL-4505:
-----------------------------

             Summary: Can't group by or sort across files with different schema
                 Key: DRILL-4505
                 URL: https://issues.apache.org/jira/browse/DRILL-4505
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Parquet
    Affects Versions: 1.5.0
         Environment: Java 1.8
            Reporter: Tobias


We are currently trying out the support for querying across parquet files with 
different schemas.
Simple selects work well but when we wan't to do sort or group by Drill returns 
"UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with 
changing schemas Fragment 0:0 [Error Id: ff490670-64c1-4fb8-990e-a02aa44ac010 
on zookeeper-1:31010]"

This is despite not even including the new columns in the query.
Expected result would be to treat the non existing columns in certain files as 
either null or default value and allow them to be grouped and sorted

Example
SELECT APPLICATION_ID ,dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 
>='2016-01-01' AND dir2<'2016-04-02' work with changing schema

but SELECT max(APPLICATION_ID ),dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 
>='2016-01-01' AND dir2<'2016-04-02'  group by dir0 does not work

For us this hampers any possibility to have an evolving schema with moderatly 
complex queries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to