benj created DRILL-7090:
---------------------------

             Summary: Improve management of Optional(Nullable) / Required(Not 
nullable) type at least for parquet storage
                 Key: DRILL-7090
                 URL: https://issues.apache.org/jira/browse/DRILL-7090
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - Parquet
    Affects Versions: 1.15.0
            Reporter: benj


It will be useful to have the ability to precise/define/cast the "mode" of 
columns for Parquet storage.

Example of problem without this possibility : several files are created by 
different methods/process. all the files have the same columns. When requested 
all the file and group on a column
{code:java}
SELECT source, count(*) FROM ....`ALL` GROUP BY source;
=>
java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not 
support schema change 
Prior schema : BatchSchema [fields=[[`source` (VARCHAR:REQUIRED)]], 
selectionVector=NONE] 
New schema : BatchSchema [fields=[[`source` (VARCHAR:OPTIONAL)]], 
selectionVector=NONE]
{code}
Because source has different way of generation (example : use of a const, use 
of dir0*).

It will be nice to have the ability to define himself the nullable attribute 
(required/optional) or at least the ability to cast on read the mode/type of 
the field - it will allows a better homogeneity of the files and avoid crash on 
simple operation like aggregation.

 

(*) In a surprising way,
 * dir0 => varchar<NULLABLE>
 * '' => varchar<NOT NULL>
 * coalesce(dir0, '') => varchar<NULLABLE>  *???*

User should have the ability to overrule the system choice to define if the 
column mode is required or optional



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to