[ https://issues.apache.org/jira/browse/DRILL-7104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801567#comment-16801567 ]
benj commented on DRILL-7104: ----------------------------- A little trick to get around this problem until someone can fix it. Use a specific string value (example : 'MYSPECIALVALUE' or '' or anything else appropriate in your case) INSTEAD of NULL in a inner request, then use a NULLIF function to change the value of the column in a outer SELECT. {code:java} /* Trick to avoid the bug that change the NULL(VARCHAR) into NULL(INT) */ CREATE TABLE ....`trickavoidbug` AS (SELECT NULLIF(demo,'TRICK'), jam FROM ((SELECT 'TRICK' AS demo ,md5(CAST(rand() AS VARCHAR)) AS jam FROM ....`onebigfile` LIMIT 1000000) UNION (SELECT 'TRICK' AS demo ,md5(CAST(rand() AS VARCHAR)) AS jam FROM ....`onebigfile` LIMIT 1000000)) ); {code} Hopefully this will help someone who is facing this problem while waiting for his resolution. > Change of data type when parquet with multiple fragment > ------------------------------------------------------- > > Key: DRILL-7104 > URL: https://issues.apache.org/jira/browse/DRILL-7104 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet > Affects Versions: 1.15.0 > Reporter: benj > Priority: Major > Attachments: DRILL-7104_ErrorNumberFormatException_20190322.log > > > When creating a Parquet with a column filled only with "CAST(NULL AS > VARCHAR)", if the parquet has several fragment, the type is read like INT > instead of VARCHAR. > First, create +Parquet with only one fragment+ - all is fine (the type of > "demo" is correct). > {code:java} > CREATE TABLE ....`nobug` AS > (SELECT CAST(NULL AS VARCHAR) AS demo > , md5(cast(rand() AS VARCHAR) AS jam > FROM ....`onebigfile` LIMIT 1000000)); > +-----------+----------------------------+ > | Fragment | Number of records written | > +-----------+----------------------------+ > | 0_0 | 10000000 | > SELECT drilltypeof(demo) AS goodtype FROM ....`bug` LIMIT 1; > +--------------------+ > | goodtype | > +--------------------+ > | VARCHAR | > {code} > Second, create +Parquet with at least 2 fragments+ - the type of "demo" > change to INT > {code:java} > CREATE TABLE ....`bug` AS > ((SELECT CAST(NULL AS VARCHAR) AS demo > ,md5(CAST(rand() AS VARCHAR)) AS jam > FROM ....`onebigfile` LIMIT 1000000) > UNION > (SELECT CAST(NULL AS VARCHAR) AS demo > ,md5(CAST(rand() AS VARCHAR)) AS jam > FROM ....`onebigfile` LIMIT 1000000)); > +-----------+----------------------------+ > | Fragment | Number of records written | > +-----------+----------------------------+ > | 1_1 | 1000276 | > | 1_0 | 999724 | > SELECT drilltypeof(demo) AS badtype FROM ....`bug` LIMIT 1; > +--------------------+ > | badtype | > +--------------------+ > | INT |{code} > The change of type is really terrible... > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)