[ 
https://issues.apache.org/jira/browse/DRILL-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton closed DRILL-7569.
-------------------------------
    Resolution: Workaround

> dir0 problem reader - when path with wilcard and column named dir0
> ------------------------------------------------------------------
>
>                 Key: DRILL-7569
>                 URL: https://issues.apache.org/jira/browse/DRILL-7569
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: benj
>            Priority: Major
>
> If file with named columns (like csvh, parquet, json) contains a column named 
> *dir0* ( dir[0-9]+), it can cause problems when requesting with wilcard on 
> path.
> {code:sql}
> apache drill> SELECT * FROM dfs.tmp.`REP/exa.csvh`;
> +---------+------+
> |  dir0   |  a   |
> +---------+------+
> | coldir0 | cola |
> +---------+------+
> apache drill> SELECT * FROM dfs.tmp.`R*/exa.csvh`;
> Error: INTERNAL_ERROR ERROR: Failure while setting up text reader for file 
> file:/tmp/REP/exa.csvh
> {code}
> The errors message are not the same depending on the input type file
> {noformat}
> CSVH => Error: INTERNAL_ERROR ERROR: Failure while setting up text reader for 
> file file:...
> PARQUET => Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: Failure in setting up reader
> Parquet Metadata:...
> JSON => Error: INTERNAL_ERROR ERROR: 
> org.apache.drill.exec.exception.SchemaChangeException: It's not allowed to 
> have regular field and implicit field share common name dir0. Either change 
> regular field name in datasource, or change the default implicit field names.
> {noformat}
> Note that the JSON error message is more relevant and allows faster 
> identification of the problem (even if (to my knowledge) dir* is not 
> modifiable in default implicit field name).
> I know you should avoid using dir0 for a column name. But when creating table 
> it's "easy" to use a "SELECT *" which will include dir0 (and other dir*) (if 
> path containing wildcard).
> I have no good idea to solve this problem but it would be interesting to find 
> a method to avoid falling into this trap.
> Maybe *dir** should not appear automatically when _SELECT *_ but need 
> implicit call like _SELECT dir0, dir1, *_ (maybe direceted by an option)
> Maybe errors messages should be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to