[ https://issues.apache.org/jira/browse/DRILL-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901691#comment-14901691 ]
Ted Dunning commented on DRILL-3815: ------------------------------------ Daniel, Did you trace this a bit to see where the extensions are being matched? Could it be a naively constructed regex? Kinda smells like that. > unknown suffixes .not_json and .json_not treated differently (multi-file case) > ------------------------------------------------------------------------------ > > Key: DRILL-3815 > URL: https://issues.apache.org/jira/browse/DRILL-3815 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other > Reporter: Daniel Barclay (Drill) > Assignee: Jacques Nadeau > > In scanning a directory subtree used as a table, unknown filename extensions > seem to be treated differently depending on whether they're similar to known > file extensions. The behavior suggests that Drill checks whether a file name > _contains_ an extension's string rather than _ending_ with it. > For example, given these subtrees with almost identical leaf file names: > {noformat} > $ find /tmp/testext_xx_json/ > /tmp/testext_xx_json/ > /tmp/testext_xx_json/voter2.not_json > /tmp/testext_xx_json/voter1.json > $ find /tmp/testext_json_xx/ > /tmp/testext_json_xx/ > /tmp/testext_json_xx/voter1.json > /tmp/testext_json_xx/voter2.json_not > $ > {noformat} > the results of trying to use them as tables differs: > {noformat} > 0: jdbc:drill:zk=local> SELECT * FROM `dfs.tmp`.`testext_xx_json`; > Sep 21, 2015 11:41:50 AM > org.apache.calcite.sql.validate.SqlValidatorException <init> > ... > Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table > 'dfs.tmp.testext_xx_json' not found > [Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] > (state=,code=0) > 0: jdbc:drill:zk=local> SELECT * FROM `dfs.tmp`.`testext_json_xx`; > +-----------------------+ > | onecf | > +-----------------------+ > | {"name":"someName1"} | > | {"name":"someName2"} | > +-----------------------+ > 2 rows selected (0.149 seconds) > {noformat} > (Other probing seems to indicate that there is also some sensitivity to > whether the extension contains an underscore character.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)