[jira] [Commented] (DRILL-3815) unknown suffixes .not_json and .json_not treated differently (multi-file case)

2015-09-23 Thread Daniel Barclay (Drill) (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905243#comment-14905243
 ] 

Daniel Barclay (Drill) commented on DRILL-3815:
---

I haven't changed any storage plug-in configuration from the defaults.

> unknown suffixes .not_json and .json_not treated differently (multi-file case)
> --
>
> Key: DRILL-3815
> URL: https://issues.apache.org/jira/browse/DRILL-3815
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Daniel Barclay (Drill)
>
> In scanning a directory subtree used as a table, unknown filename extensions 
> seem to be treated differently depending on whether they're similar to known 
> file extensions.  The behavior suggests that Drill checks whether a file name 
> _contains_ an extension's string rather than _ending_ with it. 
> For example, given these subtrees with almost identical leaf file names:
> {noformat}
> $ find /tmp/testext_xx_json/
> /tmp/testext_xx_json/
> /tmp/testext_xx_json/voter2.not_json
> /tmp/testext_xx_json/voter1.json
> $ find /tmp/testext_json_xx/
> /tmp/testext_json_xx/
> /tmp/testext_json_xx/voter1.json
> /tmp/testext_json_xx/voter2.json_not
> $ 
> {noformat}
> the results of trying to use them as tables differs:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_xx_json`;
> Sep 21, 2015 11:41:50 AM 
> org.apache.calcite.sql.validate.SqlValidatorException 
> ...
> Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table 
> 'dfs.tmp.testext_xx_json' not found
> [Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_json_xx`;
> +---+
> | onecf |
> +---+
> | {"name":"someName1"}  |
> | {"name":"someName2"}  |
> +---+
> 2 rows selected (0.149 seconds)
> {noformat}
> (Other probing seems to indicate that there is also some sensitivity to 
> whether the extension contains an underscore character.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3815) unknown suffixes .not_json and .json_not treated differently (multi-file case)

2015-09-23 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904914#comment-14904914
 ] 

Jacques Nadeau commented on DRILL-3815:
---

Can you also confirm that you have no default input format set?

> unknown suffixes .not_json and .json_not treated differently (multi-file case)
> --
>
> Key: DRILL-3815
> URL: https://issues.apache.org/jira/browse/DRILL-3815
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Daniel Barclay (Drill)
>Assignee: Jacques Nadeau
>
> In scanning a directory subtree used as a table, unknown filename extensions 
> seem to be treated differently depending on whether they're similar to known 
> file extensions.  The behavior suggests that Drill checks whether a file name 
> _contains_ an extension's string rather than _ending_ with it. 
> For example, given these subtrees with almost identical leaf file names:
> {noformat}
> $ find /tmp/testext_xx_json/
> /tmp/testext_xx_json/
> /tmp/testext_xx_json/voter2.not_json
> /tmp/testext_xx_json/voter1.json
> $ find /tmp/testext_json_xx/
> /tmp/testext_json_xx/
> /tmp/testext_json_xx/voter1.json
> /tmp/testext_json_xx/voter2.json_not
> $ 
> {noformat}
> the results of trying to use them as tables differs:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_xx_json`;
> Sep 21, 2015 11:41:50 AM 
> org.apache.calcite.sql.validate.SqlValidatorException 
> ...
> Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table 
> 'dfs.tmp.testext_xx_json' not found
> [Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_json_xx`;
> +---+
> | onecf |
> +---+
> | {"name":"someName1"}  |
> | {"name":"someName2"}  |
> +---+
> 2 rows selected (0.149 seconds)
> {noformat}
> (Other probing seems to indicate that there is also some sensitivity to 
> whether the extension contains an underscore character.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3815) unknown suffixes .not_json and .json_not treated differently (multi-file case)

2015-09-21 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901691#comment-14901691
 ] 

Ted Dunning commented on DRILL-3815:


Daniel,

Did you trace this a bit to see where the extensions are being matched?

Could it be a naively constructed regex?  Kinda smells like that.



> unknown suffixes .not_json and .json_not treated differently (multi-file case)
> --
>
> Key: DRILL-3815
> URL: https://issues.apache.org/jira/browse/DRILL-3815
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Daniel Barclay (Drill)
>Assignee: Jacques Nadeau
>
> In scanning a directory subtree used as a table, unknown filename extensions 
> seem to be treated differently depending on whether they're similar to known 
> file extensions.  The behavior suggests that Drill checks whether a file name 
> _contains_ an extension's string rather than _ending_ with it. 
> For example, given these subtrees with almost identical leaf file names:
> {noformat}
> $ find /tmp/testext_xx_json/
> /tmp/testext_xx_json/
> /tmp/testext_xx_json/voter2.not_json
> /tmp/testext_xx_json/voter1.json
> $ find /tmp/testext_json_xx/
> /tmp/testext_json_xx/
> /tmp/testext_json_xx/voter1.json
> /tmp/testext_json_xx/voter2.json_not
> $ 
> {noformat}
> the results of trying to use them as tables differs:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_xx_json`;
> Sep 21, 2015 11:41:50 AM 
> org.apache.calcite.sql.validate.SqlValidatorException 
> ...
> Error: VALIDATION ERROR: From line 1, column 17 to line 1, column 25: Table 
> 'dfs.tmp.testext_xx_json' not found
> [Error Id: 6fe41deb-0e39-43f6-beca-de27b39d276b on dev-linux2:31010] 
> (state=,code=0)
> 0: jdbc:drill:zk=local> SELECT *   FROM `dfs.tmp`.`testext_json_xx`;
> +---+
> | onecf |
> +---+
> | {"name":"someName1"}  |
> | {"name":"someName2"}  |
> +---+
> 2 rows selected (0.149 seconds)
> {noformat}
> (Other probing seems to indicate that there is also some sensitivity to 
> whether the extension contains an underscore character.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)