[jira] [Commented] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function
[ https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277812#comment-15277812 ] Roger Dielrton commented on DRILL-4658: --- Ok, Arina, thanks; I'll follow DRILL-4660. > cannot specify tab as a fieldDelimiter in table function > > > Key: DRILL-4658 > URL: https://issues.apache.org/jira/browse/DRILL-4658 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser >Affects Versions: 1.6.0 > Environment: Mac OS X, Java 8 >Reporter: Vince Gonzalez >Assignee: Arina Ielchiieva > > I can't specify a tab delimiter in the table function because it maybe counts > the characters rather than trying to interpret as a character escape code? > {code} > 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as > b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => > '\t', skipFirstLine => true)); > Error: PARSE ERROR: Expected single character but was String: \t > table sample_cast.tsv > parameter fieldDelimiter > SQL Query null > [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277810#comment-15277810 ] Roger Dielrton commented on DRILL-3149: --- I vote for a {{fieldDelimiter}} and a {{lineDelimiter}} of any length. See [https://issues.apache.org/jira/browse/DRILL-4658#comment-15277759]. > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: Future > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.
[ https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277783#comment-15277783 ] Roger Dielrton commented on DRILL-4659: --- Thank you, Jason for the information, and sorry for not realize that Drill can do what I needed. I'm agree with put some examples of this feauture in the "Querying Data" section; it would be very useful. But, however, I continue with problems relative to "query parametrization enrichment". Then I pass to explain it. The contents of the source data (JSON type) file is (I show the partial ouput of {{$ less -N /tmp/foojson1}}): {noformat} ... 5132 { "city" : "WYNCOTE", "loc" : [ -75.152417, 40.086673 ], "pop" : 6164, "state" : "PA", "_id" : "19095" } 5133 { "city" : "WYNNEWOOD", "loc" : [ -75.275983, 40 ], "pop" : 8285, "state" : "PA", "_id" : "19096" } 5134 { "city" : "PHILADELPHIA", "loc" : [ -75.1661090001, 39.948908 ], "pop" : 3623, "state" : "PA", "_id" : "19102" } ... {noformat} The query: {code:sql} select columns from table(dfs.`/tmp/foojson1`(type => 'json')) {code} The result (error): {noformat} UNSUPPORTED_OPERATION ERROR: In a list of type FLOAT8, encountered a value of type BIGINT. Drill does not support lists of different types. File /tmp/foojson1 Record 5133 Line 5133 Column 58 Field loc Fragment 0:0 {noformat} I know this problem can be avoided executing {{alter session set `store.json.all_text_mode` = true;}} before issuing the query, but, it would be useful to do something like this: {code:sql} select columns from table(dfs.`/tmp/foojson1`(type => 'json', 'store.json.all_text_mode' => true)) {code} That is: extends table function parameters to any useful parametrization for the issued query like, in this case, the {{store.json.all_text_mode}} parameter. > Specify, as part of the query, table information: data format (CSV, parquet, > JSON. etc.), field delimiter, etc. > --- > > Key: DRILL-4659 > URL: https://issues.apache.org/jira/browse/DRILL-4659 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, SQL Parser >Reporter: Roger Dielrton >Priority: Minor > > I have a file, that I would like to use in a query, and it can have one or > more of the following properties: > * Has not extension ==> Drill is unable to handle it. > * I know it contains data in CSV format, but the field separator is a non > standard character ==> Drill is unable to parse it (without modify the > storage plugin configuration). > * Is located in an Amazon S3 bucket ==> I can't rename it. > * Has a big size ==> It would be expensive to make a copy of it. > It would be nice if you can specify, as part of the "select" query, as > metadata, relevant table information as: > * Data format (CSV, parquet, JSON. etc.) > * Field delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function
[ https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277759#comment-15277759 ] Roger Dielrton commented on DRILL-4658: --- Execuse me for the intervention, but I suffer a related problem with {{fieldDelimiter}}: Data file {{/tmp/foo.txt}} contents: {noformat} 0::2::3 0::3::1 0::5::2 0::9::4 0::11::1 0::12::2 0::15::1 {noformat} Query: {code:sql} select columns from table(dfs.`/tmp/foo.txt`(type => 'text', fieldDelimiter => '::')) {code} Results in a error message: {noformat} PARSE ERROR: Expected single character but was String: :: table /tmp/foo.txt parameter fieldDelimiter SQL Query null {noformat} It would be nice that {{fieldDelimiter}} accepts text of any length. > cannot specify tab as a fieldDelimiter in table function > > > Key: DRILL-4658 > URL: https://issues.apache.org/jira/browse/DRILL-4658 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser >Affects Versions: 1.6.0 > Environment: Mac OS X, Java 8 >Reporter: Vince Gonzalez > > I can't specify a tab delimiter in the table function because it maybe counts > the characters rather than trying to interpret as a character escape code? > {code} > 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as > b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => > '\t', skipFirstLine => true)); > Error: PARSE ERROR: Expected single character but was String: \t > table sample_cast.tsv > parameter fieldDelimiter > SQL Query null > [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.
[ https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roger Dielrton updated DRILL-4659: -- Description: I have a file, that I would like to use in a query, and it can have one or more of the following properties: * Has not extension ==> Drill is unable to handle it. * I know it contains data in CSV format, but the field separator is a non standard character ==> Drill is unable to parse it (without modify the storage plugin configuration). * Is located in an Amazon S3 bucket ==> I can't rename it. * Has a big size ==> It would be expensive to make a copy of it. It would be nice if you can specify, as part of the "select" query, as metadata, relevant table information as: * Data format (CSV, parquet, JSON. etc.) * Field delimiter. was: I have a file, that I would like to use in a query, and it can have one or more of the following properties: * Has not extension ==> Drill is unable to handle it. * I know it contains data in CSV format, but with an non standard character as field separator ==> Drill is unable to parse it (without modify the storage plugin configuration). * Is located in an Amazon S3 bucket ==> I can rename it. * Has a big size ==> It would be expensive to make a copy of it. It would be nice if you can specify, as part of the "select" query, as metadata, relevant table information as: * Data format (CSV, parquet, JSON. etc.) * Field delimiter. > Specify, as part of the query, table information: data format (CSV, parquet, > JSON. etc.), field delimiter, etc. > --- > > Key: DRILL-4659 > URL: https://issues.apache.org/jira/browse/DRILL-4659 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, SQL Parser >Reporter: Roger Dielrton >Priority: Minor > > I have a file, that I would like to use in a query, and it can have one or > more of the following properties: > * Has not extension ==> Drill is unable to handle it. > * I know it contains data in CSV format, but the field separator is a non > standard character ==> Drill is unable to parse it (without modify the > storage plugin configuration). > * Is located in an Amazon S3 bucket ==> I can't rename it. > * Has a big size ==> It would be expensive to make a copy of it. > It would be nice if you can specify, as part of the "select" query, as > metadata, relevant table information as: > * Data format (CSV, parquet, JSON. etc.) > * Field delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.
Roger Dielrton created DRILL-4659: - Summary: Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc. Key: DRILL-4659 URL: https://issues.apache.org/jira/browse/DRILL-4659 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization, SQL Parser Reporter: Roger Dielrton Priority: Minor I have a file, that I would like to use in a query, and it can have one or more of the following properties: * Has not extension ==> Drill is unable to handle it. * I know it contains data in CSV format, but with an non standard character as field separator ==> Drill is unable to parse it (without modify the storage plugin configuration). * Is located in an Amazon S3 bucket ==> I can rename it. * Has a big size ==> It would be expensive to make a copy of it. It would be nice if you can specify, as part of the "select" query, as metadata, relevant table information as: * Data format (CSV, parquet, JSON. etc.) * Field delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)