[jira] [Commented] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function

2016-05-10 Thread Roger Dielrton (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277812#comment-15277812
 ] 

Roger Dielrton commented on DRILL-4658:
---

Ok, Arina, thanks; I'll follow DRILL-4660.

> cannot specify tab as a fieldDelimiter in table function
> 
>
> Key: DRILL-4658
> URL: https://issues.apache.org/jira/browse/DRILL-4658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
> Environment: Mac OS X, Java 8
>Reporter: Vince Gonzalez
>Assignee: Arina Ielchiieva
>
> I can't specify a tab delimiter in the table function because it maybe counts 
> the characters rather than trying to interpret as a character escape code?
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as 
> b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => 
> '\t', skipFirstLine => true));
> Error: PARSE ERROR: Expected single character but was String: \t
> table sample_cast.tsv
> parameter fieldDelimiter
> SQL Query null
> [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters

2016-05-10 Thread Roger Dielrton (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277810#comment-15277810
 ] 

Roger Dielrton commented on DRILL-3149:
---

I vote for a {{fieldDelimiter}} and a {{lineDelimiter}} of any length. See 
[https://issues.apache.org/jira/browse/DRILL-4658#comment-15277759].

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-10 Thread Roger Dielrton (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277783#comment-15277783
 ] 

Roger Dielrton commented on DRILL-4659:
---

Thank you, Jason for the information, and sorry for not realize that Drill can 
do what I needed.
I'm agree with put some examples of this feauture in the "Querying Data" 
section; it would be very useful.

But, however, I continue with problems relative to "query parametrization 
enrichment". Then I pass to explain it.

The contents of the source data (JSON type) file is (I show the partial ouput 
of {{$ less -N /tmp/foojson1}}):
{noformat}
...
5132 { "city" : "WYNCOTE", "loc" : [ -75.152417, 40.086673 ], "pop" : 6164, 
"state" : "PA", "_id" : "19095" }
5133 { "city" : "WYNNEWOOD", "loc" : [ -75.275983, 40 ], "pop" : 8285, 
"state" : "PA", "_id" : "19096" }
5134 { "city" : "PHILADELPHIA", "loc" : [ -75.1661090001, 39.948908 ], 
"pop" : 3623, "state" : "PA", "_id" : "19102" }
...
{noformat}

The query:
{code:sql}
select
columns
from
table(dfs.`/tmp/foojson1`(type => 'json'))
{code}


The result (error):
{noformat}
UNSUPPORTED_OPERATION ERROR:
In a list of type FLOAT8, encountered a value of type BIGINT.
Drill does not support lists of different types.
File /tmp/foojson1
Record 5133
Line 5133
Column 58
Field loc
Fragment 0:0
{noformat}

I know this problem can be avoided executing {{alter session set 
`store.json.all_text_mode` = true;}} before
issuing the query, but, it would be useful to do something like this:
{code:sql}
select
columns
from
table(dfs.`/tmp/foojson1`(type => 'json', 'store.json.all_text_mode' => 
true))
{code}

That is: extends table function parameters to any useful parametrization for 
the issued query like, in this case, the {{store.json.all_text_mode}} parameter.

> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function

2016-05-10 Thread Roger Dielrton (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15277759#comment-15277759
 ] 

Roger Dielrton commented on DRILL-4658:
---

Execuse me for the intervention, but I suffer a related problem with 
{{fieldDelimiter}}:

Data file {{/tmp/foo.txt}} contents:
{noformat}
0::2::3
0::3::1
0::5::2
0::9::4
0::11::1
0::12::2
0::15::1
{noformat}

Query:
{code:sql}
select
columns
from
table(dfs.`/tmp/foo.txt`(type => 'text', fieldDelimiter => '::'))
{code}

Results in a error message:
{noformat}
PARSE ERROR:
Expected single character but was String: ::
table /tmp/foo.txt
parameter fieldDelimiter SQL Query null
{noformat}

It would be nice that {{fieldDelimiter}} accepts text of any length.

> cannot specify tab as a fieldDelimiter in table function
> 
>
> Key: DRILL-4658
> URL: https://issues.apache.org/jira/browse/DRILL-4658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
> Environment: Mac OS X, Java 8
>Reporter: Vince Gonzalez
>
> I can't specify a tab delimiter in the table function because it maybe counts 
> the characters rather than trying to interpret as a character escape code?
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as 
> b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => 
> '\t', skipFirstLine => true));
> Error: PARSE ERROR: Expected single character but was String: \t
> table sample_cast.tsv
> parameter fieldDelimiter
> SQL Query null
> [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Roger Dielrton (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Dielrton updated DRILL-4659:
--
Description: 
I have a file, that I would like to use in a query, and it can have one or more 
of the following properties:
* Has not extension ==> Drill is unable to handle it.
* I know it contains data in CSV format, but the field separator is a non 
standard character ==> Drill is unable to parse it (without modify the storage 
plugin configuration).
* Is located in an Amazon S3 bucket ==> I can't rename it.
* Has a big size ==> It would be expensive to make a copy of it. 

It would be nice if you can specify, as part of the "select" query, as 
metadata, relevant table information as:
* Data format (CSV, parquet, JSON. etc.)
* Field delimiter.

  was:
I have a file, that I would like to use in a query, and it can have one or more 
of the
following properties:
* Has not extension ==> Drill is unable to handle it.
* I know it contains data in CSV format, but with an non standard character as 
field separator ==>
Drill is unable to parse it (without modify the storage plugin configuration).
* Is located in an Amazon S3 bucket ==> I can rename it.
* Has a big size ==> It would be expensive to make a copy of it. 

It would be nice if you can specify, as part of the "select" query, as 
metadata, relevant table
information as:
* Data format (CSV, parquet, JSON. etc.)
* Field delimiter.


> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Roger Dielrton (JIRA)
Roger Dielrton created DRILL-4659:
-

 Summary: Specify, as part of the query, table information: data 
format (CSV, parquet, JSON. etc.), field delimiter, etc.
 Key: DRILL-4659
 URL: https://issues.apache.org/jira/browse/DRILL-4659
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization, SQL Parser
Reporter: Roger Dielrton
Priority: Minor


I have a file, that I would like to use in a query, and it can have one or more 
of the
following properties:
* Has not extension ==> Drill is unable to handle it.
* I know it contains data in CSV format, but with an non standard character as 
field separator ==>
Drill is unable to parse it (without modify the storage plugin configuration).
* Is located in an Amazon S3 bucket ==> I can rename it.
* Has a big size ==> It would be expensive to make a copy of it. 

It would be nice if you can specify, as part of the "select" query, as 
metadata, relevant table
information as:
* Data format (CSV, parquet, JSON. etc.)
* Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)