[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

Roger Dielrton (JIRA) Tue, 10 May 2016 01:16:01 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277783#comment-15277783
 ]


Roger Dielrton commented on DRILL-4659:
---------------------------------------

Thank you, Jason for the information, and sorry for not realize that Drill can 
do what I needed.
I'm agree with put some examples of this feauture in the "Querying Data" 
section; it would be very useful.

But, however, I continue with problems relative to "query parametrization 
enrichment". Then I pass to explain it.

The contents of the source data (JSON type) file is (I show the partial ouput 
of {{$ less -N /tmp/foojson1}}):
{noformat}
...
5132 { "city" : "WYNCOTE", "loc" : [ -75.152417, 40.086673 ], "pop" : 6164, 
"state" : "PA", "_id" : "19095" }
5133 { "city" : "WYNNEWOOD", "loc" : [ -75.27598399999999, 40 ], "pop" : 8285, 
"state" : "PA", "_id" : "19096" }
5134 { "city" : "PHILADELPHIA", "loc" : [ -75.16610900000001, 39.948908 ], 
"pop" : 3623, "state" : "PA", "_id" : "19102" }
...
{noformat}

The query:
{code:sql}
select
        columns
from
        table(dfs.`/tmp/foojson1`(type => 'json'))
{code}


The result (error):
{noformat}
UNSUPPORTED_OPERATION ERROR:
In a list of type FLOAT8, encountered a value of type BIGINT.
Drill does not support lists of different types.
File /tmp/foojson1
Record 5133
Line 5133
Column 58
Field loc
Fragment 0:0
{noformat}

I know this problem can be avoided executing {{alter session set 
`store.json.all_text_mode` = true;}} before
issuing the query, but, it would be useful to do something like this:
{code:sql}
select
        columns
from
        table(dfs.`/tmp/foojson1`(type => 'json', 'store.json.all_text_mode' => 
true))
{code}

That is: extends table function parameters to any useful parametrization for 
the issued query like, in this case, the {{store.json.all_text_mode}} parameter.

> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4659
>                 URL: https://issues.apache.org/jira/browse/DRILL-4659
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization, SQL Parser
>            Reporter: Roger Dielrton
>            Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

Reply via email to