[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

Charles Givre (JIRA) Tue, 25 Jun 2019 05:50:12 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872321#comment-16872321
 ]


Charles Givre commented on DRILL-7308:
--------------------------------------

Hi [~Paul.Rogers]
I'm fine with Drill returning the width and precision, however it doesn't seem 
to be doing so consistently and it was breaking my SQLAlchemy driver, which was 
how I found this in the first place.  For example, if you query a non-CSV file, 
you just get VARCHAR as the data type with no width and precision.  Also, as 
noted, the width and precision seem to be wrong.  When I submitted 
https://issues.apache.org/jira/browse/DRILL-6847, I did test VARCHAR fields 
with specified width and it did work at the time, so I suspect something has 
changed along the way.  

Seperately, and I think this is related, I've been developing a series of UDFs 
that accept a VARCHAR as input and return complex fields.  When I use the UDF 
in a query with data from either a CSV or from VALUES() I get errors, but if 
the data is from another source, it works.  See examples below:

{{apache drill> SELECT whois('gtkcyber.com') FROM (VALUES(1));
Error: FUNCTION ERROR: WHOIS does not support operand types (CHAR)


[Error Id: af4705e6-e4ef-461d-8866-f7ce3b9b5e09 ] (state=,code=0)
}}

This example is from an HTTPD web server log and the function works as intended.

{{apache drill> SELECT whois(connection_client_host )
. .semicolon> FROM dfs.test.`hackers-access.httpd` LIMIT 1;
No match for "195.154.46.135".

+----------------------------------------------------------------------------------+
|                                      EXPR$0                                   
   |
+----------------------------------------------------------------------------------+
| {"_last_update_of_whois_database":" 2019-06-25T12:46:22Z <<<","notice":" The 
expiration date displayed in this record is the date the","terms_of_use":" You 
are not authorized to access or query our 
Whois","by_the_following_terms_of_use":" You agree that you may use this Data 
only","to":" (1) allow, enable, or otherwise support the transmission of mass"} 
|
+----------------------------------------------------------------------------------+
1 row selected (1.259 seconds)}}

Here is the same thing when querying the domain.csvh file and you can see that 
it doesn't work. 

{{apache drill> SELECT whois(domain) as domain_info from 
dfs.test.`domains.csvh`;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: 
[whois(VARCHAR-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

Please, refer to logs for more information.

[Error Id: bd226bdd-9cef-439d-8c3e-bcdd972f52b1 on 192.168.1.33:31010] 
(state=,code=0)}}




> Incorrect Metadata from text file queries
> -----------------------------------------
>
>                 Key: DRILL-7308
>                 URL: https://issues.apache.org/jira/browse/DRILL-7308
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Priority: Major
>         Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh
>
>
> I'm noticing some strange behavior with the newest version of Drill.  If you 
> query a CSV file, you get the following metadata:
> {code:sql}
> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
> {code}
> {code:json}
> {
>   "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>   "columns": [
>     "domain"
>   ],
>   "rows": [}
>    {       "domain": "thedataist.com"     }  ],
>   "metadata": [
>     "VARCHAR(0, 0)",
>     "VARCHAR(0, 0)"
>   ],
>   "queryState": "COMPLETED",
>   "attemptedAutoLimit": 0
> }
> {code}
> There are two issues here:
> 1.  VARCHAR now has precision
> 2.  There are twice as many columns as there should be.
> Additionally, if you query a regular CSV, without the columns extracted, you 
> get the following:
> {code:json}
>  "rows": [
>  { 
>       "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"     }
>   ],
>    "metadata": [
>      "VARCHAR(0, 0)",
>      "VARCHAR(0, 0)"
>    ],
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7308) Incorrect Metadata from text file queries

Reply via email to