[ https://issues.apache.org/jira/browse/DRILL-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872321#comment-16872321 ]
Charles Givre commented on DRILL-7308: -------------------------------------- Hi [~Paul.Rogers] I'm fine with Drill returning the width and precision, however it doesn't seem to be doing so consistently and it was breaking my SQLAlchemy driver, which was how I found this in the first place. For example, if you query a non-CSV file, you just get VARCHAR as the data type with no width and precision. Also, as noted, the width and precision seem to be wrong. When I submitted https://issues.apache.org/jira/browse/DRILL-6847, I did test VARCHAR fields with specified width and it did work at the time, so I suspect something has changed along the way. Seperately, and I think this is related, I've been developing a series of UDFs that accept a VARCHAR as input and return complex fields. When I use the UDF in a query with data from either a CSV or from VALUES() I get errors, but if the data is from another source, it works. See examples below: {{apache drill> SELECT whois('gtkcyber.com') FROM (VALUES(1)); Error: FUNCTION ERROR: WHOIS does not support operand types (CHAR) [Error Id: af4705e6-e4ef-461d-8866-f7ce3b9b5e09 ] (state=,code=0) }} This example is from an HTTPD web server log and the function works as intended. {{apache drill> SELECT whois(connection_client_host ) . .semicolon> FROM dfs.test.`hackers-access.httpd` LIMIT 1; No match for "195.154.46.135". +----------------------------------------------------------------------------------+ | EXPR$0 | +----------------------------------------------------------------------------------+ | {"_last_update_of_whois_database":" 2019-06-25T12:46:22Z <<<","notice":" The expiration date displayed in this record is the date the","terms_of_use":" You are not authorized to access or query our Whois","by_the_following_terms_of_use":" You agree that you may use this Data only","to":" (1) allow, enable, or otherwise support the transmission of mass"} | +----------------------------------------------------------------------------------+ 1 row selected (1.259 seconds)}} Here is the same thing when querying the domain.csvh file and you can see that it doesn't work. {{apache drill> SELECT whois(domain) as domain_info from dfs.test.`domains.csvh`; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [whois(VARCHAR-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 Please, refer to logs for more information. [Error Id: bd226bdd-9cef-439d-8c3e-bcdd972f52b1 on 192.168.1.33:31010] (state=,code=0)}} > Incorrect Metadata from text file queries > ----------------------------------------- > > Key: DRILL-7308 > URL: https://issues.apache.org/jira/browse/DRILL-7308 > Project: Apache Drill > Issue Type: Bug > Components: Metadata > Affects Versions: 1.17.0 > Reporter: Charles Givre > Priority: Major > Attachments: Screen Shot 2019-06-24 at 3.16.40 PM.png, domains.csvh > > > I'm noticing some strange behavior with the newest version of Drill. If you > query a CSV file, you get the following metadata: > {code:sql} > SELECT * FROM dfs.test.`domains.csvh` LIMIT 1 > {code} > {code:json} > { > "queryId": "22eee85f-c02c-5878-9735-091d18788061", > "columns": [ > "domain" > ], > "rows": [} > { "domain": "thedataist.com" } ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > "queryState": "COMPLETED", > "attemptedAutoLimit": 0 > } > {code} > There are two issues here: > 1. VARCHAR now has precision > 2. There are twice as many columns as there should be. > Additionally, if you query a regular CSV, without the columns extracted, you > get the following: > {code:json} > "rows": [ > { > "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]" } > ], > "metadata": [ > "VARCHAR(0, 0)", > "VARCHAR(0, 0)" > ], > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)