Re: Strange metadata from Text Reader

Paul Rogers Mon, 24 Jun 2019 22:54:00 -0700

Hi All,

To close the loop on this, see the detailed comments in DRILL-7308 which 
Charles kindly filed. There is a code bug in the REST metadata feature itself 
which causes the schema to repeat for every returned record batch, and which 
causes it to display precision and scale for VARCHAR columns. Should be easy to 
fix.


Thanks,
- Paul

 

    On Monday, June 24, 2019, 12:21:10 PM PDT, Arina Yelchiyeva 
<arina.yelchiy...@gmail.com> wrote:  
 
 It would be good to help to identify the commit that actually caused the bug.
Personally, I don’t recall anything that might have broken this functionality.

Kind regards,
Arina

> On Jun 24, 2019, at 10:19 PM, Charles Givre <cgi...@gmail.com> wrote:
> 
> I don't have that version of Drill anymore but this feature worked correctly 
> until recently.  I'm using the latest build of Drill. 
> 
>> On Jun 24, 2019, at 3:18 PM, Arina Yelchiyeva <arina.yelchiy...@gmail.com> 
>> wrote:
>> 
>> Just to confirm, in Drill 1.15 it works correctly?
>> 
>> Kind regards,
>> Arina
>> 
>>> On Jun 24, 2019, at 10:15 PM, Charles Givre <cgi...@gmail.com> wrote:
>>> 
>>> Hi Arina, 
>>> It doesn't seem to make a difference unfortunately. :-(
>>> --C 
>>> 
>>>> On Jun 24, 2019, at 3:09 PM, Arina Yelchiyeva <arina.yelchiy...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Hi Charles,
>>>> 
>>>> Please try with v3 reader enabled: set 
>>>> `exec.storage.enable_v3_text_reader` = true.
>>>> Does it behave the same?
>>>> 
>>>> Kind regards,
>>>> Arina
>>>> 
>>>>> On Jun 24, 2019, at 9:38 PM, Charles Givre <cgi...@gmail.com> wrote:
>>>>> 
>>>>> Hello Drill Devs,
>>>>> I'm noticing some strange behavior with the newest version of Drill.  If 
>>>>> you query a CSV file, you get the following metadata:
>>>>> 
>>>>> SELECT * FROM dfs.test.`domains.csvh` LIMIT 1
>>>>> 
>>>>> {
>>>>> "queryId": "22eee85f-c02c-5878-9735-091d18788061",
>>>>> "columns": [
>>>>> "domain"
>>>>> ],
>>>>> "rows": [
>>>>> {
>>>>>  "domain": "thedataist.com"
>>>>> }
>>>>> ],
>>>>> "metadata": [
>>>>> "VARCHAR(0, 0)",
>>>>> "VARCHAR(0, 0)"
>>>>> ],
>>>>> "queryState": "COMPLETED",
>>>>> "attemptedAutoLimit": 0
>>>>> }
>>>>> 
>>>>> 
>>>>> There are two issues here:
>>>>> 1.  VARCHAR now has precision 
>>>>> 2.  There are twice as many columns as there should be.
>>>>> 
>>>>> Additionally, if you query a regular CSV, without the columns extracted, 
>>>>> you get the following:
>>>>> 
>>>>> "rows": [
>>>>> {
>>>>>  "columns": "[\"ACCT_NUM\",\"PRODUCT\",\"MONTH\",\"REVENUE\"]"
>>>>> }
>>>>> ],
>>>>> "metadata": [
>>>>> "VARCHAR(0, 0)",
>>>>> "VARCHAR(0, 0)"
>>>>> ],
>>>>> 
>>>>> This is bizarre in that the data type is not being reported correctly, it 
>>>>> should be LIST or something like that, AND we're getting too many columns 
>>>>> in the metadata.  I'll submit a JIRA as well, but could someone please 
>>>>> take a look?
>>>>> Thanks,
>>>>> -- C
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Strange metadata from Text Reader

Reply via email to