[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.

Paul Rogers (JIRA) Sat, 07 Apr 2018 23:02:24 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429636#comment-16429636
 ]


Paul Rogers commented on DRILL-6312:
------------------------------------

Drill supports maps and arrays (including arrays of maps). Drill's current 
SELECT syntax does not support these constructs well. Suppose my pesky field is 
nested inside map m:

{noformat}
{m: {a: null}}
{noformat}

I need to express the type of a. The following will not work in Drill today:

{noformat}
SELECT CAST(m.a AS VARCHAR) FROM ...
{noformat}

This sets the type of {{m.a}}, but it also puts {{m.a}} into the projection 
list as a top-level column. That is, it destroys the map structure. One would 
not be able to even do this if {{m}} where an array of maps.

This is where the cast idea really fails to be general: the syntax of SQL just 
does not allow us to reach down inside a map.

But, a separate hint does not have this problem. Using the made-up syntax from 
above:

{noformat}
SELECT m FROM myFile WITH HINTS (m.a AS VARCHAR)
{noformat}

And, of a separate metadata hint file can be designed to handle any kind of 
structures: maps, arrays, arrays of maps, and so on.

Conclusion: the cast mechanism is good and should be added. But, the hint or 
metadata mechanism is still required in the general case.

> Enable pushing of cast expressions to the scanner for better schema discovery.
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-6312
>                 URL: https://issues.apache.org/jira/browse/DRILL-6312
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators, Query Planning &amp; 
> Optimization
>    Affects Versions: 1.13.0
>            Reporter: Hanumath Rao Maduri
>            Priority: Major
>
> Drill is a schema less engine which tries to infer the schema from disparate 
> sources at the read time. Currently the scanners infer the schema for each 
> batch depending upon the data for that column in the corresponding batch. 
> This solves many uses cases but can error out when the data is too different 
> between batches like int and array[int] etc... (There are other cases as well 
> but just to give one example).
> There is also a mechanism to create a view by type casting the columns to 
> appropriate type. This solves issues in some cases but fails in many other 
> cases. This is due to the fact that cast expression is not being pushed down 
> to the scanner but staying at the project or filter etc operators up the 
> query plan.
> This JIRA is to fix this by propagating the type information embedded in the 
> cast function to the scanners so that scanners can cast the incoming data 
> appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.

Reply via email to