[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

ASF GitHub Bot (JIRA) Mon, 11 Jun 2018 21:12:40 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509147#comment-16509147
 ]


ASF GitHub Bot commented on DRILL-6373:
---------------------------------------

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396460011
 
 
   @vrozov, this is what I meant by the code only working well for a flat 
vector. Maps have a `MaterializedField` for their own metadata. That metadata 
contains a list of the child metadata. So:
   
   ```
   m:Map, {a:Int, b:Varchar}
   . a:INT
   . b:Varchar
   ```
   
   This means, as we add `a`, then, `b` to map `m`, we must mutate the 
`MaterializedField` for `m` to add the child fields. This completely breaks the 
idea that a `MaterializedField` is always immutable. This is a place where a 
design goal (immutable `MateraializedField`) collides with implementation 
(readers add to maps as they find new fields).
   
   So. We could make a clone on each modification and problem solved, right? As 
in many places in Drill, the solution is not so simple. The above is the easy 
case. What about a nested map:
   
   ```
   m1:Map, {m2:Map {a:Int, b:Varchar}}
   . m2:Map, {a:Int, b:Varchar}
   . . a:INT
   . . b:Varchar
   ```
   
   Now when we add `a`, we have to update both the `m1` and `m2` 
`MaterializedField`s. If we clone the MaterializedField for `m2`, then the old 
version in `m1` will get out of sync. The result will be:
   
   ```
   m1:Map, {m2:Map {}}
   . m2:Map, {a:Int, b:Varchar}
   . . a:INT
   . . b:Varchar
   ```
   
   Code that depends on accurate schema information then breaks.
   
   So, we're left with a choice: clone and have a corrupt schema, or make 
`MaterializedField` mutable.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -----------------------------------------------------------------
>
>                 Key: DRILL-6373
>                 URL: https://issues.apache.org/jira/browse/DRILL-6373
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

Reply via email to