[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509163#comment-16509163
 ] 

ASF GitHub Bot commented on DRILL-6373:
---------------------------------------

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396462584
 
 
   @vrozov, your proposed solution works for the single-level map case. It does 
*not* work for the two-level map case as the `MaterializedField` used inside 
the inner map is not the one passed from the outer map. Hacky code would be 
needed to pass in the `MaterializedField`, then fetch it to add it to the outer 
map's `MaterializedField`.
   
   And, even the above hack does not work for three-level maps.
   
   So, what else can we do? We can note one additional problem with maps. If we 
simply take a map field `m` from batch B1 and use it to create a map in batch 
B2, we have a dilemma. The `m` `MaterializedField` already contains "child" 
entries for `a` and `b`, say.
   
   A similar issue exists for any vector with structure: VarChar (offsets and 
data), Nullable (bits and data), etc.
   
   So, what to do? The new code, of which this PR is a part, faced similar 
difficulties with the additional complexity of the `ResultSetLoader` which must 
clone a vector to produce the "overflow" vector. (This is the reason for fixing 
the Nullable vectors so that the data vector has its type changed to Required, 
so that the vector clone works correctly.)
   
   (If all this seems overly complex, I agree that it is. I spent many hours 
working out all these twists and turns. The original design is broken and was 
never cleanly revised once maps were added.)
   
   So, we get our solution. Clone the `MaterializedField` in the 
`PartitionSender` *before* calling `BasicTypeHelper.getNewVector`.
   
   We might thing that `getNewVector()` can do the clone. But, it is used by 
`ResultSetLoader` assuming that the new vector will use the `MaterializedField` 
provided. And, in the vast number of cases, we don't need a clone.
   
   It is only when using one vector to create another that we need the clone, 
so it must be done in the caller. In this case, that is `PartitionSender`.
   
   Does this make sense? Maybe the easiest thing is to just make the change and 
do a test run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -----------------------------------------------------------------
>
>                 Key: DRILL-6373
>                 URL: https://issues.apache.org/jira/browse/DRILL-6373
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to