[ 
https://issues.apache.org/jira/browse/DRILL-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543904#comment-16543904
 ] 

ASF GitHub Bot commented on DRILL-6585:
---------------------------------------

paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones 
vectors, but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404987283
 
 
   @sohami, thanks for your comments and questions. Unfortunately, I cannot 
debug the use case and so you may have a deeper understanding than I do. I'm 
working from experience gained some six months ago when working with the result 
set loader, and that knowledge is getting rusty.
   
   > In original PR there is a change for NullableValueVectors to add the 
values and bits vector materialized field as child field of parent vector 
field. ... From your comment it looks like because the internal values 
ValueVector mode needs to be required so you are creating another Materialized 
Field with that mode for internal values vector and adding it as child of 
parent vector field.
   
   The reason for that change is that the result set loader code that clones a 
vector needs to know the actual type. That code walks the vector tree, using 
the `MaterializedField` to get the type. If a `values` vector (which has no 
`bits` vector) reports its type as `Nullable`, then the clone will create a 
`bits` vector, which causes havoc.
   
   I'm thinking that I should change the cloning code. Rather than believing 
the `MaterializedField`, I can use the vector class type itself. That will be 
more clunky and slow, but it will eliminate the need to change the existing 
vector code.
   
   Given how long this discussion has gone on, that I can't do the required 
tests, and that we can't we discuss this in person, I'm thinking that the 
alternative approach may be more expedient.
   
   I suppose a larger question is whether the final bits of the result set 
loader are even still useful. Much work has been done on batch sizing since 
this work started. Is it still worth while finishing up this code so we can 
control the batch size for readers? Parquet has its own solution. Is it worth 
worrying about the others?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> PartitionSender clones vectors, but shares field metdata
> --------------------------------------------------------
>
>                 Key: DRILL-6585
>                 URL: https://issues.apache.org/jira/browse/DRILL-6585
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> See the discussion forĀ [PR #1244 for 
> DRILL-6373|https://github.com/apache/drill/pull/1244].
> The PartitionSender clones vectors. But, it does so by reusing the 
> {{MaterializedField}} from the original vector. Though the original authors 
> of {{MaterializedField}} apparently meant it to be immutable, later changes 
> for maps and unions ended up changing it to add members.
> When cloning a map, we get the original map materialized field, then start 
> doctoring it up as we add the cloned map members. This screws up the original 
> map vector's metadata.
> The solution is to clone an empty version of the materialized field when 
> creating a new vector.
> But, since much code creates vectors by giving a perfectly valid, unique 
> materialized field, we want to add a new method for use by the ill-behaved 
> uses, such as PartitionSender, that ask to create a new vector without 
> cloning the materialized field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to