[GitHub] sohami commented on issue #1367: DRILL-6585: PartitionSender clones vectors, but shares field metdata

GitBox Thu, 12 Jul 2018 20:15:02 -0700

sohami commented on issue #1367: DRILL-6585: PartitionSender clones vectors, 
but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404713253
 
 
   @paul-rogers - I have couple of questions related to this change and your 
last comment:
   1) I don't understand what you mean by `clone` method will clear out the 
child field. Isn't `MaterializedField::clone` is taking care of [cloning the 
children 
fields](https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java#L145)
 as well ? Instead `cloneEmpty` is throwing away the children fields.
   
   2) The main issue with the original PR and the logic of building outgoing 
batch from incoming batch of Partition Sender is something like below:
   
   - In original PR there is a change for NullableValueVectors to add the 
`values` and `bits` vector materialized field as child field of parent vector 
field. See 
[here](https://github.com/apache/drill/pull/1244/files#diff-646d5ac291e14369d72634752edaae84R111).
 From your comment it looks like because the internal `values` ValueVector mode 
needs to be required so you are creating another Materialized Field with that 
mode for internal `values` vector and adding it as child of parent vector 
field. But AFAIK based on the usage seen, the children of a Materialized Field 
is only used in case of container Type vector like Map not in case of composite 
vector like NullableValueVectors. For those the internal `values` and `bits` 
vector holds on to the parent Materialized Field. Now with change even for 
Nullable ValueVectors the children component of parent field will be populated. 
This will probably affect the serde component and compatibility too ?
   
   - Now coming to the main issue. Earlier for each incoming of the partition 
sender, there are multiple threads which are trying to create outgoing batches 
but using the same Materialized from incoming batch vectors. Since outgoing 
batch will have same schema as incoming batch.
   
   - When a ValueVector in incoming batch is of Map type then while creating a 
new ValueVector of Map type in outgoing batch it passes the Materialized Field 
from input Map Vector. In the constructor of 
[AbstractMapVector](https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java#L49),
 it clones the input field and then iterate over the children of **not** the 
cloned field but the original input field (which is from same incoming vector). 
This looks to be a bug where it should [iterate on cloned field children rather 
than input field 
children](https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java#L51).
   
   - With above if there is a child of the Map vector which is of Nullable 
type, then that child vector will get reference to the children field from 
actual input field to parent MapVector (not the reference to cloned children 
field).
   
   - With [new 
change](https://github.com/apache/drill/pull/1244/files#diff-646d5ac291e14369d72634752edaae84R111)
 in NullableValueVector it will try to modify it's input Materialized Field to 
add children for `bit` and `values` vector field.
   
   - So combination of above bug in Map vector and the new change will cause 
the concurrent modification exception as NullableValueVector child of MapVector 
will modify the Materialized field of incoming ValueVector not the new clone 
Materialized Field created for the outgoing vector.
   
   With your current fix I think you should still use `clone` instead of 
`cloneEmpty` since we need to preserve the children information too in outgoing 
batches of PartitionSender.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] sohami commented on issue #1367: DRILL-6585: PartitionSender clones vectors, but shares field metdata

Reply via email to