[
https://issues.apache.org/jira/browse/DRILL-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542452#comment-16542452
]
ASF GitHub Bot commented on DRILL-6585:
---------------------------------------
sohami commented on issue #1367: DRILL-6585: PartitionSender clones vectors,
but shares field metdata
URL: https://github.com/apache/drill/pull/1367#issuecomment-404713253
@paul-rogers - I have couple of questions related to this change and your
last comment:
1) I don't understand what you mean by `clone` method will clear out the
child field. Isn't `MaterializedField::clone` is taking care of [cloning the
children
fields](https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java#L145)
as well ? Instead `cloneEmpty` is throwing away the children fields.
2) The main issue with the original PR and the logic of building outgoing
batch from incoming batch of Partition Sender is something like below:
- In original PR there is a change for NullableValueVectors to add the
`values` and `bits` vector materialized field as child field of parent vector
field. See
[here](https://github.com/apache/drill/pull/1244/files#diff-646d5ac291e14369d72634752edaae84R111).
From your comment it looks like because the internal `values` ValueVector mode
needs to be required so you are creating another Materialized Field with that
mode for internal `values` vector and adding it as child of parent vector
field. But AFAIK based on the usage seen, the children of a Materialized Field
is only used in case of container Type vector like Map not in case of composite
vector like NullableValueVectors. For those the internal `values` and `bits`
vector holds on to the parent Materialized Field. Now with change even for
Nullable ValueVectors the children component of parent field will be populated.
This will probably affect the serde component and compatibility too ?
- Now coming to the main issue. Earlier for each incoming of the partition
sender, there are multiple threads which are trying to create outgoing batches
but using the same Materialized from incoming batch vectors. Since outgoing
batch will have same schema as incoming batch.
- When a ValueVector in incoming batch is of Map type then while creating a
new ValueVector of Map type in outgoing batch it passes the Materialized Field
from input Map Vector. In the constructor of
[AbstractMapVector](https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java#L49),
it clones the input field and then iterate over the children of **not** the
cloned field but the original input field (which is from same incoming vector).
This looks to be a bug where it should [iterate on cloned field children rather
than input field
children](https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java#L51).
- With above if there is a child of the Map vector which is of Nullable
type, then that child vector will get reference to the children field from
actual input field to parent MapVector (not the reference to cloned children
field).
- With [new
change](https://github.com/apache/drill/pull/1244/files#diff-646d5ac291e14369d72634752edaae84R111)
in NullableValueVector it will try to modify it's input Materialized Field to
add children for `bit` and `values` vector field.
- So combination of above bug in Map vector and the new change will cause
the concurrent modification exception as NullableValueVector child of MapVector
will modify the Materialized field of incoming ValueVector not the new clone
Materialized Field created for the outgoing vector.
With your current fix I think you should still use `clone` instead of
`cloneEmpty` since we need to preserve the children information too in outgoing
batches of PartitionSender.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> PartitionSender clones vectors, but shares field metdata
> --------------------------------------------------------
>
> Key: DRILL-6585
> URL: https://issues.apache.org/jira/browse/DRILL-6585
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.13.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
>
> See the discussion forĀ [PR #1244 for
> DRILL-6373|https://github.com/apache/drill/pull/1244].
> The PartitionSender clones vectors. But, it does so by reusing the
> {{MaterializedField}} from the original vector. Though the original authors
> of {{MaterializedField}} apparently meant it to be immutable, later changes
> for maps and unions ended up changing it to add members.
> When cloning a map, we get the original map materialized field, then start
> doctoring it up as we add the cloned map members. This screws up the original
> map vector's metadata.
> The solution is to clone an empty version of the materialized field when
> creating a new vector.
> But, since much code creates vectors by giving a perfectly valid, unique
> materialized field, we want to add a new method for use by the ill-behaved
> uses, such as PartitionSender, that ask to create a new vector without
> cloning the materialized field.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)