paul-rogers commented on issue #1367: DRILL-6585: PartitionSender clones vectors, but shares field metdata URL: https://github.com/apache/drill/pull/1367#issuecomment-404387515 @vrozov, as usual, you zeroed in on two issues. First, it turns out that we have both a `clone()` and `cloneEmpty()`. I should have used `cloneEmpty()`. This is a problem with not being able to run the pre-commit tests, I can't catch such bone-headed errors... Second, the `clone()` method does not even make sense. It will clear out child fields for the `MaterializedField`, but *not* the subtypes for the major type. This is very likely a bug. Why have we never caught it? Turns out that `clone()` is not called. The same bug exists for all of the `withPathAndType()` variations: we preserve subtypes. But, the semantics of the (mostly broken) union vectors appear to want a new vector to be given a materialized field without subtypes; subtypes are added later as the types are needed. Is that behavior itself a bug or a feature? We can't be sure because union vectors are not supported my much of Drill. This makes this a hard case. Since the behavior is murky, I'd suggest leaving well enough alone, and just fixing the one problem we know about: passing a fully-populated materialized field to a new vector. Might be worthwhile filing a JIRA to track this issue. (Though, we never seem to fix such tickets...) Running tests with the other PR (with the above fix) should tell us if the fix works, since that PR found the issue. To your fundamental question, please see the discussion in the original PR and in the description for details. The story is, in essence: * There is a fully-built vector coming into the partitioner. * When that incoming vector was built, the code for a map added children to the `MaterializedField`, and for unions, added a subtype to the `MajorType`. * For whatever reason, the partition wants to make a new copy of that vector using the same structure. * If we reuse the same vector, then the new code will modify it as we build up the maps or unions, causing concurrent modifications to the source vector's field while building the destination. * If we do a simple copy, we'll carry over child fields and subtypes, which will likely cause problems as we build up the new vector's structure. The original code would only work for non-nullable, fixed-width types (with no internal structure) I have no idea why it started to fail with my result set loader changes. Probably because, to get things to work, I had to fix bugs in the vector code in which we were not properly maintaining the material field structure, which caused problems with the new code tried to copy a vector structure to create the "overflow vector."
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
