[ https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251826#comment-16251826 ]
ASF GitHub Bot commented on DRILL-5657: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/914#discussion_r150755238 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/record/MaterializedField.java --- @@ -168,6 +174,58 @@ public boolean equals(Object obj) { Objects.equals(this.type, other.type); } + public boolean isEquivalent(MaterializedField other) { + if (! name.equalsIgnoreCase(other.name)) { + return false; + } + + // Requires full type equality, including fields such as precision and scale. + // But, unset fields are equivalent to 0. Can't use the protobuf-provided + // isEquals(), that treats set and unset fields as different. + + if (type.getMinorType() != other.type.getMinorType()) { + return false; + } + if (type.getMode() != other.type.getMode()) { + return false; + } + if (type.getScale() != other.type.getScale()) { + return false; + } + if (type.getPrecision() != other.type.getPrecision()) { + return false; + } + + // Compare children -- but only for maps, not the internal children + // for Varchar, repeated or nullable types. + + if (type.getMinorType() != MinorType.MAP) { + return true; + } + + if (children == null || other.children == null) { + return children == other.children; + } + if (children.size() != other.children.size()) { + return false; + } + + // Maps are name-based, not position. But, for our + // purposes, we insist on identical ordering. + + Iterator<MaterializedField> thisIter = children.iterator(); + Iterator<MaterializedField> otherIter = other.children.iterator(); + while (thisIter.hasNext()) { --- End diff -- The row set & writer abstractions require identical ordering so that column indexes are well-defined. Here we are facing the age-old philosophical question of "sameness." Sameness is instrumental: sameness-for-a-purpose. Here, we want to know if two schemas are equivalent for the purposes of referencing columns by index. We recently did a fix elsewhere we do use the looser definition: that A and B contain the same columns, but in possibly different orderings. Added a comment to explain this. > Implement size-aware result set loader > -------------------------------------- > > Key: DRILL-5657 > URL: https://issues.apache.org/jira/browse/DRILL-5657 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: Future > Reporter: Paul Rogers > Assignee: Paul Rogers > Fix For: Future > > > A recent extension to Drill's set of test tools created a "row set" > abstraction to allow us to create, and verify, record batches with very few > lines of code. Part of this work involved creating a set of "column > accessors" in the vector subsystem. Column readers provide a uniform API to > obtain data from columns (vectors), while column writers provide a uniform > writing interface. > DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size > (to avoid memory fragmentation due to Drill's two memory allocators.) The > column accessors have proven to be so useful that they will be the basis for > the new, size-aware writers used by Drill's record readers. > A step in that direction is to retrofit the column writers to use the > size-aware {{setScalar()}} and {{setArray()}} methods introduced in > DRILL-5517. > Since the test framework row set classes are (at present) the only consumer > of the accessors, those classes must also be updated with the changes. > This then allows us to add a new "row mutator" class that handles size-aware > vector writing, including the case in which a vector fills in the middle of a > row. -- This message was sent by Atlassian JIRA (v6.4.14#64029)