[ https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037580#comment-16037580 ]
ASF GitHub Bot commented on DRILL-5514: --------------------------------------- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r118797793 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { return true; } + /** + * Merge two schema to produce a new, merged schema. The caller is responsible + * for ensuring that column names are unique. The order of the fields in the + * new schema is the same as that of this schema, with the other schema's fields + * appended in the order defined in the other schema. The resulting selection + * vector mode is the same as this schema. (That is, this schema is assumed to + * be the main part of the batch, possibly with a selection vector, with the + * other schema representing additional, new columns.) + * @param otherSchema the schema to merge with this one + * @return the new, merged, schema + */ + + public BatchSchema merge(BatchSchema otherSchema) { + if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE && + selectionVectorMode != otherSchema.selectionVectorMode) { + throw new IllegalArgumentException("Left schema must carry the selection vector mode"); --- End diff -- "Left schema must carry the same selection vector mode" + "as the right schema"? > Enhance VectorContainer to merge two row sets > --------------------------------------------- > > Key: DRILL-5514 > URL: https://issues.apache.org/jira/browse/DRILL-5514 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > Fix For: 1.11.0 > > > Consider the concept of a "record batch" in Drill. On the one hand, one can > envision a record batch as a stack of records: > {code} > | a1 | b1 | c1 | > ---------------- > | a2 | b2 | c2 | > {code} > But, Drill is columnar. So a record batch is really a "bundle" of vectors: > {code} > | a1 | | b1 | | c1 | > | a2 | | b2 | | c2 | > {code} > There are times when it is handy to build up a record batch as a merge of two > different vector bundles: > {code} > -- bundle 1 -- -- bundle 2 -- > | a1 | | b1 | | c1 | > | a2 | | b2 | | c2 | > {code} > For example, consider a reader. The reader implementation might read columns > (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an > implicit vector (the file name, say.) The merged set of vectors comprises the > final schema: (a, b, c). > This ticket asks for the code to do the merge: > * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c). > * Merge two vector containers C1 and C2 to create a new container, C3, that > holds the merger of the vectors from the first two. > Clearly, the merge only makes sense if: > * The two input containers have the same row count, and > * The columns in each input container are distinct. > Because this feature is also useful for tests, add the merge to the "row set" > tools also. -- This message was sent by Atlassian JIRA (v6.3.15#6346)