[ https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050956#comment-16050956 ]
ASF GitHub Bot commented on DRILL-5514: --------------------------------------- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/837#discussion_r122287615 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -162,20 +162,22 @@ private boolean majorTypeEqual(MajorType t1, MajorType t2) { * Merge two schema to produce a new, merged schema. The caller is responsible * for ensuring that column names are unique. The order of the fields in the * new schema is the same as that of this schema, with the other schema's fields - * appended in the order defined in the other schema. The resulting selection - * vector mode is the same as this schema. (That is, this schema is assumed to - * be the main part of the batch, possibly with a selection vector, with the - * other schema representing additional, new columns.) + * appended in the order defined in the other schema. + * <p> + * Merging data with selection vectors is unlikely to be useful, or work well. --- End diff -- Can you please leave a comment about why this is unlikely to be useful, or work well? > Enhance VectorContainer to merge two row sets > --------------------------------------------- > > Key: DRILL-5514 > URL: https://issues.apache.org/jira/browse/DRILL-5514 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.10.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Priority: Minor > Fix For: 1.11.0 > > > Consider the concept of a "record batch" in Drill. On the one hand, one can > envision a record batch as a stack of records: > {code} > | a1 | b1 | c1 | > ---------------- > | a2 | b2 | c2 | > {code} > But, Drill is columnar. So a record batch is really a "bundle" of vectors: > {code} > | a1 | | b1 | | c1 | > | a2 | | b2 | | c2 | > {code} > There are times when it is handy to build up a record batch as a merge of two > different vector bundles: > {code} > -- bundle 1 -- -- bundle 2 -- > | a1 | | b1 | | c1 | > | a2 | | b2 | | c2 | > {code} > For example, consider a reader. The reader implementation might read columns > (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an > implicit vector (the file name, say.) The merged set of vectors comprises the > final schema: (a, b, c). > This ticket asks for the code to do the merge: > * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c). > * Merge two vector containers C1 and C2 to create a new container, C3, that > holds the merger of the vectors from the first two. > Clearly, the merge only makes sense if: > * The two input containers have the same row count, and > * The columns in each input container are distinct. > Because this feature is also useful for tests, add the merge to the "row set" > tools also. -- This message was sent by Atlassian JIRA (v6.4.14#64029)