[jira] [Commented] (DRILL-5514) Enhance VectorContainer to merge two row sets

ASF GitHub Bot (JIRA) Mon, 05 Jun 2017 13:58:28 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037580#comment-16037580
 ]


ASF GitHub Bot commented on DRILL-5514:
---------------------------------------

Github user bitblender commented on a diff in the pull request:

    https://github.com/apache/drill/pull/837#discussion_r118797793
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
    @@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
         return true;
       }
     
    +  /**
    +   * Merge two schema to produce a new, merged schema. The caller is 
responsible
    +   * for ensuring that column names are unique. The order of the fields in 
the
    +   * new schema is the same as that of this schema, with the other 
schema's fields
    +   * appended in the order defined in the other schema. The resulting 
selection
    +   * vector mode is the same as this schema. (That is, this schema is 
assumed to
    +   * be the main part of the batch, possibly with a selection vector, with 
the
    +   * other schema representing additional, new columns.)
    +   * @param otherSchema the schema to merge with this one
    +   * @return the new, merged, schema
    +   */
    +
    +  public BatchSchema merge(BatchSchema otherSchema) {
    +    if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
    +        selectionVectorMode != otherSchema.selectionVectorMode) {
    +      throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
    --- End diff --
    
    "Left schema must carry the same selection vector mode"  + "as the right 
schema"?


> Enhance VectorContainer to merge two row sets
> ---------------------------------------------
>
>                 Key: DRILL-5514
>                 URL: https://issues.apache.org/jira/browse/DRILL-5514
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> ----------------
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 |    | b1 |    | c1 |
> | a2 |    | b2 |    | c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 --    -- bundle 2 --
> | a1 |    | b1 |        | c1 |
> | a2 |    | b2 |        | c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (DRILL-5514) Enhance VectorContainer to merge two row sets

Reply via email to