[jira] [Updated] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-05-15 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5514:
---
Description: 
Consider the concept of a "record batch" in Drill. On the one hand, one can 
envision a record batch as a stack of records:

{code}
| a1 | b1 | c1 |

| a2 | b2 | c2 |
{code}

But, Drill is columnar. So a record batch is really a "bundle" of vectors:

{code}
| a1 || b1 || c1 |
| a2 || b2 || c2 |
{code}

There are times when it is handy to build up a record batch as a merge of two 
different vector bundles:

{code}
-- bundle 1 ---- bundle 2 --
| a1 || b1 || c1 |
| a2 || b2 || c2 |
{code}

For example, consider a reader. The reader implementation might read columns 
(a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an implicit 
vector (the file name, say.) The merged set of vectors comprises the final 
schema: (a, b, c).

This ticket asks for the code to do the merge:

* Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
* Merge two vector containers C1 and C2 to create a new container, C3, that 
holds the merger of the vectors from the first two.

Clearly, the merge only makes sense if:

* The two input containers have the same row count, and
* The columns in each input container are distinct.

Because this feature is also useful for tests, add the merge to the "row set" 
tools also.

  was:
Consider the concept of a "record batch" in Drill. On the one hand, one can 
envision a record batch as a stack of records:

{code}
| a1 | b1 | c1 |

| a2 | b2 | c2 |
{code}

But, Drill is columnar. So a record batch is really a "bundle" of vectors:

{code}
| a1 || b1 || c1 |
| a2 || b2 || c2 |
{code}

There are times when it is handy to build up a record batch as a merge of two 
different vector bundles:

{code}
-- bundle 1 ---- bundle 2 --
| a1 || b1 || c1 |
| a2 || b2 || c2 |
{code}

For example, consider a reader. The reader implementation might read columns 
(a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an implicit 
vector (the file name, say.) The merged set of vectors comprises the final 
schema: (a, b, c).

This ticket asks for the code to do the merge:

* Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
* Merge two vector containers C1 and C2 to create a new container, C3, that 
holds the merger of the vectors from the first two.


> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-05-17 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5514:
---
Reviewer: Sorabh Hamirwasia

> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-06-05 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan updated DRILL-5514:
--
Reviewer: Karthikeyan Manivannan  (was: Sorabh Hamirwasia)

> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5514) Enhance VectorContainer to merge two row sets

2017-06-15 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5514:
---
Labels: ready-to-commit  (was: )

> Enhance VectorContainer to merge two row sets
> -
>
> Key: DRILL-5514
> URL: https://issues.apache.org/jira/browse/DRILL-5514
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Consider the concept of a "record batch" in Drill. On the one hand, one can 
> envision a record batch as a stack of records:
> {code}
> | a1 | b1 | c1 |
> 
> | a2 | b2 | c2 |
> {code}
> But, Drill is columnar. So a record batch is really a "bundle" of vectors:
> {code}
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> There are times when it is handy to build up a record batch as a merge of two 
> different vector bundles:
> {code}
> -- bundle 1 ---- bundle 2 --
> | a1 || b1 || c1 |
> | a2 || b2 || c2 |
> {code}
> For example, consider a reader. The reader implementation might read columns 
> (a, b) from a file, say. Then, the "{{ScanBatch}}" might add (c) as an 
> implicit vector (the file name, say.) The merged set of vectors comprises the 
> final schema: (a, b, c).
> This ticket asks for the code to do the merge:
> * Merge two schemas A = (a, b), B = (c) to create schema C = (a, b, c).
> * Merge two vector containers C1 and C2 to create a new container, C3, that 
> holds the merger of the vectors from the first two.
> Clearly, the merge only makes sense if:
> * The two input containers have the same row count, and
> * The columns in each input container are distinct.
> Because this feature is also useful for tests, add the merge to the "row set" 
> tools also.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)