[ 
https://issues.apache.org/jira/browse/ARROW-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rok Mihevc updated ARROW-3164:
------------------------------
    External issue URL: https://github.com/apache/arrow/issues/19512

> [Java] Port Row Set abstraction from Drill to Arrow
> ---------------------------------------------------
>
>                 Key: ARROW-3164
>                 URL: https://issues.apache.org/jira/browse/ARROW-3164
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Paul Rogers
>            Priority: Major
>
> Arrow is a great way to exchange data between systems. Somewhere in the 
> process, however, data must be load into, and read out of the Arrow vectors.
> Arrow's vector code started with similar code inĀ Apache Drill. The Drill 
> project created a "Row Set" abstraction that:
>  * Provides a simple way to define the schema for a set of batches.
>  * Loads data into vectors from row-oriented inputs.
>  * Reads data out of vectors in row-oriented output.
>  * Controls memory consumed by the record batch when loading data into 
> vectors.
>  * Ensures maximum usage of the allocated vector space when loading data Into 
> vectors.
>  * Optionally handles projection when reading data from an input file into a 
> set of vectors.
>  * Optionally handles data conversion from input to vector formats.
> This mechanism is handy for any Java developer who produces or consumes Arrow 
> vectors.
> Detailed information is available in [this 
> wiki|https://github.com/paul-rogers/arrow/wiki], including a more detailed 
> description of the motivation for this project, and an analysis of the work 
> required to do the Drill-to-Arrow port.
> The code is in Java simply because Drill is written in Java. The same 
> mechanisms can be ported to other languages if useful. Those ports would be 
> separate future projects.
> The code will be placed in a new Java module which can be imported by 
> projects that wish to use the code. Changes may be needed to expose items 
> from the {{vector}} module; we'll tackle those issues if/when they occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to