[ 
https://issues.apache.org/jira/browse/KUDU-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2077:
------------------------------
    Labels: kudu-roadmap  (was: )

> Return data in Apache Arrow format
> ----------------------------------
>
>                 Key: KUDU-2077
>                 URL: https://issues.apache.org/jira/browse/KUDU-2077
>             Project: Kudu
>          Issue Type: New Feature
>          Components: client, server
>            Reporter: Andrew Wong
>            Priority: Major
>              Labels: kudu-roadmap
>
> Dan and I spent the hackathon tinkering with the Apache Arrow format. Arrow 
> is an in-memory columnar format designed to be the common data format for a 
> large number of projects, see [here|https://arrow.apache.org]. One place we 
> thought adding this would be particularly fitting is when sending results 
> back to the client, since this currently returns row-wise data. By returning 
> Arrow, this could open the door to simpler and faster integration with other 
> projects.
> The server-side changes can be localized to the tablet service and wire 
> protocol. We considered using Arrow more exhaustively throughout the server 
> codebase, but found that because Arrow and Kudu's own in-memory format (i.e. 
> that in kudu::ColumnBlock) are so similar, a simpler approach is to copy the 
> buffers from ColumnBlock to the scan response and build arrow::Arrays 
> client-side. A POC of the server-side changes can be found here: 
> https://github.com/danburkert/kudu/tree/arrow
> At the time of writing this, the arrow::Array type has a varying number of 
> arrow::Buffers, depending on the data type (e.g. one for null bitmaps, one 
> for data, etc). The ColumnBlock "buffers" (i.e. data, null_bitmap) should be 
> compatible with these Buffers with a couple of modifications:
> * The null-bitmaps in arrow are the complement of those used by Kudu
> * The RowBlock that owns the ColumnBlocks has a selection vector needs to be 
> accounted for
> If the buffers are transferred over the wire (via sidecars or protobuf), they 
> should be able to be converted to Arrays via arrow::ArrayData or directly via 
> the Array constructors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to