[jira] [Comment Edited] (FLINK-10929) Add support for Apache Arrow

Fabian Hueske (JIRA) Wed, 10 Apr 2019 01:15:24 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699093#comment-16699093
 ]


Fabian Hueske edited comment on FLINK-10929 at 4/10/19 8:14 AM:
----------------------------------------------------------------

Arrow is a columnar in-memory data storage / exchange format. This means it was 
not designed with point updates / queries in mind which is the access pattern 
for a state backend in Flink. I'm not saying that there is no use case for 
Arrow in streaming, but IMO there is no obvious one. 

For batch SQL, yes, it might make sense but it would be a huge change 
(basically completely rewriting the data processing engine). 
One could also think of adding interfaces to consume or export Arrow data. 
There are probably more scenarios in which Arrow could be used.

I think we need more detail to decide whether this proposal is something that 
is useful (and achievable) for Flink or not.
[~pcless], what was the use case you had in mind when opening this Jira?





was (Author: fhueske):
Arrow is a columnar in-memory data storage / exchange format. This means it was 
not designed with point updates / queries in mind which is the access pattern 
for a state backend in Flink. I'm not saying that there is no use case for 
Arrow in streaming, but IMO there is obvious one. 

For batch SQL, yes, it might make sense but it would be a huge change 
(basically completely rewriting the data processing engine). 
One could also think of adding interfaces to consume or export Arrow data. 
There are probably more scenarios in which Arrow could be used.

I think we need more detail to decide whether this proposal is something that 
is useful (and achievable) for Flink or not.
[~pcless], what was the use case you had in mind when opening this Jira?




> Add support for Apache Arrow
> ----------------------------
>
>                 Key: FLINK-10929
>                 URL: https://issues.apache.org/jira/browse/FLINK-10929
>             Project: Flink
>          Issue Type: Wish
>          Components: Runtime / State Backends
>            Reporter: Pedro Cardoso Silva
>            Priority: Minor
>         Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-10929) Add support for Apache Arrow

Reply via email to