[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

Pedro Cardoso Silva (JIRA) Mon, 26 Nov 2018 07:48:09 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699173#comment-16699173
 ]


Pedro Cardoso Silva commented on FLINK-10929:
---------------------------------------------

{noformat}
 Pedro Cardoso Silva, what was the use case you had in mind when opening this 
Jira?
{noformat} 
Essentially how to load and analyse over large datasets (100s of millions of 
records). 
My current stack is Spark but my company is considering Flink and we have some 
memory issues when loading data for analysis since spark holds on top all 
records in memory forcing us to have large amounts of RAM just to have 
everything in memory with unoptimized querying operations. 
I found Arrow and it seemed to me a good match, considering what we are going 
to use, hence this Jira ticket.



> Add support for Apache Arrow
> ----------------------------
>
>                 Key: FLINK-10929
>                 URL: https://issues.apache.org/jira/browse/FLINK-10929
>             Project: Flink
>          Issue Type: Wish
>            Reporter: Pedro Cardoso Silva
>            Assignee: vinoyang
>            Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

Reply via email to