[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619217#comment-17619217 ] Aitozi commented on FLINK-10929: Nice to see a good discussion about Arrow support here. I have one question that can we make a first step to support the arrow format to make it as a standard format in Flink like Protobuf / JSON / CSV, Since it has become a widely used in different system, it will make it easy to use if we supported naturally. Besides, in the current code base of 1.16. There already some module work with the arrow eg: flink-python. Do you guys think it's worth to support it in a more wider way ? cc [~dianfu] [~Yellow] [~martijnvisser] > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: New Feature > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831530#comment-16831530 ] Stephan Ewen commented on FLINK-10929: -- [~fan_li_ya] A lot of this seems applicable for the early stages of pulling data in and filtering / projecting the data. So in combination with sources, I can see a benefit, but that is different from a deep integration into the query processor. For a deep integration into the query processor, the answer is not clear. Like Kurt pointed out, specifically - Batch / Streaming unification - High fan-out shuffles (work better row-wise) - Row materialization strategies for join cascades are non trivial Not only would this be a huge engineering effort until this has full feature coverage. I also don't see how with all these open questions we can possibly start working on such an integration into the runtime. It would mean breaking everything without knowing whether it would work out in the end. I see only one way forward for Arrow in the query processor: - First, we finish the current query processor work, based on the Blink merge, and all the related work around batch failover. - Second, someone make a PoC by forking the Blink query processor and modify it to work with Arrow. That PoC would need to show that it will be possible to get the same feature coverage (there is no fundamental design issue / blocker) and that there are relevant speedups in many cases, and no bad regressions (in performance, stability, resource footprint) in too many other cases. That means having a solid design or PoC implementation for all complex cases like (inner-/outer-/semi-/anti-) joins, time-versioned joins (in memory and spilling), aggregations, high fan-out shuffles, etc. - We could then make this additional query processor available through the pluggable Query Planner mechanism, in the same way as the current Flink SQL engine and the Blink SQL engine exist side by side for now. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826677#comment-16826677 ] Liya Fan commented on FLINK-10929: -- Hi [Stephan Ewen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=StephanEwen] and [Kurt Young|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ykt836], thanks a lot for your valuable comments. It seems there is not much debate for point 1). For point 2), I want to list a few advantages of using Arrow as internal data structures for Flink. These conclusions are based on our investigations of research papers, as well as our engineering experiences in secondary development on Blink. # Arrow adopts columnar data format, which makes it possible to apply SIMD ([link|[http://prestodb.rocks/code/simd/])|http://prestodb.rocks/code/simd/%5d).]. SIMD means higher performance. Java starts to support SIMD in Java 8, and it is expected that high versions of Java will provide better support for SIMD. # Arrow provides more compact memory layout. To make it simple, encoding a batch of rows as Arrow vectors will require much less memory space, compared with encoding by BinaryRow. This, in turn, leads to two results: ## With the same amount of memory space, more data can be loaded into memory, and fewer spills are required. ## The cost of shuffling can be reduced significantly. For example, the amount of shuffled data for TPC-H Q4 reduces to almost the half. # Columnar format makes it much easier for data compression (see [paper|http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf], for example). Our initial experiments have shown some positive results. There are more advantages for Arrow, but I just list a few, due to space constraints. The TPC-H performance in the attachment speaks for itself. We understand and respect that Blink is currently being merged, and some changes should be postponed. We would like to provide our devoted help in this process. However, too much change is not a good reason to refuse Arrow, if we want to make Flink the world-top-level computing engine. All these difficulties can be overcome. We have a plan to carry out the changes incrementally. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816345#comment-16816345 ] Kurt Young commented on FLINK-10929: I'm not sure everyone who have already involved to this discussion have a clean and common goal about introducing Apache Arrow to Flink. As Stephan said, there are two scenarios which can be considered. Regarding (2): I think making Arrow as a vectorized execution data format will involves lots of changes, from runtime to operator and query optimizer. We should at first have consensus about the final goal and status of this. Whether streaming can benefits from vectorized execution? Will this break the unification of streaming and batch? How many benefits we can gain from it... There are lots of unanswered questions. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816192#comment-16816192 ] Stephan Ewen commented on FLINK-10929: -- There are two different considerations: (1) Arrow for interoperability with other systems and languages (source / sink / inter-process-communication) (2) Arrow as a format for internal data processing in SQL / Table API. For (1), I see it would make sense, but then we need to look more concretely at what we want to integrate. Arrow is not a magic integration, it is a data format. For (2), the original Flink query processor is not getting much committer attention, because we plan to replace it with the Blink processor in the long run. The Blink processor is in the process or merging and that needs to be finished before we can start making more changes. [~ykt836] could probably provide some input into when would be a good time to follow up there. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815264#comment-16815264 ] Liya Fan commented on FLINK-10929: -- Hi [~fhueske], thank you for your kind reply. I agree with most points. The changes should be incremental so as not to break other components. So I think the first step is to provide a flag which is disabled by default, and let the MemoryManager depend on the Arrow Buffer Allocator. With this change, all the MemorySegment will be Arrow buffers, but this is transparent to other components, and will break them. I guess I will initiate a discussion on the dev mailing list. BTW, my mail address has been kicked out from the mailing list for a couple of days, because the dev-help claims that it receives some bouncing message. Would you please add another email address of mine (fan_li...@aliyun.com) to the mailing list? Thank you in advance. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815207#comment-16815207 ] Fabian Hueske commented on FLINK-10929: --- Hi [~fan_li_ya], I think Arrow is a good building block for vectorized processing of batch queries, which is what it was designed for and what your implementation is doing. With all the changes that are currently done on the Table API / SQL (API, planner, runtime), adding such a feature needs to be well coordinated. I'm not involved in those efforts, so it would be best to start a discussion thread on the dev mailing list to gather some input there. Best, Fabian > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815159#comment-16815159 ] vinoyang commented on FLINK-10929: -- [~fan_li_ya] Glad to see the design document. I will have a look at it later. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815155#comment-16815155 ] Liya Fan commented on FLINK-10929: -- Hi [~yanghua], thank you so much for your feedback. Below is an initial draft of our design document: https://docs.google.com/document/d/1LstaiGzlzTdGUmyG_-9GSWleKpLRid8SJWXQCTqcQB4/edit?usp=sharing > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815073#comment-16815073 ] vinoyang commented on FLINK-10929: -- Hi [~fan_li_ya] If there is any design documentation, it would be better for discussing. And it may be a good start. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815009#comment-16815009 ] Liya Fan commented on FLINK-10929: -- We have imported Arrow in our efforts to vectorize Blink batch jobs. It is an incremental change, which can be easily turned off with a single flag, and it does not affect other parts of the code base. [~fhueske], do you think it is a good time to make some initial attempts to incorporate such changes now ? :) > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish > Components: Runtime / State Backends >Reporter: Pedro Cardoso Silva >Priority: Minor > Attachments: image-2019-04-10-13-43-08-107.png > > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700091#comment-16700091 ] Fabian Hueske commented on FLINK-10929: --- Arrow is a data format to store data in-memory, process and transfer it. Columnar formats like Arrow are known for good compression and to be efficient bulk operations like aggregations. >From that perspective, building on Arrow for a processing engine might make >sense. However, we are not starting from a green field. Flink has it's own in-memory representation, memory management, and network transfer. Replacing this by Arrow would be a huge effort and result in rewriting large parts of the internals. So this is nothing that is quickly done. TBH,I think there are more important issues on the roadmap at the moment. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699173#comment-16699173 ] Pedro Cardoso Silva commented on FLINK-10929: - {noformat} Pedro Cardoso Silva, what was the use case you had in mind when opening this Jira? {noformat} Essentially how to load and analyse over large datasets (100s of millions of records). My current stack is Spark but my company is considering Flink and we have some memory issues when loading data for analysis since spark holds on top all records in memory forcing us to have large amounts of RAM just to have everything in memory with unoptimized querying operations. I found Arrow and it seemed to me a good match, considering what we are going to use, hence this Jira ticket. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699093#comment-16699093 ] Fabian Hueske commented on FLINK-10929: --- Arrow is a columnar in-memory data storage / exchange format. This means it was not designed with point updates / queries in mind which is the access pattern for a state backend in Flink. I'm not saying that there is no use case for Arrow in streaming, but IMO there is obvious one. For batch SQL, yes, it might make sense but it would be a huge change (basically completely rewriting the data processing engine). One could also think of adding interfaces to consume or export Arrow data. There are probably more scenarios in which Arrow could be used. I think we need more detail to decide whether this proposal is something that is useful (and achievable) for Flink or not. [~pcless], what was the use case you had in mind when opening this Jira? > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696351#comment-16696351 ] vinoyang commented on FLINK-10929: -- Hi [~fhueske] I believe that Flink Streaming will also benefit from Apache Arrow. If there is a HeapKeyedStateBackend based on the Arrow data format implementation, then those real-time state information can be shared more efficiently with other systems, which I think makes sense. What do you think? > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693199#comment-16693199 ] Fabian Hueske commented on FLINK-10929: --- It depends on a couple of things. Who is willing to work on it, i.e., create a design doc, discuss the document, implement the feature, review the PR and merge it. So there are many people involved in the process. Esp. for features or improvements that require larger changes, it might be hard to get enough people behind the effort. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693193#comment-16693193 ] Pedro Cardoso Silva commented on FLINK-10929: - Thank you for the reply [~fhueske] Is there usually a time-table for this kind of feature? Even if only to know whether supporting it makes sense. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10929) Add support for Apache Arrow
[ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692953#comment-16692953 ] Fabian Hueske commented on FLINK-10929: --- Thanks for opening this Jira [~pcless]. Apache Arrow looks indeed quite interesting. I can see that it would make sense to adopt it for batch work loads. For streaming use cases, I am not so sure and we would have to investigate the benefits first. In any case, an integration would be a major effort and require a detailed design document and discussion. > Add support for Apache Arrow > > > Key: FLINK-10929 > URL: https://issues.apache.org/jira/browse/FLINK-10929 > Project: Flink > Issue Type: Wish >Reporter: Pedro Cardoso Silva >Assignee: vinoyang >Priority: Minor > > Investigate the possibility of adding support for Apache Arrow as a > standardized columnar, memory format for data. > Given the activity that [https://github.com/apache/arrow] is currently > getting and its claims objective of providing a zero-copy, standardized data > format across platforms, I think it makes sense for Flink to look into > supporting it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)