[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2022-10-17 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619217#comment-17619217
 ] 

Aitozi commented on FLINK-10929:


Nice to see a good discussion about Arrow support here. I have one question 
that can we make a first step to support the arrow format to make it as a 
standard format in Flink like Protobuf / JSON / CSV, Since it has become a 
widely used in different system, it will make it easy to use if we supported 
naturally. 

Besides, in the current code base of 1.16. There already some module work with 
the arrow eg: flink-python. Do you guys think it's worth to support it in a 
more wider way ?

cc [~dianfu]  [~Yellow]  [~martijnvisser] 

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-05-02 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831530#comment-16831530
 ] 

Stephan Ewen commented on FLINK-10929:
--

[~fan_li_ya]

A lot of this seems applicable for the early stages of pulling data in and 
filtering / projecting the data. So in combination with sources, I can see a 
benefit, but that is different from a deep integration into the query processor.

For a deep integration into the query processor, the answer is not clear. Like 
Kurt pointed out, specifically
 - Batch / Streaming unification
 - High fan-out shuffles (work better row-wise)
 - Row materialization strategies for join cascades are non trivial

Not only would this be a huge engineering effort until this has full feature 
coverage. I also don't see how with all these open questions we can possibly 
start working on such an integration into the runtime. It would mean breaking 
everything without knowing whether it would work out in the end.

I see only one way forward for Arrow in the query processor:
 - First, we finish the current query processor work, based on the Blink merge, 
and all the related work around batch failover.
 - Second, someone make a PoC by forking the Blink query processor and modify 
it to work with Arrow. That PoC would need to show that it will be possible to 
get the same feature coverage (there is no fundamental design issue / blocker) 
and that there are relevant speedups in many cases, and no bad regressions (in 
performance, stability, resource footprint) in too many other cases. That means 
having a solid design or PoC implementation for all complex cases like 
(inner-/outer-/semi-/anti-) joins, time-versioned joins (in memory and 
spilling), aggregations, high fan-out shuffles, etc.
 - We could then make this additional query processor available through the 
pluggable Query Planner mechanism, in the same way as the current Flink SQL 
engine and the Blink SQL engine exist side by side for now.

 

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-25 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826677#comment-16826677
 ] 

Liya Fan commented on FLINK-10929:
--

Hi [Stephan 
Ewen|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=StephanEwen] 
and [Kurt 
Young|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=ykt836], 
thanks a lot for your valuable comments. 

It seems there is not much debate for point 1). 

For point 2), I want to list a few advantages of using Arrow as internal data 
structures for Flink. These conclusions are based on our investigations of 
research papers, as well as our engineering experiences in secondary 
development on Blink.
 # Arrow adopts columnar data format, which makes it possible to apply SIMD 
([link|[http://prestodb.rocks/code/simd/])|http://prestodb.rocks/code/simd/%5d).].
 SIMD means higher performance. Java starts to support SIMD in Java 8, and it 
is expected that high versions of Java will provide better support for SIMD. 
 # Arrow provides more compact memory layout. To make it simple, encoding a 
batch of rows as Arrow vectors will require much less memory space, compared 
with encoding by BinaryRow. This, in turn, leads to two results:
 ## With the same amount of memory space, more data can be loaded into memory, 
and fewer spills are required.
 ## The cost of shuffling can be reduced significantly. For example, the amount 
of shuffled data for TPC-H Q4 reduces to almost the half. 
 # Columnar format makes it much easier for data compression (see 
[paper|http://db.lcs.mit.edu/projects/cstore/abadisigmod06.pdf], for example). 
Our initial experiments have shown some positive results.

There are more advantages for Arrow, but I just list a few, due to space 
constraints. The TPC-H performance in the attachment speaks for itself.

We understand and respect that Blink is currently being merged, and some 
changes should be postponed. We would like to provide our devoted help in this 
process.

However, too much change is not a good reason to refuse Arrow, if we want to 
make Flink the world-top-level computing engine. All these difficulties can be 
overcome. We have a plan to carry out the changes incrementally.

 

 

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-12 Thread Kurt Young (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816345#comment-16816345
 ] 

Kurt Young commented on FLINK-10929:


I'm not sure everyone who have already involved to this discussion have a clean 
and common goal about introducing Apache Arrow to Flink. As Stephan said, there 
are two scenarios which can be considered. 

Regarding (2): I think making Arrow as a vectorized execution data format will 
involves lots of changes, from runtime to operator and query optimizer. We 
should at first have consensus about the final goal and status of this. Whether 
streaming can benefits from vectorized execution? Will this break the 
unification of streaming and batch? How many benefits we can gain from it... 
There are lots of unanswered questions. 

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-12 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816192#comment-16816192
 ] 

Stephan Ewen commented on FLINK-10929:
--

There are two different considerations:

(1) Arrow for interoperability with other systems and languages (source / sink 
/ inter-process-communication)
(2) Arrow as a format for internal data processing in SQL / Table API.

For (1), I see it would make sense, but then we need to look more concretely at 
what we want to integrate. Arrow is not a magic integration, it is a data 
format.

For (2), the original Flink query processor is not getting much committer 
attention, because we plan to replace it with the Blink processor in the long 
run. The Blink processor is in the process or merging and that needs to be 
finished before we can start making more changes. [~ykt836] could probably 
provide some input into when would be a good time to follow up there.


> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-11 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815264#comment-16815264
 ] 

Liya Fan commented on FLINK-10929:
--

Hi [~fhueske], thank you for your kind reply. I agree with most points. The 
changes should be incremental so as not to break other components. So I think 
the first step is to provide a flag which is disabled by default, and let the 
MemoryManager depend on the Arrow Buffer Allocator. With this change, all the 
MemorySegment will be Arrow buffers, but this is transparent to other 
components, and will break them.

I guess I will initiate a discussion on the dev mailing list.

BTW, my mail address has been kicked out from the mailing list for a couple of 
days, because the dev-help claims that it receives some bouncing message. Would 
you please add another email address of mine (fan_li...@aliyun.com) to the 
mailing list?  Thank you in advance.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-11 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815207#comment-16815207
 ] 

Fabian Hueske commented on FLINK-10929:
---

Hi [~fan_li_ya], 

I think Arrow is a good building block for vectorized processing of batch 
queries, which is what it was designed for and what your implementation is 
doing.
With all the changes that are currently done on the Table API / SQL (API, 
planner, runtime), adding such a feature needs to be well coordinated.
I'm not involved in those efforts, so it would be best to start a discussion 
thread on the dev mailing list to gather some input there.

Best, Fabian


> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-11 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815159#comment-16815159
 ] 

vinoyang commented on FLINK-10929:
--

[~fan_li_ya] Glad to see the design document. I will have a look at it later.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-11 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815155#comment-16815155
 ] 

Liya Fan commented on FLINK-10929:
--

Hi [~yanghua], thank you so much for your feedback. Below is an initial draft 
of our design document:

https://docs.google.com/document/d/1LstaiGzlzTdGUmyG_-9GSWleKpLRid8SJWXQCTqcQB4/edit?usp=sharing
 

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-10 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815073#comment-16815073
 ] 

vinoyang commented on FLINK-10929:
--

Hi [~fan_li_ya] If there is any design documentation, it would be better for 
discussing. And it may be a good start.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2019-04-10 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815009#comment-16815009
 ] 

Liya Fan commented on FLINK-10929:
--

We have imported Arrow in our efforts to vectorize Blink batch jobs. It is an 
incremental change, which can be easily turned off with a single flag, and it 
does not affect other parts of the code base.

[~fhueske], do you think it is a good time to make some initial attempts to 
incorporate such changes now ? :)

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>  Components: Runtime / State Backends
>Reporter: Pedro Cardoso Silva
>Priority: Minor
> Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-27 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700091#comment-16700091
 ] 

Fabian Hueske commented on FLINK-10929:
---

Arrow is a data format to store data in-memory, process and transfer it.
Columnar formats like Arrow are known for good compression and to be efficient 
bulk operations like aggregations. 
>From that perspective, building on Arrow for a processing engine might make 
>sense.

However, we are not starting from a green field. 
Flink has it's own in-memory representation, memory management, and network 
transfer.
Replacing this by Arrow would be a huge effort and result in rewriting large 
parts of the internals.

So this is nothing that is quickly done. 
TBH,I think there are more important issues on the roadmap at the moment.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-26 Thread Pedro Cardoso Silva (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699173#comment-16699173
 ] 

Pedro Cardoso Silva commented on FLINK-10929:
-

{noformat}
 Pedro Cardoso Silva, what was the use case you had in mind when opening this 
Jira?
{noformat} 
Essentially how to load and analyse over large datasets (100s of millions of 
records). 
My current stack is Spark but my company is considering Flink and we have some 
memory issues when loading data for analysis since spark holds on top all 
records in memory forcing us to have large amounts of RAM just to have 
everything in memory with unoptimized querying operations. 
I found Arrow and it seemed to me a good match, considering what we are going 
to use, hence this Jira ticket.



> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-26 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699093#comment-16699093
 ] 

Fabian Hueske commented on FLINK-10929:
---

Arrow is a columnar in-memory data storage / exchange format. This means it was 
not designed with point updates / queries in mind which is the access pattern 
for a state backend in Flink. I'm not saying that there is no use case for 
Arrow in streaming, but IMO there is obvious one. 

For batch SQL, yes, it might make sense but it would be a huge change 
(basically completely rewriting the data processing engine). 
One could also think of adding interfaces to consume or export Arrow data. 
There are probably more scenarios in which Arrow could be used.

I think we need more detail to decide whether this proposal is something that 
is useful (and achievable) for Flink or not.
[~pcless], what was the use case you had in mind when opening this Jira?




> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-22 Thread vinoyang (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696351#comment-16696351
 ] 

vinoyang commented on FLINK-10929:
--

Hi [~fhueske]  I believe that Flink Streaming will also benefit from Apache 
Arrow. If there is a HeapKeyedStateBackend based on the Arrow data format 
implementation, then those real-time state information can be shared more 
efficiently with other systems, which I think makes sense. What do you think?

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-20 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693199#comment-16693199
 ] 

Fabian Hueske commented on FLINK-10929:
---

It depends on a couple of things. Who is willing to work on it, i.e., create a 
design doc, discuss the document, implement the feature, review the PR and 
merge it. So there are many people involved in the process. Esp. for features 
or improvements that require larger changes, it might be hard to get enough 
people behind the effort.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-20 Thread Pedro Cardoso Silva (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693193#comment-16693193
 ] 

Pedro Cardoso Silva commented on FLINK-10929:
-

Thank you for the reply [~fhueske]

Is there usually a time-table for this kind of feature? Even if only to know 
whether supporting it makes sense.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10929) Add support for Apache Arrow

2018-11-20 Thread Fabian Hueske (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16692953#comment-16692953
 ] 

Fabian Hueske commented on FLINK-10929:
---

Thanks for opening this Jira [~pcless]. Apache Arrow looks indeed quite 
interesting.

I can see that it would make sense to adopt it for batch work loads. For 
streaming use cases, I am not so sure and we would have to investigate the 
benefits first.

In any case, an integration would be a major effort and require a detailed 
design document and discussion.

> Add support for Apache Arrow
> 
>
> Key: FLINK-10929
> URL: https://issues.apache.org/jira/browse/FLINK-10929
> Project: Flink
>  Issue Type: Wish
>Reporter: Pedro Cardoso Silva
>Assignee: vinoyang
>Priority: Minor
>
> Investigate the possibility of adding support for Apache Arrow as a 
> standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently 
> getting and its claims objective of providing a zero-copy, standardized data 
> format across platforms, I think it makes sense for Flink to look into 
> supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)