[ 
https://issues.apache.org/jira/browse/ARROW-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662223#comment-17662223
 ] 

Rok Mihevc commented on ARROW-5200:
-----------------------------------

This issue has been migrated to [issue 
#21675|https://github.com/apache/arrow/issues/21675] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Java] Provide light-weight arrow APIs
> --------------------------------------
>
>                 Key: ARROW-5200
>                 URL: https://issues.apache.org/jira/browse/ARROW-5200
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2019-04-23-15-19-34-187.png, safe_nocheck.jpg, 
> unsafe.jpg
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We are trying to incorporate Apache Arrow to Apache Flink runtime. We find 
> Arrow an amazing library, which greatly simplifies the support of columnar 
> data format.
> However, for many scenarios, we find the performance unacceptable. Our 
> investigation shows the reason is that, there are too many redundant checks 
> and computations in Arrow API.
> For example, the following figures shows that in a single call to 
> Float8Vector.get(int) method (this is one of the most frequently used APIs in 
> Flink computation),  there are 20+ method invocations.
> !image-2019-04-23-15-19-34-187.png!
>  
> There are many other APIs with similar problems. We believe that these checks 
> will make sure of the integrity of the program. However, it also impacts 
> performance severely. For our evaluation, the performance may degrade by two 
> or three orders of magnitude slower, compared to access data on heap memory. 
> We think at least for some scenarios, we can give the responsibility of 
> integrity check to application owners. If they can be sure all the checks 
> have been passed, we can provide some light-weight APIs and the inherent high 
> performance, to them.
> In the light-weight APIs, we only provide minimum checks, or avoid checks at 
> all. The application owner can still develop and debug their code using the 
> original heavy-weight APIs. Once all bugs have been fixed, they can switch to 
> light-weight APIs in their products and enjoy the consequent high performance.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to