[ https://issues.apache.org/jira/browse/ARROW-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662223#comment-17662223 ]
Rok Mihevc commented on ARROW-5200: ----------------------------------- This issue has been migrated to [issue #21675|https://github.com/apache/arrow/issues/21675] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Java] Provide light-weight arrow APIs > -------------------------------------- > > Key: ARROW-5200 > URL: https://issues.apache.org/jira/browse/ARROW-5200 > Project: Apache Arrow > Issue Type: Improvement > Components: Java > Reporter: Liya Fan > Assignee: Liya Fan > Priority: Major > Labels: pull-request-available > Attachments: image-2019-04-23-15-19-34-187.png, safe_nocheck.jpg, > unsafe.jpg > > Time Spent: 0.5h > Remaining Estimate: 0h > > We are trying to incorporate Apache Arrow to Apache Flink runtime. We find > Arrow an amazing library, which greatly simplifies the support of columnar > data format. > However, for many scenarios, we find the performance unacceptable. Our > investigation shows the reason is that, there are too many redundant checks > and computations in Arrow API. > For example, the following figures shows that in a single call to > Float8Vector.get(int) method (this is one of the most frequently used APIs in > Flink computation), there are 20+ method invocations. > !image-2019-04-23-15-19-34-187.png! > > There are many other APIs with similar problems. We believe that these checks > will make sure of the integrity of the program. However, it also impacts > performance severely. For our evaluation, the performance may degrade by two > or three orders of magnitude slower, compared to access data on heap memory. > We think at least for some scenarios, we can give the responsibility of > integrity check to application owners. If they can be sure all the checks > have been passed, we can provide some light-weight APIs and the inherent high > performance, to them. > In the light-weight APIs, we only provide minimum checks, or avoid checks at > all. The application owner can still develop and debug their code using the > original heavy-weight APIs. Once all bugs have been fixed, they can switch to > light-weight APIs in their products and enjoy the consequent high performance. > -- This message was sent by Atlassian Jira (v8.20.10#820010)