[ https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662247#comment-17662247 ]
Rok Mihevc commented on ARROW-5224: ----------------------------------- This issue has been migrated to [issue #16718|https://github.com/apache/arrow/issues/16718] on GitHub. Please see the [migration documentation|https://github.com/apache/arrow/issues/14542] for further details. > [Java] Add APIs for supporting directly serialize/deserialize ValueVector > ------------------------------------------------------------------------- > > Key: ARROW-5224 > URL: https://issues.apache.org/jira/browse/ARROW-5224 > Project: Apache Arrow > Issue Type: Improvement > Components: Java > Reporter: Ji Liu > Priority: Minor > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > There is no API to directly serialize/deserialize ValueVector. The only way > to implement this is to put a single FieldVector in VectorSchemaRoot and > convert it to ArrowRecordBatch, and the deserialize process is as well. > Provide a utility class to implement this may be better, I know all > serializations should follow IPC format so that data can be shared between > different Arrow implementations. But for users who only use Java API and want > to do some further optimization, this seem to be no problem and we could > provide them a more option. > This may take some benefits for Java user who only use ValueVector rather > than IPC series classes such as ArrowReordBatch: > * We could do some shuffle optimization such as compression and some > encoding algorithm for numerical type which could greatly improve performance. > * Do serialize/deserialize with the actual buffer size within vector since > the buffer size is power of 2 which is actually bigger than it really need. > * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it > user-friendly. > -- This message was sent by Atlassian Jira (v8.20.10#820010)