[ https://issues.apache.org/jira/browse/ARROW-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838295#comment-16838295 ]
Micah Kornfield commented on ARROW-5224: ---------------------------------------- For #1, can you provide more details on the encoding you have in mind? For #2 I believe that only used capacity [1] is written not what is allocated (which is the power of 2?) If this isn't the case could you provide a unit test demonstrating the wasted space? [1]https://github.com/apache/arrow/blob/87feee3d941ee41fb39b25411e108bef40a55995/java/vector/src/main/java/org/apache/arrow/vector/ipc/WriteChannel.java#L93 > [Java] Add APIs for supporting directly serialize/deserialize ValueVector > ------------------------------------------------------------------------- > > Key: ARROW-5224 > URL: https://issues.apache.org/jira/browse/ARROW-5224 > Project: Apache Arrow > Issue Type: Improvement > Reporter: Ji Liu > Assignee: Ji Liu > Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > There is no API to directly serialize/deserialize ValueVector. The only way > to implement this is to put a single FieldVector in VectorSchemaRoot and > convert it to ArrowRecordBatch, and the deserialize process is as well. > Provide a utility class to implement this may be better, I know all > serializations should follow IPC format so that data can be shared between > different Arrow implementations. But for users who only use Java API and want > to do some further optimization, this seem to be no problem and we could > provide them a more option. > This may take some benefits for Java user who only use ValueVector rather > than IPC series classes such as ArrowReordBatch: > * We could do some shuffle optimization such as compression and some > encoding algorithm for numerical type which could greatly improve performance. > * Do serialize/deserialize with the actual buffer size within vector since > the buffer size is power of 2 which is actually bigger than it really need. > * Reduce data conversion(VectorSchemaRoot, ArrowRecordBatch etc) to make it > user-friendly. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)