[ 
https://issues.apache.org/jira/browse/ARROW-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108163#comment-17108163
 ] 

Liya Fan commented on ARROW-8803:
---------------------------------

As you have indicated, {{root.setRowCount}} calls {{setValueCount}} methods for 
the underlying vectors, and the {{setValueCount}} methods may involve 
allocation for the underlying vectors. 

If we place the {{root.setRowCount}} call to the front, it will lead to 
unnecessary vector allocations, as the underlying buffers will be populated 
shortly.

In fact, we are working on the support of data compression in IPC scenarios 
(ARROW-8672). Hope it will solve your problem. 

> [Java] Row count should be set before loading buffers in VectorLoader
> ---------------------------------------------------------------------
>
>                 Key: ARROW-8803
>                 URL: https://issues.apache.org/jira/browse/ARROW-8803
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Rong Ma
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hi guys! I'm new to the community, and I've been using Arrow for some time. 
> In my use case, I need to read RecordBatch with *compressed* underlying 
> buffers using Java's IPC API, and I'm finally blocked by the VectorLoader's 
> "load" method. In this method,
> {quote}{{root.setRowCount(recordBatch.getLength());}}
> {quote}
> It not only set the rowCount for the root, but also set the valueCount for 
> the vectors the root holds, *which have already been set once when load 
> buffers.*
> It's not a bug... I know. But if I try to load some compressed buffers, I 
> will get the following exceptions:
> {quote}java.lang.IndexOutOfBoundsException: index: 0, length: 512 (expected: 
> range(0, 504))
>  at io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:718)
>  at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:965)
>  at 
> org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:439)
>  at 
> org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:708)
>  at 
> org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:226)
>  at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:61)
>  at 
> org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:205)
>  at 
> org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:122)
> {quote}
> And I start to think that if it would be more make sense to call 
> root.setRowCount before loadbuffers?
> In root.setRowCount it also calls each vector's setValueCount, which I think 
> is unnecessary here since the vectors after calling loadbuffers are already 
> formed.
> Another existing piece of code upstream is similar to this change. 
> [link|https://github.com/apache/arrow/blob/ed1f771dccdde623ce85e212eccb2b573185c461/java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java#L170-L178]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to