[ 
https://issues.apache.org/jira/browse/ARROW-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rong Ma updated ARROW-8803:
---------------------------
    Summary: [Java] Row count should be set before loading buffers in 
VectorLoader  (was: [Java] Row count should be set before loading buffers In 
VectorLoader)

> [Java] Row count should be set before loading buffers in VectorLoader
> ---------------------------------------------------------------------
>
>                 Key: ARROW-8803
>                 URL: https://issues.apache.org/jira/browse/ARROW-8803
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Rong Ma
>            Priority: Major
>             Fix For: 1.0.0
>
>
> Hi guys! I'm new to the community, and I've been using Arrow for some time. 
> In my use case, I need to read RecordBatch with *compressed* underlying 
> buffers using Java's IPC API, and I'm finally blocked by the VectorLoader's 
> "load" method. In this method,
> {quote}{{root.setRowCount(recordBatch.getLength());}}
> {quote}
> It not only set the rowCount for the root, but also set the valueCount for 
> the vectors the root holds, *which have already been set once when load 
> buffers.*
> It's not a bug... I know. But if I try to load some compressed buffers, I 
> will get the following exceptions:
> {quote}java.lang.IndexOutOfBoundsException: index: 0, length: 512 (expected: 
> range(0, 504))
>  at io.netty.buffer.ArrowBuf.checkIndex(ArrowBuf.java:718)
>  at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:965)
>  at 
> org.apache.arrow.vector.BaseFixedWidthVector.reAlloc(BaseFixedWidthVector.java:439)
>  at 
> org.apache.arrow.vector.BaseFixedWidthVector.setValueCount(BaseFixedWidthVector.java:708)
>  at 
> org.apache.arrow.vector.VectorSchemaRoot.setRowCount(VectorSchemaRoot.java:226)
>  at org.apache.arrow.vector.VectorLoader.load(VectorLoader.java:61)
>  at 
> org.apache.arrow.vector.ipc.ArrowReader.loadRecordBatch(ArrowReader.java:205)
>  at 
> org.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:122)
> {quote}
> And I start to think that if it would be more make sense to call 
> root.setRowCount before loadbuffers?
> In root.setRowCount it also calls each vector's setValueCount, which I think 
> is unnecessary here since the vectors after calling loadbuffers are already 
> formed.
> Another existing piece of code upstream is similar to this change. 
> [link|https://github.com/apache/arrow/blob/ed1f771dccdde623ce85e212eccb2b573185c461/java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java#L170-L178]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to