[ 
https://issues.apache.org/jira/browse/ARROW-17107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568051#comment-17568051
 ] 

David Li commented on ARROW-17107:
----------------------------------

All vectors that use offsets must have at least one offset (or more 
specifically: the number of offsets is always the number of values + 1, see 
[the 
spec|https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout]),
 so it should account for large/regular binary, utf8, and list vectors. It 
looks like originally it only accounted for regular binary/utf8 and as you 
found, we need to cover lists, and then we should cover large binary/utf8/list 
as well. (That said I wonder why the check is even needed, given that the 
vectors should already follow the spec; possibly empty vectors may not have 
allocated any memory as a micro-optimization?)

> [Java] JSONFileWriter throws IOOBE writing an empty list
> --------------------------------------------------------
>
>                 Key: ARROW-17107
>                 URL: https://issues.apache.org/jira/browse/ARROW-17107
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 8.0.0
>            Reporter: James Henderson
>            Priority: Minor
>
> Hey folks,
> I'm trying to write an empty ListVector out through the `JsonFileWriter`, and 
> am getting an IOOBE. Stack trace is as follows:
>  
> ```
> java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 
> 0))
>  at org.apache.arrow.memory.ArrowBuf.checkIndexD (ArrowBuf.java:318)
>     org.apache.arrow.memory.ArrowBuf.chk (ArrowBuf.java:305)
>     org.apache.arrow.memory.ArrowBuf.getInt (ArrowBuf.java:424)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeValueToGenerator 
> (JsonFileWriter.java:270)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:237)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeFromVectorIntoJson 
> (JsonFileWriter.java:253)
>     org.apache.arrow.vector.ipc.JsonFileWriter.writeBatch 
> (JsonFileWriter.java:200)
>     org.apache.arrow.vector.ipc.JsonFileWriter.write (JsonFileWriter.java:190)
> ```
> It's trying to write the offset buffer of the list, which is empty. L224 of 
> JFW.java sets `bufferValueCount` to 1 (because we're not a DUV), so we enter 
> the `for` loop. We don't hit the `valueCount=0` condition in L230 (because 
> we're not a varbinary or a varchar vector). So we fall into the `else`, which 
> tries to write the 0th element in the offset vector, and IOOBE.
> Could we include 'list' in either the L224 or the L230 checks?
> Admittedly, I'm not aware of the history of this section, but it seems that, 
> by the time we hit L230 (i.e. excluding DUV), any empty vector should yield a 
> single 0?
> Let me know if there's any more info I can provide!
> Cheers,
> James



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to