[ 
https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23034:
---------------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

[^HIVE-23034.01.patch] committed to master. Thanks [~ShubhamChaurasia] for 
fixing it and [~thejas] for review.

> Arrow serializer should not keep the reference of arrow offset and validity 
> buffers
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-23034
>                 URL: https://issues.apache.org/jira/browse/HIVE-23034
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, Serializers/Deserializers
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-23034.01.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, a part of writeList() method in arrow serializer is implemented 
> like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
>     int nextOffset = 0;
>     for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>       int selectedIndex = rowIndex;
>       if (vectorizedRowBatch.selectedInUse) {
>         selectedIndex = vectorizedRowBatch.selected[rowIndex];
>       }
>       if (hiveVector.isNull[selectedIndex]) {
>         offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>       } else {
>         offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>         nextOffset += (int) hiveVector.lengths[selectedIndex];
>         arrowVector.setNotNull(rowIndex);
>       }
>     }
>     offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and 
> offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
> the offset and validity buffers when a threshold is crossed, updates the 
> references internally and also releases the old buffers (which decrements the 
> buffer reference count). Now the reference which we obtained in 1) becomes 
> obsolete. Furthermore if try to read or write old buffer, we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
>       at 
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
>       at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
>       at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
>       at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( 
> {{arrowVector.getOffsetBuffer()}} ) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
> thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to