[ 
https://issues.apache.org/jira/browse/HIVE-23034?focusedWorklogId=405235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-405235
 ]

ASF GitHub Bot logged work on HIVE-23034:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Mar/20 05:45
            Start Date: 18/Mar/20 05:45
    Worklog Time Spent: 10m 
      Work Description: ShubhamChaurasia commented on pull request #957: 
HIVE-23034: Arrow serializer should not keep the reference of arrow offset and 
validity buffers
URL: https://github.com/apache/hive/pull/957
 
 
   …ffset and validity buffers
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 405235)
    Remaining Estimate: 0h
            Time Spent: 10m

> Arrow serializer should not keep the reference of arrow offset and validity 
> buffers
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-23034
>                 URL: https://issues.apache.org/jira/browse/HIVE-23034
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, Serializers/Deserializers
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, a part of writeList() method in arrow serializer is implemented 
> like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
>     int nextOffset = 0;
>     for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>       int selectedIndex = rowIndex;
>       if (vectorizedRowBatch.selectedInUse) {
>         selectedIndex = vectorizedRowBatch.selected[rowIndex];
>       }
>       if (hiveVector.isNull[selectedIndex]) {
>         offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>       } else {
>         offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>         nextOffset += (int) hiveVector.lengths[selectedIndex];
>         arrowVector.setNotNull(rowIndex);
>       }
>     }
>     offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and 
> offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
> the offset and validity buffers when a threshold is crossed, updates the 
> references internally and also releases the old buffers (which decrements the 
> buffer reference count). Now the reference which we obtained in 1) becomes 
> obsolete. Furthermore if try to read or write old buffer, we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
>       at 
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
>       at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
>       at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
>       at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>       at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( 
> {{arrowVector.getOffsetBuffer()}} ) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
> thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to