[ 
https://issues.apache.org/jira/browse/HIVE-22856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-22856:
---------------------------------------
    Description: 
LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.

The batch size of 0 is possible in the case when a split read by ORC reader has 
all deleted or aborted data. In that case VectorizedOrcAcidRowBatchReader sends 
a batch size of 0. With 0 batch size, VectorFileSinkArrowOperator creates a 
batch of just metadata and set the value count to 0. This kind of batch should 
be ignore by the client and should wait for next batch.

  was:LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
loadNextBatch returns column vector with 0 length. But we should keep reading 
data until loadNextBatch returns false. Some batch may return column vector of 
length 0, but we should ignore and wait for the next batch.


> Hive LLAP LlapArrowBatchRecordReader skipping remaining batches when 
> ArrowStreamReader returns a 0 length batch.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22856
>                 URL: https://issues.apache.org/jira/browse/HIVE-22856
>             Project: Hive
>          Issue Type: Bug
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>         Attachments: HIVE-22856.01.patch, HIVE-22856.02.patch
>
>
> LlapArrowBatchRecordReader returns false when the ArrowStreamReader 
> loadNextBatch returns column vector with 0 length. But we should keep reading 
> data until loadNextBatch returns false. Some batch may return column vector 
> of length 0, but we should ignore and wait for the next batch.
> The batch size of 0 is possible in the case when a split read by ORC reader 
> has all deleted or aborted data. In that case VectorizedOrcAcidRowBatchReader 
> sends a batch size of 0. With 0 batch size, VectorFileSinkArrowOperator 
> creates a batch of just metadata and set the value count to 0. This kind of 
> batch should be ignore by the client and should wait for next batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to