[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-13 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-19479:
---
Fix Version/s: 3.0.0

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0, 3.1.0
>
> Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}
> We found this happens when ORC writes a strange stream combination - data 
> stream for a RG has no values (the rows all have nulls), but there are values 
> (0-s) in length stream for the same rows. That is technically a valid ORC 
> file, although writing the 0s is completely useless. 
> This may be fixed separately in ORC, but since these files now exist in the 
> wild we should handle them correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

Committed to master for now

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}
> We found this happens when ORC writes a strange stream combination - data 
> stream for a RG has no values (the rows all have nulls), but there are values 
> (0-s) in length stream for the same rows. That is technically a valid ORC 
> file, although writing the 0s is completely useless. 
> This may be fixed separately in ORC, but since these files now exist in the 
> wild we should handle them correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

Description: 
The PositionProvider offset is not updated correctly and an error like this may 
happen:
{noformat}
Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside 
of the data
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
{noformat}

We found this happens when ORC writes a strange stream combination - data 
stream for a RG has no values (the rows all have nulls), but there are values 
(0-s) in length stream for the same rows. That is technically a valid ORC file, 
although writing the 0s is completely useless. 
This may be fixed separately in ORC, but since these files now exist in the 
wild we should handle them correctly.

  was:
The PositionProvider offset is not updated correctly and an error like this may 
happen:
{noformat}
Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside 
of the data
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
{noformat}

We found this happens when ORC writes a strange stream combination - data 
stream for a RG has no values (the rows all have nulls), but there are values 
(0-s) in length stream for the same rows. That is technically a valid ORC file, 
although writing the 0s is completely useless. 


> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}
> We found this happens when ORC writes a strange stream combination - data 
> stream for a RG has no values (the rows all have nulls), but there are values 
> (0-s) in length stream for the same rows. That is technically a valid ORC 
> file, although writing the 0s is completely 

[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

Description: 
The PositionProvider offset is not updated correctly and an error like this may 
happen:
{noformat}
Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside 
of the data
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
{noformat}

We found this happens when ORC writes a strange stream combination - data 
stream for a RG has no values (the rows all have nulls), but there are values 
(0-s) in length stream for the same rows. That is technically a valid ORC file, 
although writing the 0s is completely useless. 

  was:
The PositionProvider offset is not updated correctly and an error like this may 
happen:
{noformat}
Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside 
of the data
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
{noformat}


> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}
> We found this happens when ORC writes a strange stream combination - data 
> stream for a RG has no values (the rows all have nulls), but there are values 
> (0-s) in length stream for the same rows. That is technically a valid ORC 
> file, although writing the 0s is completely useless. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

Attachment: HIVE-19479.01.patch

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19479.01.patch, HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

Description: 
The PositionProvider offset is not updated correctly and an error like this may 
happen:
{noformat}
Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside 
of the data
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
at 
org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
at 
org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
at 
org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
at 
org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
{noformat}

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19479.patch
>
>
> The PositionProvider offset is not updated correctly and an error like this 
> may happen:
> {noformat}
> Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is 
> outside of the data
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
>   at 
> org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
>   at 
> org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
>   at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
>   at 
> org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

Attachment: HIVE-19479.patch

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19479.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-19479) encoded stream seek is incorrect for 0-length RGs in LLAP IO

2018-05-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-19479:

Status: Patch Available  (was: Open)

> encoded stream seek is incorrect for 0-length RGs in LLAP IO
> 
>
> Key: HIVE-19479
> URL: https://issues.apache.org/jira/browse/HIVE-19479
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-19479.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)