[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906887#comment-13906887 ] Puneet Gupta commented on HIVE-5994: Hi Prasanth I also tested with the path mentioned in https://reviews.apache.org/r/16148/diff/ by merging the code in 0.12 .0 . It solves the issue :-). Thanks for the help . ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) Key: HIVE-5994 URL: https://issues.apache.org/jira/browse/HIVE-5994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5994.1.patch For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long value) function. This resulted in wrong computation of total number of bits required which results in wrong encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905871#comment-13905871 ] Prasanth J commented on HIVE-5994: -- Puneeth, This issue can happen with large positive values as well. The reason being when the number of repetitions of large number is 3 and =10 SHORT_REPEAT encoding is used. https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java#L35 This encoding zigzag encodes the repeating value. So in your case when 470327563395383L is zigzag encoded, the MSB bit (64th) is set which will be considered as a negative value according to this bug. I tested your test case with trunk and it works fine. Applying the patch attached in this JIRA should also work. ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) Key: HIVE-5994 URL: https://issues.apache.org/jira/browse/HIVE-5994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5994.1.patch For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long value) function. This resulted in wrong computation of total number of bits required which results in wrong encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904141#comment-13904141 ] Puneet Gupta commented on HIVE-5994: Will it affect positive values also? I am trying to write long 470327563395383L and I see some issues while reading back. When I write 10 rows of same long value and read back , I get value as 112 instead. When I write 1 or 100 rows of same long value and read back , I get correct value back ! Not sure why ? ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) Key: HIVE-5994 URL: https://issues.apache.org/jira/browse/HIVE-5994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5994.1.patch For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long value) function. This resulted in wrong computation of total number of bits required which results in wrong encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904535#comment-13904535 ] Prasanth J commented on HIVE-5994: -- Hi Puneeth I don't seem to reproduce your issue. Can you post the exact 10 rows that you are writing? ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) Key: HIVE-5994 URL: https://issues.apache.org/jira/browse/HIVE-5994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5994.1.patch For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long value) function. This resulted in wrong computation of total number of bits required which results in wrong encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )
[ https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905058#comment-13905058 ] Puneet Gupta commented on HIVE-5994: Hi Prasanth This is the code I Used to reproduce the issue . 1. I am using Hive binary from hive-0.12.0.tar.gz 2. I am using a old hadoop version hadoop-core-1.0.0.jar --- http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core 3. In the below code if ROWS_TO_TEST is set to 1 or 10 , the problem does not occur. --- package hive; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.ql.io.orc.CompressionKind; import org.apache.hadoop.hive.ql.io.orc.OrcFile; import org.apache.hadoop.hive.ql.io.orc.Reader; import org.apache.hadoop.hive.ql.io.orc.RecordReader; import org.apache.hadoop.hive.ql.io.orc.Writer; import org.apache.hadoop.hive.ql.io.orc.OrcFile.WriterOptions; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; public class TestLong { /** * @param args * @throws IOException */ public static void main(String[] args) throws IOException { int ROWS_TO_TEST =10; Path path = new Path(E:/Test/file.orc); Configuration conf = new Configuration(); FileSystem fs = FileSystem.getLocal(conf); if(fs.exists(path)) fs.delete(path,true); ObjectInspector inspector = ObjectInspectorFactory .getReflectionObjectInspector(MyData.class, ObjectInspectorFactory.ObjectInspectorOptions.JAVA); WriterOptions options = OrcFile.writerOptions(conf) .inspector(inspector).compress(CompressionKind.SNAPPY); Writer writer = OrcFile.createWriter(path, options); for (int i = 0; i ROWS_TO_TEST; i++) { writer.addRow(new MyData()); } writer.close(); Reader reader = OrcFile.createReader(fs, path); RecordReader rows = reader.rows(null); Object row = null; while (rows.hasNext()) { row = rows.next(row); System.out.println(row); } } private static class MyData { long data = 470327563395383L ; } } --- OUTPUT {112} {112} {112} {112} {112} {112} {112} {112} {112} {112} ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits ) Key: HIVE-5994 URL: https://issues.apache.org/jira/browse/HIVE-5994 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Fix For: 0.13.0 Attachments: HIVE-5994.1.patch For large negative BIGINTs, zigzag encoding will yield large value (64bit value) with MSB set to 1. This value is interpreted as negative value in SerializationUtils.findClosestNumBits(long value) function. This resulted in wrong computation of total number of bits required which results in wrong encoding/decoding of values. -- This message was sent by Atlassian JIRA (v6.1.5#6160)