[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2014-02-20 Thread Puneet Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906887#comment-13906887
 ] 

Puneet Gupta commented on HIVE-5994:


Hi Prasanth 
I also tested with the path mentioned in 
https://reviews.apache.org/r/16148/diff/ by merging the code in 0.12 .0 . It 
solves the issue :-).

Thanks for the help .



 ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
 

 Key: HIVE-5994
 URL: https://issues.apache.org/jira/browse/HIVE-5994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5994.1.patch


 For large negative BIGINTs, zigzag encoding will yield large value (64bit 
 value) with MSB set to 1. This value is interpreted as negative value in 
 SerializationUtils.findClosestNumBits(long value) function. This resulted in 
 wrong computation of total number of bits required which results in wrong 
 encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2014-02-19 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905871#comment-13905871
 ] 

Prasanth J commented on HIVE-5994:
--

Puneeth,

This issue can happen with large positive values as well. The reason being when 
the number of repetitions of large number is 3 and =10 SHORT_REPEAT encoding 
is used. 
https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java#L35

This encoding zigzag encodes the repeating value. So in your case when 
470327563395383L is zigzag encoded, the MSB bit (64th) is set which will be 
considered as a negative value according to this bug. 

I tested your test case with trunk and it works fine. Applying the patch 
attached in this JIRA should also work.

 ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
 

 Key: HIVE-5994
 URL: https://issues.apache.org/jira/browse/HIVE-5994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5994.1.patch


 For large negative BIGINTs, zigzag encoding will yield large value (64bit 
 value) with MSB set to 1. This value is interpreted as negative value in 
 SerializationUtils.findClosestNumBits(long value) function. This resulted in 
 wrong computation of total number of bits required which results in wrong 
 encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2014-02-18 Thread Puneet Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904141#comment-13904141
 ] 

Puneet Gupta commented on HIVE-5994:


Will it affect positive values also?  I am trying to write long  
470327563395383L and I see some issues while reading back.
When I write 10 rows of same long value and read back , I get value as 112 
instead.
When I write 1 or 100 rows of same long value and read back , I get correct 
value back !  Not sure why ? 



 ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
 

 Key: HIVE-5994
 URL: https://issues.apache.org/jira/browse/HIVE-5994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5994.1.patch


 For large negative BIGINTs, zigzag encoding will yield large value (64bit 
 value) with MSB set to 1. This value is interpreted as negative value in 
 SerializationUtils.findClosestNumBits(long value) function. This resulted in 
 wrong computation of total number of bits required which results in wrong 
 encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2014-02-18 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904535#comment-13904535
 ] 

Prasanth J commented on HIVE-5994:
--

Hi Puneeth

I don't seem to reproduce your issue. Can you post the exact 10 rows that you 
are writing?

 ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
 

 Key: HIVE-5994
 URL: https://issues.apache.org/jira/browse/HIVE-5994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5994.1.patch


 For large negative BIGINTs, zigzag encoding will yield large value (64bit 
 value) with MSB set to 1. This value is interpreted as negative value in 
 SerializationUtils.findClosestNumBits(long value) function. This resulted in 
 wrong computation of total number of bits required which results in wrong 
 encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5994) ORC RLEv2 encodes wrongly for large negative BIGINTs (64 bits )

2014-02-18 Thread Puneet Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13905058#comment-13905058
 ] 

Puneet Gupta commented on HIVE-5994:


Hi Prasanth

This is the code I Used to reproduce the issue . 
1. I am using Hive binary from hive-0.12.0.tar.gz 
2. I am using a old hadoop version hadoop-core-1.0.0.jar   --- 
http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-core
3. In the below code if  ROWS_TO_TEST is set to 1 or 10 , the problem does not 
occur.

---
package hive;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.io.orc.CompressionKind;
import org.apache.hadoop.hive.ql.io.orc.OrcFile;
import org.apache.hadoop.hive.ql.io.orc.Reader;
import org.apache.hadoop.hive.ql.io.orc.RecordReader;
import org.apache.hadoop.hive.ql.io.orc.Writer;
import org.apache.hadoop.hive.ql.io.orc.OrcFile.WriterOptions;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;

public class TestLong {

/**
 * @param args
 * @throws IOException 
 */
public static void main(String[] args) throws IOException
{
int ROWS_TO_TEST =10;
Path path = new Path(E:/Test/file.orc);
Configuration conf = new Configuration();
FileSystem fs = FileSystem.getLocal(conf);
if(fs.exists(path))
fs.delete(path,true);

ObjectInspector inspector = ObjectInspectorFactory
.getReflectionObjectInspector(MyData.class,

ObjectInspectorFactory.ObjectInspectorOptions.JAVA);

WriterOptions options = OrcFile.writerOptions(conf)

.inspector(inspector).compress(CompressionKind.SNAPPY);

Writer writer = OrcFile.createWriter(path, options);

for (int i = 0; i  ROWS_TO_TEST; i++) {
writer.addRow(new MyData());
}
writer.close();

Reader reader = OrcFile.createReader(fs, path);
RecordReader rows = reader.rows(null);
Object row = null;
while (rows.hasNext()) {
row = rows.next(row);
System.out.println(row);
}
}


private static class MyData
{
long data = 470327563395383L ;
}
}
---
OUTPUT
{112}
{112}
{112}
{112}
{112}
{112}
{112}
{112}
{112}
{112}


 ORC RLEv2 encodes wrongly for large negative BIGINTs  (64 bits )
 

 Key: HIVE-5994
 URL: https://issues.apache.org/jira/browse/HIVE-5994
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Fix For: 0.13.0

 Attachments: HIVE-5994.1.patch


 For large negative BIGINTs, zigzag encoding will yield large value (64bit 
 value) with MSB set to 1. This value is interpreted as negative value in 
 SerializationUtils.findClosestNumBits(long value) function. This resulted in 
 wrong computation of total number of bits required which results in wrong 
 encoding/decoding of values.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)