-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50816/
-----------------------------------------------------------

Review request for hive, Ashutosh Chauhan and Gopal V.


Repository: hive-git


Description
-------

HIVE-7239 Fix bug in HiveIndexedInputFormat implementation that causes 
incorrect query result when input backed by Sequence/RC files

In case of sequence files, it's crucial that splits are calculated around the 
boundaries enforced by the input sequence file. However by default hadoop 
creates input splits depending on the configuration parameters which may not 
match the boundaries for the input sequence file. Hive provides 
HiveIndexedInputFormat that provides extra logic and recalculates the split 
boundaries for each split depending on the sequence file's boundaries.

However we noticed this behavior of "over" reporting from data backed by 
sequence file. We've a sample data on which we experimented and fixed this bug, 
we have verified this fix by comparing the query output for input being 
sequence file format, rc file and regular format. 

https://issues.apache.org/jira/browse/HIVE-7239


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexResult.java 33cc5c3 
  ql/src/java/org/apache/hadoop/hive/ql/index/HiveIndexedInputFormat.java 
5247ece 
  ql/src/java/org/apache/hadoop/hive/ql/index/IndexResult.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/index/SplitFilter.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/MockHiveInputSplits.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/MockIndexResult.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/MockInputFile.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/SplitFilterTestCase.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/TestHiveInputSplitComparator.java 
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/index/TestSplitFilter.java PRE-CREATION 

Diff: https://reviews.apache.org/r/50816/diff/


Testing
-------

Manually tested on a cluster.

HiveQA:

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/674/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/674/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-674/


Thanks,

Illya Yalovyy

Reply via email to