[ https://issues.apache.org/jira/browse/HIVE-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-4221: ------------------------------ Attachment: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch sxyuan requested code review of "HIVE-4221 [jira] Stripe-level merge for ORC files ". Reviewers: kevinwilfong, omalley As with RC files, we would like to be able to merge ORC files efficiently by reading/writing stripes without deserializing each row. Most of the logic is unchanged from merging for RC files, so the original code has been refactored for reuse. TEST PLAN Copied and modified RC file merge tests to use ORC file format. Added a test case to TestOrcFile to make sure file level column stats are merged properly. REVISION DETAIL https://reviews.facebook.net/D9759 AFFECTED FILES data/files/smbbucket_1.orc data/files/smbbucket_3.orc data/files/smbbucket_2.orc common/src/java/org/apache/hadoop/hive/conf/HiveConf.java ql/src/test/results/clientpositive/orc_createas1.q.out ql/src/test/results/clientpositive/orcfile_merge3.q.out ql/src/test/results/clientpositive/orcfile_merge2.q.out ql/src/test/results/clientpositive/alter_merge_orc2.q.out ql/src/test/results/clientpositive/alter_merge_orc.q.out ql/src/test/results/clientpositive/orcfile_merge1.q.out ql/src/test/results/clientpositive/orcfile_merge4.q.out ql/src/test/results/clientpositive/alter_merge_orc_stats.q.out ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java ql/src/test/queries/clientpositive/orcfile_merge2.q ql/src/test/queries/clientpositive/orcfile_merge3.q ql/src/test/queries/clientpositive/alter_merge_orc.q ql/src/test/queries/clientpositive/orcfile_merge4.q ql/src/test/queries/clientpositive/alter_merge_orc_stats.q ql/src/test/queries/clientpositive/orcfile_merge1.q ql/src/test/queries/clientpositive/alter_merge_orc2.q ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java ql/src/java/org/apache/hadoop/hive/ql/parse/AlterTablePartMergeFilesDesc.java ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/MergeWork.java ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileMergeMapper.java ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/BlockMergeTask.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcBlockMergeRecordReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcBlockMergeInputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcMergeMapper.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/StripeReader.java ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java ql/src/java/org/apache/hadoop/hive/ql/io/merge ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeWork.java ql/src/java/org/apache/hadoop/hive/ql/io/merge/MergeMapper.java ql/src/java/org/apache/hadoop/hive/ql/io/merge/BlockMergeOutputFormat.java ql/src/java/org/apache/hadoop/hive/ql/io/merge/BlockMergeTask.java MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/23295/ To: kevinwilfong, omalley, sxyuan Cc: JIRA > Stripe-level merge for ORC files > -------------------------------- > > Key: HIVE-4221 > URL: https://issues.apache.org/jira/browse/HIVE-4221 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Reporter: Samuel Yuan > Assignee: Samuel Yuan > Attachments: HIVE-4221.HIVE-4221.HIVE-4221.HIVE-4221.D9759.1.patch > > > As with RC files, we would like to be able to merge ORC files efficiently by > reading/writing stripes without decompressing/recompressing them. This will > be similar to the RC file merge, except that footers will have to be updated > with the stripe positions in the new file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira