[ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151145#comment-14151145 ]
Zhichun Wu commented on HIVE-8151: ---------------------------------- We have some orc tables generated by dynamic partition with group by. when we upgrade our hive from 0.11 to 0.13 and use hive 0.13 to produce new data, we find that the data from these tables can't be read. Below is the exception when we try to read the data: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339 {code} After some troubleshooting we find this issue and have to turn off this optimization and recreate the table, we finally fix this problem. It seems like this feature doesn't work well with group by (also in HIVE-6883). After turning off this feature, some dynamic partition etls which generate orc tables start to run into OOM.We have to enlarge the reduce memory in order to get passed. Hope this feature will become mature soon:) > Dynamic partition sort optimization inserts record wrongly to partition when > used with GroupBy > ---------------------------------------------------------------------------------------------- > > Key: HIVE-8151 > URL: https://issues.apache.org/jira/browse/HIVE-8151 > Project: Hive > Issue Type: Bug > Affects Versions: 0.14.0, 0.13.1 > Reporter: Prasanth J > Assignee: Prasanth J > Priority: Blocker > Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch > > > HIVE-6455 added dynamic partition sort optimization. It added startGroup() > method to FileSink operator to look for changes in reduce key for creating > partition directories. This method however is not reliable as the key called > with startGroup() is different from the key called with processOp(). > startGroup() is called with newly changed key whereas processOp() is called > with previously aggregated key. This will result in processOp() writing the > last row of previous group as the first row of next group. This happens only > when used with group by operator. > The fix is to not rely on startGroup() and do the partition directory > creation in processOp() itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)