[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

Zhichun Wu (JIRA) Sun, 28 Sep 2014 10:23:58 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151145#comment-14151145
 ]


Zhichun Wu commented on HIVE-8151:
----------------------------------

 We have some orc tables generated by dynamic partition with group by. when we 
upgrade our hive from 0.11 to 0.13 and use hive 0.13 to produce new data, we 
find that the data from these tables can't be read. Below is the  exception 
when we try to read the data:
{code}
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
cannot be cast to org.apache.hadoop.io.LongWritable at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:717)
 at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1788)
 at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2997)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:153)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:127)
 at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339
{code}
After some troubleshooting we find this issue and have to turn off this 
optimization and recreate the table,  we finally fix this problem.  It seems 
like this feature doesn't work well with group by (also in HIVE-6883).  

After turning off this feature, some dynamic partition etls which generate orc 
tables start to run into OOM.We have to enlarge the reduce memory in order to 
get passed. Hope this feature will  become mature soon:)

> Dynamic partition sort optimization inserts record wrongly to partition when 
> used with GroupBy
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8151
>                 URL: https://issues.apache.org/jira/browse/HIVE-8151
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>            Priority: Blocker
>         Attachments: HIVE-8151.1.patch, HIVE-8151.2.patch, HIVE-8151.3.patch
>
>
> HIVE-6455 added dynamic partition sort optimization. It added startGroup() 
> method to FileSink operator to look for changes in reduce key for creating 
> partition directories. This method however is not reliable as the key called 
> with startGroup() is different from the key called with processOp(). 
> startGroup() is called with newly changed key whereas processOp() is called 
> with previously aggregated key. This will result in processOp() writing the 
> last row of previous group as the first row of next group. This happens only 
> when used with group by operator.
> The fix is to not rely on startGroup() and do the partition directory 
> creation in processOp() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy

Reply via email to