[ 
https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910810#comment-13910810
 ] 

Szehon Ho commented on HIVE-6414:
---------------------------------

Hi Justin, I think this version of the fix looks more complete than mine, let's 
go with this one if it works.  

But just some comments.  Which version of the branch did you create the branch 
from?  There are some late changes that added new output to all q.out file in 
HIVE-5958, and this might need to regenerate based on that.  

Also does the query need a "sort by" after "group by" to guarantee 
deterministic result of the q.out file?  Thanks.

> ParquetInputFormat provides data values that do not match the object 
> inspectors
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-6414
>                 URL: https://issues.apache.org/jira/browse/HIVE-6414
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.13.0
>            Reporter: Remus Rusanu
>            Assignee: Justin Coffey
>              Labels: Parquet
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6414.patch
>
>
> While working on HIVE-5998 I noticed that the ParquetRecordReader returns 
> IntWritable for all 'int like' types, in disaccord with the row object 
> inspectors. I though fine, and I worked my way around it. But I see now that 
> the issue trigger failuers in other places, eg. in aggregates:
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row 
> {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"}
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534)
>         at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>         ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
> to java.lang.Short
>         at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790)
>         at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790)
>         at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790)
>         at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524)
>         ... 9 more
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable 
> cannot be cast to java.lang.Short
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671)
>         at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631)
>         at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109)
>         at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96)
>         at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183)
>         at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
>         at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838)
>         at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735)
>         at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803)
>         ... 15 more
> {noformat}
> My test is (I'm writing a test .q from HIVE-5998, but the repro does not 
> involve vectorization):
> {noformat}
> create table if not exists alltypes_parquet (
>   cint int,
>   ctinyint tinyint,
>   csmallint smallint,
>   cfloat float,
>   cdouble double,
>   cstring1 string) stored as parquet;
> insert overwrite table alltypes_parquet
>   select cint,
>     ctinyint,
>     csmallint,
>     cfloat,
>     cdouble,
>     cstring1
>   from alltypesorc;
> explain select * from alltypes_parquet limit 10; select * from 
> alltypes_parquet limit 10;
> explain select ctinyint,
>   max(cint),
>   min(csmallint),
>   count(cstring1),
>   avg(cfloat),
>   stddev_pop(cdouble)
>   from alltypes_parquet
>   group by ctinyint;
> select ctinyint,
>   max(cint),
>   min(csmallint),
>   count(cstring1),
>   avg(cfloat),
>   stddev_pop(cdouble)
>   from alltypes_parquet
>   group by ctinyint;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to