[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized

2013-06-06 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676826#comment-13676826
 ] 

Jitendra Nath Pandey commented on HIVE-4665:


We should use Writables from org.apache.hadoop.hive.serde2.io.* as much as 
possible. 
Writables from hadoop.io should be used only when an implementation in hive is 
not available.

Also, the strings should use Text instead of BytesWritable.

 error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
 --

 Key: HIVE-4665
 URL: https://issues.apache.org/jira/browse/HIVE-4665
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Jitendra Nath Pandey

 CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, 
 dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, 
 dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, 
 dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, 
 dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, 
 dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 
 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, 
 dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 
 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, 
 mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; 
 create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from 
 FactSqlEngineAM4712;
 hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc 
 group by ddate;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 3
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Validating if vectorized execution is applicable
 Going down the vectorization path
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator
 Continuing ...
 java.lang.Exception: XMLEncoder: discarding statement 
 ArrayList.add(VectorGroupByOperator);
 Continuing ...
 Starting Job = job_201306041757_0016, Tracking URL = 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job  -kill 
 job_201306041757_0016
 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 
 3
 2013-06-05 10:03:06,022 Stage-1 map = 0%,  reduce = 0%
 2013-06-05 10:03:51,142 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201306041757_0016 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Examining task ID: task_201306041757_0016_m_09 (and more) from job 
 job_201306041757_0016
 Task with the most failures(4):
 -
 Task ID:
   task_201306041757_0016_m_00
 URL:
   
 http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Hive Runtime Error while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
 at org.apache.hadoop.mapred.Child.main(Child.java:265)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable 
 cannot be cast to org.apache.hadoop.io.Text
 at 
 

[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized

2013-06-06 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676851#comment-13676851
 ] 

Jitendra Nath Pandey commented on HIVE-4665:


Patch uploaded.
Review board: https://reviews.apache.org/r/11666/

 error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
 --

 Key: HIVE-4665
 URL: https://issues.apache.org/jira/browse/HIVE-4665
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4665.1.patch


 CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, 
 dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, 
 dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, 
 dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, 
 dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, 
 dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 
 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, 
 dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 
 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, 
 mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; 
 create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from 
 FactSqlEngineAM4712;
 hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc 
 group by ddate;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 3
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Validating if vectorized execution is applicable
 Going down the vectorization path
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator
 Continuing ...
 java.lang.Exception: XMLEncoder: discarding statement 
 ArrayList.add(VectorGroupByOperator);
 Continuing ...
 Starting Job = job_201306041757_0016, Tracking URL = 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job  -kill 
 job_201306041757_0016
 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 
 3
 2013-06-05 10:03:06,022 Stage-1 map = 0%,  reduce = 0%
 2013-06-05 10:03:51,142 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201306041757_0016 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Examining task ID: task_201306041757_0016_m_09 (and more) from job 
 job_201306041757_0016
 Task with the most failures(4):
 -
 Task ID:
   task_201306041757_0016_m_00
 URL:
   
 http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Hive Runtime Error while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
 at org.apache.hadoop.mapred.Child.main(Child.java:265)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable 
 cannot be cast to org.apache.hadoop.io.Text
 at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(Writable
 StringObjectInspector.java:40)
 at 
 

[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized

2013-06-06 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677331#comment-13677331
 ] 

Eric Hanson commented on HIVE-4665:
---

Looks good!

 error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
 --

 Key: HIVE-4665
 URL: https://issues.apache.org/jira/browse/HIVE-4665
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4665.1.patch


 CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, 
 dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, 
 dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, 
 dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, 
 dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, 
 dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 
 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, 
 dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 
 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, 
 mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; 
 create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from 
 FactSqlEngineAM4712;
 hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc 
 group by ddate;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 3
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Validating if vectorized execution is applicable
 Going down the vectorization path
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator
 Continuing ...
 java.lang.Exception: XMLEncoder: discarding statement 
 ArrayList.add(VectorGroupByOperator);
 Continuing ...
 Starting Job = job_201306041757_0016, Tracking URL = 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job  -kill 
 job_201306041757_0016
 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 
 3
 2013-06-05 10:03:06,022 Stage-1 map = 0%,  reduce = 0%
 2013-06-05 10:03:51,142 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201306041757_0016 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Examining task ID: task_201306041757_0016_m_09 (and more) from job 
 job_201306041757_0016
 Task with the most failures(4):
 -
 Task ID:
   task_201306041757_0016_m_00
 URL:
   
 http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Hive Runtime Error while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
 at org.apache.hadoop.mapred.Child.main(Child.java:265)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable 
 cannot be cast to org.apache.hadoop.io.Text
 at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(Writable
 StringObjectInspector.java:40)
 at 
 

[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized

2013-06-06 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13677364#comment-13677364
 ] 

Eric Hanson commented on HIVE-4665:
---

I applied this patch to my branch and ran both the queries above, and they both 
run successfully now.

 error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
 --

 Key: HIVE-4665
 URL: https://issues.apache.org/jira/browse/HIVE-4665
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-4665.1.patch


 CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, 
 dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, 
 dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, 
 dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, 
 dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, 
 dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 
 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, 
 dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 
 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, 
 mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; 
 create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from 
 FactSqlEngineAM4712;
 hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc 
 group by ddate;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 3
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Validating if vectorized execution is applicable
 Going down the vectorization path
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator
 Continuing ...
 java.lang.Exception: XMLEncoder: discarding statement 
 ArrayList.add(VectorGroupByOperator);
 Continuing ...
 Starting Job = job_201306041757_0016, Tracking URL = 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job  -kill 
 job_201306041757_0016
 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 
 3
 2013-06-05 10:03:06,022 Stage-1 map = 0%,  reduce = 0%
 2013-06-05 10:03:51,142 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201306041757_0016 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Examining task ID: task_201306041757_0016_m_09 (and more) from job 
 job_201306041757_0016
 Task with the most failures(4):
 -
 Task ID:
   task_201306041757_0016_m_00
 URL:
   
 http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Hive Runtime Error while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
 at org.apache.hadoop.mapred.Child.main(Child.java:265)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable 
 cannot be cast to org.apache.hadoop.io.Text
 at 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveWritableObject(Writable
 StringObjectInspector.java:40)
 at 
 

[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized

2013-06-05 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676452#comment-13676452
 ] 

Eric Hanson commented on HIVE-4665:
---

Similar error occurs for this query:

select avg(disinternalmsft16431) from factsqlengineam_vec_orc;

Error:
Diagnostic Messages for this Task:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
at org.apache.hadoop.mapred.Child.main(Child.java:265)
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.DoubleWritable 
cannot be cast to org.apache.hadoop.hive.serde2.io.Doub
leWritable
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.j
ava:35)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:340)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:534)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serializeStruct(LazyBinarySerDe.java:257)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe.serialize(LazyBinarySerDe.java:204)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:245)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:502)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:832)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:253)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:196)
... 8 more

 error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
 --

 Key: HIVE-4665
 URL: https://issues.apache.org/jira/browse/HIVE-4665
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Jitendra Nath Pandey

 CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, 
 dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, 
 dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, 
 dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, 
 dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, 
 dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 
 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, 
 dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 
 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, 
 mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; 
 create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from 
 FactSqlEngineAM4712;
 hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc 
 group by ddate;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 3
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Validating if vectorized execution is applicable
 Going down the vectorization path
 java.lang.InstantiationException: 
 

[jira] [Commented] (HIVE-4665) error at VectorExecMapper.close in group-by-agg query over ORC, vectorized

2013-06-05 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676553#comment-13676553
 ] 

Eric Hanson commented on HIVE-4665:
---

I started working on this and was able to get 

select avg(disinternalmsft16431) from factsqlengineam_vec_orc;

to run by importing DoubleWritable like so in VectorUDAFAvg.txt:

import org.apache.hadoop.hive.serde2.io.DoubleWritable;

instead of from org.apach.hadoop.io.DoubleWritable


 error at VectorExecMapper.close in group-by-agg query over ORC, vectorized
 --

 Key: HIVE-4665
 URL: https://issues.apache.org/jira/browse/HIVE-4665
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: vectorization-branch
Reporter: Eric Hanson
Assignee: Eric Hanson

 CREATE EXTERNAL TABLE FactSqlEngineAM4712( dAppVersionBuild int, 
 dAppVersionBuildUNMAPPED32449 int, dAppVersionMajor int, 
 dAppVersionMinor32447 int, dAverageCols23083 int, dDatabaseSize23090 int, 
 dDate string, dIsInternalMSFT16431 int, dLockEscalationDisabled23323 int, 
 dLockEscalationEnabled23324 int, dMachineID int, dNumberTables23008 int, 
 dNumCompressionPagePartitions23088 int, dNumCompressionRowPartitions23089 
 int, dNumIndexFragmentation23084 int, dNumPartitionedTables23098 int, 
 dNumPartitions23099 int, dNumTablesClusterIndex23010 int, dNumTablesHeap23100 
 int, dSessionType5618 int, dSqlEdition8213 int, dTempDbSize23103 int, 
 mNumColumnStoreIndexesVar48171 bigint, mOccurrences int, mRowFlag int) ROW 
 FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/ehans/SQM'; 
 create table FactSqlEngineAM_vec_ORC ROW FORMAT SERDE 
 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' stored as INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.CommonOrcInputFormat' OUTPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' AS select * from 
 FactSqlEngineAM4712;
 hive select ddate, max(dnumbertables23008) from factsqlengineam_vec_orc 
 group by ddate;
 Total MapReduce jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 3
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapred.reduce.tasks=number
 Validating if vectorized execution is applicable
 Going down the vectorization path
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector
 Continuing ...
 java.lang.RuntimeException: failed to evaluate: unbound=Class.new();
 Continuing ...
 java.lang.InstantiationException: 
 org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator
 Continuing ...
 java.lang.Exception: XMLEncoder: discarding statement 
 ArrayList.add(VectorGroupByOperator);
 Continuing ...
 Starting Job = job_201306041757_0016, Tracking URL = 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Kill Command = c:\Hadoop\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job  -kill 
 job_201306041757_0016
 Hadoop job information for Stage-1: number of mappers: 8; number of reducers: 
 3
 2013-06-05 10:03:06,022 Stage-1 map = 0%,  reduce = 0%
 2013-06-05 10:03:51,142 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_201306041757_0016 with errors
 Error during job, obtaining debugging information...
 Job Tracking URL: 
 http://localhost:50030/jobdetails.jsp?jobid=job_201306041757_0016
 Examining task ID: task_201306041757_0016_m_09 (and more) from job 
 job_201306041757_0016
 Task with the most failures(4):
 -
 Task ID:
   task_201306041757_0016_m_00
 URL:
   
 http://localhost:50030/taskdetails.jsp?jobid=job_201306041757_0016tipid=task_201306041757_0016_m_00
 -
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: Hive Runtime Error while closing operators
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorExecMapper.close(VectorExecMapper.java:229)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:271)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
 at org.apache.hadoop.mapred.Child.main(Child.java:265)
 Caused by: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable 
 cannot be cast to org.apache.hadoop.io.Text
 at