[jira] Commented: (HIVE-106) Join operation fails for some queries

2009-07-27 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735675#action_12735675
 ] 

Namit Jain commented on HIVE-106:
-

Josh, this problem should have been fixed - can you provide a testcase ? 
otherwise, i will close this issue

 Join operation fails for some queries
 -

 Key: HIVE-106
 URL: https://issues.apache.org/jira/browse/HIVE-106
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Josh Ferguson
Assignee: Namit Jain
Priority: Critical

 The Tables Are
 CREATE TABLE activities 
 (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
  FieldSchema(name:actee_id,type:string,comment:null), 
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
  
 actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 CREATE TABLE users 
 (id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
  
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 A working query is
 SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 A non working query is
 SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
 activities.actor_id = users.id WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 The Exception Is
 java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
 expression on string
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
   

[jira] Commented: (HIVE-106) Join operation fails for some queries

2009-01-29 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668711#action_12668711
 ] 

Namit Jain commented on HIVE-106:
-

Josh, can you provide the data files for the tables activities and users which 
was failing

 Join operation fails for some queries
 -

 Key: HIVE-106
 URL: https://issues.apache.org/jira/browse/HIVE-106
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Josh Ferguson
Assignee: Namit Jain
Priority: Critical

 The Tables Are
 CREATE TABLE activities 
 (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
  FieldSchema(name:actee_id,type:string,comment:null), 
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
  
 actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 CREATE TABLE users 
 (id STRING, properties MAPSTRING, STRING) 
 PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
 CLUSTERED BY (id) INTO 32 BUCKETS 
 ROW FORMAT DELIMITED 
 COLLECTION ITEMS TERMINATED BY '44'
 MAP KEYS TERMINATED BY '58'
 STORED AS TEXTFILE;
 Detailed Table Information:
 Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
  
 FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
  FieldSchema(name:application,type:string,comment:null), 
 FieldSchema(name:dataset,type:string,comment:null), 
 FieldSchema(name:hour,type:int,comment:null)],parameters:{})
 A working query is
 SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 A non working query is
 SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
 activities.actor_id = users.id WHERE activities.dataset='poke' AND 
 activities.properties['verb'] = 'Dance';
 The Exception Is
 java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
 expression on string
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
   at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
   at