[jira] Commented: (HIVE-106) Join operation fails for some queries
[ https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735675#action_12735675 ] Namit Jain commented on HIVE-106: - Josh, this problem should have been fixed - can you provide a testcase ? otherwise, i will close this issue Join operation fails for some queries - Key: HIVE-106 URL: https://issues.apache.org/jira/browse/HIVE-106 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Josh Ferguson Assignee: Namit Jain Priority: Critical The Tables Are CREATE TABLE activities (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null), FieldSchema(name:actee_id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) CREATE TABLE users (id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) A working query is SELECT activities.* FROM activities WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; A non working query is SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; The Exception Is java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index expression on string at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
[jira] Commented: (HIVE-106) Join operation fails for some queries
[ https://issues.apache.org/jira/browse/HIVE-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668711#action_12668711 ] Namit Jain commented on HIVE-106: - Josh, can you provide the data files for the tables activities and users which was failing Join operation fails for some queries - Key: HIVE-106 URL: https://issues.apache.org/jira/browse/HIVE-106 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Josh Ferguson Assignee: Namit Jain Priority: Critical The Tables Are CREATE TABLE activities (actor_id STRING, actee_id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null), FieldSchema(name:actee_id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id, actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) CREATE TABLE users (id STRING, properties MAPSTRING, STRING) PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) CLUSTERED BY (id) INTO 32 BUCKETS ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '44' MAP KEYS TERMINATED BY '58' STORED AS TEXTFILE; Detailed Table Information: Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null), FieldSchema(name:properties,type:mapstring,string,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null), FieldSchema(name:application,type:string,comment:null), FieldSchema(name:dataset,type:string,comment:null), FieldSchema(name:hour,type:int,comment:null)],parameters:{}) A working query is SELECT activities.* FROM activities WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; A non working query is SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON activities.actor_id = users.id WHERE activities.dataset='poke' AND activities.properties['verb'] = 'Dance'; The Exception Is java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index expression on string at org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72) at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262) at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467) at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140) at