Join operation fails for some queries
-------------------------------------

                 Key: HIVE-106
                 URL: https://issues.apache.org/jira/browse/HIVE-106
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.19.0
            Reporter: Josh Ferguson


The Tables Are

CREATE TABLE activities 
(actor_id STRING, actee_id STRING, properties MAP<STRING, STRING>) 
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
CLUSTERED BY (actor_id, actee_id) INTO 32 BUCKETS 
ROW FORMAT DELIMITED 
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:activities,dbName:default,owner:Josh,createTime:1228208598,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:actor_id,type:string,comment:null),
 FieldSchema(name:actee_id,type:string,comment:null), 
FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/activities,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[actor_id,
 
actee_id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
 FieldSchema(name:application,type:string,comment:null), 
FieldSchema(name:dataset,type:string,comment:null), 
FieldSchema(name:hour,type:int,comment:null)],parameters:{})


CREATE TABLE users 
(id STRING, properties MAP<STRING, STRING>) 
PARTITIONED BY (account STRING, application STRING, dataset STRING, hour INT) 
CLUSTERED BY (id) INTO 32 BUCKETS 
ROW FORMAT DELIMITED 
COLLECTION ITEMS TERMINATED BY '44'
MAP KEYS TERMINATED BY '58'
STORED AS TEXTFILE;

Detailed Table Information:
Table(tableName:users,dbName:default,owner:Josh,createTime:1228208633,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols:[FieldSchema(name:id,type:string,comment:null),
 
FieldSchema(name:properties,type:map<string,string>,comment:null)],location:/user/hive/warehouse/users,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputFormat:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:false,numBuckets:32,serdeInfo:SerDeInfo(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,parameters:{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[id],sortCols:[],parameters:{}),partitionKeys:[FieldSchema(name:account,type:string,comment:null),
 FieldSchema(name:application,type:string,comment:null), 
FieldSchema(name:dataset,type:string,comment:null), 
FieldSchema(name:hour,type:int,comment:null)],parameters:{})

A working query is

SELECT activities.* FROM activities WHERE activities.dataset='poke' AND 
activities.properties['verb'] = 'Dance';

A non working query is

SELECT activities.*, users.* FROM activities LEFT OUTER JOIN users ON 
activities.actor_id = users.id WHERE activities.dataset='poke' AND 
activities.properties['verb'] = 'Dance';

The Exception Is

java.lang.RuntimeException: Hive 2 Internal error: cannot evaluate index 
expression on string
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeIndexEvaluator.evaluate(ExprNodeIndexEvaluator.java:64)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
        at 
org.apache.hadoop.hive.ql.exec.ExprNodeFuncEvaluator.evaluate(ExprNodeFuncEvaluator.java:72)
        at 
org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:67)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:262)
        at 
org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:257)
        at 
org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:477)
        at 
org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
        at 
org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:467)
        at 
org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:507)
        at 
org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:489)
        at 
org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:140)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:430)
        at org.apache.hadoop.mapred.Child.main(Child.java:155)

This is thrown every time in the first phase of reduction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to