java.lang.ClassCastException while using double value from result of a group
----------------------------------------------------------------------------

                 Key: PIG-1948
                 URL: https://issues.apache.org/jira/browse/PIG-1948
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.8.0, 0.7.0, 0.9.0
            Reporter: Vivek Padmanabhan


I have a fairly simple script (but too many coloumns) which is failing with 
class cast exception.


{code}
register myudf.jar;
A = load 'newinput' as (datestamp: chararray,vtestid: chararray,src_kt1: 
chararray,f1: chararray,f2: chararray,f3: chararray,f4: chararray,f5: 
chararray,f6: int,ipc: chararray,woeid: long,woeid_place: chararray,f7: 
chararray,f8: double,woeid_latitude: double,f9: chararray,woeid_town: 
chararray,woeid_county: chararray,a1: chararray,a2: chararray,woeid_country: 
chararray,a3: chararray,connection_speed: chararray,isp_name: 
chararray,isp_domain: chararray,ecnt: int,vcnt: int,ccnt: int,startts: 
int,duration: int,endts: int,stqust: chararray,startqc: chararray,starts_con: 
chararray,starts_lng: chararray,startv_pk1: int,startv_pk2: int,startv_pk3: 
int,startv_pk4: int,startv_pk5: int,lastquerystring: chararray,lastqc: 
chararray,lasts_con: chararray,lasts_lng: chararray,lastv_pk1: int,lastv_pk2: 
int,lastv_pk3: int,lastv_pk4: int,lastv_pk5: int,b1: chararray,lastsection: 
chararray,lastseclink: chararray,lasturl: chararray,path: chararray,pathtype: 
chararray,firstlastquerymatch: int,log_duration: double,log_duration_sq: 
double,duration_sq: double);

B = foreach A generate  
datestamp,src_kt1,vtestid,stqust,ecnt,vcnt,ccnt,log_duration,duration;
C = group B by ( datestamp, src_kt1,vtestid, stqust ) parallel 4;
D = foreach C generate COUNT( B ) as total, MyEval( B.log_duration ) as 
log_duration_summary;
store D into 'output';

{code}

The above script is failing with class cast exception;

{code}
java.lang.ClassCastException: java.lang.Double cannot be cast to 
java.lang.String
        at org.apache.pig.data.BinInterSedes.readMap(BinInterSedes.java:193)
        at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:280)
        at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
        at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:111)
        at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
        at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
        at 
org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
        at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
        at 
org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
        at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
        at 
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
        at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1376)
        .
        .
{code}

The problem is happening in the line MyEval( B.log_duration ), here even though 
log_duration is defined as a double field  BinInterSedes is considering it as a 
map value, TINYMAP to be exact. Hence it is trying to cast the double value 
into the key identifier, ie a String .  This bug exists in 0.9 also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to