java.lang.ClassCastException while using double value from result of a group
----------------------------------------------------------------------------
Key: PIG-1948
URL: https://issues.apache.org/jira/browse/PIG-1948
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.8.0, 0.7.0, 0.9.0
Reporter: Vivek Padmanabhan
I have a fairly simple script (but too many coloumns) which is failing with
class cast exception.
{code}
register myudf.jar;
A = load 'newinput' as (datestamp: chararray,vtestid: chararray,src_kt1:
chararray,f1: chararray,f2: chararray,f3: chararray,f4: chararray,f5:
chararray,f6: int,ipc: chararray,woeid: long,woeid_place: chararray,f7:
chararray,f8: double,woeid_latitude: double,f9: chararray,woeid_town:
chararray,woeid_county: chararray,a1: chararray,a2: chararray,woeid_country:
chararray,a3: chararray,connection_speed: chararray,isp_name:
chararray,isp_domain: chararray,ecnt: int,vcnt: int,ccnt: int,startts:
int,duration: int,endts: int,stqust: chararray,startqc: chararray,starts_con:
chararray,starts_lng: chararray,startv_pk1: int,startv_pk2: int,startv_pk3:
int,startv_pk4: int,startv_pk5: int,lastquerystring: chararray,lastqc:
chararray,lasts_con: chararray,lasts_lng: chararray,lastv_pk1: int,lastv_pk2:
int,lastv_pk3: int,lastv_pk4: int,lastv_pk5: int,b1: chararray,lastsection:
chararray,lastseclink: chararray,lasturl: chararray,path: chararray,pathtype:
chararray,firstlastquerymatch: int,log_duration: double,log_duration_sq:
double,duration_sq: double);
B = foreach A generate
datestamp,src_kt1,vtestid,stqust,ecnt,vcnt,ccnt,log_duration,duration;
C = group B by ( datestamp, src_kt1,vtestid, stqust ) parallel 4;
D = foreach C generate COUNT( B ) as total, MyEval( B.log_duration ) as
log_duration_summary;
store D into 'output';
{code}
The above script is failing with class cast exception;
{code}
java.lang.ClassCastException: java.lang.Double cannot be cast to
java.lang.String
at org.apache.pig.data.BinInterSedes.readMap(BinInterSedes.java:193)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:280)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:111)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
at
org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
at
org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at
org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1376)
.
.
{code}
The problem is happening in the line MyEval( B.log_duration ), here even though
log_duration is defined as a double field BinInterSedes is considering it as a
map value, TINYMAP to be exact. Hence it is trying to cast the double value
into the key identifier, ie a String . This bug exists in 0.9 also.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira