ClassCastException when using IsEmpty(DIFF()) ----------------------------------------------
Key: PIG-2144 URL: https://issues.apache.org/jira/browse/PIG-2144 Project: Pig Issue Type: Bug Reporter: Mitesh Singh Jat I have following input <name>:<nickname>, for which I want to find records where name is different from nickname. {code:title=input/name_nickname.txt} Bharat:Bharat Amita:Amita Mitesh:Mitesh Reenu:Anshu Shikha:Shikhu Shilpa:Shilpi {code} I have following script to find records where name is different from nickname. {code:title=isEmpty_diff.pig} A = LOAD 'input/name_nickname.txt' using PigStorage(':'); B = FILTER A BY NOT IsEmpty(DIFF($0, $1)); DUMP B; {code} The above pig script works with older pig versions (e.g. 0.8.0 (r1043805)) and gives following output {code:title=output of isEmpty_diff.pig} (Reenu,Anshu) (Shikha,Shikhu) (Shilpa,Shilpi) {code} However, the above pig script (isEmpty_diff.pig) fails on Pig 0.9 (e.g. 0.9.0.1105251322 (r1127671)) and newer version of Pig 0.8 (e.g. version 0.8.0.1105131316 (r1102885)) , with ClassCastException {code:title=ClassCastException} java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag cannot be cast to java.lang.Boolean at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:75) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:269) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:676) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) {code} As a workaround, I used the following pig script. {code:titlee=isEmpty_diff2.pig} A = LOAD 'input/name_nickname.txt' using PigStorage(':'); --B = FILTER A BY NOT IsEmpty(DIFF($0, $1)); B1 = FOREACH A GENERATE $0, $1, DIFF($0, $1); B2 = FILTER B1 BY NOT IsEmpty($2); B = FOREACH B2 GENERATE $0, $1; DUMP B; {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira