[ https://issues.apache.org/jira/browse/PIG-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rohini Palaniswamy updated PIG-4930: ------------------------------------ Attachment: PIG-4930-2.patch > Skewed Join Breaks On Empty Sampled Input When Key is From Map > -------------------------------------------------------------- > > Key: PIG-4930 > URL: https://issues.apache.org/jira/browse/PIG-4930 > Project: Pig > Issue Type: Bug > Affects Versions: 0.9.2, 0.16.0 > Reporter: William Butler > Assignee: Nandor Kollar > Fix For: 0.17.0, 0.16.1 > > Attachments: PIG-4930-2.patch, PIG-4930.patch, empty_skew.diff > > > When using a skewed join, if the left relation gets its key from a map and > said relation is empty, then the skewed join fails during the sampling phase > with: > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing (Name: Local Rearrange[tuple]{tuple}(false) - scope-27 > Operator Key: scope-27): > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing [POMapLookUp (Name: POMapLookUp[bytearray] - scope-14 > Operator Key: scope-14) children: null at [null[3,17]]]: > java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > I think the problem is more fundamental to Pig's skewed join implementation > than maps, but it is easily demonstrable with them. I have written an > additional test in TestSkewedJoin that demonstrates the problem. The join > works correctly if we remove "using 'skewed'" -- This message was sent by Atlassian JIRA (v6.3.4#6332)