[
https://issues.apache.org/jira/browse/PIG-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804038#comment-15804038
]
Nandor Kollar edited comment on PIG-4930 at 1/6/17 1:20 PM:
------------------------------------------------------------
Sure, I can, could you please assign the Jira to me?
was (Author: nkollar):
Sure, I can, could you please assign the Jira to?
> Skewed Join Breaks On Empty Sampled Input When Key is From Map
> --------------------------------------------------------------
>
> Key: PIG-4930
> URL: https://issues.apache.org/jira/browse/PIG-4930
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.9.2, 0.16.0
> Reporter: William Butler
> Assignee: Nandor Kollar
> Fix For: 0.17.0
>
> Attachments: PIG-4930.patch, empty_skew.diff
>
>
> When using a skewed join, if the left relation gets its key from a map and
> said relation is empty, then the skewed join fails during the sampling phase
> with:
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception
> while executing (Name: Local Rearrange[tuple]{tuple}(false) - scope-27
> Operator Key: scope-27):
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception
> while executing [POMapLookUp (Name: POMapLookUp[bytearray] - scope-14
> Operator Key: scope-14) children: null at [null[3,17]]]:
> java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:314)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:280)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:275)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> I think the problem is more fundamental to Pig's skewed join implementation
> than maps, but it is easily demonstrable with them. I have written an
> additional test in TestSkewedJoin that demonstrates the problem. The join
> works correctly if we remove "using 'skewed'"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)