[
https://issues.apache.org/jira/browse/DATAFU-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151117#comment-14151117
]
Jarek Jarcec Cecho commented on DATAFU-68:
------------------------------------------
Thank you for the review [~matterhayes]!
> SampleByKey can throw NullPointerException
> ------------------------------------------
>
> Key: DATAFU-68
> URL: https://issues.apache.org/jira/browse/DATAFU-68
> Project: DataFu
> Issue Type: Bug
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Fix For: 1.3.0
>
> Attachments: DATAFU-68.patch, DATAFU-68.patch
>
>
> I've noticed that {{SampleByKey}} can throw {{NullPointerException}}:
> {code}
> Caused by: java.lang.NullPointerException
> at
> datafu.pig.sampling.SampleByKey.setUDFContextSignature(SampleByKey.java:86)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.setSignature(POUserFunc.java:604)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:127)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.<init>(POUserFunc.java:122)
> at
> org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:505)
> at
> org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:112)
> at
> org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69)
> at
> org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:220)
> at
> org.apache.pig.newplan.logical.relational.LOFilter.accept(LOFilter.java:79)
> at
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:310)
> at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
> at org.apache.pig.PigServer.storeEx(PigServer.java:978)
> at org.apache.pig.PigServer.store(PigServer.java:942)
> at org.apache.pig.Pig
> {code}
> I've reproduced the behaviour on old 1.1.0 version, but the UDF in question
> did not change much since then and hence I'm assuming that trunk will be
> affected the same way. Script that reproduces the issue is simple:
> {code}
> grunt> DEFINE SampleByKey datafu.pig.sampling.SampleByKey('0.5');
> grunt> data = LOAD 'datafu/input_datafu' AS (A_id:chararray, B_id:chararray,
> C:int);
> grunt> out = FILTER data BY SampleByKey(A_id);
> grunt> DUMP out;
> {code}
> The problem seems to be that method {{setUDFContextSignature}} can be called
> with {{null}} argument that breaks our code. The documentation for this
> method is not specific whether {{null}} is or isn't allowed. I've looked into
> other UDFs in Pig and it seems that they are handling the case when signature
> is {{null}} and hence I've decided to fix {{SampleByKey}} as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)