[ https://issues.apache.org/jira/browse/DATAFU-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14151110#comment-14151110 ]
Matthew Hayes commented on DATAFU-68: ------------------------------------- Looks good to me :) Pushed the commit. > SampleByKey can throw NullPointerException > ------------------------------------------ > > Key: DATAFU-68 > URL: https://issues.apache.org/jira/browse/DATAFU-68 > Project: DataFu > Issue Type: Bug > Reporter: Jarek Jarcec Cecho > Assignee: Jarek Jarcec Cecho > Fix For: 1.3.0 > > Attachments: DATAFU-68.patch, DATAFU-68.patch > > > I've noticed that {{SampleByKey}} can throw {{NullPointerException}}: > {code} > Caused by: java.lang.NullPointerException > at > datafu.pig.sampling.SampleByKey.setUDFContextSignature(SampleByKey.java:86) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.setSignature(POUserFunc.java:604) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:127) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.<init>(POUserFunc.java:122) > at > org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:505) > at > org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:112) > at > org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69) > at > org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:220) > at > org.apache.pig.newplan.logical.relational.LOFilter.accept(LOFilter.java:79) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:310) > at org.apache.pig.PigServer.compilePp(PigServer.java:1380) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305) > at org.apache.pig.PigServer.storeEx(PigServer.java:978) > at org.apache.pig.PigServer.store(PigServer.java:942) > at org.apache.pig.Pig > {code} > I've reproduced the behaviour on old 1.1.0 version, but the UDF in question > did not change much since then and hence I'm assuming that trunk will be > affected the same way. Script that reproduces the issue is simple: > {code} > grunt> DEFINE SampleByKey datafu.pig.sampling.SampleByKey('0.5'); > grunt> data = LOAD 'datafu/input_datafu' AS (A_id:chararray, B_id:chararray, > C:int); > grunt> out = FILTER data BY SampleByKey(A_id); > grunt> DUMP out; > {code} > The problem seems to be that method {{setUDFContextSignature}} can be called > with {{null}} argument that breaks our code. The documentation for this > method is not specific whether {{null}} is or isn't allowed. I've looked into > other UDFs in Pig and it seems that they are handling the case when signature > is {{null}} and hence I've decided to fix {{SampleByKey}} as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)