Jarek Jarcec Cecho created DATAFU-68:
----------------------------------------

             Summary: SampleByKey can throw NullPointerException
                 Key: DATAFU-68
                 URL: https://issues.apache.org/jira/browse/DATAFU-68
             Project: DataFu
          Issue Type: Bug
            Reporter: Jarek Jarcec Cecho
            Assignee: Jarek Jarcec Cecho
             Fix For: 1.3.0


I've noticed that {{SampleByKey}} can throw {{NullPointerException}}:

{code}
Caused by: java.lang.NullPointerException
        at 
datafu.pig.sampling.SampleByKey.setUDFContextSignature(SampleByKey.java:86)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.setSignature(POUserFunc.java:604)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:127)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.<init>(POUserFunc.java:122)
        at 
org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:505)
        at 
org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:112)
        at 
org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69)
        at 
org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:220)
        at 
org.apache.pig.newplan.logical.relational.LOFilter.accept(LOFilter.java:79)
        at 
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
        at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
        at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:310)
        at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
        at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1305)
        at org.apache.pig.PigServer.storeEx(PigServer.java:978)
        at org.apache.pig.PigServer.store(PigServer.java:942)
        at org.apache.pig.Pig
{code}

I've reproduced the behaviour on old 1.1.0 version, but the UDF in question did 
not change much since then and hence I'm assuming that trunk will be affected 
the same way. Script that reproduces the issue is simple:

{code}
grunt> DEFINE SampleByKey datafu.pig.sampling.SampleByKey('0.5'); 
grunt> data = LOAD 'datafu/input_datafu' AS (A_id:chararray, B_id:chararray, 
C:int);
grunt> out = FILTER data BY SampleByKey(A_id); 
grunt> DUMP out;
{code}

The problem seems to be that method {{setUDFContextSignature}} can be called 
with {{null}} argument that breaks our code. The documentation for this method 
is not specific whether {{null}} is or isn't allowed. I've looked into other 
UDFs in Pig and it seems that they are handling the case when signature is 
{{null}} and hence I've decided to fix {{SampleByKey}} as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to