[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717037#comment-14717037
 ] 

li xiang commented on PIG-3294:
-------------------------------

Hi Daniel,

Sorry for not responding you quickly. I am trying to debug/fix a Parquet UT 
failure which I found has something to do with the change on 
ExpToPhyTranslationVisitor.java by this JIRA. 

The test case is testPigScript() of 
https://github.com/apache/parquet-mr/blob/master/parquet-pig/src/test/java/org/apache/parquet/pig/summary/TestSummary.java.
 It failed with a null pointer exception(please see the first comment in 
PARQUET-334).

Class Summary 
(https://github.com/apache/parquet-mr/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/summary/Summary.java)
 extends EvalFunc of Pig. EvalFunc has a private field inputSchemaInternal and 
provides both setInputSchema() and getInputSchema() to set and return 
inputSchemaInternal. But Summary provides a different one called 
inputSchema(vs. inputSchemaInternal) and only provides the setter 
setInputSchema(), no getter. I think it might not be reasonable, so opened 
PARQUET-365 and provide the getter to return inputSchema as the fix.

In setInputSchema() of Summary, do you think it is reasonable to get the schema 
of tuple by using the following?
{code}
this.inputSchema = input.getField(0).schema.getField(0).schema;
{code}

Further, the adding of "((EvalFunc) 
f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema())"(as follow) makes 
setInputSchema() of Summary called twice. In ExpToPhyTranslationVisitor
{code}
 510             if (((POUserFunc)p).getFunc().getInputSchema() == null) {
 511                ((POUserFunc)p).setFuncInputSchema(op.getSignature());  <-- 
call setInputSchema()
 512                 ((EvalFunc) 
f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema());    <-- add this 
line, call setInputSchema() again
 513             }
{code}

I printed the result of each step of "this.inputSchema = 
input.getField(0).schema.getField(0).schema"
Here is the first call of setInputSchema(), by setFuncInputSechema() of 
POUserFunc
======================
In Summary - SetInputSchema() - input = {A: {(a: chararray,a1: chararray,b: 
int,c: {t: (a2: chararray,b2: map[])})}}
In Summary - SetInputSchema() - input.getField(0) = A: bag({(a: chararray,a1: 
chararray,b: int,c: {t: (a2: chararray,b2: map[])})})
In Summary - SetInputSchema() - input.getField(0).schema = {(a: chararray,a1: 
chararray,b: int,c: {t: (a2: chararray,b2: map[])})}
In Summary - SetInputSchema() - input.getField(0).schema.getField(0) = 
tuple({a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])}})
In Summary - SetInputSchema() - input.getField(0).schema.getField(0).schema = 
{a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])}}
======================

Here is the second call of setInputSchema(), by 
{code}
((EvalFunc) f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema())
{code}
======================
In Summary - SetInputSchema() - input = {a: chararray,a1: chararray,b: int,c: 
{t: (a2: chararray,b2: map[])}}
In Summary - SetInputSchema() - input.getField(0) = a: chararray
In Summary - SetInputSchema() - input.getField(0).schema = null  <--- So the 
null pointer exception is here.
======================

So, to fix this error,
(1) do you think it is not quite reasonable to get the schema of tuple in class 
Summary like this
{code}
this.inputSchema = input.getField(0).schema.getField(0).schema;
{code}
(2) Or on Pig side, does it make sense to check if the schema has been set 
before calling setInputSchema() again, maybe like the following change onto 
ExpToPhyTranslationVisitor
{code}
if (((POUserFunc)p).getFunc().getInputSchema() == null) {
    System.out.println("In visit, if == null");
    ((POUserFunc)p).setFuncInputSchema(op.getSignature());         
    if (((POUserFunc)p).getFunc().getInputSchema() == null) {  // Check before 
calling again
        ((EvalFunc) 
f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema());
    }
}
{code}

Thanks for your time, thanks!


> Allow Pig use Hive UDFs
> -----------------------
>
>                 Key: PIG-3294
>                 URL: https://issues.apache.org/jira/browse/PIG-3294
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>              Labels: gsoc2013, java
>             Fix For: 0.15.0
>
>         Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
> PIG-3294-4.patch, PIG-3294-5.patch, PIG-3294-before-refactory.patch
>
>
> It would be nice if Pig provide some interoperability with Hive. We can wrap 
> Hive UDF in Pig so we can use Hive UDF in Pig.
> This is a candidate project for Google summer of code 2013. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to