[ https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717037#comment-14717037 ]
li xiang commented on PIG-3294: ------------------------------- Hi Daniel, Sorry for not responding you quickly. I am trying to debug/fix a Parquet UT failure which I found has something to do with the change on ExpToPhyTranslationVisitor.java by this JIRA. The test case is testPigScript() of https://github.com/apache/parquet-mr/blob/master/parquet-pig/src/test/java/org/apache/parquet/pig/summary/TestSummary.java. It failed with a null pointer exception(please see the first comment in PARQUET-334). Class Summary (https://github.com/apache/parquet-mr/blob/master/parquet-pig/src/main/java/org/apache/parquet/pig/summary/Summary.java) extends EvalFunc of Pig. EvalFunc has a private field inputSchemaInternal and provides both setInputSchema() and getInputSchema() to set and return inputSchemaInternal. But Summary provides a different one called inputSchema(vs. inputSchemaInternal) and only provides the setter setInputSchema(), no getter. I think it might not be reasonable, so opened PARQUET-365 and provide the getter to return inputSchema as the fix. In setInputSchema() of Summary, do you think it is reasonable to get the schema of tuple by using the following? {code} this.inputSchema = input.getField(0).schema.getField(0).schema; {code} Further, the adding of "((EvalFunc) f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema())"(as follow) makes setInputSchema() of Summary called twice. In ExpToPhyTranslationVisitor {code} 510 if (((POUserFunc)p).getFunc().getInputSchema() == null) { 511 ((POUserFunc)p).setFuncInputSchema(op.getSignature()); <-- call setInputSchema() 512 ((EvalFunc) f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema()); <-- add this line, call setInputSchema() again 513 } {code} I printed the result of each step of "this.inputSchema = input.getField(0).schema.getField(0).schema" Here is the first call of setInputSchema(), by setFuncInputSechema() of POUserFunc ====================== In Summary - SetInputSchema() - input = {A: {(a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])})}} In Summary - SetInputSchema() - input.getField(0) = A: bag({(a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])})}) In Summary - SetInputSchema() - input.getField(0).schema = {(a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])})} In Summary - SetInputSchema() - input.getField(0).schema.getField(0) = tuple({a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])}}) In Summary - SetInputSchema() - input.getField(0).schema.getField(0).schema = {a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])}} ====================== Here is the second call of setInputSchema(), by {code} ((EvalFunc) f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema()) {code} ====================== In Summary - SetInputSchema() - input = {a: chararray,a1: chararray,b: int,c: {t: (a2: chararray,b2: map[])}} In Summary - SetInputSchema() - input.getField(0) = a: chararray In Summary - SetInputSchema() - input.getField(0).schema = null <--- So the null pointer exception is here. ====================== So, to fix this error, (1) do you think it is not quite reasonable to get the schema of tuple in class Summary like this {code} this.inputSchema = input.getField(0).schema.getField(0).schema; {code} (2) Or on Pig side, does it make sense to check if the schema has been set before calling setInputSchema() again, maybe like the following change onto ExpToPhyTranslationVisitor {code} if (((POUserFunc)p).getFunc().getInputSchema() == null) { System.out.println("In visit, if == null"); ((POUserFunc)p).setFuncInputSchema(op.getSignature()); if (((POUserFunc)p).getFunc().getInputSchema() == null) { // Check before calling again ((EvalFunc) f).setInputSchema(((POUserFunc)p).getFunc().getInputSchema()); } } {code} Thanks for your time, thanks! > Allow Pig use Hive UDFs > ----------------------- > > Key: PIG-3294 > URL: https://issues.apache.org/jira/browse/PIG-3294 > Project: Pig > Issue Type: New Feature > Reporter: Daniel Dai > Assignee: Daniel Dai > Labels: gsoc2013, java > Fix For: 0.15.0 > > Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, > PIG-3294-4.patch, PIG-3294-5.patch, PIG-3294-before-refactory.patch > > > It would be nice if Pig provide some interoperability with Hive. We can wrap > Hive UDF in Pig so we can use Hive UDF in Pig. > This is a candidate project for Google summer of code 2013. More information > about the program can be found at > https://cwiki.apache.org/confluence/display/PIG/GSoc2013 -- This message was sent by Atlassian JIRA (v6.3.4#6332)