[jira] [Updated] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Woody Anderson updated PIG-1824: Attachment: 1824_final.patch ok. my bad! testcase=full.package.path doesn't even run the test, so tho i claimed that the tests were passing, it was in fact simply that junit could run. Here's a new patch: there was an extra line that i mistakenly didn't delete when creating the re-trunked code. this patch will pass the tests > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2077) Project UDF output inside a non-foreach statement fail on 0.8
[ https://issues.apache.org/jira/browse/PIG-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-2077: Attachment: PIG-2077-1.patch PIG-2077-1.patch is only for Pig 0.8. However, test case should commit to 0.8, 0.9 and trunk. > Project UDF output inside a non-foreach statement fail on 0.8 > - > > Key: PIG-2077 > URL: https://issues.apache.org/jira/browse/PIG-2077 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1 >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.8.1 > > Attachments: PIG-2077-1.patch > > > The following script fail on 0.8: > {code} > A = load '1.txt' as (tracking_id, day:chararray); > B = load '2.txt' as (tracking_id, timestamp:chararray); > C = JOIN A by (tracking_id, day) LEFT OUTER, B by (tracking_id, > STRSPLIT(timestamp, ' ').$0); > explain C; > {code} > Error stack: > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at java.util.ArrayList.get(ArrayList.java:324) > at > org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:207) > at > org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:121) > at > org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:193) > at > org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53) > at > org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:75) > at > org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:83) > at > org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:149) > at > org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) > at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:262) > This is not a problem on 0.9, trunk, since LogicalExpPlanMigrationVistor is > dropped in 0.9. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2077) Project UDF output inside a non-foreach statement fail on 0.8
Project UDF output inside a non-foreach statement fail on 0.8 - Key: PIG-2077 URL: https://issues.apache.org/jira/browse/PIG-2077 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.8.1 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.1 The following script fail on 0.8: {code} A = load '1.txt' as (tracking_id, day:chararray); B = load '2.txt' as (tracking_id, timestamp:chararray); C = JOIN A by (tracking_id, day) LEFT OUTER, B by (tracking_id, STRSPLIT(timestamp, ' ').$0); explain C; {code} Error stack: Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.get(ArrayList.java:324) at org.apache.pig.newplan.logical.expression.ProjectExpression.findReferent(ProjectExpression.java:207) at org.apache.pig.newplan.logical.expression.ProjectExpression.getFieldSchema(ProjectExpression.java:121) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:193) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:53) at org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:75) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:83) at org.apache.pig.newplan.logical.relational.LOJoin.accept(LOJoin.java:149) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:262) This is not a problem on 0.9, trunk, since LogicalExpPlanMigrationVistor is dropped in 0.9. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-2029: -- Attachment: PIG-2029.patch > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.10 > > Attachments: PIG-2029.patch > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0 0 0 > 0 0 ONJOIN19SAMPLER > job_201103091134_556662 0 0 0 0 0 0 > 0 0 ONJOIN19ORDER_BY,COMBINER > .. > {quote} > Run 2: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201104272229_75503159 100 484 192 353 396 > 308 321 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201104272229_7569318 0 31 14 24 0 > 0 UNION5 MULTI_QUERY,MAP_ONLY /user/viraj/dir, > job_201104272229_756947 100 34 13 22 46 > 20 25 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201104272229_75695125 100 19 11 15 32 > 18 26 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201104272229_756981 100 12 12 12 13 > 9 11 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201104272229_757022 100 21 5 13 35 > 22 26 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201104272229_757241 1 4 4 4 11 > 11 11 ONJOIN15SAMPLER > job_201104272229_757250 0 0 0 0 0 > 0 ONJOIN25SAMPLER > job_201104272229_757266 1 8 6 8 24 > 24 24 ONJOIN3 SAMPLER > job_201104272229_757290 0 0 0 0 0 > 0 ONJOIN19SAMPLER > job_2011
[jira] [Commented] (PIG-2029) Inconsistency in Pig Stats reports
[ https://issues.apache.org/jira/browse/PIG-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035010#comment-13035010 ] Richard Ding commented on PIG-2029: --- Currently Pig prints out zero (0) if max/min/avg map/reduce time isn't available by querying hadoop using hadoop client API. This is misleading. I propose that we change those values to 'n/a' as following: {code} Job Stats (time in seconds): JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs job_201104272229_434232 2 10 354 220 287 168 149 163 IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P DISTINCT,MULTI_QUERY job_201104272229_434319 2 0 9 3 6 0 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/rding/verifypigstats2-UNION5, job_201104272229_434320 2 10 n/a n/a n/a n/a n/a n/a CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER job_201104272229_434321 1 10 5 5 5 23 9 17 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER job_201104272229_434322 2 10 n/a n/a n/a n/a n/a n/a CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER job_201104272229_434323 2 10 n/a n/a n/a n/a n/a n/a CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER job_201104272229_434331 2 1 n/a n/a n/a n/a n/a n/a ONJOIN15SAMPLER job_201104272229_434332 2 1 n/a n/a n/a n/a n/a n/a ONJOIN3 SAMPLER job_201104272229_434333 1 1 2 2 2 13 13 13 ONJOIN25SAMPLER job_201104272229_434334 1 1 1 1 1 12 12 12 ONJOIN19SAMPLER job_201104272229_434342 1 10 2 2 2 16 8 11 ONJOIN25ORDER_BY,COMBINER {code} > Inconsistency in Pig Stats reports > --- > > Key: PIG-2029 > URL: https://issues.apache.org/jira/browse/PIG-2029 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.8.1, 0.9.0 >Reporter: Viraj Bhat >Assignee: Richard Ding > Fix For: 0.10 > > > I have a Pig script which reports varying Stats for the same M/R job (same > inputs). Sometimes the PigStats reports all the stats (such as > Maps,Reduces,MaxMapTime,MinMapTime,AvgMapTime,MaxReduceTime, MinReduceTime > and AvgReduceTime) for the M/R job as 0. Sometimes it reports it correctly. > Enclosed are the stderr logs for 2 runs, you can notice that for Run 1 > job_201103091134_556600 from Run 1; has 0 against all the columns whereas in > Run 2, Hadoop job job_201104272229_75693 has some valid values. > The actual Job Tracker link shows that they are non empty. This points to a > bug in the interaction of the PigStats module with the Jobtracker. > Run 1: > {quote} > Job Stats (time in seconds): > JobId MapsReduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201103091134_556458 160 100 552 191 368 1257 > 371 392 > IN,SP10P,SP11P,SP12P,SP13P,SP16P,SP17P,SP18P,SP20P,SP21P,SP22P,SP23P,SP24P,SP26P,SP27P,SP28P,SP29P,SP30P,SP31P,SP32P,SP33P,SP34P,SP4P,SP6P,SP7P,SP8P,SP9P >DISTINCT,MULTI_QUERY > job_201103091134_556600 0 0 0 0 0 0 > 0 0 UNION5 MULTI_QUERY,MAP_ONLY/user/viraj/dir,, > job_201103091134_556601 7 100 17 8 14 200 > 15 27 CNJOIN25,GNJOIN25,sampleNJOIN25 GROUP_BY,COMBINER > job_201103091134_556602 0 0 0 0 0 0 > 0 0 CNJOIN3,GNJOIN3,sampleNJOIN3GROUP_BY,COMBINER > job_201103091134_556603 0 0 0 0 0 0 > 0 0 CNJOIN15,GNJOIN15,sampleNJOIN15 GROUP_BY,COMBINER > job_201103091134_556604 2 100 13 7 10 34 > 13 31 CNJOIN19,GNJOIN19,sampleNJOIN19 GROUP_BY,COMBINER > job_201103091134_556644 0 0 0 0 0 0 > 0 0 ONJOIN15SAMPLER > job_201103091134_556645 0 0 0 0 0 0 > 0 0 ONJOIN25SAMPLER > job_201103091134_556646 0 0 0 0 0 0 > 0 0 ONJOIN3 SAMPLER > job_201103091134_556654 0 0 0 0
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034932#comment-13034932 ] Woody Anderson commented on PIG-1824: - hmm.. i ran each of those tests via: ant -noclasspath test -Dtestcase=org.apache.pig.test.TestScriptUDF etc. and they all passed. is your environment clean? % printenv | grep YTHON (should be empty) is there anything else i should be doing to try to mirror your test framework (while not having to run all tests for the 18 hours that that requires)? > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch, 1824x.patch, TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage
[ https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034925#comment-13034925 ] Daniel Dai commented on PIG-1890: - Seems it should call POProject.getNext(DataBag) instead. Project one item assumes this item already has the correct type and need not convert. The issue should be caused by plan generation, which results a wrong result type for POProject. > Fix piggybank unit test TestAvroStorage > --- > > Key: PIG-1890 > URL: https://issues.apache.org/jira/browse/PIG-1890 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.9.0 >Reporter: Daniel Dai >Assignee: Jakob Homan > Fix For: 0.9.0 > > Attachments: PIG-1890-1.patch > > > TestAvroStorage fail on trunk. There are two reasons: > 1. After PIG-1680, we call LoadFunc.setLocation one more time. > 2. The schema for AvroStorage seems to be wrong. For example, in first test > case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: > {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This > issue is hidden until PIG-1188 checked in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1824: Attachment: TEST-org.apache.pig.test.TestScriptUDF.txt TEST-org.apache.pig.test.TestScriptLanguage.txt TEST-org.apache.pig.test.TestGrunt.txt > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch, 1824x.patch, TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1824: Status: Open (was: Patch Available) I ran the unit tests and saw issues with most of the python oriented tests. I'll attach the logs from the failing tests. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch, 1824x.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira