[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649924#comment-13649924 ] Rohini Palaniswamy commented on PIG-1824: - to use a jython install, the Lib dir must be in the jython search path * via env variable JYTHON_HOME=jy_home or JYTHON_PATH=jy_home/Lib:... or * jython-standalone.jar should be in the classpath > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10.0 > > Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, > 1824_final.patch, 1824.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649920#comment-13649920 ] Rohini Palaniswamy commented on PIG-1824: - You need to have jython/Lib directory in the classpath. We bundle it with our deployment. Else need to have jython-standalone.jar instead of jython.jar as in Pig 0.11. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10.0 > > Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, > 1824_final.patch, 1824.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648793#comment-13648793 ] Martin Gerlach commented on PIG-1824: - Doesn't work for me, either (with codecs module). Pig version is 0.10.0-cdh4.1.2 > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10.0 > > Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, > 1824_final.patch, 1824.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565541#comment-13565541 ] Russell Jurney commented on PIG-1824: - This does not actually work for me, in either Pig 0.10 or Pig 0.10.1. I can't include the 're' module via 'import re', or I get this error: Caused by: Traceback (most recent call last): File "udfs.py", line 20, in import re ImportError: No module named re > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10.0 > > Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, > 1824_final.patch, 1824.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037043#comment-13037043 ] Olga Natkovich commented on PIG-1824: - Lets get it committed! Thanks, Woody for contributing! > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037039#comment-13037039 ] Richard Ding commented on PIG-1824: --- Patch passed e2e python tests. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036514#comment-13036514 ] Olga Natkovich commented on PIG-1824: - I believe that Richard is running some additional tests. Once he is done, he is planning to commit the patch > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036496#comment-13036496 ] Woody Anderson commented on PIG-1824: - cool. can we get this into trunk so i don't have to keep fixing the patches? > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035542#comment-13035542 ] Richard Ding commented on PIG-1824: --- The new patch fixed the unit test errors reported earlier. I have one (different) failed test in TestGrunt, not sure if it's related to the patch. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824_final.patch, 1824a.patch, 1824b.patch, > 1824c.patch, 1824d.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034932#comment-13034932 ] Woody Anderson commented on PIG-1824: - hmm.. i ran each of those tests via: ant -noclasspath test -Dtestcase=org.apache.pig.test.TestScriptUDF etc. and they all passed. is your environment clean? % printenv | grep YTHON (should be empty) is there anything else i should be doing to try to mirror your test framework (while not having to run all tests for the 18 hours that that requires)? > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch, 1824x.patch, TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032594#comment-13032594 ] Alan Gates commented on PIG-1824: - Woody, This patch now conflicts with the changes that were checked in as part of PIG-2056. I don't understand how to resolve the conflicts. You could upload a new patch or just tell me how to do the resolution so I can continue testing. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031274#comment-13031274 ] Alan Gates commented on PIG-1824: - I'll start running the tests and such. I also want to add some end to end tests. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030597#comment-13030597 ] Julien Le Dem commented on PIG-1824: +1 for inclusion for me. Thanks for including the comments Woody. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030116#comment-13030116 ] Woody Anderson commented on PIG-1824: - i'm not sure what's really left to keep this out of the next release, given we've been going back an forth over issues that don't even affect functionality. but, there are other jython related bugs in the pipe for 0.10 anyway, so perhaps having them all in the same release is a good idea for a feature grouping perspective. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch, > 1824d.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028366#comment-13028366 ] Woody Anderson commented on PIG-1824: - understood. adding that null check/throw etc. is just a change that is unrelated to this bug. I can bundle it up as all the related lines of code are being changed by this bug anyway, but that's why i didn't do it originally. I'll add a throw similar to current impl of getScriptAsStream > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028362#comment-13028362 ] Julien Le Dem commented on PIG-1824: Hi Woody, I had misread the code about automatic deletion. You're right it deletes only if it was created by Pig. I understand the superfluous null check and the warning being somewhat incorrect. To me there should be either no null check in that case or throw some exception if null. This is about debug-ability of the code. If someone changes the behavior of getScriptAsStream() there should be an exception in your code at that point. Not somewhere else. It also helps with understanding the code so that the reader does not wonder why it does nothing when the stream is null (because it's never null. But then why do we check ? etc) otherwise it looks good. Thanks! > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch, 1824c.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025007#comment-13025007 ] Woody Anderson commented on PIG-1824: - agree: inre: PYTHON_CACHEDIR: the code behaves as you wish, in that it only deletes the dir if it (pig) created it. sorry for not being being clear in comments about that, but if you read the code you'll see it. if we can't write, i (pig) was creating an alternate directory. It may be possible to pre-populate this, and i understand (and had) the desire to have an error instead of a new directory, but I was initially experiencing this error: {code} *sys-package-mgr*: can't create package cache dir, '/grid/0/Releases/pig-0.8.0..1103222002-20110401-000/share/pig-0.8.0..1103222002/lib/cachedir/packages' {code} which is why i added the 'is writable' check, but after reviewing (per your comment), it seems that cachedir is not set on the grid (at least at the point when the static block runs). If left as null, it seems to default to some grid location that is not writable (and thus doesn't work), but if i set it to a writable tmp first, it works. so.. i can safely agree that an error if the dir isn't writable is both desirable and works. as for the getScriptAsStream(): i followed the existing code convention on that one, though i didn't like it either. again, if you read down a bit you'll see that the impl of getScriptAsStream() is: {code} .. if (is == null) { throw new IllegalStateException( "Could not initialize interpreter (from file system or classpath) with " + scriptPath); } return is; {code} so, the null check is superfluous but does quiet the "not null check" warnings. i didn't add an additional throw statement in this case b/c essentially, my code wouldn't add any _new_ errors that the existing code didn't already exhibit if somehow the impl of getScriptAsStream changed and could return null. anyway, ill upload a new patch to address the writable issue, if you think it's a big deal we can add an 'else throw' statement around getScriptAsStream > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024960#comment-13024960 ] Julien Le Dem commented on PIG-1824: Hi Woody, This is a great feature. I agree with the static block comments, but I don't see how you could do it differently without a major refactoring of the existing code. Here are comments/questions about some details of the implementation. in JythonScriptEngine.Interpreter static block: * If _PYTHON_CACHEDIR_ is provided, we will delete it on exit. Shouldn't we delete it only if it has been created by Pig? it is dangerous to delete something that we have not created. The user could shoot himself in the foot by providing something he cares about as the _PYTHON_CACHEDIR_. * Also, if we can't write to the provided _PYTHON_CACHEDIR_ we create another one. Can the user pre-populate the cache dir? If yes we should throw an exception here. in JythonScriptEngine.Interpreter.init(): * Something should fail if _is_ is null. {code} InputStream is = getScriptAsStream(path); if (is != null) { {code} > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch, 1824b.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017789#comment-13017789 ] Woody Anderson commented on PIG-1824: - ok. i understand your thoughts on static, and mostly i have them too, but the PythonInterpreter is a static member of the Interperter class, and the code i wrote must run BEFORE that interpreter is constructed. Interpeter is a private inner class, so it cannot be caused to load before normal use patterns. So, moving the static block into the static block for Interpreter addresses your concerns. import will not cause the static block to be executed btw, it's the first executed reference to the class. However, i take the point that some code could have been: {code} Class = JythonScriptEngine.class; {code} or something like that to cause the class to be loaded. Still, as i said: Interpreter static block addresses this, and the ctor is out b/c of the static nature of Interpreter.interpreter. on second point: i dont' see the point of a includeResources() method, if it can be done, it can be done in init(), if not it won't be done. Why add a new method? > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017762#comment-13017762 ] Alan Gates commented on PIG-1824: - On the issue of the static block, I dislike static initialization blocks because you're never sure when they are going to be called. Someone adding "import o.a.p.s.j.JythonScriptingEngine" somewhere in the code will result in changing when this is executed, including possibly when it does not need to be executed. Just moving it into the Interpreter class as a static block won't change that I don't think. It can't be in Interpreter's constructor? On the second point, what I meant was, should there be a separate method ScriptEngine.includeResources()? This would make clear to developers of future scripting engines that this is something they need to do. The contract would then be that before Pig called ScriptingEngine.registerFunction it would call includeResources(). I agree with you that, when possible, all scripting engine implementations should include their resources. I was not suggesting a supportsFeature() method. For situations where it cannot be supported includeResources would be a NOP. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017549#comment-13017549 ] Woody Anderson commented on PIG-1824: - 1. i could re-work the initialization into the static block of the inner class "Interpreter", it simply needs to be done before the interpreter is allocated. I'm not sure what you mean by not wanting a cache dir when using python udfs or control flow? can you clarify? 2. separate the logic out of init into what? I think it should, in general, be the contract of any script environment to handle resource inclusion (if possible). Are you imagining some scenario where init(file,..) would not actually parse/internalize the code inside init()? I don't much care where the code is parsed and added to a ScriptEngine, but when it is, it should handle all other evaluated resources that are necessary to succeed. In the current API, a user provided script file is given to init(), so that's where it must do this. There is really no other place to evaluate resource inclusions, and i think i might not be understanding your suggestion. As for other ScriptEngines that may not be able to support this concept, are you suggesting a "supportsFeature()" method that we use to test various SE's to determine if they can support this (or other) features? I'm not sure what we'd do with this knowledge. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.9.0 > > Attachments: 1824.patch, 1824a.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016475#comment-13016475 ] Alan Gates commented on PIG-1824: - A couple of questions: # Based on my analysis the static init block that this patch adds to JythonScriptingEngine will only get invoked once we know we have Jython in the mix. Is that correct? We don't want to be invoking this when Python UDFs or a Python control flow. # Right now the code to do this is part of the init of the JythonScriptingEngine. Should we make this a separate method in ScriptEngine so that other languages can also add this kind of functionality? I would not make it abstract, since some languages may not be able to do this. But it seems like it makes for a cleaner interface. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.8.0, 0.9.0, 0.10 > > Attachments: 1824.patch, 1824a.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014086#comment-13014086 ] Woody Anderson commented on PIG-1824: - The following may not be immediately self evident to all developers: import statements that execute from within runtime function calls will not work (unless the dependency has already been satisfied statically), eg: {code} def resplit(content, regex, index): import re return re.compile(regex).split(content)[index] {code} will not work b/c the import is not attempted until after the job has been defined, built, and deployed. This import practice is frowned upon and is used very rarely. If you happen to be doing it (i'll assume you have a good reason), then you probably know how to fix it. If you're using someone else's code that is written like this, you can satisfy the dependency by explicitly importing the module up front, this will cause it to be added to the jar, and subsequent uses will succeed. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.8.0, 0.9.0, 0.10 > > Attachments: 1824.patch > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012608#comment-13012608 ] Woody Anderson commented on PIG-1824: - this code originally written cannot work: {code} import re @outputSchema("y:bag{t:tuple(word:chararray)}") def strsplittobag(content,regex): return re.compile(regex).split(content {code} the reason is that split returns a list of strings, not a list of tuples, and jythonfunction casting will fail. i've created a ticket for these kinds of 'obvious' type coercions: https://issues.apache.org/jira/browse/PIG-1942 and, as such, i am going to change the code for this ticket to something that will work when 'import re' works. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10 > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("y:bag{t:tuple(word:chararray)}") > def strsplittobag(content,regex): > return re.compile(regex).split(content) > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000982#comment-13000982 ] David Ciemiewicz commented on PIG-1824: --- I don't think it is appropriate to just leave this up to the end user to figure this stuff out. Especially when the errors won't be discovered until the user attempts to run the code on the grid then must decipher the errors then must track down the individual dependency files then must try to figure out how to ship the necessary files then must try to track down why it still doesn't work because the import files contained dependencies on imported files then must track down the subsequent dependencies then ... If jython itself does not provide hooks to enumerate all dependencies after parsing, would it be possible to build a tool which recurses the imports and then provides information to the end user on how to package all the dependencies for ship (or better just does it). Couldn't this be a requirement for all language bindings to provide a method or script for enumerating all dependent files, even if the interpreter implementation in Java doesn't provide this functionality natively? > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Reporter: Richard Ding >Assignee: Richard Ding > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("y:bag{t:tuple(word:chararray)}") > def strsplittobag(content,regex): > return re.compile(regex).split(content) > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987143#action_12987143 ] Alan Gates commented on PIG-1824: - +1 to Ashutosh's comment. Also, this won't port well as we add UDFs in new languages. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Reporter: Richard Ding > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("y:bag{t:tuple(word:chararray)}") > def strsplittobag(content,regex): > return re.compile(regex).split(content) > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987105#action_12987105 ] Ashutosh Chauhan commented on PIG-1824: --- Unless, there is a java api provided by jython interpreter which lists all the dependencies of a jython script, trying to figure out all the module dependencies yourself will be close to writing a linker, isn't it? I think it will be easier to let user specify and ship his modules in the meanwhile. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Reporter: Richard Ding > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("y:bag{t:tuple(word:chararray)}") > def strsplittobag(content,regex): > return re.compile(regex).split(content) > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.