[
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871025#action_12871025
]
Arnab Nandi commented on PIG-928:
---------------------------------
Thanks Dmitriy! Lazy objects are a great idea. Note that I'm not saying that
pythontoPig is slow per se -- it's just the biggest part of the profiler trace,
and would be a great place for optimization. I ran some numbers on the patch,
and it looks like outside of the runtime instantiation, there is a fairly small
performance penalty with the current code (1.2x slower).
WordCount example from Alan's package.zip:
||Data size||Native||Jython||Factor||
|10K|9s|18s|2|
|50K|14s|19s|1.35|
|500K|54s|64s|1.19|
(Full Data: 8x"War & Peace" from Proj. Gutenberg, 500K lines, 24MB)
(TOKENIZE was modified to spaces-only, both implementations have identical
output)
Python code:
{noformat}
@outputSchema("s:{d:(word:chararray)}")
def tokenize(word):
if word is not None:
return word.split(' ')
{noformat}
> UDFs in scripting languages
> ---------------------------
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Fix For: 0.8.0
>
> Attachments: calltrace.png, package.zip, pig-greek.tgz,
> pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip
>
>
> It should be possible to write UDFs in scripting languages such as python,
> ruby, etc. This frees users from needing to compile Java, generate a jar,
> etc. It also opens Pig to programmers who prefer scripting languages over
> Java.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.