[
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847986#action_12847986
]
Julien Le Dem commented on PIG-928:
-----------------------------------
@Woody
The main advantage of embedding pig calls in the scripting language is that it
enables iterative algorithms, which Pig is no very good at currently. Why would
we limit users to UDFs when they can have their whole program in their
scripting language of choice?
4. Python is a very interesting language to integrate with Pig because it has
all the same native data structures (tuple:tuple, list:bag, dictionary:map)
which makes the UDFs compact and easy to code. That said, in scripting
languages that don't match as well as Python to the Pig types, using the schema
to disambiguate will be a must have.
When do we need to convert sequences and iterators ? Pig has only tuple, bag
and map as complex types AFAIK.
5. agreed, It should be cached or initialised at the begining.
3. and 6. I'll investigate passing the main script through the classpath when I
have time. One interpreter would be nice to save memory and initialization
time. I'm not sure the shared state is such an advantage as UDFs should not
rely on being run in the same process. Maybe I'm just missing something.
About the multi language: I'm not against it, but there's not that much code to
share.
The scripting<->pig type conversion is specific to each language as you
mentioned. also calling functions, getting a list of functions, defining output
schemas will be specific.
How I see the multilanguage:
pig local|mapred -script {language} {scriptfile}
main program:
- generic: loads the sript file
- generic: makes the script available in the classpath of the tasks (through a
jar generated on the fly?)
- specific: initializes the interpreter for the scripting language
- specific: adds the global variables defined by pig for the main (in my case:
decorators, pig server instance)
- generic: loads the script in the interpreter
- specific: figures out the list of functions and registers them automatically
as UDFs in PIG using a dedicated UDF wrapper class
- specific: run the main
Pig execute call from the script:
- generic: parse the Pig string to replace ${expression} by the value of the
expression as evaluated by the interpreter in the local scope.
UDF init:
- generic: loads the script from the classpath
- specific: initializes the interpreter for the scripting language
- specific: add the global variables defined by pig for the UDFs (in my case:
decorators)
- generic: loads the script in the interpreter
- specific: figures out the runtime for the outputSchema: function call or
static schema (parsing of schema generic)
UDF call:
- specific: convert a pig tuple to a parameter list in the scripting language
types
- specific: call the function with the parameters
- specific: convert the result to Pig types
- generic: return the result
> UDFs in scripting languages
> ---------------------------
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
> Issue Type: New Feature
> Reporter: Alan Gates
> Attachments: package.zip, pyg.tgz, scripting.tgz, scripting.tgz
>
>
> It should be possible to write UDFs in scripting languages such as python,
> ruby, etc. This frees users from needing to compile Java, generate a jar,
> etc. It also opens Pig to programmers who prefer scripting languages over
> Java.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.