Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "UDFsUsingScriptingLanguages" page has been changed by Aniket Mokashi. http://wiki.apache.org/pig/UDFsUsingScriptingLanguages?action=diff&rev1=2&rev2=3 -------------------------------------------------- {{{ Register 'test.py' using jython as myfuncs; }}} - This uses org.apache.pig.scripting.jython.JythonScriptEngine to interpret the python script. Users can use custom script engines to support multiple languages and ways to interpret them. Currently, pig identifies jython as a keyword and ships the required scriptengine (jython) to interpret it. + This uses org.apache.pig.scripting.jython.JythonScriptEngine to interpret the python script. Users can develop and use custom script engines to support multiple programming languages and ways to interpret them. Currently, pig identifies jython as a keyword and ships the required scriptengine (jython) to interpret it. Following syntax is also supported - {{{ @@ -52, +52 @@ }}} Registering test.py with pig makes under myfuncs namespace creates functions - myfuncs.helloworld(), myfuncs.complex(2), myfuncs.square(2.0) available as UDFs. These UDFs can be used with {{{ - b = foreach a generate myfuncs.helloworld, myfuncs.square(3); + b = foreach a generate myfuncs.helloworld(), myfuncs.square(3); }}} === Decorators and Schemas === - For annotating python script so that pig can identify their return types, we use decorators to define output schema for a script UDF. + For annotating python script so that pig can identify their return types, we use python decorators to define output schema for a script UDF. '''outputSchema''' defines schema for a script udf in a format that pig understands and is able to parse. '''outputFunctionSchema''' defines a script delegate function that defines schema for this function depending upon the input type. This is needed for functions that can accept generic types and perform generic operations on these types. A simple example is ''square'' which can accept multiple types. SchemaFunction for this type is a simple identity function (same schema as input). '''schemaFunction''' defines delegate function and is not registered to pig. - - When no decorator is specified, pig assumes the output datatype as bytearray and converts the output generated by script function to bytearray. This is consistent with pig's behavior in other cases. + When no decorator is specified, pig assumes the output datatype as bytearray and converts the output generated by script function to bytearray. This is consistent with pig's behavior in case of Java UDFs. - - ''Sample Schema String'' - y:{t:(word:chararray,num:long)}, variable names are not used anywhere they are just to make syntax consistent. + ''Sample Schema String'' - y:{t:(word:chararray,num:long)}, variable names inside schema string are not used anywhere, they are used just to make syntax identifiable to the parser. == Inline Scripts == + As of today, Pig doesn't support UDFs using inline scripts. This feature is being tracked at [[#ref4|PIG-1471]]. + + == Sample Script UDFs == + Simple tasks like string manipulation, mathematical computations, reorganizing data types can be easily done using python scripts without having to develop long and complex UDFs in Java. The overall overhead of using scripting language is much less and development cost is almost negligible. Following are a few examples of UDFs developed in python that can be used with Pig. + {{{ + mySampleLib.py + --------------------- + #!/usr/bin/python + + ################## + # Math functions # + ################## + #Square - Square of a number of any data type + @outputSchemaFunction("squareSchema") + def square(num): + return ((num)*(num)) + @schemaFunction("squareSchema") + def squareSchema(input): + return input + + #Percent- Percentage + @outputSchema("t:(percent:double)") + def percent(num, total): + return num * 100 / total + + #CommaFormat- + @outputSchema("t:(numformat:chararray)") + def commaFormat(num): + return '{:,}'.format(num) + + #################### + # String Functions # + #################### + + + ####################### + # Data Type Functions # + ####################### + + + }}} == Performance == === Jython === @@ -78, +117 @@ 1. <<Anchor(ref1)>> PIG-928, "UDFs in scripting languages", https://issues.apache.org/jira/browse/PIG-928 2. <<Anchor(ref2)>> Jython, "The jython project", http://www.jython.org/ 3. <<Anchor(ref3)>> Jruby, "100% pure-java implementation of ruby programming language", http://jruby.org/ + 4. <<Anchor(ref4)>> PIG-1471, "inline UDFs in scripting languages", https://issues.apache.org/jira/browse/PIG-1471