Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "UDFsUsingScriptingLanguages" page has been changed by Aniket Mokashi.
http://wiki.apache.org/pig/UDFsUsingScriptingLanguages?action=diff&rev1=2&rev2=3

--------------------------------------------------

  {{{
  Register 'test.py' using jython as myfuncs;
  }}}
- This uses org.apache.pig.scripting.jython.JythonScriptEngine to interpret the 
python script. Users can use custom script engines to support multiple 
languages and ways to interpret them. Currently, pig identifies jython as a 
keyword and ships the required scriptengine (jython) to interpret it.
+ This uses org.apache.pig.scripting.jython.JythonScriptEngine to interpret the 
python script. Users can develop and use custom script engines to support 
multiple programming languages and ways to interpret them. Currently, pig 
identifies jython as a keyword and ships the required scriptengine (jython) to 
interpret it.
  
  Following syntax is also supported -
  {{{
@@ -52, +52 @@

  }}}
  Registering test.py with pig makes under myfuncs namespace creates functions 
- myfuncs.helloworld(), myfuncs.complex(2), myfuncs.square(2.0) available as 
UDFs. These UDFs can be used with
  {{{
- b = foreach a generate myfuncs.helloworld, myfuncs.square(3);
+ b = foreach a generate myfuncs.helloworld(), myfuncs.square(3);
  }}}
  
  === Decorators and Schemas ===
- For annotating python script so that pig can identify their return types, we 
use decorators to define output schema for a script UDF. 
+ For annotating python script so that pig can identify their return types, we 
use python decorators to define output schema for a script UDF.
   '''outputSchema''' defines schema for a script udf in a format that pig 
understands and is able to parse. 
   
   '''outputFunctionSchema''' defines a script delegate function that defines 
schema for this function depending upon the input type. This is needed for 
functions that can accept generic types and perform generic operations on these 
types. A simple example is ''square'' which can accept multiple types. 
SchemaFunction for this type is a simple identity function (same schema as 
input).
   
   '''schemaFunction''' defines delegate function and is not registered to pig.
- 
   
- When no decorator is specified, pig assumes the output datatype as bytearray 
and converts the output generated by script function to bytearray. This is 
consistent with pig's behavior in other cases. 
+ When no decorator is specified, pig assumes the output datatype as bytearray 
and converts the output generated by script function to bytearray. This is 
consistent with pig's behavior in case of Java UDFs.
- 
- ''Sample Schema String'' - y:{t:(word:chararray,num:long)}, variable names 
are not used anywhere they are just to make syntax consistent.
+ ''Sample Schema String'' - y:{t:(word:chararray,num:long)}, variable names 
inside schema string are not used anywhere, they are used just to make syntax 
identifiable to the parser.
  
  == Inline Scripts ==
+ As of today, Pig doesn't support UDFs using inline scripts. This feature is 
being tracked at [[#ref4|PIG-1471]].
+ 
+ == Sample Script UDFs ==
+ Simple tasks like string manipulation, mathematical computations, 
reorganizing data types can be easily done using python scripts without having 
to develop long and complex UDFs in Java. The overall overhead of using 
scripting language is much less and development cost is almost negligible. 
Following are a few examples of UDFs developed in python that can be used with 
Pig.
+ {{{
+ mySampleLib.py
+ ---------------------
+ #!/usr/bin/python
+ 
+ ##################
+ # Math functions #
+ ##################
+ #Square - Square of a number of any data type
+ @outputSchemaFunction("squareSchema")
+ def square(num):
+   return ((num)*(num))
+ @schemaFunction("squareSchema")
+ def squareSchema(input):
+   return input
+ 
+ #Percent- Percentage
+ @outputSchema("t:(percent:double)")
+ def percent(num, total):
+   return num * 100 / total
+ 
+ #CommaFormat-
+ @outputSchema("t:(numformat:chararray)")
+ def commaFormat(num):
+   return '{:,}'.format(num)
+ 
+ ####################
+ # String Functions #
+ ####################
+ 
+ 
+ #######################
+ # Data Type Functions #
+ #######################
+ 
+ 
+ }}}
  
  == Performance ==
  === Jython ===
@@ -78, +117 @@

   1. <<Anchor(ref1)>> PIG-928, "UDFs in scripting languages", 
https://issues.apache.org/jira/browse/PIG-928
   2. <<Anchor(ref2)>> Jython, "The jython project", http://www.jython.org/
   3. <<Anchor(ref3)>> Jruby, "100% pure-java implementation of ruby 
programming language", http://jruby.org/
+  4. <<Anchor(ref4)>> PIG-1471, "inline UDFs in scripting languages", 
https://issues.apache.org/jira/browse/PIG-1471
  

Reply via email to