[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3294:
----------------------------
    Attachment: PIG-3294-before-refactory.patch
                PIG-3294-1.patch

To use it, define HiveUDF/HiveUDTF/HiveUDAF in Pig:
define sin HiveUDF('sin');  -- alias in FunctionRegistry
define sin HiveUDF('org.apache.hadoop.hive.ql.udf.UDFSin'); -- full class name
define explode HiveUDTF('explode');  -- UDTF maps to Pig UDF returns bag
define avg HiveUDAF('avg');  -- UDAF maps to Pig Algebraic UDF

Some Hive UDF require constant parameters. Since Hive use ObjectInspector to 
communicate schema to UDF, and ObjectInspector is richer than Schema in that 
ObjectInspector can express a field is a constant or not. To support this 
function, HiveUDF take an optional constant tuple. null item in the tuple means 
it is not a constant:

define in_file HiveUDF('in_file', '(null, "names.txt")');

The patch contain the following changes:
1. Allow UDF produce a last record in close. This is used in HiveUDTF to 
process all the records as input, and produce the output in close().
2. Add input schema to Initial, Intermed, Final to Algebraic. The input schema 
is the original input schema of the UDF. The actual input schema is the 
internal knowledge of the Algebraic and Pig does not know.
3. Several minor fix in combiner 
 * tez combiner conf does not have UDFContext
 * does not set parentPlan for combiner plan operators
 * resultType of FINAL is not set properly

4. Refactory OrcUtils -> HiveUtils (also include patch before refactory to ease 
review)

> Allow Pig use Hive UDFs
> -----------------------
>
>                 Key: PIG-3294
>                 URL: https://issues.apache.org/jira/browse/PIG-3294
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Daniel Dai
>              Labels: gsoc2013, java
>         Attachments: PIG-3294-1.patch, PIG-3294-before-refactory.patch
>
>
> It would be nice if Pig provide some interoperability with Hive. We can wrap 
> Hive UDF in Pig so we can use Hive UDF in Pig.
> This is a candidate project for Google summer of code 2013. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to