[jira] [Created] (HIVE-2361) Add some UDFs which help to migrate Oracle to Hive

JunHo Cho (JIRA) Mon, 08 Aug 2011 18:26:53 -0700

 Add some UDFs which help to migrate Oracle to Hive
---------------------------------------------------


                 Key: HIVE-2361
                 URL: https://issues.apache.org/jira/browse/HIVE-2361
             Project: Hive
          Issue Type: New Feature
          Components: UDF
    Affects Versions: 0.8.0
            Reporter: JunHo Cho
            Priority: Minor
         Attachments: nexr-udf.tar

Here some UDFs which can be matched to oracle functions:
There are two kinds of oracle functions. one is scalar function and another is 
analytic function.
Most scalar functions in Oracle can be converted to hive's udf directly.  

Oracle Scalar Function

GenericUDFDecode : Compares first argument to each other value one by one. 
e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
GenericUDFGreatest : Return the greatest of the list of one or more 
expressions. e.g., GREATEST(2,5,12,3) will return 12
GenericUDFInstr : Return the location of a substring in a string. e.g., 
INSTR('next', 'e') will return 2
GenericUDFLnnvl : Evaluate a condition when one or both operands of the 
condition may be null. e.g., LNNVL(2 > 4) will return true
GenericUDFNVL : Replace null with a string in the results of a query. e.g., 
NVL(null,'hive') will return hive
GenericUDFNVL2 : Determine the value returned by a query based on whether a 
specified expression is null or not null. e.g., NVL2(null,'not null','null 
value') will return 'null value'
GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') 
will return 112
GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., 
TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'


Oracle Analytic Function

Most analytic functions in Oracle can't be converted to hive's query and udf 
directly.
Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to 
support analytic functions 

e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED 
BY hash(col1) SORT BY col1, col2 ...

GenericUDFSum : Calculate a cumulative sum.
GenericUDFRank : Assign a sequential order, or rank within some group based on 
key.
GenericUDFDenseRank : Act like RANK function except that it assigns consecutive 
ranks.
GenericUDFRowNumber : Return sequence integer value within some group based on 
key.
GenericUDFMax : Determine the highest value within some group based on key.
GenericUDFMin : Determine the lowest value within some group based on key.
GenericUDFLag : Access data from a previous row.

This udfs was developed with hive-pdk


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-2361) Add some UDFs which help to migrate Oracle to Hive

Reply via email to