[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs

Edward Capriolo (JIRA) Thu, 08 Aug 2013 18:17:07 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734306#comment-13734306
 ]


Edward Capriolo commented on HIVE-1545:
---------------------------------------

The annotations and other things you are seeing are part of an internal testing 
framework at FB that was never open sourced, the hive plugin developer kit had 
similar annotations but they were removed. So the UDFS likely compilefine but 
the test cases will not.
                
> Add a bunch of UDFs and UDAFs
> -----------------------------
>
>                 Key: HIVE-1545
>                 URL: https://issues.apache.org/jira/browse/HIVE-1545
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>            Reporter: Jonathan Chang
>            Assignee: Jonathan Chang
>            Priority: Minor
>         Attachments: core.tar.gz, ext.tar.gz, UDFEndsWith.java, 
> UDFFindInString.java, UDFLtrim.java, UDFRtrim.java, udfs.tar.gz, udfs.tar.gz, 
> UDFStartsWith.java, UDFTrim.java
>
>
> Here some UD(A)Fs which can be incorporated into the Hive distribution:
> UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, 
> 5, 3) returns 1.
> UDFBucket - Find the bucket in which the first argument belongs. e.g., 
> BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} 
> but <= b_{i+1}. Returns 0 if x is smaller than all the buckets.
> UDFFindInArray - Finds the 1-index of the first element in the array given as 
> the second argument. Returns 0 if not found. Returns NULL if either argument 
> is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, 
> array(1,2,3)) will return 0.
> UDFGreatCircleDist - Finds the great circle distance (in km) between two 
> lat/long coordinates (in degrees).
> UDFLDA - Performs LDA inference on a vector given fixed topics.
> UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 
> whenever any of its parameters changes.
> UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 
> 5.
> UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches 
> in an array.
> UDFUnescape - Returns the string unescaped (using C/Java style unescaping).
> UDFWhich - Given a boolean array, return the indices which are TRUE.
> UDFJaccard
> UDAFCollect - Takes all the values associated with a row and converts it into 
> a list. Make sure to have: set hive.map.aggr = false;
> UDAFCollectMap - Like collect except that it takes tuples and generates a map.
> UDAFEntropy - Compute the entropy of a column.
> UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two 
> columns.
> UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value 
> of VAL.
> UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated 
> with the N (passed as the third parameter) largest values of VAL.
> UDAFHistogram

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-1545) Add a bunch of UDFs and UDAFs

Reply via email to