[jira] Updated: (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Chang updated HIVE-1545: - Attachment: udfs.tar.gz > Add a bunch of UDFs and UDAFs > - > > Key: HIVE-1545 > URL: https://issues.apache.org/jira/browse/HIVE-1545 > Project: Hadoop Hive > Issue Type: New Feature > Components: UDF >Reporter: Jonathan Chang >Assignee: Jonathan Chang >Priority: Minor > Attachments: udfs.tar.gz, udfs.tar.gz > > > Here some UD(A)Fs which can be incorporated into the Hive distribution: > UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, > 5, 3) returns 1. > UDFBucket - Find the bucket in which the first argument belongs. e.g., > BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} > but <= b_{i+1}. Returns 0 if x is smaller than all the buckets. > UDFFindInArray - Finds the 1-index of the first element in the array given as > the second argument. Returns 0 if not found. Returns NULL if either argument > is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, > array(1,2,3)) will return 0. > UDFGreatCircleDist - Finds the great circle distance (in km) between two > lat/long coordinates (in degrees). > UDFLDA - Performs LDA inference on a vector given fixed topics. > UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 > whenever any of its parameters changes. > UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns > 5. > UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches > in an array. > UDFUnescape - Returns the string unescaped (using C/Java style unescaping). > UDFWhich - Given a boolean array, return the indices which are TRUE. > UDFJaccard > UDAFCollect - Takes all the values associated with a row and converts it into > a list. Make sure to have: set hive.map.aggr = false; > UDAFCollectMap - Like collect except that it takes tuples and generates a map. > UDAFEntropy - Compute the entropy of a column. > UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two > columns. > UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value > of VAL. > UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated > with the N (passed as the third parameter) largest values of VAL. > UDAFHistogram -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1545: - Component/s: UDF > Add a bunch of UDFs and UDAFs > - > > Key: HIVE-1545 > URL: https://issues.apache.org/jira/browse/HIVE-1545 > Project: Hadoop Hive > Issue Type: New Feature > Components: UDF >Reporter: Jonathan Chang >Assignee: Jonathan Chang >Priority: Minor > Attachments: udfs.tar.gz > > > Here some UD(A)Fs which can be incorporated into the Hive distribution: > UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, > 5, 3) returns 1. > UDFBucket - Find the bucket in which the first argument belongs. e.g., > BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} > but <= b_{i+1}. Returns 0 if x is smaller than all the buckets. > UDFFindInArray - Finds the 1-index of the first element in the array given as > the second argument. Returns 0 if not found. Returns NULL if either argument > is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, > array(1,2,3)) will return 0. > UDFGreatCircleDist - Finds the great circle distance (in km) between two > lat/long coordinates (in degrees). > UDFLDA - Performs LDA inference on a vector given fixed topics. > UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 > whenever any of its parameters changes. > UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns > 5. > UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches > in an array. > UDFUnescape - Returns the string unescaped (using C/Java style unescaping). > UDFWhich - Given a boolean array, return the indices which are TRUE. > UDFJaccard > UDAFCollect - Takes all the values associated with a row and converts it into > a list. Make sure to have: set hive.map.aggr = false; > UDAFCollectMap - Like collect except that it takes tuples and generates a map. > UDAFEntropy - Compute the entropy of a column. > UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two > columns. > UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value > of VAL. > UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated > with the N (passed as the third parameter) largest values of VAL. > UDAFHistogram -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1545) Add a bunch of UDFs and UDAFs
[ https://issues.apache.org/jira/browse/HIVE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Chang updated HIVE-1545: - Attachment: udfs.tar.gz Here is a tarball of the poorly documented/tested udfs. > Add a bunch of UDFs and UDAFs > - > > Key: HIVE-1545 > URL: https://issues.apache.org/jira/browse/HIVE-1545 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: Jonathan Chang >Assignee: Jonathan Chang >Priority: Minor > Attachments: udfs.tar.gz > > > Here some UD(A)Fs which can be incorporated into the Hive distribution: > UDFArgMax - Find the 0-indexed index of the largest argument. e.g., ARGMAX(4, > 5, 3) returns 1. > UDFBucket - Find the bucket in which the first argument belongs. e.g., > BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_{i} > but <= b_{i+1}. Returns 0 if x is smaller than all the buckets. > UDFFindInArray - Finds the 1-index of the first element in the array given as > the second argument. Returns 0 if not found. Returns NULL if either argument > is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, > array(1,2,3)) will return 0. > UDFGreatCircleDist - Finds the great circle distance (in km) between two > lat/long coordinates (in degrees). > UDFLDA - Performs LDA inference on a vector given fixed topics. > UDFNumberRows - Number successive rows starting from 1. Counter resets to 1 > whenever any of its parameters changes. > UDFPmax - Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns > 5. > UDFRegexpExtractAll - Like REGEXP_EXTRACT except that it returns all matches > in an array. > UDFUnescape - Returns the string unescaped (using C/Java style unescaping). > UDFWhich - Given a boolean array, return the indices which are TRUE. > UDFJaccard > UDAFCollect - Takes all the values associated with a row and converts it into > a list. Make sure to have: set hive.map.aggr = false; > UDAFCollectMap - Like collect except that it takes tuples and generates a map. > UDAFEntropy - Compute the entropy of a column. > UDAFPearson (BROKEN!!!) - Computes the pearson correlation between two > columns. > UDAFTop - TOP(KEY, VAL) - returns the KEY associated with the largest value > of VAL. > UDAFTopN (BROKEN!!!) - Like TOP except returns a list of the keys associated > with the N (passed as the third parameter) largest values of VAL. > UDAFHistogram -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.