[ https://issues.apache.org/jira/browse/SPARK-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin updated SPARK-7285: ------------------------------- Description: Create a list of functions that is on this page but not in SQL/DataFrame. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Here's the list of missing stuff: *basic* -between- bitwise operation bitwiseAND bitwiseOR bitwiseXOR bitwiseNOT *math* round(DOUBLE a) round(DOUBLE a, INT d) Returns a rounded to d decimal places. log2 sqrt(string column name) bin hex(long), hex(string), hex(binary) unhex(string) -> binary conv pmod factorial -toDeg -> toDegrees- -toRad -> toRadians- e() pi() shiftleft(int or long) shiftright(int or long) shiftrightunsigned(int or long) *collection functions* sort_array(array) size(map, array) map_values(map<k,v>): array<v> map_keys(map<k,v>):array<k> array_contains(array<t>, value): boolean *date functions* from_unixtime(long, string): string unix_timestamp(): long unix_timestamp(date): long year(date): int month(date): int day(date): int dayofmonth(date); int hour(timestamp): int minute(timestamp): int second(timestamp): int weekofyear(date): int date_add(date, int) date_sub(date, int) from_utc_timestamp(timestamp, string timezone): timestamp current_date(): date current_timestamp(): timestamp add_months(string start_date, int num_months): string last_day(string date): string next_day(string start_date, string day_of_week): string trunc(string date[, string format]): string months_between(date1, date2): double date_format(date/timestamp/string ts, string fmt): String *conditional functions* if(boolean testCondition, T valueTrue, T valueFalseOrNull): T nvl(T value, T default_value): T greatest(T v1, T v2, …): T least(T v1, T v2, …): T *string functions* ascii(string str): int base64(binary): string concat(string|binary A, string|binary B…): string | binary concat_ws(string SEP, string A, string B…): string concat_ws(string SEP, array<string>): string decode(binary bin, string charset): string encode(string src, string charset): binary find_in_set(string str, string strList): int format_number(number x, int d): string length(string): int instr(string str, string substr): int locate(string substr, string str[, int pos]): int lower(string), lcase(string) lpad(string str, int len, string pad): string ltrim(string): string parse_url(string urlString, string partToExtract [, string keyToExtract]): string printf(String format, Obj... args): string regexp_extract(string subject, string pattern, int index): string regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): string repeat(string str, int n): string reverse(string A): string rpad(string str, int len, string pad): string space(int n): string split(string str, string pat): array str_to_map(text[, delimiter1, delimiter2]): map<string, string> trim(string A): string unbase64(string str): binary upper(string A) ucase(string A): string levenshtein(string A, string B: int soundex(string A): string *Misc* hash(a1[, a2…]): int *text* context_ngrams(array<array<string>>, array<string>, int K, int pf): array<struct<string,double>> ngrams(array<array<string>>, int N, int K, int pf): array<struct<string,double>> sentences(string str, string lang, string locale): array<array<string>> *UDAF* var_samp stddev_pop stddev_samp covar_pop covar_samp corr percentile: array<double> percentile_approx: array<double> histogram_numeric: array<struct {'x','y'}> collect_set <— we have hashset collect_list ntile was: Create a list of functions that is on this page but not in SQL/DataFrame. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Here's the list of missing stuff: *basic* -between- bitwise operation bitwiseAND bitwiseOR bitwiseXOR bitwiseNOT *math* round(DOUBLE a) round(DOUBLE a, INT d) Returns a rounded to d decimal places. log2 sqrt(string column name) bin hex(long), hex(string), hex(binary) unhex(string) -> binary conv pmod factorial toDeg -> toDegrees toRad -> toRadians e() pi() shiftleft(int or long) shiftright(int or long) shiftrightunsigned(int or long) *collection functions* sort_array(array) size(map, array) map_values(map<k,v>): array<v> map_keys(map<k,v>):array<k> array_contains(array<t>, value): boolean *date functions* from_unixtime(long, string): string unix_timestamp(): long unix_timestamp(date): long year(date): int month(date): int day(date): int dayofmonth(date); int hour(timestamp): int minute(timestamp): int second(timestamp): int weekofyear(date): int date_add(date, int) date_sub(date, int) from_utc_timestamp(timestamp, string timezone): timestamp current_date(): date current_timestamp(): timestamp add_months(string start_date, int num_months): string last_day(string date): string next_day(string start_date, string day_of_week): string trunc(string date[, string format]): string months_between(date1, date2): double date_format(date/timestamp/string ts, string fmt): String *conditional functions* if(boolean testCondition, T valueTrue, T valueFalseOrNull): T nvl(T value, T default_value): T greatest(T v1, T v2, …): T least(T v1, T v2, …): T *string functions* ascii(string str): int base64(binary): string concat(string|binary A, string|binary B…): string | binary concat_ws(string SEP, string A, string B…): string concat_ws(string SEP, array<string>): string decode(binary bin, string charset): string encode(string src, string charset): binary find_in_set(string str, string strList): int format_number(number x, int d): string length(string): int instr(string str, string substr): int locate(string substr, string str[, int pos]): int lower(string), lcase(string) lpad(string str, int len, string pad): string ltrim(string): string parse_url(string urlString, string partToExtract [, string keyToExtract]): string printf(String format, Obj... args): string regexp_extract(string subject, string pattern, int index): string regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): string repeat(string str, int n): string reverse(string A): string rpad(string str, int len, string pad): string space(int n): string split(string str, string pat): array str_to_map(text[, delimiter1, delimiter2]): map<string, string> trim(string A): string unbase64(string str): binary upper(string A) ucase(string A): string levenshtein(string A, string B: int soundex(string A): string *Misc* hash(a1[, a2…]): int *text* context_ngrams(array<array<string>>, array<string>, int K, int pf): array<struct<string,double>> ngrams(array<array<string>>, int N, int K, int pf): array<struct<string,double>> sentences(string str, string lang, string locale): array<array<string>> *UDAF* var_samp stddev_pop stddev_samp covar_pop covar_samp corr percentile: array<double> percentile_approx: array<double> histogram_numeric: array<struct {'x','y'}> collect_set <— we have hashset collect_list ntile > Audit missing Hive functions > ---------------------------- > > Key: SPARK-7285 > URL: https://issues.apache.org/jira/browse/SPARK-7285 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Assignee: Reynold Xin > > Create a list of functions that is on this page but not in SQL/DataFrame. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF > Here's the list of missing stuff: > *basic* > -between- > bitwise operation > bitwiseAND > bitwiseOR > bitwiseXOR > bitwiseNOT > *math* > round(DOUBLE a) > round(DOUBLE a, INT d) Returns a rounded to d decimal places. > log2 > sqrt(string column name) > bin > hex(long), hex(string), hex(binary) > unhex(string) -> binary > conv > pmod > factorial > -toDeg -> toDegrees- > -toRad -> toRadians- > e() > pi() > shiftleft(int or long) > shiftright(int or long) > shiftrightunsigned(int or long) > *collection functions* > sort_array(array) > size(map, array) > map_values(map<k,v>): array<v> > map_keys(map<k,v>):array<k> > array_contains(array<t>, value): boolean > *date functions* > from_unixtime(long, string): string > unix_timestamp(): long > unix_timestamp(date): long > year(date): int > month(date): int > day(date): int > dayofmonth(date); int > hour(timestamp): int > minute(timestamp): int > second(timestamp): int > weekofyear(date): int > date_add(date, int) > date_sub(date, int) > from_utc_timestamp(timestamp, string timezone): timestamp > current_date(): date > current_timestamp(): timestamp > add_months(string start_date, int num_months): string > last_day(string date): string > next_day(string start_date, string day_of_week): string > trunc(string date[, string format]): string > months_between(date1, date2): double > date_format(date/timestamp/string ts, string fmt): String > *conditional functions* > if(boolean testCondition, T valueTrue, T valueFalseOrNull): T > nvl(T value, T default_value): T > greatest(T v1, T v2, …): T > least(T v1, T v2, …): T > *string functions* > ascii(string str): int > base64(binary): string > concat(string|binary A, string|binary B…): string | binary > concat_ws(string SEP, string A, string B…): string > concat_ws(string SEP, array<string>): string > decode(binary bin, string charset): string > encode(string src, string charset): binary > find_in_set(string str, string strList): int > format_number(number x, int d): string > length(string): int > instr(string str, string substr): int > locate(string substr, string str[, int pos]): int > lower(string), lcase(string) > lpad(string str, int len, string pad): string > ltrim(string): string > parse_url(string urlString, string partToExtract [, string keyToExtract]): > string > printf(String format, Obj... args): string > regexp_extract(string subject, string pattern, int index): string > regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): > string > repeat(string str, int n): string > reverse(string A): string > rpad(string str, int len, string pad): string > space(int n): string > split(string str, string pat): array > str_to_map(text[, delimiter1, delimiter2]): map<string, string> > trim(string A): string > unbase64(string str): binary > upper(string A) ucase(string A): string > levenshtein(string A, string B: int > soundex(string A): string > *Misc* > hash(a1[, a2…]): int > *text* > context_ngrams(array<array<string>>, array<string>, int K, int pf): > array<struct<string,double>> > ngrams(array<array<string>>, int N, int K, int pf): > array<struct<string,double>> > sentences(string str, string lang, string locale): array<array<string>> > *UDAF* > var_samp > stddev_pop > stddev_samp > covar_pop > covar_samp > corr > percentile: array<double> > percentile_approx: array<double> > histogram_numeric: array<struct {'x','y'}> > collect_set <— we have hashset > collect_list > ntile -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org