This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 1646bf6aae docs: improve expressions.md (#5978)
1646bf6aae is described below
commit 1646bf6aae1e568e38c5a43e9a44c34e26222341
Author: Igor Izvekov <[email protected]>
AuthorDate: Thu Apr 13 21:07:04 2023 +0300
docs: improve expressions.md (#5978)
* docs: improve expressions.md
* feat: add hyperbolic functions
* fix: some typos
---
datafusion/expr/src/expr_fn.rs | 4 +-
docs/source/user-guide/expressions.md | 176 ++++++++++++++++++----------------
2 files changed, 96 insertions(+), 84 deletions(-)
diff --git a/datafusion/expr/src/expr_fn.rs b/datafusion/expr/src/expr_fn.rs
index d2305462fa..9cfc565b59 100644
--- a/datafusion/expr/src/expr_fn.rs
+++ b/datafusion/expr/src/expr_fn.rs
@@ -550,9 +550,9 @@ scalar_expr!(SHA224, sha224, string, "SHA-224 hash");
scalar_expr!(SHA256, sha256, string, "SHA-256 hash");
scalar_expr!(SHA384, sha384, string, "SHA-384 hash");
scalar_expr!(SHA512, sha512, string, "SHA-512 hash");
-scalar_expr!(SplitPart, split_part, string delimiter index, "splits a string
based on a delimiter and picks out the desired field based on the index. ");
+scalar_expr!(SplitPart, split_part, string delimiter index, "splits a string
based on a delimiter and picks out the desired field based on the index.");
scalar_expr!(StartsWith, starts_with, string prefix, "whether the `string`
starts with the `prefix`");
-scalar_expr!(Strpos, strpos, string substring, "finds the position from where
the `substring` matchs the `string`");
+scalar_expr!(Strpos, strpos, string substring, "finds the position from where
the `substring` matches the `string`");
scalar_expr!(Substr, substr, string position, "substring from the `position`
to the end");
scalar_expr!(Substr, substring, string position length, "substring from the
`position` with `length` characters");
scalar_expr!(Translate, translate, string from to, "replaces the characters in
`from` with the counterpart in `to`");
diff --git a/docs/source/user-guide/expressions.md
b/docs/source/user-guide/expressions.md
index dbca3d01ca..cbbc328df1 100644
--- a/docs/source/user-guide/expressions.md
+++ b/docs/source/user-guide/expressions.md
@@ -49,6 +49,16 @@ col("a").gt(lit(6)).and(col("b").lt(lit(7)))
| or | `or(expr1, expr2)` or `expr1.or(expr2)` |
| not | `not(expr)` or `expr.not()` |
+## Bitwise expressions
+
+| Function | Notes
|
+| ------------------- |
------------------------------------------------------------------------- |
+| bitwise_and | `bitwise_and(expr1, expr2)` or
`expr1.bitwise_and(expr2)` |
+| bitwise_or | `bitwise_or(expr1, expr2)` or
`expr1.bitwise_or(expr2)` |
+| bitwise_xor | `bitwise_xor(expr1, expr2)` or
`expr1.bitwise_xor(expr2)` |
+| bitwise_shift_right | `bitwise_shift_right(expr1, expr2)` or
`expr1.bitwise_shift_right(expr2)` |
+| bitwise_shift_left | `bitwise_shift_left(expr1, expr2)` or
`expr1.bitwise_shift_left(expr2)` |
+
## Comparison Expressions
| Function | Notes |
@@ -78,19 +88,22 @@ expressions such as `col("a") + col("b")` to be used.
| cbrt(x) | cube root |
| ceil(x) | nearest integer greater than or equal to argument |
| cos(x) | cosine |
+| cosh(x) | hyperbolic cosine |
| exp(x) | exponential |
| floor(x) | nearest integer less than or equal to argument |
| ln(x) | natural logarithm |
| log(base, x) | logarithm of x for a particular base |
| log10(x) | base 10 logarithm |
| log2(x) | base 2 logarithm |
-| pi(base, exponent) | approximate value of π |
+| pi() | approximate value of π |
| power(base, exponent) | base raised to the power of exponent |
| round(x) | round to nearest integer |
| signum(x) | sign of the argument (-1, 0, +1) |
| sin(x) | sine |
+| sinh(x) | hyperbolic sine |
| sqrt(x) | square root |
| tan(x) | tangent |
+| tanh(x) | hyperbolic tangent |
| trunc(x) | truncate toward zero |
### Math functions usage notes:
@@ -126,105 +139,104 @@ Unlike to some databases the math functions in
Datafusion works the same way as
## String Expressions
-| Function | Notes |
-| ---------------- | ----- |
-| ascii | |
-| bit_length | |
-| btrim | |
-| char_length | |
-| character_length | |
-| concat | |
-| concat_ws | |
-| chr | |
-| initcap | |
-| left | |
-| length | |
-| lower | |
-| lpad | |
-| ltrim | |
-| md5 | |
-| octet_length | |
-| repeat | |
-| replace | |
-| reverse | |
-| right | |
-| rpad | |
-| rtrim | |
-| digest | |
-| split_part | |
-| starts_with | |
-| strpos | |
-| substr | |
-| translate | |
-| trim | |
-| upper | |
+| Function | Notes
|
+| ---------------------------------------------- |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
+| ascii(character) | Returns a numeric
representation of the character (`character`). Example: `ascii('a') -> 97`
|
+| bit_length(text) | Returns the length of the
string (`text`) in bits. Example: `bit_length('spider') -> 48`
|
+| btrim(text, characters) | Removes all specified
characters (`characters`) from both the beginning and the end of the string
(`text`). Example: `btrim('aabchelloccb', 'abc') -> hello`
|
+| char_length(text) | Returns number of
characters in the string (`text`). The same as `character_length` and `length`.
Example: `character_length('lion') -> 4`
|
+| character_length(text) | Returns number of
characters in the string (`text`). The same as `char_length` and `length`.
Example: `char_length('lion') -> 4`
|
+| concat(value1, [value2 [, ...]]) | Concatenates the text
representations (`value1, [value2 [, ...]]`) of all the arguments. NULL
arguments are ignored. Example: `concat('aaa', 'bbc', NULL, 321) -> aaabbc321`
|
+| concat_ws(separator, value1, [value2 [, ...]]) | Concatenates the text
representations (`value1, [value2 [, ...]]`) of all the arguments with the
separator (`separator`). NULL arguments are ignored. `concat_ws('/', 'path',
'to', NULL, 'my', 'folder', 123) -> path/to/my/folder/123` |
+| chr(integer) | Returns a character by its
numeric representation (`integer`). Example: `chr(90) -> 8`
|
+| initcap | Converts the first letter
of each word to upper case and the rest to lower case. Example: `initcap('hi
TOM') -> Hi Tom`
|
+| left(text, number) | Returns a certain number
(`number`) of first characters (`text`). Example: `left('like', 2) -> li`
|
+| length(text) | Returns number of
characters in the string (`text`). The same as `character_length` and
`char_length`. Example: `length('lion') -> 4`
|
+| lower(text) | Converts all characters in
the string (`text`) into lower case. Example: `lower('HELLO') -> hello`
|
+| lpad(text, length, [, fill]) | Extends the string to
length (`length`) by prepending the characters (`fill`) (a space by default).
Example: `lpad('bb', 5, 'a') → aaabb`
|
+| ltrim(text, text) | Removes all specified
characters (`characters`) from the beginning of the string (`text`). Example:
`ltrim('aabchelloccb', 'abc') -> helloccb`
|
+| md5(text) | Computes the MD5 hash of
the argument (`text`).
|
+| octet_length(text) | Returns number of bytes in
the string (`text`).
|
+| repeat(text, number) | Repeats the string the
specified number of times. Example: `repeat('1', 4) -> 1111`
|
+| replace(string, from, to) | Replaces a specified string
(`from`) with another specified string (`to`) in the string (`string`).
Example: `replace('Hello', 'replace', 'el') -> Hola`
|
+| reverse(text) | Reverses the order of the
characters in the string (`text`). Example: `reverse('hello') -> olleh`
|
+| right(text, number) | Returns a certain number
(`number`) of last characters (`text`). Example: `right('like', 2) -> ke`
|
+| rpad(text, length, [, fill]) | Extends the string to
length (`length`) by prepending the characters (`fill`) (a space by default).
Example: `rpad('bb', 5, 'a') → bbaaa`
|
+| rtrim | Removes all specified
characters (`characters`) from the end of the string (`text`). Example:
`rtrim('aabchelloccb', 'abc') -> aabchello`
|
+| digest(input, algorithm) | Computes the binary hash of
`input`, using the `algorithm`.
|
+| split_part(string, delimiter, index) | Splits the string
(`string`) based on a delimiter (`delimiter`) and picks out the desired field
based on the index (`index`).
|
+| starts_with(string, prefix) | Returns `true` if the
string (`string`) starts with the specified prefix (`prefix`). If not, it
returns `false`. Example: `starts_with('Hi Tom', 'Hi') -> true`
|
+| strpos | Finds the position from
where the `substring` matches the `string`
|
+| substr(string, position, [, length]) | Returns substring from the
position (`position`) with length (`length`) characters in the string
(`string`).
|
+| translate(string, from, to) | Replaces the characters in
`from` with the counterpart in `to`. Example: `translate('abcde', 'acd', '15')
-> 1b5e`
|
+| trim(string) | Removes all characters,
space by default from the string (`string`)
|
+| upper | Converts all characters in
the string into upper case. Example: `upper('hello') -> HELLO`
|
## Regular Expressions
-| Function | Notes |
-| -------------- | ----- |
-| regexp_match | |
-| regexp_replace | |
+| Function | Notes
|
+| -------------- |
----------------------------------------------------------------------------- |
+| regexp_match | Matches a regular expression against a string and returns
matched substrings. |
+| regexp_replace | Replaces strings that match a regular expression
|
## Temporal Expressions
-| Function | Notes |
-| -------------------- | ------------ |
-| date_part | |
-| date_trunc | |
-| from_unixtime | |
-| to_timestamp | |
-| to_timestamp_millis | |
-| to_timestamp_micros | |
-| to_timestamp_seconds | |
-| now() | current time |
+| Function | Notes
|
+| -------------------- |
------------------------------------------------------ |
+| date_part | Extracts a subfield from the date.
|
+| date_trunc | Truncates the date to a specified level of precision.
|
+| from_unixtime | Returns the unix time in format.
|
+| to_timestamp | Converts a string to a `Timestamp(_, _)`
|
+| to_timestamp_millis | Converts a string to a `Timestamp(Milliseconds,
None)` |
+| to_timestamp_micros | Converts a string to a `Timestamp(Microseconds,
None)` |
+| to_timestamp_seconds | Converts a string to a `Timestamp(Seconds, None)`
|
+| now() | Returns current time.
|
## Other Expressions
-| Function | Notes |
-| -------- | ----- |
-| array | |
-| in_list | |
-| random | |
-| sha224 | |
-| sha256 | |
-| sha384 | |
-| sha512 | |
-| struct | |
-| to_hex | |
+| Function | Notes
|
+| ---------------------------- |
----------------------------------------------------------------------------------------------------------
|
+| array([value1, ...]) | Returns an array of fixed size with each
argument (`[value1, ...]`) on it. |
+| in_list(expr, list, negated) | Returns `true` if (`expr`) belongs or not
belongs (`negated`) to a list (`list`), otherwise returns false. |
+| random() | Returns a random value from 0 (inclusive) to
1 (exclusive). |
+| sha224(text) | Computes the SHA224 hash of the argument
(`text`). |
+| sha256(text) | Computes the SHA256 hash of the argument
(`text`). |
+| sha384(text) | Computes the SHA384 hash of the argument
(`text`). |
+| sha512(text) | Computes the SHA512 hash of the argument
(`text`). |
+| to_hex(integer) | Converts the integer (`integer`) to the
corresponding hexadecimal string. |
## Aggregate Functions
-| Function | Notes |
-| ---------------------------------- | ----- |
-| avg | |
-| approx_distinct | |
-| approx_median | |
-| approx_percentile_cont | |
-| approx_percentile_cont_with_weight | |
-| count | |
-| count_distinct | |
-| cube | |
-| grouping_set | |
-| max | |
-| median | |
-| min | |
-| rollup | |
-| sum | |
+| Function | Notes
|
+| ----------------------------------------------------------------- |
---------------------------------------------------------------------------------------
|
+| avg(expr) |
Сalculates the average value for `expr`.
|
+| approx_distinct(expr) |
Calculates an approximate count of the number of distinct values for `expr`.
|
+| approx_median(expr) |
Calculates an approximation of the median for `expr`.
|
+| approx_percentile_cont(expr, percentile) |
Calculates an approximation of the specified `percentile` for `expr`.
|
+| approx_percentile_cont_with_weight(expr, weight_expr, percentile) |
Calculates an approximation of the specified `percentile` for `expr` and
`weight_expr`. |
+| count(expr) | Returns
the number of rows for `expr`.
|
+| count_distinct | Creates
an expression to represent the count(distinct) aggregate function
|
+| cube(exprs) | Creates
a grouping set for all combination of `exprs`
|
+| grouping_set(exprs) | Create a
grouping set. |
+| max(expr) | Finds
the maximum value of `expr`.
|
+| median(expr) |
Сalculates the median of `expr`.
|
+| min(expr) | Finds
the minimum value of `expr`.
|
+| rollup(exprs) | Creates
a grouping set for rollup sets.
|
+| sum(expr) |
Сalculates the sum of `expr`.
|
## Subquery Expressions
| Function | Notes
|
| --------------- |
---------------------------------------------------------------------------------------------
|
-| exists |
|
+| exists | Creates an `EXISTS` subquery expression
|
| in_subquery | `df1.filter(in_subquery(col("foo"), df2))?` is the
equivalent of the SQL `WHERE foo IN <df2>` |
-| not_exists |
|
-| not_in_subquery |
|
-| scalar_subquery |
|
+| not_exists | Creates a `NOT EXISTS` subquery expression
|
+| not_in_subquery | Creates a `NOT IN` subquery expression
|
+| scalar_subquery | Creates a scalar subquery expression
|
## User-Defined Function Expressions
-| Function | Notes |
-| ----------- | ----- |
-| create_udf | |
-| create_udaf | |
+| Function | Notes
|
+| ----------- |
------------------------------------------------------------------------- |
+| create_udf | Creates a new UDF with a specific signature and specific
return type. |
+| create_udaf | Creates a new UDAF with a specific signature, state type and
return type. |