GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15677
[SPARK-17963][SQL][Documentation] Add examples (extend) in each expression and improve documentation with arguments ## What changes were proposed in this pull request? This PR proposes to change the documentation for functions. Please refer the discussion from https://github.com/apache/spark/pull/15513 The changes include - Re-indent the documentation - Add examples/arguments in `extended` where the arguments are multiple or specific format (e.g. xml/ json). For examples, the documentation was updated as below: ### Functions with single line usage **Before** - `pow` ```sql Usage: pow(x1, x2) - Raise x1 to the power of x2. Extended Usage: > SELECT pow(2, 3); 8.0 ``` - `current_timestamp` ```sql Usage: current_timestamp() - Returns the current timestamp at the start of query evaluation. Extended Usage: No example for current_timestamp. ``` **After** - `pow` ```sql Usage: pow(expr1, expr2) - Raise expr1 to the power of expr2. Extended Usage: Arguments: expr1 - a numeric expression. expr2 - a numeric expression. Examples: > SELECT pow(2, 3); 8.0 ``` - `current_timestamp` ```sql Usage: current_timestamp() - Returns the current timestamp at the start of query evaluation. Extended Usage: No example/arguemnt for current_timestamp. ``` ### Functions with (already) multiple line usage **Before** - `approx_count_distinct` ```sql Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++. approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++ with relativeSD, the maximum estimation error allowed. Extended Usage: No example for approx_count_distinct. ``` - `percentile_approx` ```sql Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. percentile_approx(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate percentile array of column `col` at the given percentage array. Each value of the percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. Extended Usage: No example for percentile_approx. ``` **After** - `approx_count_distinct` ```sql Usage: approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD defines the maximum estimation error allowed. Extended Usage: Arguments: expr - an expression of any type that represents data to count. relativeSD - a numeric literal that defines the maximum estimation error allowed. ``` - `percentile_approx` ```sql Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of `percentage` must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. Extended Usage: Arguments: col - a numeric expression. percentage - a numeric literal or an array literal of numeric type that defines the percentile. For example, 0.5 means 50-percentile. accuracy - a numeric literal. Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0 ``` ## How was this patch tested? Manually tested **When examples are multiple** ```sql spark-sql> describe function extended reflect; Function: reflect Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Extended Usage: Arguments: class - a string literal that represents a fully-qualified class name. method - a string literal that represents a method name. arg - a boolean, string or numeric expression except decimal that represents an argument for the method. Examples: > SELECT reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 ``` **When `Usage` is in single line** ```sql spark-sql> describe function extended min; Function: min Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min Usage: min(expr) - Returns the minimum value of `expr`. Extended Usage: Arguments: expr - an expression of any type. ``` **When `Usage` is already in multiple lines** ```sql spark-sql> describe function extended percentile_approx; Function: percentile_approx Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of `percentage` must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. Extended Usage: Arguments: col - a numeric expression. percentage - a numeric literal or an array literal of numeric type that defines the percentile. For example, 0.5 means 50-percentile. accuracy - a numeric literal. Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0 ``` **When example/argument is missing** ```sql spark-sql> describe function extended rank; Function: rank Class: org.apache.spark.sql.catalyst.expressions.Rank Usage: rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Extended Usage: No example/argument for rank. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17963-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15677 ---- commit 5f70b1ddc26c51d0c2aef34b58fe98f8220ffc0a Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-17T12:07:52Z Add examples (extend) in each function and improve documentation with arguments commit 77abafa5741c5c4c706f4e310f5a63c39811471b Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T13:02:49Z aggregate OK commit 8cd6d80e2c5232253444c59c363010c7a2c4aa69 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T13:24:36Z xml OK commit 2614572e0b04130d662f34b4926e6acaf866c704 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T14:14:54Z arithmetic OK commit 29d0262ef0b1c3d828b2301d50f2d3d1f37ac961 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T14:59:06Z bitwiseExpressions OK and double-check others commit 710c68eda59c0fdb299feccd5125f75d38063176 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T15:09:46Z CallMethodViaReflection OK commit 1d44ec38d5c68ea2853d8bda9713f633c984cdd0 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T15:15:01Z Cast OK commit 78b1fc75e9c9d931f5ff130109cf2f0d9ce2547b Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-18T16:20:07Z collectionOperations OK commit e2062afdd9579fbec08ac073a6021736078d7ad2 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T10:15:18Z complexTypeCreator OK and double check others commit e9672db417cbf800a92665942a8c096221724cbe Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T12:59:30Z conditionalExpressions OK commit 9baa847729cd6f7686c1aa476cc1de0548f9ac2f Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T14:09:38Z datetimeExpressions OK commit fff85f6bd645c17d91ce96ac73dfd99d957909d3 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T14:27:42Z generators OK commit 24627750226aef3a0e34886b3516496b4e7bb456 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T14:29:32Z InputFileName OK commit 10892fb676b6e70e4eb5039a8581f29edae59226 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T14:41:16Z jsonExpressions OK commit 45e7f99a9252d66a3a87d17136206e098cff6eea Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-19T16:01:14Z mathExpressions OK, double check others and fix scala style commit 1d69e40a89c1242c283e820ece0e9fdf4b52c7cb Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T14:14:49Z misc OK, double-check others and ignore a test in SQLQuerySuite for now commit 9b24e7879d6d75e913dd71ee3e32292483da5fb5 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T14:18:47Z MonotonicallyIncreasingID OK commit ed1c83936bdf78cbab35c50307c2a8afa6c586a3 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T14:43:17Z nullExpressions OK commit 15351863fecf7765ac97289b40c2b578ae4db7e1 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T14:59:35Z predicates OK commit 8efee7e07ca4bcce654fc947c0fb513b7f361555 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T15:11:47Z randomExpressions OK commit a111d2a63965eda60c6ba6f297a84c9e0a8f85f1 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T15:24:48Z regexpExpressions OK commit a8ddcc2010b06c19fa76a5bfcc64e490ad58f5b3 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T15:26:18Z SparkPartitionID OK commit 99b565879c86b1e8a89da8224a7f4183f0b10b1d Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T16:38:39Z tringExpressions OK commit a29472eeb0fbbdccdd8affd82d7e5706623114c0 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T16:51:30Z windowExpressions OK commit 73ccda0d89e71249adc25d4b04f70501849f1fd9 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T17:17:04Z conditionalExpressions OK, double-check others and fix tests commit d927bff266b978c26db98fd64ff346ca248f7ec4 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-20T17:32:31Z double-check commit 91d2ab5174273623819a6b18f0d2d557c54603f7 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-21T01:11:15Z Fix tests in SQLQuerySuite and DDLSuite first commit 7841860bf50fba8b31da774574ad9818ee678d85 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-21T01:13:35Z Take out a space after `Extended Usage:`. commit 01eecfe44c5edc3db0d25f43a7ae8f80ca07ac61 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-21T01:18:46Z Consistent spacing in Examples commit 1979d920afcac5794131b7605b8a170f837b461e Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-10-21T09:20:30Z Remove repeated _FUNC_, consolidate usages and simplify the arguments ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org