This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git

commit e3968908c335705a8ea876d3876223fdcd3a34ce
Author: James Turton <[email protected]>
AuthorDate: Fri Mar 3 13:15:54 2023 +0200

    Document new statistical functions.
---
 .../sql-reference/sql-commands/084-limit-clause.md |   9 ++
 .../sql-commands/087-set-operators.md              |  23 +++--
 .../050-aggregate-and-aggregate-statistical.md     | 111 +++++++++++++++++----
 3 files changed, 115 insertions(+), 28 deletions(-)

diff --git a/_docs/en/sql-reference/sql-commands/084-limit-clause.md 
b/_docs/en/sql-reference/sql-commands/084-limit-clause.md
index 160128628..e2e5be611 100644
--- a/_docs/en/sql-reference/sql-commands/084-limit-clause.md
+++ b/_docs/en/sql-reference/sql-commands/084-limit-clause.md
@@ -11,6 +11,7 @@ The LIMIT clause limits the result set to the specified 
number of rows. You can
 You can also configure an automatic limit on the number of rows returned from 
a result set. See [Setting an Auto Limit on the Number of Rows Returned for 
Result 
Sets]({{site.baseurl}}/docs/planning-and-execution-options/#setting-an-auto-limit-on-the-number-of-rows-returned-for-result-sets).
 
 ## Syntax
+
 The LIMIT clause supports the following syntaxes:
 ```
 LIMIT { count | ALL }
@@ -19,6 +20,7 @@ FETCH NEXT count { ROW | ROWS } ONLY
 ```
 
 ## Parameters
+
 *count*
 Specifies the maximum number of rows to return.
 If the count expression evaluates to NULL, Drill treats it as LIMIT ALL.
@@ -42,6 +44,7 @@ LIMIT 0 quickly returns an empty set. Use LIMIT 0 to test the 
validity of the SQ
 Drill optimizes LIMIT 0 queries through the following options, which are 
enabled by default:
 
 **planner.enable\_limit0_optimization**
+
 Enables the query planner to determine data types returned by a query during 
the planning phase (before scanning data). Since Drill is a schema-free engine, 
the query planner does not know the column types of the result set before 
reading records. However, the query planner can infer the column types of the 
result set during query validation if you provide enough type information to 
Drill about the input column types. If the planner can infer all column types, 
a short execution path is p [...]
 
 To provide column type information, you can:
@@ -52,12 +55,14 @@ To provide column type information, you can:
 - Issue queries on views with casts on table columns.
 
 **planner.enable\_limit0\_on_scan**
+
 Supported in Drill 1.14 and later. Enables Drill to determine data types as 
Drill scans data. This optimization is used when the query planner cannot infer 
types of columns during validation (prior to scanning). Drill exits and 
terminates the query immediately after resolving the types, and the query 
execution is not parallelized. When this optimization is applied, the query 
plan contains a LIMIT (0) above every SCAN, with an optional PROJECT in between.
 
 ### LIMIT 0 Limitations
 The following sections list the types, operators, and functions that LIMIT 0 
optimizations do not support.
 
 **Unsupported Types**
+
 LIMIT 0 optimizations do not apply to the following types:
 
 REAL, BINARY, VARBINARY, NULL, ANY, SYMBOL, MULTISET, ARRAY, MAP, DISTINCT, 
STRUCTURED, ROW, OTHER, CURSOR, COLUMN_LIST
@@ -65,6 +70,7 @@ REAL, BINARY, VARBINARY, NULL, ANY, SYMBOL, MULTISET, ARRAY, 
MAP, DISTINCT, STRU
 **NOTE:** The query planner in Drill has other types that are not optimized 
because they are undefined in Drill’s type system.
 
 **Unsupported Hive UDF Types**
+
 Hive has a list of functions with return type NVARCHAR, which the query 
planner in Drill does not support; therefore, type inference does not work for 
the following functions:
 
 
@@ -76,6 +82,7 @@ Hive has a list of functions with return type NVARCHAR, which 
the query planner
 - XPATH_STRING
 
 **Unsupported Operators and Functions**
+
 LIMIT 0 optimizations do not work for queries with the UNION [ALL] set 
operator or the following complex functions:
 
 - KVGEN or MAPPIFY
@@ -87,12 +94,14 @@ LIMIT 0 optimizations do not work for queries with the 
UNION [ALL] set operator
 - AVG (window function)
 
 **Recursive file listing in Drill 1.21+**
+
 Since version 1.21, Drill will exit early from recursive file listing during 
the planning of a query against filesystem storage if has detected a LIMIT 0 in 
the outermost SELECT. This optimization is aimed at queries of the following 
form.
 ```
 SELECT * FROM dfs.`huge_directory` LIMIT 0;
 ```
 
 ## Examples
+
 The following example query includes the ORDER BY and LIMIT clauses and 
returns the top 20 sales totals by month and state:
 
        0: jdbc:drill:> SELECT `month`, state, SUM(order_total)
diff --git a/_docs/en/sql-reference/sql-commands/087-set-operators.md 
b/_docs/en/sql-reference/sql-commands/087-set-operators.md
index 3dd251dca..8d2f30e83 100644
--- a/_docs/en/sql-reference/sql-commands/087-set-operators.md
+++ b/_docs/en/sql-reference/sql-commands/087-set-operators.md
@@ -6,23 +6,23 @@ parent: "SQL Commands"
 
 The UNION, INTERSECT and EXCEPT set operators combine the result sets of two 
separate query expressions. The result set of each query must have the same 
number of columns and compatible data types.
 
-# UNION
+## UNION
 
 The UNION operator computes the set union of its two arguments, automatically 
removing duplicate records from the result set. UNION ALL returns all duplicate 
records.
 
-## Syntax
+### Syntax
 The UNION operator supports the following syntax:
 
        query
        { UNION [ ALL ] }
        another_query
 
-## Parameters
+### Parameters
 *query*, *another_query*
 
 Any SELECT query that Drill supports. See 
[SELECT]({{site.baseurl}}/docs/select/).
 
-## Examples
+### Examples
 The following example uses the UNION ALL set operator to combine click 
activity data before and after a marketing campaign. The data in the example 
exists in the `dfs.clicks workspace`.
 
        0: jdbc:drill:> SELECT t.trans_id transaction, t.user_info.cust_id 
customer
@@ -47,12 +47,13 @@ If a query on either side of the UNION operator queries an 
empty directory, as s
 
 Drill treats the empty directory as a schemaless table and returns results as 
if the UNION operator is not included in the query.
 
-# INTERSECT and EXCEPT
+## INTERSECT and EXCEPT
 
 **Introduced in release: 1.21**
+
 The INTERSECT and EXCEPT operators respectively compute the set intersection 
and the set difference of their two arguments. As with the UNION operator, 
duplicate records are automatically removed from the result set.
 
-## Syntax
+### Syntax
 
 These set operators support the following syntax:
 
@@ -62,14 +63,14 @@ query
 another_query
 ```
 
-## Parameters
+### Parameters
 *query*, *another_query*
 
 Any SELECT query that Drill supports. See 
[SELECT]({{site.baseurl}}/docs/select/).
 
-## Examples
+### Examples
 
-### Compute { 1, 2, 4 } ∩ { 4, 5, 6 }.
+#### Compute { 1, 2, 4 } ∩ { 4, 5, 6 }.
 ```
 apache drill> WITH
        X AS (SELECT employee_id AS col FROM cp.`employee.json` LIMIT 3 OFFSET 
0),
@@ -83,7 +84,7 @@ col  4
 1 row selected (0.449 seconds)
 ```
 
-### Compute { 1, 2, 4 } ∖ { 4, 5, 6 }
+#### Compute { 1, 2, 4 } ∖ { 4, 5, 6 }
 
 ```
 apache drill> WITH
@@ -100,7 +101,7 @@ col  2
 2 rows selected (0.41 seconds)
 ```
 
-# Set operator usage notes
+## Set operator usage notes
    * The two SELECT query expressions that represent the direct operands of 
the set operator must produce the same number of columns. Corresponding columns 
must contain compatible data types. See [Supported Data 
Types]({{site.baseurl}}/docs/supported-data-types/).
    * Multiple set operators in the same SELECT statement are evaluated left to 
right, unless otherwise indicated by parentheses.
    * You can only use * on either side of a set operator when the data source 
has a defined schema, such as data in Hive or views.
diff --git 
a/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
 
b/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
index 49bcde22b..cb58d1ecd 100644
--- 
a/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
+++ 
b/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
@@ -10,21 +10,24 @@ The following table lists the aggregate functions that you 
can use in Drill
 queries.
 
 
-| **Function**                                                 | **Argument 
Type**                                                                          
                                               | **Return Type**                
                                                                                
                      |
-| ------------------------------------------------------------ | 
-----------------------------------------------------------------------------------------------------------------------------------------
 | 
------------------------------------------------------------------------------------------------------------------------------------
 |
-| ANY_VALUE(expression)                                        | BIT, INT, 
BIGINT, FLOAT4, FLOAT8, DATE, TIMESTAMP, TIME, VARCHAR, VARBINARY, LIST, MAP, 
INTERVAL, INTERVALDAY, INTERVALYEAR, VARDECIMAL   | Same as argument type       
                                                                                
                         |
-| AVG(expression)                                              | SMALLINT,   
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                              | DECIMAL for DECIMAL argument,   
DOUBLE for all other arguments                                                  
                     |
-| BOOL_AND(expression), BOOL_OR(expression)                    | BIT           
                                                                                
                                            | BIT                               
                                                                                
                   |
-| BIT_AND(expression), BIT_OR(expression), BIT_XOR(expression) | INT, BIGINT   
                                                                                
                                            | Same as argument type             
                                                                                
                   |
-| CORR(x,y)                                                    | Numeric       
                                                        | Double                
                                                                                
                               |
-| COVAR_POP(x,y)                                               | Numeric       
                                                        | Double
-| COVAR_SAMP(x,y)                                              | Numeric       
                                                        | Double
-| COUNT(\*)                                                    | -             
                                                                                
                                            | BIGINT                            
                                                                                
                   |
-| COUNT(\[DISTINCT\] expression)                               | any           
                                                                                
                                            | BIGINT                            
                                                                                
                   |
-| MAX(expression), MIN(expression)                             | BINARY, 
DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP                                      
                                                  | Same   as argument type     
                                                                                
                         |
-| STDDEV, STDDEV_POP, STDDEV_SAMP                              | Numeric       
                                                        | Double
-| SUM(expression)                                              | SMALLINT, 
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                                | DECIMAL for DECIMAL   
argument,     BIGINT for any integer-type argument (including BIGINT), DOUBLE 
for   floating-point arguments   |
-| VARIANCE, VAR_POP, VAR_SAMP                                  | Numeric       
                                                        | Numeric 
+| **Function**                                                 | **Argument 
Type**                                                                          
                                             | **Return Type**                  
                                                                                
                  |
+| ------------------------------------------------------------ | 
---------------------------------------------------------------------------------------------------------------------------------------
 | 
----------------------------------------------------------------------------------------------------------------------------------
 |
+| ANY_VALUE(expression)                                        | BIT, INT, 
BIGINT, FLOAT4, FLOAT8, DATE, TIMESTAMP, TIME, VARCHAR, VARBINARY, LIST, MAP, 
INTERVAL, INTERVALDAY, INTERVALYEAR, VARDECIMAL | Same as argument type         
                                                                                
                     |
+| AVG(expression)                                              | SMALLINT,   
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                            | DECIMAL for DECIMAL argument,   
DOUBLE for all other arguments                                                  
                   |
+| BOOL_AND(expression), BOOL_OR(expression)                    | BIT           
                                                                                
                                          | BIT                                 
                                                                                
               |
+| BIT_AND(expression), BIT_OR(expression), BIT_XOR(expression) | INT, BIGINT   
                                                                                
                                          | Same as argument type               
                                                                                
               |
+| CORR(x,y)                                                    | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| COVAR_POP(x,y)                                               | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| COVAR_SAMP(x,y)                                              | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| COUNT(\*)                                                    | -             
                                                                                
                                          | BIGINT                              
                                                                                
               |
+| COUNT(\[DISTINCT\] expression)                               | any           
                                                                                
                                          | BIGINT                              
                                                                                
               |
+| KENDALL_CORRELATION                                          | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| MAX(expression), MIN(expression)                             | BINARY, 
DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP                                      
                                                | Same   as argument type       
                                                                                
                     |
+| REGR_INTERCEPT                                               | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| REGR_SLOPE                                                   | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| STDDEV, STDDEV_POP, STDDEV_SAMP                              | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
+| SUM(expression)                                              | SMALLINT, 
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                              | DECIMAL for DECIMAL   argument, 
    BIGINT for any integer-type argument (including BIGINT), DOUBLE for   
floating-point arguments |
+| VARIANCE, VAR_POP, VAR_SAMP                                  | Numeric       
                                                                                
                                          | Numeric                             
                                                                                
               |
 
 - Drill 1.14 and later supports the ANY_VALUE function.
 - Starting in Drill 1.14, the DECIMAL data type is enabled by default.
@@ -40,8 +43,9 @@ queries.
 
 ## ANY_VALUE
 
-Supported in Drill 1.14 and later. Returns one of the values of value across 
all
-input values. This function is NOT specified in the SQL standard.
+**Introdcued in release: 1.14**
+
+Returns one of the values of value across all input values. This function is 
NOT specified in the SQL standard.
 
 ### ANY_VALUE Syntax
 
@@ -99,6 +103,7 @@ SELECT ANY_VALUE(employee_id) as anyemp FROM 
cp.`employee.json` GROUP BY salary
 ```
 
 ## Filtered Aggregates
+
 **Introduced in release: 1.21**
 
 Starting in Drill 1.21 it is possible to follow an aggregate function 
invocation with a boolean expression that will filter the values procesed by 
the aggregate using the following syntax.
@@ -363,6 +368,30 @@ SELECT COUNT(*) FROM cp.`employee.json`;
 1 row selected (0.174 seconds)
 ```
 
+## KENDALL_CORRELATION
+
+Returns the Kendall correlation coefficient.
+
+### KENDALL_CORRELATION Syntax
+
+```
+KENDALL_CORRELATION( expression1, expression2 )
+```
+
+### KENDALL_CORRELATION Examples
+
+```sql
+with seq as (
+  select row_number() over (order by 1) x from cp.`employee.json` limit 10
+)
+select kendall_correlation(x+random(), x+random()+5) from seq;
+```
+```
+EXPR$0  0.2
+
+1 row selected (0.213 seconds)
+```
+
 ## MIN and MAX
 
 These functions return the smallest and largest values of the selected
@@ -521,6 +550,54 @@ computation of table statistics accordingly. {% include 
endnote.html %}
 - exec.statistics.ndv_extrapolation_bf_elements
 - exec.statistics.ndv_extrapolation_bf_fpprobability
 
+## REGR_INTERCEPT
+
+Returns the intercept of the least squares linear regression fit.
+
+### REGR_INTERCEPT Syntax
+
+```
+REGR_INTERCEPT( expression1, expression2 )
+```
+
+### REGR_INTERCEPT Examples
+
+```sql
+with seq as (
+  select row_number() over (order by 1) x from cp.`employee.json` limit 10
+)
+select regr_intercept(x+random(), x+random()+5) from seq;
+```
+```
+EXPR$0  4.9715020855109255
+
+1 row selected (0.221 seconds)
+```
+
+## REGR_SLOPE
+
+Returns the slope of the least squares linear regression fit.
+
+### REGR_INTERCEPT Syntax
+
+```
+REGR_INTERCEPT( expression1, expression2 )
+```
+
+### REGR_INTERCEPT Examples
+
+```sql
+with seq as (
+  select row_number() over (order by 1) x from cp.`employee.json` limit 10
+)
+select regr_slope(x+random(), x+random()+5) from seq;
+```
+```
+EXPR$0  0.9719696129009783
+
+1 row selected (0.283 seconds)
+```
+
 ## STDDEV
 
 Returns the sample standard deviation.

Reply via email to