[drill-site] branch master updated: Re-add new blog post for Drill 1.21.

dzamo Fri, 03 Mar 2023 03:54:57 -0800

This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git



The following commit(s) were added to refs/heads/master by this push:
     new 114dacb7e Re-add new blog post for Drill 1.21.
114dacb7e is described below

commit 114dacb7e9eff30014380b121dffc44f59e75e1b
Author: James Turton <[email protected]>
AuthorDate: Fri Mar 3 13:54:33 2023 +0200

    Re-add new blog post for Drill 1.21.
---
 .../sql-reference/sql-functions/015-statistical.md |  9 ++-
 .../050-aggregate-and-aggregate-statistical.md     | 55 +++++++++-------
 blog/_posts/en/2023-03-02-drill-1.21.0-released.md | 73 ++++++++++++++++++++++
 3 files changed, 109 insertions(+), 28 deletions(-)

diff --git a/_docs/en/sql-reference/sql-functions/015-statistical.md 
b/_docs/en/sql-reference/sql-functions/015-statistical.md
index 9e9497f76..1cadec553 100644
--- a/_docs/en/sql-reference/sql-functions/015-statistical.md
+++ b/_docs/en/sql-reference/sql-functions/015-statistical.md
@@ -3,18 +3,17 @@ title: "Statistical"
 slug: "Statistical"
 parent: "SQL Functions"
 ---
-
 Drill supports the scalar statistical functions shown in the following table.
 
 ## Table of statistical functions
 
-| Function     | Return Type | Description                                     
                                                                               |
-| ------------ | ----------- | 
------------------------------------------------------------------------------------------------------------------------------
 |
-| width_bucket | INT         | Returns the 1-based bucket index of the value 
after dividing the interval between min and max into the given number of 
buckets |
+| Function                               | Return Type | Description           
                                                                                
                         |
+| -------------------------------------- | ----------- | 
------------------------------------------------------------------------------------------------------------------------------
 |
+| width_bucket(value, min, max, buckets) | INT         | Returns the 1-based 
bucket index of the value after dividing the interval between min and max into 
the given number of buckets |
 
 ## WIDTH_BUCKET
 
-Returns the 1-based bucket index of _value_ after dividing the interval 
between _min_ and_ max_ into the given number of buckets. A _value_ that falls 
outside the given range are given an index of 0 (_value_ < _min_) or _buckets_ 
+ 1 (_value_ > _max_).
+Returns the 1-based bucket index of _value_ after dividing the interval 
between _min_ and _max_ into the given number of buckets. A _value_ that falls 
outside the given range are given an index of 0 (_value_ \< _min_) or _buckets_ 
+ 1 (_value_ > _max_).
 
 ### WIDTH_BUCKET Syntax
 
diff --git 
a/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
 
b/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
index cb58d1ecd..7f6704d46 100644
--- 
a/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
+++ 
b/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
@@ -9,25 +9,24 @@ parent: "SQL Functions"
 The following table lists the aggregate functions that you can use in Drill
 queries.
 
-
-| **Function**                                                 | **Argument 
Type**                                                                          
                                             | **Return Type**                  
                                                                                
                  |
-| ------------------------------------------------------------ | 
---------------------------------------------------------------------------------------------------------------------------------------
 | 
----------------------------------------------------------------------------------------------------------------------------------
 |
-| ANY_VALUE(expression)                                        | BIT, INT, 
BIGINT, FLOAT4, FLOAT8, DATE, TIMESTAMP, TIME, VARCHAR, VARBINARY, LIST, MAP, 
INTERVAL, INTERVALDAY, INTERVALYEAR, VARDECIMAL | Same as argument type         
                                                                                
                     |
-| AVG(expression)                                              | SMALLINT,   
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                            | DECIMAL for DECIMAL argument,   
DOUBLE for all other arguments                                                  
                   |
-| BOOL_AND(expression), BOOL_OR(expression)                    | BIT           
                                                                                
                                          | BIT                                 
                                                                                
               |
-| BIT_AND(expression), BIT_OR(expression), BIT_XOR(expression) | INT, BIGINT   
                                                                                
                                          | Same as argument type               
                                                                                
               |
-| CORR(x,y)                                                    | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| COVAR_POP(x,y)                                               | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| COVAR_SAMP(x,y)                                              | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| COUNT(\*)                                                    | -             
                                                                                
                                          | BIGINT                              
                                                                                
               |
-| COUNT(\[DISTINCT\] expression)                               | any           
                                                                                
                                          | BIGINT                              
                                                                                
               |
-| KENDALL_CORRELATION                                          | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| MAX(expression), MIN(expression)                             | BINARY, 
DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP                                      
                                                | Same   as argument type       
                                                                                
                     |
-| REGR_INTERCEPT                                               | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| REGR_SLOPE                                                   | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| STDDEV, STDDEV_POP, STDDEV_SAMP                              | Numeric       
                                                                                
                                          | Double                              
                                                                                
               |
-| SUM(expression)                                              | SMALLINT, 
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                              | DECIMAL for DECIMAL   argument, 
    BIGINT for any integer-type argument (including BIGINT), DOUBLE for   
floating-point arguments |
-| VARIANCE, VAR_POP, VAR_SAMP                                  | Numeric       
                                                                                
                                          | Numeric                             
                                                                                
               |
+| **Function**                      | **Argument Type**                        
                                                                                
               | **Return Type**                                                
                                                                    |
+| --------------------------------- | 
---------------------------------------------------------------------------------------------------------------------------------------
 | 
----------------------------------------------------------------------------------------------------------------------------------
 |
+| ANY_VALUE(x)                      | BIT, INT, BIGINT, FLOAT4, FLOAT8, DATE, 
TIMESTAMP, TIME, VARCHAR, VARBINARY, LIST, MAP, INTERVAL, INTERVALDAY, 
INTERVALYEAR, VARDECIMAL | Same as argument type                                
                                                                              |
+| AVG(x)                            | SMALLINT,   INTEGER, BIGINT, FLOAT, 
DOUBLE, DECIMAL, INTERVAL                                                       
                    | DECIMAL for DECIMAL argument,   DOUBLE for all other 
arguments                                                                     |
+| BOOL_AND(x), BOOL_OR(x)           | BIT                                      
                                                                                
               | BIT                                                            
                                                                    |
+| BIT_AND(x), BIT_OR(x), BIT_XOR(x) | INT, BIGINT                              
                                                                                
               | Same as argument type                                          
                                                                    |
+| CORR(x,y)                         | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| COVAR_POP(x,y)                    | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| COVAR_SAMP(x,y)                   | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| COUNT(\*)                         | -                                        
                                                                                
               | BIGINT                                                         
                                                                    |
+| COUNT(\[DISTINCT\] expression)    | any                                      
                                                                                
               | BIGINT                                                         
                                                                    |
+| KENDALL_CORRELATION(x, y)         | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| MAX(expression), MIN(expression)  | BINARY, DECIMAL, VARCHAR, DATE, TIME, or 
TIMESTAMP                                                                       
               | Same   as argument type                                        
                                                                    |
+| REGR_INTERCEPT(x, y)              | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| REGR_SLOPE(x, y)                  | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| STDDEV, STDDEV_POP, STDDEV_SAMP   | Numeric                                  
                                                                                
               | DOUBLE                                                         
                                                                    |
+| SUM(expression)                   | SMALLINT, INTEGER, BIGINT, FLOAT, 
DOUBLE, DECIMAL, INTERVAL                                                       
                      | DECIMAL for DECIMAL   argument,     BIGINT for any 
integer-type argument (including BIGINT), DOUBLE for   floating-point arguments 
|
+| VARIANCE, VAR_POP, VAR_SAMP       | Numeric                                  
                                                                                
               | Numeric                                                        
                                                                    |
 
 - Drill 1.14 and later supports the ANY_VALUE function.
 - Starting in Drill 1.14, the DECIMAL data type is enabled by default.
@@ -107,12 +106,14 @@ SELECT ANY_VALUE(employee_id) as anyemp FROM 
cp.`employee.json` GROUP BY salary
 **Introduced in release: 1.21**
 
 Starting in Drill 1.21 it is possible to follow an aggregate function 
invocation with a boolean expression that will filter the values procesed by 
the aggregate using the following syntax.
+
 ```
 agg_func( column ) FILTER(WHERE boolean_expression)
 ```
 
 For example
-``` sql
+
+```sql
 SELECT
   count(n_name) FILTER(WHERE n_regionkey = 1) AS nations_count_in_1_region,
   count(n_name) FILTER(WHERE n_regionkey = 2) AS nations_count_in_2_region,
@@ -121,7 +122,9 @@ SELECT
   count(n_name) FILTER(WHERE n_regionkey = 0) AS nations_count_in_0_region
 FROM cp.`tpch/nation.parquet`
 ```
+
 will return
+
 ```
 
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
 | nations_count_in_1_region | nations_count_in_2_region | 
nations_count_in_3_region | nations_count_in_4_region | 
nations_count_in_0_region |
@@ -130,7 +133,7 @@ will return
 
+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+
 ```
 
-**N.B.** Some versions of Drill prior to 1.21 do not fail if FILTER 
expressions are included with aggregate function calls, but silently do no 
filtering yielding incorrect results. Filtered aggregates are only supported 
from version 1.21 onward.
+{% include startnote.html %}Some versions of Drill prior to 1.21 do not fail 
if FILTER expressions are included with aggregate function calls, but silently 
do no filtering yielding incorrect results. Filtered aggregates are only 
supported from version 1.21 onward.{% include endnote.html %}
 
 ## AVG
 
@@ -277,7 +280,8 @@ SELECT BIT_XOR(position_id) FROM cp.`employee.json`;
 ```
 
 ## CORR
-Returns the [Pearson Correlation 
Coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) for 
a given x, y. 
+
+Returns the [Pearson Correlation 
Coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) for 
a given x, y.
 
 ```
 SELECT CORR (department_id, salary) as correlation FROM cp.`employee.json`;
@@ -289,6 +293,7 @@ SELECT CORR (department_id, salary) as correlation FROM 
cp.`employee.json`;
 ```
 
 ## COVAR_POP
+
 Returns the population covariance for a data set.
 
 ```
@@ -301,6 +306,7 @@ SELECT covar_pop (department_id, salary) AS covariance FROM 
cp.`employee.json`;
 ```
 
 ## COVAR_SAMP
+
 Returns the sample covariance for a data set.
 
 ```
@@ -330,7 +336,7 @@ SELECT COUNT(*) FROM . . .
 - ALL expression\
   Returns the number of values of the specified expression.
 - \* (asterisk)\
--  Returns the number of records in the table.
+  Returns the number of records in the table.
 
 ### COUNT Examples
 
@@ -386,6 +392,7 @@ with seq as (
 )
 select kendall_correlation(x+random(), x+random()+5) from seq;
 ```
+
 ```
 EXPR$0  0.2
 
@@ -568,6 +575,7 @@ with seq as (
 )
 select regr_intercept(x+random(), x+random()+5) from seq;
 ```
+
 ```
 EXPR$0  4.9715020855109255
 
@@ -592,6 +600,7 @@ with seq as (
 )
 select regr_slope(x+random(), x+random()+5) from seq;
 ```
+
 ```
 EXPR$0  0.9719696129009783
 
diff --git a/blog/_posts/en/2023-03-02-drill-1.21.0-released.md 
b/blog/_posts/en/2023-03-02-drill-1.21.0-released.md
new file mode 100644
index 000000000..f610b3342
--- /dev/null
+++ b/blog/_posts/en/2023-03-02-drill-1.21.0-released.md
@@ -0,0 +1,73 @@
+---
+layout: post
+title: "Announcing Drill 1.21!"
+code: drill-1.21-announcement
+excerpt: "Announcing Drill 1.21: New Connectors, Functions and Much Better 
Stability."
+
+authors: ["cgivre"]
+---
+
+
+# Announcing Drill 1.21: New Connectors, Functions and Much Better Stability
+The Apache Drill PMC is pleased to announce a milestone release of Apache 
Drill. Since the last release of Drill the team has been hard at work quashing 
bugs and making overall functionality improvements. The TL;DR includes the 
following:
+
+* New connectors including Apache Iceberg, Delta Lake, Microsoft Access, 
GoogleSheets, and Box
+* Efficient cross-cloud query capability
+* Greatly improved access controls to include user translation support for all 
storage plugins
+* Greatly improved query planning and implicit casting.
+* New BI-focused SQL operators including `PIVOT`, `UNPIVOT`, `EXCEPT` and 
`INTERSECT`
+* New functions for computing regression lines and trends.
+* New and updated date manipulation functions.
+
+Overall, Drill 1.21 is much more capable and stable than previous versions. 
+
+## Calcite, We’re Back!
+Drill relies on another open source project, Apache Calcite for its query 
planning. The query planning process is a huge part of the overall 
functionality of Drill. Unfortunately, about three years ago, there were some 
issues in Calcite which forced Drill to fork it and rely on that fork. As a 
result, Drill was essentially stuck with a three year old query planner, but 
more importantly, bugs that were fixed in Calcite, as well as new capabilities 
were not finding their way into Drill. 
+
+That is no longer the case. Drill 1.21 is now running on the latest stable 
version of Calcite, version 1.33. As a result, we’ve been able to close 
countless JIRA tickets of various queries failing and other random bugs that 
were the result of query planning bugs.
+
+What this means for you as a user is that you’ll see much fewer queries 
failing and better overall performance in terms of speed and stability. You’ll 
see better optimizations being pushed down to JDBC data sources as well as 
support for BigQuery, Athena and other JDBC data sources. We hope to keep Drill 
away from Calcite forks so I hope that we will work with the Calcite community 
to keep our tools in sync.
+
+## Improved Implicit Casting Rules Reduce Schema Change Failures
+From this author’s perspective, one of the biggest improvements in Drill is 
one of the least noticeable and that is the result of improved implicit 
casting. One of Drill’s unique features is its ability to infer the structure, 
or schema of your data. However, this can be problematic when the schema 
changes. When I used to teach Drill, I used to have spend a considerable amount 
of time teaching students how to cast data from one data type to another to 
ensure that the queries would succeed.
+
+When using latest version of Drill, you’ll find that queries will work without 
the need for much if any casting. In short, they’ll do what you expect them to 
do. It’s really a high on magic functionality. 
+
+## Integrations with the Modern and Not-so-Modern Data Stack
+The new version of Drill features several new connectors and readers that will 
enable users to connect to the “modern data stack”, specifically support for 
Apache Iceberg and Delta Lake. 
+
+### Breaking the Iceberg
+Iceberg is a high-performance format for huge analytic tables. Iceberg brings 
the reliability and simplicity of SQL tables to big data, while making it 
possible for engines like Drill to safely work with the same tables, at the 
same time. In addition to being able to query data directly from Iceberg 
tables, Drill also allows users to query the Iceberg table metadata as well as 
snapshots.  [Complete documentation is available 
here](https://drill.apache.org/docs/iceberg-format-plugin/).
+
+### Querying Delta Lake
+Lest we offend someone, we’re not going to get into the debate between Iceberg 
and Delta lake (after all, let’s not argue about who killed whom), but Delta 
Lake, if you aren’t familiar with it, is another modern table format which 
allows ACID transactions, versioning etc. In version 1.21, Drill adds support 
for Delta Lake tables, so users can query Delta Lake tables as well as 
associated metadata. You can also query specific versions of files in delta 
lake.  [Complete documentation is av [...]
+
+### Accessing Access
+A surprising number of people use Microsoft Access as a database for their 
business data. With version 1.21, Apache Drill can now natively query Microsoft 
Access database files using Drill. This can be a major benefit for those 
looking to migrate data from Access into more modern formats such as parquet or 
even other relational databases. Drill will support Access files from version 
1997 and up. 
+
+### Oh Sheets!
+In addition to all of the above, Drill can now query data directly from 
GoogleSheets. In addition to being able to query this data source, Drill can 
read, write, delete and append to GoogleSheets. Google doesn’t make it easy, so 
if this is a feature you are interested in, you’ll definitely want to [read the 
documentation 
here](https://drill.apache.org/docs/google-sheets-storage-plugin/).
+
+### Remote Data
+As you can see, Drill has significantly expanded the number of data sources 
and types that it can query. A part of this work has also been to improve the 
implementation behind filesystems. As a result, Drill can now query data stored 
on Dropbox, and Box. We added support for filesystems which use OAuth 2.0 for 
authorization so this means that more extended file systems are likely coming 
your way for the next release.
+
+## Greatly Improved Access Controls
+Managing access controls and credentials on a federated query engine is a 
complicated task. Drill has supported a concept called user impersonation which 
basically means that Drill can execute queries using the credentials of the 
logged in user. This concept works well for querying file systems such as 
Hadoop, and other data sources that have the same concepts, however it does not 
work at all with data sources that have different concepts of users, or in the 
case of OAuth enabled plugins [...]
+
+To answer this challenge, Drill 1.21 introduces the concept of user 
translation. The idea of user translation is that, when enabled, every user 
will have their own unique credentials for specific data sources. Thus, when 
that user queries a specific data source, that user’s credentials are used to 
execute the query. This is configurable on an individual data source basis. 
Ultimately, what this means is that you no longer have to create service 
accounts to access data via Drill. 
+
+## Drilling Across the Clouds
+While we’re on the subject of clouds, as you may be aware, Drill can query 
data stored in cloud-based file systems such as S3, Azure, GCP etc. One of the 
challenges however, is that if you have data stored in multiple clouds, it can 
become very inefficient to query this data, especially from the perspective of 
network IO. As of Drill 1.21, Drill adds a storage plugin which we are calling 
Drill on Drill.
+
+Let’s say that you had a Drill cluster in S3, but you had data in both S3 and 
Azure. With the new Drill on Drill capability, you could install an additional 
Drill cluster in Azure, then query both from either Drill cluster. The 
advantage is that the queries would be pushed down to the Drill cluster where 
the data resides. So if you query Azure from S3, you aren’t sending tons of 
data back and forth. 
+
+## Drill Now Supports More BI Operators
+While Drill held more or less to the SQL standard, it was missing some BI 
operators that had become commonplace among SQL platforms. Drill 1.21 
introduces the `PIVOT`, and `UNPIVOT` operators which covert rows to columns or 
vice versa, much in the same way a pivot table works in Excel. Additionally, we 
added set operators `INTERSECT` and `EXCEPT` which have become part of the SQL 
standard.
+
+## New Statistical Functions
+Drill 1.21 adds new SQL functions for statistical summaries including 
`kendall_correlation` for calculating correlation coefficients, `width_bucket` 
which is a SQL function for computing histograms and distributions, and two 
other functions for computing regression lines. 
+
+Lastly, we’ve also added additional date/time manipulation functions which 
will make working with dates significantly easier. 
+
+## What’s Next?
+The big question is where do we go from here? We’ve already started working on 
adding support for additional BI operators such as `CUBE`, `GROUPING SETS` and 
`ROLLUP`, as well as `REGEXP_EXTRACT`. Since the new version of Calcite has 
support for numerous optimizations around materialized views this is also 
something which is being discussed. If you like what you are seeing, please 
download Drill and try it out. Feedback is always welcome on the Drill slack 
channel or on our mailing lists [...]

[drill-site] branch master updated: Re-add new blog post for Drill 1.21.

Reply via email to