This is an automated email from the ASF dual-hosted git repository.

brile pushed a commit to branch 30.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/30.0.0 by this push:
     new c5235165693 [docs] Backport migration guide for ANSI sql nulls (#16759)
c5235165693 is described below

commit c5235165693d61174842bb2e2e9d8200e769b0c7
Author: Charles Smith <techdocsm...@gmail.com>
AuthorDate: Fri Jul 19 10:39:58 2024 -0700

    [docs] Backport migration guide for ANSI sql nulls (#16759)
    
    Co-authored-by: 317brian <53799971+317br...@users.noreply.github.com>
---
 docs/release-info/migr-ansi-sql-null.md | 386 ++++++++++++++++++++++++++++++++
 1 file changed, 386 insertions(+)

diff --git a/docs/release-info/migr-ansi-sql-null.md 
b/docs/release-info/migr-ansi-sql-null.md
new file mode 100644
index 00000000000..322958f4e8e
--- /dev/null
+++ b/docs/release-info/migr-ansi-sql-null.md
@@ -0,0 +1,386 @@
+---
+id: migr-ansi-sql-null
+title: "Migration guide: SQL compliant mode"
+sidebar_label: SQL compliant mode
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+-->
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+In Apache Druid 28.0.0, the default [null 
handling](../querying/sql-data-types.md#null-values) mode changed to be 
compliant with the ANSI SQL standard.
+This guide provides strategies for Druid operators who rely on legacy Druid 
null handling behavior in their applications to transition to SQL compliant 
mode.
+Legacy mode is planned for removal from Druid. 
+
+## SQL compliant null handling
+
+As of Druid 28.0.0, Druid writes segments in an ANSI SQL compatible null 
handling mode by default.
+This means that Druid stores null values distinctly from empty strings for 
string dimensions and distinctly from 0 for numeric dimensions.
+
+This can impact your application behavior because the ANSI SQL standard 
defines any comparison to null to be unknown.
+According to this three-valued logic, `x <> 'some value'` only returns 
non-null values.
+
+The default Druid configurations for 28.0.0 and later that enable ANSI SQL 
compatible null handling mode are the following:
+
+* `druid.generic.useDefaultValueForNull=false`
+* `druid.expressions.useStrictBooleans=true`
+* `druid.generic.useThreeValueLogicForNativeFilters=true` 
+
+Follow the [Null handling tutorial](../tutorials/tutorial-sql-null.md) to 
learn how the default null handling works in Druid.
+
+## Legacy null handling and two-value logic
+
+Prior to Druid 28.0.0, Druid defaulted to a legacy mode which stored default 
values instead of nulls.
+In legacy mode, Druid created segments with the following characteristics at 
ingestion time:
+
+- String columns couldn't distinguish an empty string, '', from null.
+    Therefore, Druid treated them both as interchangeable values.
+- Numeric columns couldn't represent null valued rows.
+    Therefore Druid stored 0 instead of null. 
+
+The Druid configurations for the deprecated legacy mode were the following:
+
+* `druid.generic.useDefaultValueForNull=true`
+* `druid.expressions.useStrictBooleans=false`
+* `druid.generic.useThreeValueLogicForNativeFilters=true`
+
+These configurations are deprecated and scheduled for removal.
+After the configurations are removed, Druid will ignore them if they exist in 
your configuration files and use the default SQL compliant mode.
+
+## Migrate to SQL compliant mode
+
+If your business logic relies on the behavior of legacy mode, you have the 
following options to operate Druid in an ANSI SQL compatible null handling mode:
+
+- Modify incoming data to either [avoid 
nulls](#replace-null-values-at-ingestion-time) or [avoid empty 
strings](#coerce-empty-strings-to-null-at-ingestion-time) to achieve the same 
query behavior as legacy mode. This means modifying your ingestion SQL queries 
and ingestion specs to handle nulls or empty strings.
+    For example, replacing a null for a string column with an empty string or 
a 0 for a numeric column.
+    However, it means that your existing queries should operate as if Druid 
were in legacy mode.
+    If you do not care about preserving null values, this is a good option for 
you.
+
+- Preserve null values and [update all of your SQL queries to be ANSI SQL 
compliant](#rewrite-your-queries-to-be-sql-compliant).
+    This means you can preserve the incoming data with nulls intact.
+    However, you must rewrite any affected client-side queries to be ANSI SQL 
compliant.
+    If you have a requirement to preserve null values, choose this option.
+
+### Replace null values at ingestion time
+
+If you don't need to preserve null values within Druid, you can use a 
transform at ingestion time to replace nulls with other values.
+
+Consider the following input data:
+
+```json
+{"time":"2024-01-01T00:00:00.000Z","string_example":"my_string","number_example":99}
+{"time":"2024-01-02T00:00:00.000Z","string_example":"","number_example":0}
+{"time":"2024-01-03T00:00:00.000Z","string_example":null,"number_example":null}
+```
+ 
+The following example illustrates how to use COALESCE and NVL at ingestion 
time to avoid null values in Druid:
+
+<Tabs>
+
+<TabItem value="0" label="SQL-based batch">
+
+```sql
+REPLACE INTO "no_nulls_example" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      
'{"type":"inline","data":"{\"time\":\"2024-01-01T00:00:00.000Z\",\"string_example\":\"my_string\",\"number_example\":99}\n{\"time\":\"2024-01-02T00:00:00.000Z\",\"string_example\":\"\",\"number_example\":0}\n{\"time\":\"2024-01-03T00:00:00.000Z\",\"string_example\":null,\"number_example\":null}"}',
+      '{"type":"json"}'
+    )
+  ) EXTEND ("time" VARCHAR, "string_example" VARCHAR, "number_example" BIGINT)
+)
+SELECT
+  TIME_PARSE("time") AS "__time",
+  -- Replace any null string values with an empty string
+  COALESCE("string_example",'') AS string_example,
+  -- Replace any null numeric values with 0
+  NVL("number_example",0) AS number_example
+FROM "ext"
+PARTITIONED BY MONTH
+```
+</TabItem>
+
+<TabItem value="1" label="JSON-based batch">
+
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "inline",
+        "data": 
"{\"time\":\"2024-01-01T00:00:00.000Z\",\"string_example\":\"my_string\",\"number_example\":99}\n{\"time\":\"2024-01-02T00:00:00.000Z\",\"string_example\":\"\",\"number_example\":0}\n{\"time\":\"2024-01-03T00:00:00.000Z\",\"string_example\":null,\"number_example\":null}"
+      },
+      "inputFormat": {
+        "type": "json"
+      }
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": {
+        "type": "dynamic"
+      }
+    },
+    "dataSchema": {
+      "dataSource": "inline_data_native",
+      "timestampSpec": {
+        "column": "time",
+        "format": "iso"
+      },
+      "dimensionsSpec": {
+        "dimensions": [
+          "string_example",
+          {
+            "type": "long",
+            "name": "number_example"
+          }
+        ]
+      },
+      "granularitySpec": {
+        "queryGranularity": "none",
+        "rollup": false,
+        "segmentGranularity": "MONTH"
+      },
+      "transformSpec": {
+        "transforms": [
+          {
+            "type": "expression",
+            "name": "string_example",
+            "expression": "COALESCE(\"string_example\",'')"
+          },
+          {
+            "type": "expression",
+            "name": "number_example",
+            "expression": "NVL(\"number_example\",0)"
+          }
+        ]
+      }
+    }
+  }
+}
+```
+
+</TabItem>
+</Tabs>
+
+Druid ingests the data with no null values as follows:
+
+| `__time` | `string_examle` | `number_example`|
+| -- | -- | -- |
+| `2024-01-01T00:00:00.000Z`| `my_string`| 99 |
+| `2024-01-02T00:00:00.000Z`| `empty`| 0 |
+| `2024-01-03T00:00:00.000Z`| `empty`| 0 |
+
+### Coerce empty strings to null at ingestion time
+
+In legacy mode, Druid recognized empty strings as nulls for equality 
comparison.
+If your queries rely on empty strings to represent nulls, you can coerce empty 
strings to null at ingestion time using NULLIF.
+
+For example, consider the following sample input data:
+
+```json
+{"time":"2024-01-01T00:00:00.000Z","string_example":"my_string"}
+{"time":"2024-01-02T00:00:00.000Z","string_example":""}
+{"time":"2024-01-03T00:00:00.000Z","string_example":null}
+```
+
+In legacy mode, Druid wrote an empty string for the third record.
+Therefore the following query returned 2:
+
+```sql
+SELECT count(*)
+FROM "null_string"
+WHERE "string_example" IS NULL
+```
+
+In SQL compliant mode, Druid differentiates between empty strings and nulls, 
so the same query would return 1.
+The following example shows how to coerce empty strings into null to 
accommodate IS NULL comparisons:
+
+<Tabs>
+
+<TabItem value="0" label="SQL-based batcn">
+
+```sql
+REPLACE INTO "null_string" OVERWRITE ALL
+WITH "ext" AS (
+  SELECT *
+  FROM TABLE(
+    EXTERN(
+      
'{"type":"inline","data":"{\"time\":\"2024-01-01T00:00:00.000Z\",\"string_example\":\"my_string\"}\n{\"time\":\"2024-01-02T00:00:00.000Z\",\"string_example\":\"\"}\n{\"time\":\"2024-01-03T00:00:00.000Z\",\"string_example\":null}"}',
+      '{"type":"json"}'
+    )
+  ) EXTEND ("time" VARCHAR, "string_example" VARCHAR)
+)
+SELECT
+  TIME_PARSE("time") AS "__time",
+  NULLIF("string_example",'') AS "string_example"
+FROM "ext"
+PARTITIONED BY MONTH
+```
+
+</TabItem>
+
+<TabItem value="1" label="JSON-based batch">
+
+```json
+{
+  "type": "index_parallel",
+  "spec": {
+    "ioConfig": {
+      "type": "index_parallel",
+      "inputSource": {
+        "type": "inline",
+        "data": 
"{\"time\":\"2024-01-01T00:00:00.000Z\",\"string_example\":\"my_string\"}\n{\"time\":\"2024-01-02T00:00:00.000Z\",\"string_example\":\"\"}\n{\"time\":\"2024-01-03T00:00:00.000Z\",\"string_example\":null}"
+      },
+      "inputFormat": {
+        "type": "json"
+      }
+    },
+    "tuningConfig": {
+      "type": "index_parallel",
+      "partitionsSpec": {
+        "type": "dynamic"
+      }
+    },
+    "dataSchema": {
+      "dataSource": "null_string",
+      "timestampSpec": {
+        "column": "time",
+        "format": "iso"
+      },
+      "transformSpec": {
+        "transforms": [
+          {
+            "type": "expression",
+            "expression": "case_searched((\"string_example\" == 
''),null,\"string_example\")",
+            "name": "string_example"
+          }
+        ]
+      },
+      "dimensionsSpec": {
+        "dimensions": [
+          "string_example"
+        ]
+      },
+      "granularitySpec": {
+        "queryGranularity": "none",
+        "rollup": false,
+        "segmentGranularity": "month"
+      }
+    }
+  }
+}
+```
+
+</TabItem>
+</Tabs>
+
+Druid ingests the data with no empty strings as follows:
+
+| `__time` | `string_examle` |
+| -- | -- | -- |
+| `2024-01-01T00:00:00.000Z`| `my_string`|
+| `2024-01-02T00:00:00.000Z`| `null`|
+| `2024-01-03T00:00:00.000Z`| `null`|
+
+Therefore `SELECT count(*) FROM "null_string" WHERE "string_example" IS NULL` 
returns 2.
+
+### Rewrite your queries to be SQL compliant
+
+If you want to maintain null values in your data within Druid, you can use the 
following ANSI SQL compliant querying strategies to achieve the same results as 
legacy null handling:
+
+- Modify inequality queries to include null values.
+  For example, `x <> 'some value'` becomes `(x <> 'some value' OR x IS NULL)`.
+- Use COALESCE or NVL to replace nulls with a value.
+  For example, `x + 1` becomes `NVL(numeric_value, 0)+1`
+
+Consider the following Druid datasource `null_example`:
+
+| `__time` | `string_examle` | `number_example`|
+| -- | -- | -- |
+| `2024-01-01T00:00:00.000Z`| `my_string`| 99 |
+| `2024-01-02T00:00:00.000Z`| `empty`| 0 |
+| `2024-01-03T00:00:00.000Z`| `null`| null |
+
+Druid excludes null strings from equality comparisons. For example:
+
+```sql
+SELECT COUNT(*) AS count_example
+FROM "null_example"
+WHERE "string_example"<> 'my_string'
+```
+
+Druid returns 1 because null is considered unknown: neither equal nor unequal 
to the value.
+
+To count null values in the result, you can use an OR operator:
+
+```sql
+SELECT COUNT(*) AS count_example
+FROM "null_example"
+WHERE ("string_example"<> 'my_string') OR "string_example" IS NULL
+```
+
+Druid returns 2.
+To achieve the same result, you can use IS DISTINCT FROM for null-safe 
comparison:
+
+```sql
+SELECT COUNT(*) as count_example
+FROM "null_example"
+WHERE "string_example" IS DISTINCT FROM 'my_string'
+```
+
+Similarly, arithmetic operators on null return null. For example:
+
+```sql
+SELECT "number_example" + 1 AS additon_example
+FROM "null_example"
+```
+
+Druid returns the following because null + any value is null for the ANSI SQL 
standard:
+
+| `addition_example`|
+| -- |
+| 100 |
+| 1 |
+| null |
+
+Use NVL to avoid nulls with arithmetic. For example:
+
+```sql
+SELECT NVL("number_example",0) + 1 AS additon_example
+FROM "null_example"
+```
+
+Druid returns the following:
+
+| `addition_example` |
+| -- |
+| 100 |
+| 1 |
+| null |
+
+## Learn more
+
+See the following topics for more information:
+ - [Null handling tutorial](../tutorials/tutorial-sql-null.md) to learn how 
the default null handling works in Druid.
+ - [Null values](../querying/sql-data-types.md#null-values) for a description 
of Druid's behavior with null values.
+ - [Handling null values](../design/segments.md#handling-null-values) for 
details about how Druid stores null values.
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to