[hudi] branch asf-site updated: Fixing schema evolution docs (#9729)

bhavanisudha Fri, 15 Sep 2023 18:32:12 -0700

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 8f3b215e1e5 Fixing schema evolution docs (#9729)
8f3b215e1e5 is described below

commit 8f3b215e1e5c384b1d04f554569e0d2da1be45b5
Author: Sivabalan Narayanan <n.siv...@gmail.com>
AuthorDate: Fri Sep 15 21:31:12 2023 -0400

    Fixing schema evolution docs (#9729)
---
 website/docs/schema_evolution.md | 359 ++++++++++++++++++++-------------------
 1 file changed, 181 insertions(+), 178 deletions(-)

diff --git a/website/docs/schema_evolution.md b/website/docs/schema_evolution.md
index 51dd67a4205..e454cb249f5 100755
--- a/website/docs/schema_evolution.md
+++ b/website/docs/schema_evolution.md
@@ -6,186 +6,10 @@ toc: true
 last_modified_at: 2022-04-27T15:59:57-04:00
 ---
 
-Schema evolution allows users to easily change the current schema of a Hudi 
table to adapt to the data that is changing over time.
-As of 0.11.0 release, Spark SQL (Spark 3.1.x, 3.2.1 and above) DDL support for 
Schema evolution has been added and is experimental.
-
-### Scenarios
-
-1. Columns (including nested columns) can be added, deleted, modified, and 
moved.
-2. Partition columns cannot be evolved.
-3. You cannot add, delete, or perform operations on nested columns of the 
Array type.
-
-## SparkSQL Schema Evolution and Syntax Description
-
-Before using schema evolution, pls set `spark.sql.extensions`. For Spark 3.2.1 
and above,
-`spark.sql.catalog.spark_catalog` also need to be set.
-```shell
-# Spark SQL for spark 3.1.x
-spark-sql --packages org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.11.1 \
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
---conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
-
-# Spark SQL for spark 3.2.1 and above
-spark-sql --packages org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 \
---conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
---conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
---conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
-
-```
-After start spark-app,  pls exec `set hoodie.schema.on.read.enable=true` to 
enable schema evolution.
-
-:::note
-Currently, Schema evolution cannot disabled once being enabled.
-:::
-
-:::tip
-When use hive metastore, may encounter a problem: 
`org.apache.hadoop.hive.ql.metadata.HiveException`: Unable to alter table. The 
following columns have types incompatible with the existing columns in their 
respective positions.
-
-Make sure disable `hive.metastore.disallow.incompatible.col.type.changes` in 
hive side.
-:::
-
-### Adding Columns
-**Syntax**
-```sql
--- add columns
-ALTER TABLE tableName ADD COLUMNS(col_spec[, col_spec ...])
-```
-**Parameter Description**
-
-| Parameter        | Description                                               
                                                           |
-|:-----------------|:---------------------------------------------------------------------------------------------------------------------|
-| tableName        | Table name                                                
                                                           |
-| col_spec         | Column specifications, consisting of five fields, 
*col_name*, *col_type*, *nullable*, *comment*, and *col_position*. |
-
-**col_name** : name of the new column. It is mandatory.To add a sub-column to 
a nested column, specify the full name of the sub-column in this field.
-
-For example:
-
-1. To add sub-column col1 to a nested struct type column column users 
struct<name: string, age: int>, set this field to users.col1.
-
-2. To add sub-column col1 to a nested map type column memeber map<string, 
struct<n: string, a: int>>, set this field to member.value.col1.
-
-**col_type** : type of the new column.
-
-**nullable** : whether the new column can be null. The value can be left 
empty. Now this field is not used in Hudi.
-
-**comment** : comment of the new column. The value can be left empty.
-
-**col_position** : position where the new column is added. The value can be 
*FIRST* or *AFTER* origin_col.
-
-1. If it is set to *FIRST*, the new column will be added to the first column 
of the table.
-
-2. If it is set to *AFTER* origin_col, the new column will be added after 
original column origin_col.
-
-3. The value can be left empty. *FIRST* can be used only when new sub-columns 
are added to nested columns. Do not use *FIRST* in top-level columns. There are 
no restrictions about the usage of *AFTER*.
-
-**Examples**
-
-```sql
-ALTER TABLE h0 ADD COLUMNS(ext0 string);
-ALTER TABLE h0 ADD COLUMNS(new_col int not null comment 'add new column' AFTER 
col1);
-ALTER TABLE complex_table ADD COLUMNS(col_struct.col_name string comment 'add 
new column to a struct col' AFTER col_from_col_struct);
-```
-
-### Altering Columns
-**Syntax**
-```sql
--- alter table ... alter column
-ALTER TABLE tableName ALTER [COLUMN] col_old_name TYPE column_type [COMMENT] 
col_comment[FIRST|AFTER] column_name
-```
-
-**Parameter Description**
-
-| Parameter        | Description                                               
                                                                                
      |
-|:-----------------|:------------------------------------------------------------------------------------------------------------------------------------------------|
-| tableName        | Table name.                                               
                                                                                
      |
-| col_old_name     | Name of the column to be altered.                         
                                                                                
      |
-| column_type      | Type of the target column.                                
                                                                                
      |
-| col_comment      | col_comment.                                              
                                                                                
      |
-| column_name      | New position to place the target column. For example, 
*AFTER* **column_name** indicates that the target column is placed after 
**column_name**. |
-
-
-**Examples**
-
-```sql
---- Changing the column type
-ALTER TABLE table1 ALTER COLUMN a.b.c TYPE bigint
-
---- Altering other attributes
-ALTER TABLE table1 ALTER COLUMN a.b.c COMMENT 'new comment'
-ALTER TABLE table1 ALTER COLUMN a.b.c FIRST
-ALTER TABLE table1 ALTER COLUMN a.b.c AFTER x
-ALTER TABLE table1 ALTER COLUMN a.b.c DROP NOT NULL
-```
-
-**column type change**
-
-| Source\Target      | long  | float | double | string | decimal | date | int |
-|--------------------|-------|-------|--------|--------|---------|------|-----|
-| int                |   Y   |   Y   |    Y   |    Y   |    Y    |   N  |  Y  |
-| long               |   Y   |   N   |    Y   |    Y   |    Y    |   N  |  N  |
-| float              |   N   |   Y   |    Y   |    Y   |    Y    |   N  |  N  |
-| double             |   N   |   N   |    Y   |    Y   |    Y    |   N  |  N  |
-| decimal            |   N   |   N   |    N   |    Y   |    Y    |   N  |  N  |
-| string             |   N   |   N   |    N   |    Y   |    Y    |   Y  |  N  |
-| date               |   N   |   N   |    N   |    Y   |    N    |   Y  |  N  |
-
-### Deleting Columns
-**Syntax**
-```sql
--- alter table ... drop columns
-ALTER TABLE tableName DROP COLUMN|COLUMNS cols
-```
-
-**Examples**
-
-```sql
-ALTER TABLE table1 DROP COLUMN a.b.c
-ALTER TABLE table1 DROP COLUMNS a.b.c, x, y
-```
-
-### Changing Column Name
-**Syntax**
-```sql
--- alter table ... rename column
-ALTER TABLE tableName RENAME COLUMN old_columnName TO new_columnName
-```
-
-**Examples**
-
-```sql
-ALTER TABLE table1 RENAME COLUMN a.b.c TO x
-```
-
-### Modifying Table Properties
-**Syntax**
-```sql
--- alter table ... set|unset
-ALTER TABLE tableName SET|UNSET tblproperties
-```
-
-**Examples**
-
-```sql
-ALTER TABLE table SET TBLPROPERTIES ('table_property' = 'property_value')
-ALTER TABLE table UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key')
-```
-
-### Changing a Table Name
-**Syntax**
-```sql
--- alter table ... rename
-ALTER TABLE tableName RENAME TO newTableName
-```
-
-**Examples**
-
-```sql
-ALTER TABLE table1 RENAME TO table2
-```
+Schema evolution is a very important aspect of data management and Hudi does 
support some of them out of the box, 
+and some needs additional configs. 
 
 ## Out-of-the-box Schema Evolution
-Schema evolution is a very important aspect of data management.
 Hudi supports common schema evolution scenarios, such as adding a nullable 
field or promoting a datatype of a field, out-of-the-box.
 Furthermore, the evolved schema is queryable across engines, such as Presto, 
Hive and Spark SQL.
 The following table presents a summary of the types of schema changes 
compatible with different Hudi table types.
@@ -208,6 +32,8 @@ The following table presents a summary of the types of 
schema changes compatible
 Let us walk through an example to demonstrate the schema evolution support in 
Hudi.
 In the below example, we are going to add a new string field and change the 
datatype of a field from int to long.
 
+### Sample runbook
+
 ```java
 Welcome to
     ____              __
@@ -370,3 +196,180 @@ scala> spark.sql("select rowId, partitionId, preComb, 
name, versionId, intToLong
     +-----+-----------+-------+-------+---------+---------+----------+
 
 ```
+
+## Comprehensive Schema evolution (SparkSQL)
+But based on community needs, we also added support for comprehensive schema 
evolution. As of 0.11.0 release, Spark SQL 
+(Spark 3.1.x, 3.2.1 and above) DDL support for comprehence Schema evolution 
has been added and is experimental.
+
+### Scenarios
+
+1. Columns (including nested columns) can be added, deleted, modified, and 
moved.
+2. Partition columns cannot be evolved.
+3. You cannot add, delete, or perform operations on nested columns of the 
Array type.
+
+Before using schema evolution, pls set `spark.sql.extensions`. For Spark 3.2.1 
and above,
+`spark.sql.catalog.spark_catalog` also need to be set.
+```shell
+# Spark SQL for spark 3.1.x
+spark-sql --packages org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.11.1 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
+
+# Spark SQL for spark 3.2.1 and above
+spark-sql --packages org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1 \
+--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
+--conf 
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
+--conf 
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
+
+```
+After start spark-app,  pls exec `set hoodie.schema.on.read.enable=true` to 
enable schema evolution.
+
+:::note
+Currently, Schema evolution cannot disabled once being enabled.
+:::
+
+:::tip
+When use hive metastore, may encounter a problem: 
`org.apache.hadoop.hive.ql.metadata.HiveException`: Unable to alter table. The 
following columns have types incompatible with the existing columns in their 
respective positions.
+
+Make sure disable `hive.metastore.disallow.incompatible.col.type.changes` in 
hive side.
+:::
+
+### Adding Columns
+**Syntax**
+```sql
+-- add columns
+ALTER TABLE tableName ADD COLUMNS(col_spec[, col_spec ...])
+```
+**Parameter Description**
+
+| Parameter        | Description                                               
                                                           |
+|:-----------------|:---------------------------------------------------------------------------------------------------------------------|
+| tableName        | Table name                                                
                                                           |
+| col_spec         | Column specifications, consisting of five fields, 
*col_name*, *col_type*, *nullable*, *comment*, and *col_position*. |
+
+**col_name** : name of the new column. It is mandatory.To add a sub-column to 
a nested column, specify the full name of the sub-column in this field.
+
+For example:
+
+1. To add sub-column col1 to a nested struct type column column users 
struct<name: string, age: int>, set this field to users.col1.
+
+2. To add sub-column col1 to a nested map type column memeber map<string, 
struct<n: string, a: int>>, set this field to member.value.col1.
+
+**col_type** : type of the new column.
+
+**nullable** : whether the new column can be null. The value can be left 
empty. Now this field is not used in Hudi.
+
+**comment** : comment of the new column. The value can be left empty.
+
+**col_position** : position where the new column is added. The value can be 
*FIRST* or *AFTER* origin_col.
+
+1. If it is set to *FIRST*, the new column will be added to the first column 
of the table.
+
+2. If it is set to *AFTER* origin_col, the new column will be added after 
original column origin_col.
+
+3. The value can be left empty. *FIRST* can be used only when new sub-columns 
are added to nested columns. Do not use *FIRST* in top-level columns. There are 
no restrictions about the usage of *AFTER*.
+
+**Examples**
+
+```sql
+ALTER TABLE h0 ADD COLUMNS(ext0 string);
+ALTER TABLE h0 ADD COLUMNS(new_col int not null comment 'add new column' AFTER 
col1);
+ALTER TABLE complex_table ADD COLUMNS(col_struct.col_name string comment 'add 
new column to a struct col' AFTER col_from_col_struct);
+```
+
+### Altering Columns
+**Syntax**
+```sql
+-- alter table ... alter column
+ALTER TABLE tableName ALTER [COLUMN] col_old_name TYPE column_type [COMMENT] 
col_comment[FIRST|AFTER] column_name
+```
+
+**Parameter Description**
+
+| Parameter        | Description                                               
                                                                                
      |
+|:-----------------|:------------------------------------------------------------------------------------------------------------------------------------------------|
+| tableName        | Table name.                                               
                                                                                
      |
+| col_old_name     | Name of the column to be altered.                         
                                                                                
      |
+| column_type      | Type of the target column.                                
                                                                                
      |
+| col_comment      | col_comment.                                              
                                                                                
      |
+| column_name      | New position to place the target column. For example, 
*AFTER* **column_name** indicates that the target column is placed after 
**column_name**. |
+
+
+**Examples**
+
+```sql
+--- Changing the column type
+ALTER TABLE table1 ALTER COLUMN a.b.c TYPE bigint
+
+--- Altering other attributes
+ALTER TABLE table1 ALTER COLUMN a.b.c COMMENT 'new comment'
+ALTER TABLE table1 ALTER COLUMN a.b.c FIRST
+ALTER TABLE table1 ALTER COLUMN a.b.c AFTER x
+ALTER TABLE table1 ALTER COLUMN a.b.c DROP NOT NULL
+```
+
+**column type change**
+
+| Source\Target      | long  | float | double | string | decimal | date | int |
+|--------------------|-------|-------|--------|--------|---------|------|-----|
+| int                |   Y   |   Y   |    Y   |    Y   |    Y    |   N  |  Y  |
+| long               |   Y   |   N   |    Y   |    Y   |    Y    |   N  |  N  |
+| float              |   N   |   Y   |    Y   |    Y   |    Y    |   N  |  N  |
+| double             |   N   |   N   |    Y   |    Y   |    Y    |   N  |  N  |
+| decimal            |   N   |   N   |    N   |    Y   |    Y    |   N  |  N  |
+| string             |   N   |   N   |    N   |    Y   |    Y    |   Y  |  N  |
+| date               |   N   |   N   |    N   |    Y   |    N    |   Y  |  N  |
+
+### Deleting Columns
+**Syntax**
+```sql
+-- alter table ... drop columns
+ALTER TABLE tableName DROP COLUMN|COLUMNS cols
+```
+
+**Examples**
+
+```sql
+ALTER TABLE table1 DROP COLUMN a.b.c
+ALTER TABLE table1 DROP COLUMNS a.b.c, x, y
+```
+
+### Changing Column Name
+**Syntax**
+```sql
+-- alter table ... rename column
+ALTER TABLE tableName RENAME COLUMN old_columnName TO new_columnName
+```
+
+**Examples**
+
+```sql
+ALTER TABLE table1 RENAME COLUMN a.b.c TO x
+```
+
+### Modifying Table Properties
+**Syntax**
+```sql
+-- alter table ... set|unset
+ALTER TABLE tableName SET|UNSET tblproperties
+```
+
+**Examples**
+
+```sql
+ALTER TABLE table SET TBLPROPERTIES ('table_property' = 'property_value')
+ALTER TABLE table UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key')
+```
+
+### Changing a Table Name
+**Syntax**
+```sql
+-- alter table ... rename
+ALTER TABLE tableName RENAME TO newTableName
+```
+
+**Examples**
+
+```sql
+ALTER TABLE table1 RENAME TO table2
+```

[hudi] branch asf-site updated: Fixing schema evolution docs (#9729)

Reply via email to