This is an automated email from the ASF dual-hosted git repository.
fokko pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 982242ba8e Docs: Fix missing semicolons in SQL snippets. (#8748)
982242ba8e is described below
commit 982242ba8e9bbd34686e429ba893f7bad379799e
Author: Priyansh Agrawal <[email protected]>
AuthorDate: Tue Oct 10 16:00:52 2023 +0100
Docs: Fix missing semicolons in SQL snippets. (#8748)
* Update spark-getting-started.md
Add a missing semicolon to the "CREATE TABLE ..." statement.
* Fix all missing semicolons in spark-getting-started.md
- And a couple of minor typo/brevity fixes.
* Fix missing semicolons in all of docs/.
---
docs/branching-and-tagging.md | 16 ++++-----
docs/dell.md | 2 +-
docs/flink-ddl.md | 2 +-
docs/flink-queries.md | 2 +-
docs/partitioning.md | 6 ++--
docs/spark-configuration.md | 4 +--
docs/spark-ddl.md | 74 +++++++++++++++++++--------------------
docs/spark-getting-started.md | 14 ++++----
docs/spark-procedures.md | 80 +++++++++++++++++++++----------------------
docs/spark-queries.md | 2 +-
10 files changed, 101 insertions(+), 101 deletions(-)
diff --git a/docs/branching-and-tagging.md b/docs/branching-and-tagging.md
index 2bff0a8846..957675c81c 100644
--- a/docs/branching-and-tagging.md
+++ b/docs/branching-and-tagging.md
@@ -61,25 +61,25 @@ via Spark SQL.
snapshots will be kept, and the branch reference itself will be retained for 1
week.
```sql
-- Create a tag for the first end of week snapshot. Retain the snapshot for a
week
-ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 7 DAYS
+ALTER TABLE prod.db.table CREATE TAG `EOW-01` AS OF VERSION 7 RETAIN 7 DAYS;
```
2. Retain 1 snapshot per month for 6 months. This can be achieved by tagging
the monthly snapshot and setting the tag retention to be 6 months.
```sql
-- Create a tag for the first end of month snapshot. Retain the snapshot for 6
months
-ALTER TABLE prod.db.table CREATE TAG `EOM-01` AS OF VERSION 30 RETAIN 180 DAYS
+ALTER TABLE prod.db.table CREATE TAG `EOM-01` AS OF VERSION 30 RETAIN 180 DAYS;
```
3. Retain 1 snapshot per year forever. This can be achieved by tagging the
annual snapshot. The default retention for branches and tags is forever.
```sql
-- Create a tag for the end of the year and retain it forever.
-ALTER TABLE prod.db.table CREATE TAG `EOY-2023` AS OF VERSION 365
+ALTER TABLE prod.db.table CREATE TAG `EOY-2023` AS OF VERSION 365;
```
4. Create a temporary "test-branch" which is retained for 7 days and the
latest 2 snapshots on the branch are retained.
```sql
-- Create a branch "test-branch" which will be retained for 7 days along with
the latest 2 snapshots
-ALTER TABLE prod.db.table CREATE BRANCH `test-branch` RETAIN 7 DAYS WITH
SNAPSHOT RETENTION 2 SNAPSHOTS
+ALTER TABLE prod.db.table CREATE BRANCH `test-branch` RETAIN 7 DAYS WITH
SNAPSHOT RETENTION 2 SNAPSHOTS;
```
### Audit Branch
@@ -92,22 +92,22 @@ The above diagram shows an example of using an audit branch
for validating a wri
```sql
ALTER TABLE db.table SET TBLPROPERTIES (
'write.wap.enabled'='true'
-)
+);
```
2. Create `audit-branch` starting from snapshot 3, which will be written to
and retained for 1 week.
```sql
-ALTER TABLE db.table CREATE BRANCH `audit-branch` AS OF VERSION 3 RETAIN 7 DAYS
+ALTER TABLE db.table CREATE BRANCH `audit-branch` AS OF VERSION 3 RETAIN 7
DAYS;
```
3. Writes are performed on a separate `audit-branch` independent from the main
table history.
```sql
-- WAP Branch write
SET spark.wap.branch = audit-branch
-INSERT INTO prod.db.table VALUES (3, 'c')
+INSERT INTO prod.db.table VALUES (3, 'c');
```
4. A validation workflow can validate (e.g. data quality) the state of
`audit-branch`.
5. After validation, the main branch can be `fastForward` to the head of
`audit-branch` to update the main table state.
```sql
-CALL catalog_name.system.fast_forward('prod.db.table', 'main', 'audit-branch')
+CALL catalog_name.system.fast_forward('prod.db.table', 'main', 'audit-branch');
```
6. The branch reference will be removed when `expireSnapshots` is run 1 week
later.
diff --git a/docs/dell.md b/docs/dell.md
index af484f9306..401240ab29 100644
--- a/docs/dell.md
+++ b/docs/dell.md
@@ -122,7 +122,7 @@ CREATE CATALOG my_catalog WITH (
'catalog-impl'='org.apache.iceberg.dell.ecs.EcsCatalog',
'ecs.s3.endpoint' = 'http://10.x.x.x:9020',
'ecs.s3.access-key-id' = '<Your-ecs-s3-access-key>',
- 'ecs.s3.secret-access-key' = '<Your-ecs-s3-secret-access-key>')
+ 'ecs.s3.secret-access-key' = '<Your-ecs-s3-secret-access-key>');
```
Then, you can run `USE CATALOG my_catalog`, `SHOW DATABASES`, and `SHOW
TABLES` to fetch the namespaces and tables of the catalog.
diff --git a/docs/flink-ddl.md b/docs/flink-ddl.md
index f0a484a1cd..1ab550ec55 100644
--- a/docs/flink-ddl.md
+++ b/docs/flink-ddl.md
@@ -211,7 +211,7 @@ For more details, refer to the [Flink `CREATE TABLE`
documentation](https://nigh
Iceberg only support altering table properties:
```sql
-ALTER TABLE `hive_catalog`.`default`.`sample` SET
('write.format.default'='avro')
+ALTER TABLE `hive_catalog`.`default`.`sample` SET
('write.format.default'='avro');
```
### `ALTER TABLE .. RENAME TO`
diff --git a/docs/flink-queries.md b/docs/flink-queries.md
index fa17fdbd79..4cef5468cd 100644
--- a/docs/flink-queries.md
+++ b/docs/flink-queries.md
@@ -377,7 +377,7 @@ select
from prod.db.table$history h
join prod.db.table$snapshots s
on h.snapshot_id = s.snapshot_id
-order by made_current_at
+order by made_current_at;
```
| made_current_at | operation | snapshot_id | is_current_ancestor |
summary[flink.job-id] |
diff --git a/docs/partitioning.md b/docs/partitioning.md
index 799fc4fc75..0fddde1ceb 100644
--- a/docs/partitioning.md
+++ b/docs/partitioning.md
@@ -36,7 +36,7 @@ For example, queries for log entries from a `logs` table
would usually include a
```sql
SELECT level, message FROM logs
-WHERE event_time BETWEEN '2018-12-01 10:00:00' AND '2018-12-01 12:00:00'
+WHERE event_time BETWEEN '2018-12-01 10:00:00' AND '2018-12-01 12:00:00';
```
Configuring the `logs` table to partition by the date of `event_time` will
group log events into files with the same event date. Iceberg keeps track of
that date and will use it to skip files for other dates that don't have useful
data.
@@ -61,7 +61,7 @@ In Hive, partitions are explicit and appear as a column, so
the `logs` table wou
```sql
INSERT INTO logs PARTITION (event_date)
SELECT level, message, event_time, format_time(event_time, 'YYYY-MM-dd')
- FROM unstructured_log_source
+ FROM unstructured_log_source;
```
Similarly, queries that search through the `logs` table must have an
`event_date` filter in addition to an `event_time` filter.
@@ -69,7 +69,7 @@ Similarly, queries that search through the `logs` table must
have an `event_date
```sql
SELECT level, count(1) as count FROM logs
WHERE event_time BETWEEN '2018-12-01 10:00:00' AND '2018-12-01 12:00:00'
- AND event_date = '2018-12-01'
+ AND event_date = '2018-12-01';
```
If the `event_date` filter were missing, Hive would scan through every file in
the table because it doesn't know that the `event_time` column is related to
the `event_date` column.
diff --git a/docs/spark-configuration.md b/docs/spark-configuration.md
index f94efdcc58..9470acf027 100644
--- a/docs/spark-configuration.md
+++ b/docs/spark-configuration.md
@@ -94,14 +94,14 @@ Additional properties can be found in common [catalog
configuration](../configur
Catalog names are used in SQL queries to identify a table. In the examples
above, `hive_prod` and `hadoop_prod` can be used to prefix database and table
names that will be loaded from those catalogs.
```sql
-SELECT * FROM hive_prod.db.table -- load db.table from catalog hive_prod
+SELECT * FROM hive_prod.db.table; -- load db.table from catalog hive_prod
```
Spark 3 keeps track of the current catalog and namespace, which can be omitted
from table names.
```sql
USE hive_prod.db;
-SELECT * FROM table -- load db.table from catalog hive_prod
+SELECT * FROM table; -- load db.table from catalog hive_prod
```
To see the current catalog and namespace, run `SHOW CURRENT NAMESPACE`.
diff --git a/docs/spark-ddl.md b/docs/spark-ddl.md
index 77684b9717..ab8566c0d3 100644
--- a/docs/spark-ddl.md
+++ b/docs/spark-ddl.md
@@ -38,7 +38,7 @@ Spark 3 can create tables in any Iceberg catalog with the
clause `USING iceberg`
CREATE TABLE prod.db.sample (
id bigint COMMENT 'unique id',
data string)
-USING iceberg
+USING iceberg;
```
Iceberg will convert the column type in Spark to corresponding Iceberg type.
Please check the section of [type compatibility on creating
table](../spark-writes#spark-type-to-iceberg-type) for details.
@@ -62,7 +62,7 @@ CREATE TABLE prod.db.sample (
data string,
category string)
USING iceberg
-PARTITIONED BY (category)
+PARTITIONED BY (category);
```
The `PARTITIONED BY` clause supports transform expressions to create [hidden
partitions](../partitioning).
@@ -74,7 +74,7 @@ CREATE TABLE prod.db.sample (
category string,
ts timestamp)
USING iceberg
-PARTITIONED BY (bucket(16, id), days(ts), category)
+PARTITIONED BY (bucket(16, id), days(ts), category);
```
Supported transformations are:
@@ -151,7 +151,7 @@ In order to delete the table contents `DROP TABLE PURGE`
should be used.
To drop the table from the catalog, run:
```sql
-DROP TABLE prod.db.sample
+DROP TABLE prod.db.sample;
```
### `DROP TABLE PURGE`
@@ -159,7 +159,7 @@ DROP TABLE prod.db.sample
To drop the table from the catalog and delete the table's contents, run:
```sql
-DROP TABLE prod.db.sample PURGE
+DROP TABLE prod.db.sample PURGE;
```
## `ALTER TABLE`
@@ -179,7 +179,7 @@ In addition, [SQL
extensions](../spark-configuration#sql-extensions) can be used
### `ALTER TABLE ... RENAME TO`
```sql
-ALTER TABLE prod.db.sample RENAME TO prod.db.new_name
+ALTER TABLE prod.db.sample RENAME TO prod.db.new_name;
```
### `ALTER TABLE ... SET TBLPROPERTIES`
@@ -187,7 +187,7 @@ ALTER TABLE prod.db.sample RENAME TO prod.db.new_name
```sql
ALTER TABLE prod.db.sample SET TBLPROPERTIES (
'read.split.target-size'='268435456'
-)
+);
```
Iceberg uses table properties to control table behavior. For a list of
available properties, see [Table configuration](../configuration).
@@ -195,7 +195,7 @@ Iceberg uses table properties to control table behavior.
For a list of available
`UNSET` is used to remove properties:
```sql
-ALTER TABLE prod.db.sample UNSET TBLPROPERTIES ('read.split.target-size')
+ALTER TABLE prod.db.sample UNSET TBLPROPERTIES ('read.split.target-size');
```
`SET TBLPROPERTIES` can also be used to set the table comment (description):
@@ -203,7 +203,7 @@ ALTER TABLE prod.db.sample UNSET TBLPROPERTIES
('read.split.target-size')
```sql
ALTER TABLE prod.db.sample SET TBLPROPERTIES (
'comment' = 'A table comment.'
-)
+);
```
### `ALTER TABLE ... ADD COLUMN`
@@ -214,7 +214,7 @@ To add a column to Iceberg, use the `ADD COLUMNS` clause
with `ALTER TABLE`:
ALTER TABLE prod.db.sample
ADD COLUMNS (
new_column string comment 'new_column docs'
- )
+);
```
Multiple columns can be added at the same time, separated by commas.
@@ -228,7 +228,7 @@ ADD COLUMN point struct<x: double, y: double>;
-- add a field to the struct
ALTER TABLE prod.db.sample
-ADD COLUMN point.z double
+ADD COLUMN point.z double;
```
```sql
@@ -238,7 +238,7 @@ ADD COLUMN points array<struct<x: double, y: double>>;
-- add a field to the struct within an array. Using keyword 'element' to
access the array's element column.
ALTER TABLE prod.db.sample
-ADD COLUMN points.element.z double
+ADD COLUMN points.element.z double;
```
```sql
@@ -248,7 +248,7 @@ ADD COLUMN points map<struct<x: int>, struct<a: int>>;
-- add a field to the value struct in a map. Using keyword 'value' to access
the map's value column.
ALTER TABLE prod.db.sample
-ADD COLUMN points.value.b int
+ADD COLUMN points.value.b int;
```
Note: Altering a map 'key' column by adding columns is not allowed. Only map
values can be updated.
@@ -257,12 +257,12 @@ Add columns in any position by adding `FIRST` or `AFTER`
clauses:
```sql
ALTER TABLE prod.db.sample
-ADD COLUMN new_column bigint AFTER other_column
+ADD COLUMN new_column bigint AFTER other_column;
```
```sql
ALTER TABLE prod.db.sample
-ADD COLUMN nested.new_column bigint FIRST
+ADD COLUMN nested.new_column bigint FIRST;
```
### `ALTER TABLE ... RENAME COLUMN`
@@ -270,8 +270,8 @@ ADD COLUMN nested.new_column bigint FIRST
Iceberg allows any field to be renamed. To rename a field, use `RENAME COLUMN`:
```sql
-ALTER TABLE prod.db.sample RENAME COLUMN data TO payload
-ALTER TABLE prod.db.sample RENAME COLUMN location.lat TO latitude
+ALTER TABLE prod.db.sample RENAME COLUMN data TO payload;
+ALTER TABLE prod.db.sample RENAME COLUMN location.lat TO latitude;
```
Note that nested rename commands only rename the leaf field. The above command
renames `location.lat` to `location.latitude`
@@ -287,7 +287,7 @@ Iceberg allows updating column types if the update is safe.
Safe updates are:
* `decimal(P,S)` to `decimal(P2,S)` when P2 > P (scale cannot change)
```sql
-ALTER TABLE prod.db.sample ALTER COLUMN measurement TYPE double
+ALTER TABLE prod.db.sample ALTER COLUMN measurement TYPE double;
```
To add or remove columns from a struct, use `ADD COLUMN` or `DROP COLUMN` with
a nested column name.
@@ -295,23 +295,23 @@ To add or remove columns from a struct, use `ADD COLUMN`
or `DROP COLUMN` with a
Column comments can also be updated using `ALTER COLUMN`:
```sql
-ALTER TABLE prod.db.sample ALTER COLUMN measurement TYPE double COMMENT 'unit
is bytes per second'
-ALTER TABLE prod.db.sample ALTER COLUMN measurement COMMENT 'unit is kilobytes
per second'
+ALTER TABLE prod.db.sample ALTER COLUMN measurement TYPE double COMMENT 'unit
is bytes per second';
+ALTER TABLE prod.db.sample ALTER COLUMN measurement COMMENT 'unit is kilobytes
per second';
```
Iceberg allows reordering top-level columns or columns in a struct using
`FIRST` and `AFTER` clauses:
```sql
-ALTER TABLE prod.db.sample ALTER COLUMN col FIRST
+ALTER TABLE prod.db.sample ALTER COLUMN col FIRST;
```
```sql
-ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col
+ALTER TABLE prod.db.sample ALTER COLUMN nested.col AFTER other_col;
```
Nullability for a non-nullable column can be changed using `DROP NOT NULL`:
```sql
-ALTER TABLE prod.db.sample ALTER COLUMN id DROP NOT NULL
+ALTER TABLE prod.db.sample ALTER COLUMN id DROP NOT NULL;
```
{{< hint info >}}
@@ -329,8 +329,8 @@ It is not possible to change a nullable column to a
non-nullable column with `SE
To drop columns, use `ALTER TABLE ... DROP COLUMN`:
```sql
-ALTER TABLE prod.db.sample DROP COLUMN id
-ALTER TABLE prod.db.sample DROP COLUMN point.z
+ALTER TABLE prod.db.sample DROP COLUMN id;
+ALTER TABLE prod.db.sample DROP COLUMN point.z;
```
## `ALTER TABLE` SQL extensions
@@ -342,17 +342,17 @@ These commands are available in Spark 3 when using
Iceberg [SQL extensions](../s
Iceberg supports adding new partition fields to a spec using `ADD PARTITION
FIELD`:
```sql
-ALTER TABLE prod.db.sample ADD PARTITION FIELD catalog -- identity transform
+ALTER TABLE prod.db.sample ADD PARTITION FIELD catalog; -- identity transform
```
[Partition transforms](#partitioned-by) are also supported:
```sql
-ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id)
-ALTER TABLE prod.db.sample ADD PARTITION FIELD truncate(4, data)
-ALTER TABLE prod.db.sample ADD PARTITION FIELD year(ts)
+ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id);
+ALTER TABLE prod.db.sample ADD PARTITION FIELD truncate(4, data);
+ALTER TABLE prod.db.sample ADD PARTITION FIELD year(ts);
-- use optional AS keyword to specify a custom name for the partition field
-ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id) AS shard
+ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id) AS shard;
```
Adding a partition field is a metadata operation and does not change any of
the existing table data. New data will be written with the new partitioning,
but existing data will remain in the old partition layout. Old data files will
have null values for the new partition fields in metadata tables.
@@ -373,11 +373,11 @@ For example, if you partition by days and move to
partitioning by hours, overwri
Partition fields can be removed using `DROP PARTITION FIELD`:
```sql
-ALTER TABLE prod.db.sample DROP PARTITION FIELD catalog
-ALTER TABLE prod.db.sample DROP PARTITION FIELD bucket(16, id)
-ALTER TABLE prod.db.sample DROP PARTITION FIELD truncate(4, data)
-ALTER TABLE prod.db.sample DROP PARTITION FIELD year(ts)
-ALTER TABLE prod.db.sample DROP PARTITION FIELD shard
+ALTER TABLE prod.db.sample DROP PARTITION FIELD catalog;
+ALTER TABLE prod.db.sample DROP PARTITION FIELD bucket(16, id);
+ALTER TABLE prod.db.sample DROP PARTITION FIELD truncate(4, data);
+ALTER TABLE prod.db.sample DROP PARTITION FIELD year(ts);
+ALTER TABLE prod.db.sample DROP PARTITION FIELD shard;
```
Note that although the partition is removed, the column will still exist in
the table schema.
@@ -398,9 +398,9 @@ Be careful when dropping a partition field because it will
change the schema of
A partition field can be replaced by a new partition field in a single
metadata update by using `REPLACE PARTITION FIELD`:
```sql
-ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH day(ts)
+ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH day(ts);
-- use optional AS keyword to specify a custom name for the new partition
field
-ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH day(ts) AS
day_of_ts
+ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH day(ts) AS
day_of_ts;
```
### `ALTER TABLE ... WRITE ORDERED BY`
diff --git a/docs/spark-getting-started.md b/docs/spark-getting-started.md
index f72bb7e720..2181712c9f 100644
--- a/docs/spark-getting-started.md
+++ b/docs/spark-getting-started.md
@@ -31,7 +31,7 @@ menu:
The latest version of Iceberg is [{{% icebergVersion %}}](../../../releases).
-Spark is currently the most feature-rich compute engine for Iceberg
operations.
+Spark is currently the most feature-rich compute engine for Iceberg operations.
We recommend you to get started with Spark to understand Iceberg concepts and
features with examples.
You can also view documentations of using Iceberg with other compute engine
under the [Multi-Engine
Support](https://iceberg.apache.org/multi-engine-support) page.
@@ -69,7 +69,7 @@ To create your first Iceberg table in Spark, use the
`spark-sql` shell or `spark
```sql
-- local is the path-based catalog defined above
-CREATE TABLE local.db.table (id bigint, data string) USING iceberg
+CREATE TABLE local.db.table (id bigint, data string) USING iceberg;
```
Iceberg catalogs support the full range of SQL DDL commands, including:
@@ -93,7 +93,7 @@ Iceberg also adds row-level SQL updates to Spark, [`MERGE
INTO`](../spark-writes
```sql
MERGE INTO local.db.target t USING (SELECT * FROM updates) u ON t.id = u.id
WHEN MATCHED THEN UPDATE SET t.count = t.count + u.count
-WHEN NOT MATCHED THEN INSERT *
+WHEN NOT MATCHED THEN INSERT *;
```
Iceberg supports writing DataFrames using the new [v2 DataFrame write
API](../spark-writes#writing-with-dataframes):
@@ -107,17 +107,17 @@ The old `write` API is supported, but _not_ recommended.
### Reading
-To read with SQL, use the an Iceberg table name in a `SELECT` query:
+To read with SQL, use the Iceberg table's name in a `SELECT` query:
```sql
SELECT count(1) as count, data
FROM local.db.table
-GROUP BY data
+GROUP BY data;
```
-SQL is also the recommended way to [inspect
tables](../spark-queries#inspecting-tables). To view all of the snapshots in a
table, use the `snapshots` metadata table:
+SQL is also the recommended way to [inspect
tables](../spark-queries#inspecting-tables). To view all snapshots in a table,
use the `snapshots` metadata table:
```sql
-SELECT * FROM local.db.table.snapshots
+SELECT * FROM local.db.table.snapshots;
```
```
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
diff --git a/docs/spark-procedures.md b/docs/spark-procedures.md
index d7930fb01e..65ef9c7b84 100644
--- a/docs/spark-procedures.md
+++ b/docs/spark-procedures.md
@@ -41,7 +41,7 @@ Procedures can be used from any configured Iceberg catalog
with `CALL`. All proc
All procedure arguments are named. When passing arguments by name, arguments
can be in any order and any optional argument can be omitted.
```sql
-CALL catalog_name.system.procedure_name(arg_name_2 => arg_2, arg_name_1 =>
arg_1)
+CALL catalog_name.system.procedure_name(arg_name_2 => arg_2, arg_name_1 =>
arg_1);
```
### Positional arguments
@@ -49,7 +49,7 @@ CALL catalog_name.system.procedure_name(arg_name_2 => arg_2,
arg_name_1 => arg_1
When passing arguments by position, only the ending arguments may be omitted
if they are optional.
```sql
-CALL catalog_name.system.procedure_name(arg_1, arg_2, ... arg_n)
+CALL catalog_name.system.procedure_name(arg_1, arg_2, ... arg_n);
```
## Snapshot management
@@ -83,7 +83,7 @@ This procedure invalidates all cached Spark plans that
reference the affected ta
Roll back table `db.sample` to snapshot ID `1`:
```sql
-CALL catalog_name.system.rollback_to_snapshot('db.sample', 1)
+CALL catalog_name.system.rollback_to_snapshot('db.sample', 1);
```
### `rollback_to_timestamp`
@@ -112,7 +112,7 @@ This procedure invalidates all cached Spark plans that
reference the affected ta
Roll back `db.sample` to a specific day and time.
```sql
-CALL catalog_name.system.rollback_to_timestamp('db.sample', TIMESTAMP
'2021-06-30 00:00:00.000')
+CALL catalog_name.system.rollback_to_timestamp('db.sample', TIMESTAMP
'2021-06-30 00:00:00.000');
```
### `set_current_snapshot`
@@ -146,7 +146,7 @@ Either `snapshot_id` or `ref` must be provided but not both.
Set the current snapshot for `db.sample` to 1:
```sql
-CALL catalog_name.system.set_current_snapshot('db.sample', 1)
+CALL catalog_name.system.set_current_snapshot('db.sample', 1);
```
Set the current snapshot for `db.sample` to tag `s1`:
@@ -184,12 +184,12 @@ This procedure invalidates all cached Spark plans that
reference the affected ta
Cherry-pick snapshot 1
```sql
-CALL catalog_name.system.cherrypick_snapshot('my_table', 1)
+CALL catalog_name.system.cherrypick_snapshot('my_table', 1);
```
Cherry-pick snapshot 1 with named args
```sql
-CALL catalog_name.system.cherrypick_snapshot(snapshot_id => 1, table =>
'my_table' )
+CALL catalog_name.system.cherrypick_snapshot(snapshot_id => 1, table =>
'my_table' );
```
### `publish_changes`
@@ -222,12 +222,12 @@ This procedure invalidates all cached Spark plans that
reference the affected ta
publish_changes with WAP ID 'wap_id_1'
```sql
-CALL catalog_name.system.publish_changes('my_table', 'wap_id_1')
+CALL catalog_name.system.publish_changes('my_table', 'wap_id_1');
```
publish_changes with named args
```sql
-CALL catalog_name.system.publish_changes(wap_id => 'wap_id_2', table =>
'my_table')
+CALL catalog_name.system.publish_changes(wap_id => 'wap_id_2', table =>
'my_table');
```
### `fast_forward`
@@ -254,7 +254,7 @@ Fast-forward the current snapshot of one branch to the
latest snapshot of anothe
Fast-forward the main branch to the head of `audit-branch`
```sql
-CALL catalog_name.system.fast_forward('my_table', 'main', 'audit-branch')
+CALL catalog_name.system.fast_forward('my_table', 'main', 'audit-branch');
```
@@ -301,13 +301,13 @@ Snapshots that are still referenced by branches or tags
won't be removed. By def
Remove snapshots older than specific day and time, but retain the last 100
snapshots:
```sql
-CALL hive_prod.system.expire_snapshots('db.sample', TIMESTAMP '2021-06-30
00:00:00.000', 100)
+CALL hive_prod.system.expire_snapshots('db.sample', TIMESTAMP '2021-06-30
00:00:00.000', 100);
```
Remove snapshots with snapshot ID `123` (note that this snapshot ID should not
be the current snapshot):
```sql
-CALL hive_prod.system.expire_snapshots(table => 'db.sample', snapshot_ids =>
ARRAY(123))
+CALL hive_prod.system.expire_snapshots(table => 'db.sample', snapshot_ids =>
ARRAY(123));
```
### `remove_orphan_files`
@@ -334,12 +334,12 @@ Used to remove files which are not referenced in any
metadata files of an Iceber
List all the files that are candidates for removal by performing a dry run of
the `remove_orphan_files` command on this table without actually removing them:
```sql
-CALL catalog_name.system.remove_orphan_files(table => 'db.sample', dry_run =>
true)
+CALL catalog_name.system.remove_orphan_files(table => 'db.sample', dry_run =>
true);
```
Remove any files in the `tablelocation/data` folder which are not known to the
table `db.sample`.
```sql
-CALL catalog_name.system.remove_orphan_files(table => 'db.sample', location =>
'tablelocation/data')
+CALL catalog_name.system.remove_orphan_files(table => 'db.sample', location =>
'tablelocation/data');
```
### `rewrite_data_files`
@@ -405,29 +405,29 @@ Iceberg can compact data files in parallel using Spark
with the `rewriteDataFile
Rewrite the data files in table `db.sample` using the default rewrite
algorithm of bin-packing to combine small files
and also split large files according to the default write size of the table.
```sql
-CALL catalog_name.system.rewrite_data_files('db.sample')
+CALL catalog_name.system.rewrite_data_files('db.sample');
```
Rewrite the data files in table `db.sample` by sorting all the data on id and
name
using the same defaults as bin-pack to determine which files to rewrite.
```sql
-CALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy =>
'sort', sort_order => 'id DESC NULLS LAST,name ASC NULLS FIRST')
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy =>
'sort', sort_order => 'id DESC NULLS LAST,name ASC NULLS FIRST');
```
Rewrite the data files in table `db.sample` by zOrdering on column c1 and c2.
Using the same defaults as bin-pack to determine which files to rewrite.
```sql
-CALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy =>
'sort', sort_order => 'zorder(c1,c2)')
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', strategy =>
'sort', sort_order => 'zorder(c1,c2)');
```
Rewrite the data files in table `db.sample` using bin-pack strategy in any
partition where more than 2 or more files need to be rewritten.
```sql
-CALL catalog_name.system.rewrite_data_files(table => 'db.sample', options =>
map('min-input-files','2'))
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', options =>
map('min-input-files','2'));
```
Rewrite the data files in table `db.sample` and select the files that may
contain data matching the filter (id = 3 and name = "foo") to be rewritten.
```sql
-CALL catalog_name.system.rewrite_data_files(table => 'db.sample', where => 'id
= 3 and name = "foo"')
+CALL catalog_name.system.rewrite_data_files(table => 'db.sample', where => 'id
= 3 and name = "foo"');
```
### `rewrite_manifests`
@@ -458,12 +458,12 @@ This procedure invalidates all cached Spark plans that
reference the affected ta
Rewrite the manifests in table `db.sample` and align manifest files with table
partitioning.
```sql
-CALL catalog_name.system.rewrite_manifests('db.sample')
+CALL catalog_name.system.rewrite_manifests('db.sample');
```
Rewrite the manifests in table `db.sample` and disable the use of Spark
caching. This could be done to avoid memory issues on executors.
```sql
-CALL catalog_name.system.rewrite_manifests('db.sample', false)
+CALL catalog_name.system.rewrite_manifests('db.sample', false);
```
### `rewrite_position_delete_files`
@@ -510,17 +510,17 @@ Dangling deletes are always filtered out during rewriting.
Rewrite position delete files in table `db.sample`. This selects position
delete files that fit default rewrite criteria, and writes new files of target
size `target-file-size-bytes`. Dangling deletes are removed from rewritten
delete files.
```sql
-CALL catalog_name.system.rewrite_position_delete_files('db.sample')
+CALL catalog_name.system.rewrite_position_delete_files('db.sample');
```
Rewrite all position delete files in table `db.sample`, writing new files
`target-file-size-bytes`. Dangling deletes are removed from rewritten delete
files.
```sql
-CALL catalog_name.system.rewrite_position_delete_files(table => 'db.sample',
options => map('rewrite-all', 'true'))
+CALL catalog_name.system.rewrite_position_delete_files(table => 'db.sample',
options => map('rewrite-all', 'true'));
```
Rewrite position delete files in table `db.sample`. This selects position
delete files in partitions where 2 or more position delete files need to be
rewritten based on size criteria. Dangling deletes are removed from rewritten
delete files.
```sql
-CALL catalog_name.system.rewrite_position_delete_files(table => 'db.sample',
options => map('min-input-files','2'))
+CALL catalog_name.system.rewrite_position_delete_files(table => 'db.sample',
options => map('min-input-files','2'));
```
## Table migration
@@ -567,13 +567,13 @@ See [`migrate`](#migrate) to replace an existing table
with an Iceberg table.
Make an isolated Iceberg table which references table `db.sample` named
`db.snap` at the
catalog's default location for `db.snap`.
```sql
-CALL catalog_name.system.snapshot('db.sample', 'db.snap')
+CALL catalog_name.system.snapshot('db.sample', 'db.snap');
```
Migrate an isolated Iceberg table which references table `db.sample` named
`db.snap` at
a manually specified location `/tmp/temptable/`.
```sql
-CALL catalog_name.system.snapshot('db.sample', 'db.snap', '/tmp/temptable/')
+CALL catalog_name.system.snapshot('db.sample', 'db.snap', '/tmp/temptable/');
```
### `migrate`
@@ -609,12 +609,12 @@ By default, the original table is retained with the name
`table_BACKUP_`.
Migrate the table `db.sample` in Spark's default catalog to an Iceberg table
and add a property 'foo' set to 'bar':
```sql
-CALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo', 'bar'))
+CALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo', 'bar'));
```
Migrate `db.sample` in the current catalog to an Iceberg table without adding
any additional properties:
```sql
-CALL catalog_name.system.migrate('db.sample')
+CALL catalog_name.system.migrate('db.sample');
```
### `add_files`
@@ -663,7 +663,7 @@ CALL spark_catalog.system.add_files(
table => 'db.tbl',
source_table => 'db.src_tbl',
partition_filter => map('part_col_1', 'A')
-)
+);
```
Add files from a `parquet` file based table at location `path/to/table` to the
Iceberg table `db.tbl`. Add all
@@ -672,7 +672,7 @@ files regardless of what partition they belong to.
CALL spark_catalog.system.add_files(
table => 'db.tbl',
source_table => '`parquet`.`path/to/table`'
-)
+);
```
### `register_table`
@@ -706,7 +706,7 @@ Register a new table as `db.tbl` to `spark_catalog`
pointing to metadata.json fi
CALL spark_catalog.system.register_table(
table => 'db.tbl',
metadata_file => 'path/to/metadata/file.json'
-)
+);
```
## Metadata information
@@ -743,13 +743,13 @@ Report the live snapshot IDs of parents of a specified
snapshot
Get all the snapshot ancestors of current snapshots(default)
```sql
-CALL spark_catalog.system.ancestors_of('db.tbl')
+CALL spark_catalog.system.ancestors_of('db.tbl');
```
Get all the snapshot ancestors by a particular snapshot
```sql
-CALL spark_catalog.system.ancestors_of('db.tbl', 1)
-CALL spark_catalog.system.ancestors_of(snapshot_id => 1, table => 'db.tbl')
+CALL spark_catalog.system.ancestors_of('db.tbl', 1);
+CALL spark_catalog.system.ancestors_of(snapshot_id => 1, table => 'db.tbl');
```
## Change Data Capture
@@ -788,7 +788,7 @@ Create a changelog view `tbl_changes` based on the changes
that happened between
CALL spark_catalog.system.create_changelog_view(
table => 'db.tbl',
options => map('start-snapshot-id','1','end-snapshot-id', '2')
-)
+);
```
Create a changelog view `my_changelog_view` based on the changes that happened
between timestamp `1678335750489` (exclusive) and `1678992105265` (inclusive).
@@ -797,7 +797,7 @@ CALL spark_catalog.system.create_changelog_view(
table => 'db.tbl',
options => map('start-timestamp','1678335750489','end-timestamp',
'1678992105265'),
changelog_view => 'my_changelog_view'
-)
+);
```
Create a changelog view that computes updates based on the identifier columns
`id` and `name`.
@@ -811,10 +811,10 @@ CALL spark_catalog.system.create_changelog_view(
Once the changelog view is created, you can query the view to see the changes
that happened between the snapshots.
```sql
-SELECT * FROM tbl_changes
+SELECT * FROM tbl_changes;
```
```sql
-SELECT * FROM tbl_changes where _change_type = 'INSERT' AND id = 3 ORDER BY
_change_ordinal
+SELECT * FROM tbl_changes where _change_type = 'INSERT' AND id = 3 ORDER BY
_change_ordinal;
```
Please note that the changelog view includes Change Data Capture(CDC) metadata
columns
that provide additional information about the changes being tracked. These
columns are:
@@ -837,7 +837,7 @@ CALL spark_catalog.system.create_changelog_view(
table => 'db.tbl',
options => map('end-snapshot-id', '87647489814522183702'),
net_changes => true
-)
+);
```
With the net changes, the above changelog view only contains the following row
since Alice was inserted in the first snapshot and deleted in the second
snapshot.
@@ -861,7 +861,7 @@ reports this as the following pair of rows, despite it not
being an actual chang
To see carry-over rows, query `SparkChangelogTable` as follows:
```sql
-SELECT * FROM spark_catalog.db.tbl.changes
+SELECT * FROM spark_catalog.db.tbl.changes;
```
#### Pre/Post Update Images
diff --git a/docs/spark-queries.md b/docs/spark-queries.md
index 2923638306..188b05249b 100644
--- a/docs/spark-queries.md
+++ b/docs/spark-queries.md
@@ -258,7 +258,7 @@ select
from prod.db.table.history h
join prod.db.table.snapshots s
on h.snapshot_id = s.snapshot_id
-order by made_current_at
+order by made_current_at;
```
| made_current_at | operation | snapshot_id | is_current_ancestor |
summary[spark.app.id] |