This is an automated email from the ASF dual-hosted git repository.
blue pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 39348fe Update the site for the 0.9.0 release (#1205)
39348fe is described below
commit 39348fec94d08a9eb4f3da0b41e002adc8b832d9
Author: Ryan Blue <[email protected]>
AuthorDate: Wed Jul 15 11:58:45 2020 -0700
Update the site for the 0.9.0 release (#1205)
---
site/docs/api-quickstart.md | 176 -------------------------------------------
site/docs/configuration.md | 42 ++++++++---
site/docs/css/extra.css | 7 +-
site/docs/getting-started.md | 102 +++++++++++++++----------
site/docs/javadoc/index.html | 4 +-
site/docs/releases.md | 24 ++++--
site/docs/spark.md | 23 +++++-
site/mkdocs.yml | 7 +-
8 files changed, 140 insertions(+), 245 deletions(-)
diff --git a/site/docs/api-quickstart.md b/site/docs/api-quickstart.md
deleted file mode 100644
index 1926f4a..0000000
--- a/site/docs/api-quickstart.md
+++ /dev/null
@@ -1,176 +0,0 @@
-<!--
- - Licensed to the Apache Software Foundation (ASF) under one or more
- - contributor license agreements. See the NOTICE file distributed with
- - this work for additional information regarding copyright ownership.
- - The ASF licenses this file to You under the Apache License, Version 2.0
- - (the "License"); you may not use this file except in compliance with
- - the License. You may obtain a copy of the License at
- -
- - http://www.apache.org/licenses/LICENSE-2.0
- -
- - Unless required by applicable law or agreed to in writing, software
- - distributed under the License is distributed on an "AS IS" BASIS,
- - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- - See the License for the specific language governing permissions and
- - limitations under the License.
- -->
-
-# Spark API Quickstart
-
-## Create a table
-
-Tables are created using either a
[`Catalog`](/javadoc/master/index.html?org/apache/iceberg/catalog/Catalog.html)
or an implementation of the
[`Tables`](/javadoc/master/index.html?org/apache/iceberg/Tables.html) interface.
-
-### Using a Hive catalog
-
-The Hive catalog connects to a Hive MetaStore to keep track of Iceberg tables.
This example uses Spark's Hadoop configuration to get a Hive catalog:
-
-```scala
-import org.apache.iceberg.hive.HiveCatalog
-
-val catalog = new HiveCatalog(spark.sessionState.newHadoopConf())
-```
-
-The `Catalog` interface defines methods for working with tables, like
`createTable`, `loadTable`, `renameTable`, and `dropTable`.
-
-To create a table, pass an `Identifier` and a `Schema` along with other
initial metadata:
-
-```scala
-val name = TableIdentifier.of("logging", "logs")
-val table = catalog.createTable(name, schema, spec)
-
-// write into the new logs table with Spark 2.4
-logsDF.write
- .format("iceberg")
- .mode("append")
- .save("logging.logs")
-```
-
-The logs [schema](#create-a-schema) and [partition
spec](#create-a-partition-spec) are created below.
-
-### Using a Hadoop catalog
-
-A Hadoop catalog doesn't need to connect to a Hive MetaStore, but can only be
used with HDFS or similar file systems that support atomic rename. Concurrent
writes with a Hadoop catalog are not safe with a local FS or S3. To create a
Hadoop catalog:
-
-```scala
-import org.apache.hadoop.conf.Configuration;
-import org.apache.iceberg.hadoop.HadoopCatalog;
-
-val conf = new Configuration();
-val warehousePath = "hdfs://host:8020/warehouse_path";
-val catalog = new HadoopCatalog(conf, warehousePath);
-```
-
-Like the Hive catalog, `HadoopCatalog` implements `Catalog`, so it also has
methods for working with tables, like `createTable`, `loadTable`, and
`dropTable`.
-
-This example creates a table with the Hadoop catalog:
-
-```scala
-val name = TableIdentifier.of("logging", "logs")
-val table = catalog.createTable(name, schema, spec)
-
-// write into the new logs table with Spark 2.4
-logsDF.write
- .format("iceberg")
- .mode("append")
- .save("hdfs://host:8020/warehouse_path/logging.db/logs")
-```
-
-The logs [schema](#create-a-schema) and [partition
spec](#create-a-partition-spec) are created below.
-
-### Using Hadoop tables
-
-Iceberg also supports tables that are stored in a directory in HDFS.
Concurrent writes with a Hadoop tables are not safe when stored in the local FS
or S3. Directory tables don't support all catalog operations, like rename, so
they use the `Tables` interface instead of `Catalog`.
-
-To create a table in HDFS, use `HadoopTables`:
-
-```scala
-import org.apache.iceberg.hadoop.HadoopTables
-
-val tables = new HadoopTables(spark.sessionState.newHadoopConf())
-
-val table = tables.create(schema, spec, "hdfs:/tables/logging/logs")
-
-// write into the new logs table with Spark 2.4
-logsDF.write
- .format("iceberg")
- .mode("append")
- .save("hdfs:/tables/logging/logs")
-```
-
-!!! Warning
- Hadoop tables shouldn't be used with file systems that do not support
atomic rename. Iceberg relies on rename to synchronize concurrent commits for
directory tables.
-
-### Tables in Spark
-
-Spark uses both `HiveCatalog` and `HadoopTables` to load tables. Hive is used
when the identifier passed to `load` or `save` is not a path, otherwise Spark
assumes it is a path-based table.
-
-To read and write to tables from Spark see:
-
-* [Reading a table in Spark](../spark#reading-an-iceberg-table)
-* [Appending to a table in Spark](../spark#appending-data)
-* [Overwriting data in a table in Spark](../spark#overwriting-data)
-
-
-## Schemas
-
-### Create a schema
-
-This example creates a schema for a `logs` table:
-
-```scala
-import org.apache.iceberg.Schema
-import org.apache.iceberg.types.Types._
-
-val schema = new Schema(
- NestedField.required(1, "level", StringType.get()),
- NestedField.required(2, "event_time", TimestampType.withZone()),
- NestedField.required(3, "message", StringType.get()),
- NestedField.optional(4, "call_stack", ListType.ofRequired(5,
StringType.get()))
- )
-```
-
-When using the Iceberg API directly, type IDs are required. Conversions from
other schema formats, like Spark, Avro, and Parquet will automatically assign
new IDs.
-
-When a table is created, all IDs in the schema are re-assigned to ensure
uniqueness.
-
-### Convert a schema from Avro
-
-To create an Iceberg schema from an existing Avro schema, use converters in
`AvroSchemaUtil`:
-
-```scala
-import org.apache.avro.Schema.Parser
-import org.apache.iceberg.avro.AvroSchemaUtil
-
-val avroSchema = new Parser().parse("""{"type": "record", ... }""")
-
-val icebergSchema = AvroSchemaUtil.toIceberg(avroSchema)
-```
-
-### Convert a schema from Spark
-
-To create an Iceberg schema from an existing table, use converters in
`SparkSchemaUtil`:
-
-```scala
-import org.apache.iceberg.spark.SparkSchemaUtil
-
-val schema = SparkSchemaUtil.convert(spark.table("db.table").schema)
-```
-
-
-## Partitioning
-
-### Create a partition spec
-
-Partition specs describe how Iceberg should group records into data files.
Partition specs are created for a table's schema using a builder.
-
-This example creates a partition spec for the `logs` table that partitions
records by the hour of the log event's timestamp and by log level:
-
-```scala
-import org.apache.iceberg.PartitionSpec
-
-val spec = PartitionSpec.builderFor(schema)
- .hour("event_time")
- .identity("level")
- .build()
-```
diff --git a/site/docs/configuration.md b/site/docs/configuration.md
index 874b0e3..27dd60b 100644
--- a/site/docs/configuration.md
+++ b/site/docs/configuration.md
@@ -67,14 +67,36 @@ Iceberg tables support table properties to configure table
behavior, like the de
| --------------------------------------------- | -------- |
------------------------------------------------------------- |
| compatibility.snapshot-id-inheritance.enabled | false | Enables
committing snapshots without explicit snapshot IDs |
-## Hadoop options
+## Hadoop configuration
+
+The following properties from the Hadoop configuration are used by the Hive
Metastore connector.
| Property | Default | Description
|
| ---------------------------------- | ---------------- |
------------------------------------------------------------- |
| iceberg.hive.client-pool-size | 5 | The size of the Hive
client pool when tracking tables in HMS |
| iceberg.hive.lock-timeout-ms | 180000 (3 min) | Maximum time in
milliseconds to acquire a lock |
-## Spark options
+## Spark configuration
+
+### Catalogs
+
+[Spark catalogs](../spark#configuring-catalogs) are configured using Spark
session properties.
+
+A catalog is created and named by adding a property
`spark.sql.catalog.(catalog-name)` with an implementation class for its value.
+
+Iceberg supplies two implementations:
+
+* `org.apache.iceberg.spark.SparkCatalog` supports a Hive Metastore or a
Hadoop warehouse as a catalog
+* `org.apache.iceberg.spark.SparkSessionCatalog` adds support for Iceberg
tables to Spark's built-in catalog, and delegates to the built-in catalog for
non-Iceberg tables
+
+Both catalogs are configured using properties nested under the catalog name:
+
+| Property | Values
| Description |
+| -------------------------------------------------- |
----------------------------- |
-------------------------------------------------------------------- |
+| spark.sql.catalog._catalog-name_.type | hive or hadoop
| The underlying Iceberg catalog implementation |
+| spark.sql.catalog._catalog-name_.default-namespace | default
| The default current namespace for the catalog |
+| spark.sql.catalog._catalog-name_.uri | thrift://host:port
| URI for the Hive Metastore; default from `hive-site.xml` (Hive only) |
+| spark.sql.catalog._catalog-name_.warehouse |
hdfs://nn:8020/warehouse/path | Base path for the warehouse directory (Hadoop
only) |
### Read options
@@ -83,9 +105,8 @@ Spark read options are passed when configuring the
DataFrameReader, like this:
```scala
// time travel
spark.read
- .format("iceberg")
.option("snapshot-id", 10963874102873L)
- .load("db.table")
+ .table("catalog.db.table")
```
| Spark option | Default | Description
|
@@ -103,14 +124,13 @@ Spark write options are passed when configuring the
DataFrameWriter, like this:
```scala
// write with Avro instead of Parquet
df.write
- .format("iceberg")
.option("write-format", "avro")
- .save("db.table")
+ .insertInto("catalog.db.table")
```
-| Spark option | Default | Description
|
-| ------------ | -------------------------- |
------------------------------------------------------------ |
-| write-format | Table write.format.default | File format to use for this
write operation; parquet or avro |
-| target-file-size-bytes | As per table property | Overrides this table's
write.target-file-size-bytes |
-| check-nullability | true | Sets the nullable check on fields
|
+| Spark option | Default | Description
|
+| ---------------------- | -------------------------- |
------------------------------------------------------------ |
+| write-format | Table write.format.default | File format to use for
this write operation; parquet or avro |
+| target-file-size-bytes | As per table property | Overrides this table's
write.target-file-size-bytes |
+| check-nullability | true | Sets the nullable
check on fields |
diff --git a/site/docs/css/extra.css b/site/docs/css/extra.css
index de09c11..76545b0 100644
--- a/site/docs/css/extra.css
+++ b/site/docs/css/extra.css
@@ -41,6 +41,10 @@
opacity: 0;
}
+h2, h3, h4 {
+ padding-top: 1em;
+}
+
h2:target .headerlink {
color: #008cba;
opacity: 1;
@@ -79,7 +83,8 @@ pre {
.admonition {
margin: 0.5em;
- margin-left: 0em;
+ margin-top: 1.5em;
+ margin-left: 1em;
padding: 0.5em;
padding-left: 1em;
}
diff --git a/site/docs/getting-started.md b/site/docs/getting-started.md
index 0d15b4d..1f08521 100644
--- a/site/docs/getting-started.md
+++ b/site/docs/getting-started.md
@@ -17,82 +17,102 @@
# Getting Started
-## Using Iceberg in Spark
+## Using Iceberg in Spark 3
-The latest version of Iceberg is [0.8.0-incubating](../releases).
+The latest version of Iceberg is [0.9.0](../releases).
To use Iceberg in a Spark shell, use the `--packages` option:
```sh
-spark-shell --packages
org.apache.iceberg:iceberg-spark-runtime:0.8.0-incubating
+spark-shell --packages org.apache.iceberg:iceberg-spark3-runtime:0.9.0
```
-You can also build Iceberg locally, and add the jar using `--jars`. This can
be helpful to test unreleased features or while developing something new:
+!!! Note
+ If you want to include Iceberg in your Spark installation, add the
[`iceberg-spark3-runtime` Jar][spark-runtime-jar] to Spark's `jars` folder.
-```sh
-./gradlew assemble
-spark-shell --jars spark-runtime/build/libs/iceberg-spark-runtime-8c05a2f.jar
-```
+[spark-runtime-jar]:
https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark3-runtime/0.9.0/iceberg-spark3-runtime-0.9.0.jar
-## Installing with Spark
+### Adding catalogs
-If you want to include Iceberg in your Spark installation, add the
[`iceberg-spark-runtime` Jar][spark-runtime-jar] to Spark's `jars` folder.
+Iceberg comes with [catalogs](../spark#configuring-catalogs) that enable SQL
commands to manage tables and load them by name. Catalogs are configured using
properties under `spark.sql.catalog.(catalog_name)`.
-Where you have to replace `8c05a2f` with the git hash that you're using.
+This command creates a path-based catalog named `local` for tables under
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in
catalog:
-[spark-runtime-jar]:
https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime/0.8.0-incubating/iceberg-spark-runtime-0.8.0-incubating.jar
+```sh
+spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.9.0 \
+ --conf
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
+ --conf spark.sql.catalog.spark_catalog.type=hive \
+ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
+ --conf spark.sql.catalog.local.type=hadoop \
+ --conf spark.sql.catalog.local.uri=$PWD/warehouse
+```
-## Creating a table
+### Creating a table
-Spark 2.4 is limited to reading and writing existing Iceberg tables. Use the
[Iceberg API](../api) to create Iceberg tables.
+To create your first Iceberg table in Spark, use the `spark-sql` shell or
`spark.sql(...)` to run a [`CREATE TABLE`](../spark#create-table) command:
-Here's how to create your first Iceberg table in Spark, using a source Dataset
+```sql
+-- local is the path-based catalog defined above
+CREATE TABLE local.db.table (id bigint, data string) USING iceberg
+```
-First, import Iceberg classes and create a catalog client:
+Iceberg catalogs support the full range of SQL DDL commands, including:
-```scala
-import org.apache.iceberg.hive.HiveCatalog
-import org.apache.iceberg.catalog.TableIdentifier
-import org.apache.iceberg.spark.SparkSchemaUtil
+* [`CREATE TABLE ... PARTITIONED BY`](../spark#create-table)
+* [`CREATE TABLE ... AS SELECT`](../spark#create-table-as-select)
+* [`ALTER TABLE`](../spark#alter-table)
+* [`DROP TABLE`](../spark#drop-table)
-val catalog = new HiveCatalog(spark.sparkContext.hadoopConfiguration)
-```
+### Writing
-Next, create a dataset to write into your table and get an Iceberg schema for
it:
+Once your table is created, insert data using [`INSERT
INTO`](../spark#insert-into):
-```scala
-val data = Seq((1, "a"), (2, "b"), (3, "c")).toDF("id", "data")
-val schema = SparkSchemaUtil.convert(data.schema)
+```sql
+INSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c');
+INSERT INTO local.db.table SELECT id, data FROM source WHERE length(data) = 1;
```
-Finally, create a table using the schema:
+Iceberg supports writing DataFrames using the new [v2 DataFrame write
API](../spark#writing-with-dataframes):
```scala
-val name = TableIdentifier.of("default", "test_table")
-val table = catalog.createTable(name, schema)
+spark.table("source").select("id", "data")
+ .writeTo("local.db.table").append()
```
-### Reading and writing
+The old `write` API is supported, but _not_ recommended.
-Once your table is created, you can use it in `load` and `save` in Spark 2.4:
+### Reading
-```scala
-// write the dataset to the table
-data.write.format("iceberg").mode("append").save("default.test_table")
+To read with SQL, use the an Iceberg table name in a `SELECT` query:
-// read the table
-spark.read.format("iceberg").load("default.test_table")
+```sql
+SELECT count(1) as count, data
+FROM local.db.table
+GROUP BY data
```
-### Reading with SQL
+SQL is also the recommended way to [inspect
tables](../spark#inspecting-tables). To view all of the snapshots in a table,
use the `snapshots` metadata table:
+```sql
+SELECT * FROM local.db.table.snapshots
+```
+```
++-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
+| committed_at | snapshot_id | parent_id | operation |
manifest_list | ... |
++-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
+| 2019-02-08 03:29:51.215 | 57897183625154 | null | append |
s3://.../table/metadata/snap-57897183625154-1.avro | ... |
+| | | | |
| ... |
+| | | | |
| ... |
+| ... | ... | ... | ... | ...
| ... |
++-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+
+```
-You can also create a temporary view to use the table in SQL:
+[DataFrame reads](../spark#querying-with-dataframes) are supported and can now
reference tables by name using `spark.table`:
```scala
-spark.read.format("iceberg").load("default.test_table").createOrReplaceTempView("test_table")
-spark.sql("""SELECT count(1) FROM test_table""")
+val df = spark.table("local.db.table")
+df.count()
```
### Next steps
-Next, you can learn more about the [Iceberg Table API](../api), or about
[Iceberg tables in Spark](../spark)
+Next, you can learn more about [Iceberg tables in Spark](../spark), or about
the [Iceberg Table API](../api).
diff --git a/site/docs/javadoc/index.html b/site/docs/javadoc/index.html
index 603c018..d18f655 100644
--- a/site/docs/javadoc/index.html
+++ b/site/docs/javadoc/index.html
@@ -1,9 +1,9 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Iceberg Javadoc Redirect</title>
- <meta http-equiv="refresh" content="0;URL='/javadoc/0.8.0-incubating/'" />
+ <meta http-equiv="refresh" content="0;URL='/javadoc/0.9.0/'" />
</head>
<body>
- <p>Redirecting to Javadoc for the 0.8.0-incubating release: <a
href="/javadoc/0.8.0-incubating/">/javadoc/0.8.0-incubating</a>.</p>
+ <p>Redirecting to Javadoc for the 0.9.0 release: <a
href="/javadoc/0.9.0/">/javadoc/0.9.0</a>.</p>
</body>
</html>
diff --git a/site/docs/releases.md b/site/docs/releases.md
index fbae902..8dd169d 100644
--- a/site/docs/releases.md
+++ b/site/docs/releases.md
@@ -1,26 +1,27 @@
## Downloads
-The latest version of Iceberg is
[0.8.0-incubating](https://github.com/apache/iceberg/releases/tag/apache-iceberg-0.8.0-incubating).
+The latest version of Iceberg is
[0.9.0](https://github.com/apache/iceberg/releases/tag/apache-iceberg-0.9.0).
-* [0.8.0-incubating source
tar.gz](https://www.apache.org/dyn/closer.cgi/incubator/iceberg/apache-iceberg-0.8.0-incubating/apache-iceberg-0.8.0-incubating.tar.gz)
--
[signature](https://downloads.apache.org/incubator/iceberg/apache-iceberg-0.8.0-incubating/apache-iceberg-0.8.0-incubating.tar.gz.asc)
--
[sha512](https://downloads.apache.org/incubator/iceberg/apache-iceberg-0.8.0-incubating/apache-iceberg-0.8.0-incubating.tar.gz.sha512)
-* [0.8.0-incubating Spark 2.4 runtime
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime/0.8.0-incubating/iceberg-spark-runtime-0.8.0-incubating.jar)
+* [0.9.0 source
tar.gz](https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-0.9.0/apache-iceberg-0.9.0.tar.gz)
--
[signature](https://downloads.apache.org/iceberg/apache-iceberg-0.9.0/apache-iceberg-0.9.0.tar.gz.asc)
--
[sha512](https://downloads.apache.org/iceberg/apache-iceberg-0.9.0/apache-iceberg-0.9.0.tar.gz.sha512)
+* [0.9.0 Spark 3.0 runtime
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark3-runtime/0.9.0/iceberg-spark3-runtime-0.9.0.jar)
+* [0.9.0 Spark 2.4 runtime
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime/0.9.0/iceberg-spark-runtime-0.9.0.jar)
-To use Iceberg in Spark 2.4, download the runtime Jar and add it to the jars
folder of your Spark install.
+To use Iceberg in Spark, download the runtime Jar and add it to the jars
folder of your Spark install. Use iceberg-spark3-runtime for Spark 3, and
iceberg-spark-runtime for Spark 2.4.
-## Gradle
+### Gradle
To add a dependency on Iceberg in Gradle, add the following to `build.gradle`:
```
dependencies {
- compile 'org.apache.iceberg:iceberg-core:0.8.0-incubating'
+ compile 'org.apache.iceberg:iceberg-core:0.9.0'
}
```
You may also want to include `iceberg-parquet` for Parquet file support.
-## Maven
+### Maven
To add a dependency on Iceberg in Maven, add the following to your `pom.xml`:
@@ -30,7 +31,7 @@ To add a dependency on Iceberg in Maven, add the following to
your `pom.xml`:
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-core</artifactId>
- <version>0.8.0-incubating</version>
+ <version>0.9.0</version>
</dependency>
...
</dependencies>
@@ -38,6 +39,13 @@ To add a dependency on Iceberg in Maven, add the following
to your `pom.xml`:
## Past releases
+### 0.8.0
+
+* Git tag:
[apache-iceberg-0.8.0-incubating](https://github.com/apache/iceberg/releases/tag/apache-iceberg-0.8.0-incubating)
+* [0.8.0-incubating source
tar.gz](https://www.apache.org/dyn/closer.cgi/incubator/iceberg/apache-iceberg-0.8.0-incubating/apache-iceberg-0.8.0-incubating.tar.gz)
--
[signature](https://downloads.apache.org/incubator/iceberg/apache-iceberg-0.8.0-incubating/apache-iceberg-0.8.0-incubating.tar.gz.asc)
--
[sha512](https://downloads.apache.org/incubator/iceberg/apache-iceberg-0.8.0-incubating/apache-iceberg-0.8.0-incubating.tar.gz.sha512)
+* [0.8.0-incubating Spark 2.4 runtime
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime/0.8.0-incubating/iceberg-spark-runtime-0.8.0-incubating.jar)
+
+
### 0.7.0
* Git tag:
[apache-iceberg-0.7.0-incubating](https://github.com/apache/iceberg/releases/tag/apache-iceberg-0.7.0-incubating)
diff --git a/site/docs/spark.md b/site/docs/spark.md
index 5353c1d..120ff0d 100644
--- a/site/docs/spark.md
+++ b/site/docs/spark.md
@@ -37,7 +37,7 @@ Iceberg uses Apache Spark's DataSourceV2 API for data source
and catalog impleme
## Configuring catalogs
-Spark 3.0 adds an API to plug in table catalogs that are used to load, create,
and manage Iceberg tables. Spark catalogs are configured by setting Spark
properties under `spark.sql.catalog`.
+Spark 3.0 adds an API to plug in table catalogs that are used to load, create,
and manage Iceberg tables. Spark catalogs are configured by setting [Spark
properties](../configuration#catalogs) under `spark.sql.catalog`.
This creates an Iceberg catalog named `hive_prod` that loads tables from a
Hive metastore:
@@ -93,7 +93,7 @@ This configuration can use same Hive Metastore for both
Iceberg and non-Iceberg
## DDL commands
!!! Note
- Spark 2.4 can't create Iceberg tables with DDL, instead use the [Iceberg
API](../api-quickstart).
+ Spark 2.4 can't create Iceberg tables with DDL, instead use the [Iceberg
API](../java-api-quickstart).
### `CREATE TABLE`
@@ -286,6 +286,11 @@ val df = spark.read
.table("prod.db.table")
```
+!!! Warning
+ When reading with DataFrames in Spark 3, use `table` to load a table by
name from a catalog.
+ Using `format("iceberg")` loads an isolated table reference that is not
refreshed when other queries update the table.
+
+
### Time travel
To select a specific table snapshot or the snapshot at some time, Iceberg
supports two Spark read options:
@@ -353,6 +358,13 @@ To replace data in the table with the result of a query,
use `INSERT OVERWRITE`.
The partitions that will be replaced by `INSERT OVERWRITE` depends on Spark's
partition overwrite mode and the partitioning of a table.
+!!! Warning
+ Spark 3.0.0 has a correctness bug that affects dynamic `INSERT OVERWRITE`
with hidden partitioning, [SPARK-32168][spark-32168].
+ For tables with [hidden partitions](../partitioning), wait for Spark 3.0.1.
+
+[spark-32168]: https://issues.apache.org/jira/browse/SPARK-32168
+
+
#### Overwrite behavior
Spark's default overwrite mode is **static**, but **dynamic overwrite mode is
recommended when writing to Iceberg tables.** Static overwrite mode determines
which partitions to overwrite in a table by converting the `PARTITION` clause
to a filter, but the `PARTITION` clause can only reference table columns.
@@ -432,6 +444,13 @@ Spark 3 introduced the new `DataFrameWriterV2` API for
writing to tables using d
- `df.writeTo(t).append()` is equivalent to `INSERT INTO`
- `df.writeTo(t).overwritePartitions()` is equivalent to dynamic `INSERT
OVERWRITE`
+The v1 DataFrame `write` API is still supported, but is not recommended.
+
+!!! Warning
+ When writing with the v1 DataFrame API in Spark 3, use `saveAsTable` or
`insertInto` to load tables with a catalog.
+ Using `format("iceberg")` loads an isolated table reference that will not
automatically refresh tables used by queries.
+
+
### Appending data
To append a dataframe to an Iceberg table, use `append`:
diff --git a/site/mkdocs.yml b/site/mkdocs.yml
index 637c40c..5f1976e 100644
--- a/site/mkdocs.yml
+++ b/site/mkdocs.yml
@@ -48,16 +48,15 @@ nav:
- How to Release: how-to-release.md
- User docs:
- Getting Started: getting-started.md
+ - Spark: spark.md
+ - Presto: presto.md
- Configuration: configuration.md
- Schemas: schemas.md
- Partitioning: partitioning.md
+ - Table evolution: evolution.md
- Performance: performance.md
- Reliability: reliability.md
- - Table evolution: evolution.md
- Time Travel: spark#time-travel
- - Spark Quickstart: api-quickstart.md
- - Spark: spark.md
- - Presto: presto.md
- Java:
- Git Repo: https://github.com/apache/iceberg
- Quickstart: java-api-quickstart.md