Fokko commented on code in PR #11845:
URL: https://github.com/apache/iceberg/pull/11845#discussion_r1913039173
##########
site/docs/spark-quickstart.md:
##########
@@ -267,44 +271,109 @@ To read a table, simply use the Iceberg table's name.
df = spark.table("demo.nyc.taxis").show()
```
-### Adding A Catalog
+### Adding catalogs
-Iceberg has several catalog back-ends that can be used to track tables, like
JDBC, Hive MetaStore and Glue.
-Catalogs are configured using properties under
`spark.sql.catalog.(catalog_name)`. In this guide,
-we use JDBC, but you can follow these instructions to configure other catalog
types. To learn more, check out
-the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark
section.
+Apache Iceberg provides several catalog implementations to manage tables and
enable SQL operations.
+Catalogs are configured using properties under
`spark.sql.catalog.(catalog_name)`.
+You can configure different catalog types, such as JDBC, Hive Metastore, Glue,
and REST, to manage Iceberg tables in Spark.
-This configuration creates a path-based catalog named `local` for tables under
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in
catalog.
+This guide covers the configuration of two popular catalog types:
+
+* JDBC Catalog
+* REST Catalog
+
+To learn more, check out the
[Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark
section.
+
+#### Configuring JDBC Catalog
+
+The JDBC catalog stores Iceberg table metadata in a relational database.
+
+This configuration creates a JDBC-based catalog named `local` for tables under
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in
catalog.
+
+The JDBC catalog uses file-based SQLite database as the backend.
=== "CLI"
```sh
- spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }}\
+ spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{
icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \
--conf
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
\
--conf
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hive \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
- --conf spark.sql.catalog.local.type=hadoop \
+ --conf spark.sql.catalog.local.type=jdbc \
+ --conf
spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \
--conf spark.sql.catalog.local.warehouse=$PWD/warehouse \
--conf spark.sql.defaultCatalog=local
```
=== "spark-defaults.conf"
```sh
- spark.jars.packages
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}
+ spark.jars.packages
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion
}},org.xerial:sqlite-jdbc:3.46.1.3
spark.sql.extensions
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog
org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type hive
spark.sql.catalog.local
org.apache.iceberg.spark.SparkCatalog
- spark.sql.catalog.local.type hadoop
- spark.sql.catalog.local.warehouse $PWD/warehouse
+ spark.sql.catalog.local.type jdbc
+ spark.sql.catalog.local.uri
jdbc:sqlite:iceberg_catalog_db.sqlite
+ spark.sql.catalog.local.warehouse warehouse
spark.sql.defaultCatalog local
```
!!! note
If your Iceberg catalog is not set as the default catalog, you will have
to switch to it by executing `USE local;`
+#### Configuring REST Catalog
Review Comment:
I would be inclined to move this as the first option :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]