Re: [PR] [docs] Replace examples of Hadoop catalog with JDBC catalog [iceberg]

via GitHub Mon, 13 Jan 2025 03:24:47 -0800


Fokko commented on code in PR #11845:
URL: https://github.com/apache/iceberg/pull/11845#discussion_r1913039173



##########
site/docs/spark-quickstart.md:
##########
@@ -267,44 +271,109 @@ To read a table, simply use the Iceberg table's name.
     df = spark.table("demo.nyc.taxis").show()
     ```
 
-### Adding A Catalog
+### Adding catalogs
 
-Iceberg has several catalog back-ends that can be used to track tables, like 
JDBC, Hive MetaStore and Glue.
-Catalogs are configured using properties under 
`spark.sql.catalog.(catalog_name)`. In this guide,
-we use JDBC, but you can follow these instructions to configure other catalog 
types. To learn more, check out
-the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark 
section.
+Apache Iceberg provides several catalog implementations to manage tables and 
enable SQL operations. 
+Catalogs are configured using properties under 
`spark.sql.catalog.(catalog_name)`.
+You can configure different catalog types, such as JDBC, Hive Metastore, Glue, 
and REST, to manage Iceberg tables in Spark.
 
-This configuration creates a path-based catalog named `local` for tables under 
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in 
catalog.
+This guide covers the configuration of two popular catalog types:
+
+* JDBC Catalog
+* REST Catalog
+
+To learn more, check out the 
[Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark 
section.
+
+#### Configuring JDBC Catalog
+
+The JDBC catalog stores Iceberg table metadata in a relational database. 
+
+This configuration creates a JDBC-based catalog named `local` for tables under 
`$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in 
catalog.
+
+The JDBC catalog uses file-based SQLite database as the backend.
 
 === "CLI"
 
     ```sh
-    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }}\
+    spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ 
icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \
         --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
         --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
         --conf spark.sql.catalog.spark_catalog.type=hive \
         --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
-        --conf spark.sql.catalog.local.type=hadoop \
+        --conf spark.sql.catalog.local.type=jdbc \
+        --conf 
spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \
         --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \
         --conf spark.sql.defaultCatalog=local
     ```
 
 === "spark-defaults.conf"
 
     ```sh
-    spark.jars.packages                                  
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}
+    spark.jars.packages                                  
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion 
}},org.xerial:sqlite-jdbc:3.46.1.3
     spark.sql.extensions                                 
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
     spark.sql.catalog.spark_catalog                      
org.apache.iceberg.spark.SparkSessionCatalog
     spark.sql.catalog.spark_catalog.type                 hive
     spark.sql.catalog.local                              
org.apache.iceberg.spark.SparkCatalog
-    spark.sql.catalog.local.type                         hadoop
-    spark.sql.catalog.local.warehouse                    $PWD/warehouse
+    spark.sql.catalog.local.type                         jdbc
+    spark.sql.catalog.local.uri                          
jdbc:sqlite:iceberg_catalog_db.sqlite
+    spark.sql.catalog.local.warehouse                    warehouse
     spark.sql.defaultCatalog                             local
     ```
 
 !!! note
     If your Iceberg catalog is not set as the default catalog, you will have 
to switch to it by executing `USE local;`
 
+#### Configuring REST Catalog

Review Comment:
   I would be inclined to move this as the first option :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [docs] Replace examples of Hadoop catalog with JDBC catalog [iceberg]

Reply via email to