pan3793 commented on code in PR #15736: URL: https://github.com/apache/iceberg/pull/15736#discussion_r2978827887
########## docs/docs/spark-queries.md: ########## @@ -51,6 +51,14 @@ writing filters that match Iceberg partition transforms. These functions are ava [Iceberg catalog](spark-configuration.md#catalog-configuration); they are not registered in Spark's built-in catalog. +!!! note + In Spark versions before 4.2.0, `SparkSessionCatalog` does not expose Iceberg's `system` + namespace (see SPARK-54760). Queries such as `SELECT spark_catalog.system.bucket(16, id)` Review Comment: @kevinjqliu, to clarify, I initially created SPARK-54760 as a bug, but during discussion, it was eventually classified as a missing feature. I updated the Jira ticket to reflect that. I think we can use simple words to explain that, e.g., > Spark before 4.2.0 does not support V2Function in the session catalog, see [SPARK-54760](https://issues.apache.org/jira/browse/SPARK-54760) for details. ########## docs/docs/spark-configuration.md: ########## @@ -112,6 +112,13 @@ Spark's built-in catalog supports existing v1 and v2 tables tracked in a Hive Me This configuration can use same Hive Metastore for both Iceberg and non-Iceberg tables. +`SparkSessionCatalog` is useful when you want `spark_catalog` to work with both Iceberg and non-Iceberg +tables in the same metastore. It is not a full replacement for a dedicated Iceberg catalog, though. Review Comment: I would not say these words ########## docs/docs/spark-configuration.md: ########## @@ -112,6 +112,13 @@ Spark's built-in catalog supports existing v1 and v2 tables tracked in a Hive Me This configuration can use same Hive Metastore for both Iceberg and non-Iceberg tables. +`SparkSessionCatalog` is useful when you want `spark_catalog` to work with both Iceberg and non-Iceberg +tables in the same metastore. It is not a full replacement for a dedicated Iceberg catalog, though. +In Spark versions before 4.2.0, `SparkSessionCatalog` does not expose Iceberg's `system` namespace +(see SPARK-54760), so catalog-scoped SQL functions such as `system.bucket`, `system.days`, and +`system.iceberg_version` are not available through `spark_catalog`. To use those functions, configure a +separate Iceberg catalog with `org.apache.iceberg.spark.SparkCatalog` and call them through that catalog. Review Comment: will this introduce any side effects when users configure two catalogs point to the same catalog? (e.g., two Hive catalogs with the same HMS) to be conservative, maybe explicitly say "workaround" "To use those functions" > "To workaround this limitation" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
