rahil-c commented on code in PR #1862:
URL: https://github.com/apache/polaris/pull/1862#discussion_r2232267318
##########
plugins/spark/v3.5/spark/src/main/java/org/apache/polaris/spark/utils/PolarisCatalogUtils.java:
##########
@@ -91,6 +109,85 @@ public static Table loadSparkTable(GenericTable
genericTable) {
provider, new CaseInsensitiveStringMap(tableProperties),
scala.Option.empty());
}
+ /**
+ * Extract catalog name from Spark session configuration. Looks for
configuration like:
+ * spark.sql.catalog.<CATALOG_NAME>=org.apache.polaris.spark.SparkCatalog
+ */
+ private static String getCatalogName() {
+ SparkSession spark = SparkSession.active();
+ String catalogPrefix = "spark.sql.catalog.";
+ String polarisSparkCatalog = "org.apache.polaris.spark.SparkCatalog";
+
+ scala.collection.Iterator<scala.Tuple2<String, String>> configIterator =
+ spark.conf().getAll().iterator();
+ while (configIterator.hasNext()) {
+ scala.Tuple2<String, String> config = configIterator.next();
+ String key = config._1();
+ String value = config._2();
+
+ if (key.startsWith(catalogPrefix) && polarisSparkCatalog.equals(value)) {
+ return key.substring(catalogPrefix.length());
+ }
+ }
+
+ throw new IllegalStateException(
+ "Could not obtain Polaris catalog identifier."
+ + "Expected following configuration to be set in session:
spark.sql.catalog.<CATALOG_NAME>=org.apache.polaris.spark.SparkCatalog");
+ }
+
+ public static Table loadV1SparkHudiTable(GenericTable genericTable,
Identifier identifier) {
+ Map<String, String> tableProperties = Maps.newHashMap();
+ tableProperties.putAll(genericTable.getProperties());
+ tableProperties.put(
+ TABLE_PATH_KEY,
genericTable.getProperties().get(TableCatalog.PROP_LOCATION));
+
+ // Need full identifier in order to construct CatalogTable correctly for
Hudi
+ String namespacePath = String.join(".", identifier.namespace());
+ TableIdentifier tableIdentifier =
+ new TableIdentifier(
+ identifier.name(), Option.apply(namespacePath),
Option.apply(getCatalogName()));
Review Comment:
Hudi is expecting the Spark `V1Table` which will returned by Polaris to
contain this catalog Identifier. This is because Hudi is currently using Spark
DSV1 it will invoke the following Spark `Rule`, `FindDataSourceTable` which
expects this catalog identifier `table.identifier.catalog.get`. If the catalog
does not return this catalog identifier, then during `.get` it will hit a NPE
on spark side and error out.
Therefore we will need this in order for the integration to work.
Class is `DataSourceStrategy.scala` in spark code base
<img width="1359" height="251" alt="Screenshot 2025-07-25 at 4 57 12 PM"
src="https://github.com/user-attachments/assets/3d586858-2696-434f-b751-158de506a8af"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]