yihua commented on code in PR #18867:
URL: https://github.com/apache/hudi/pull/18867#discussion_r3315583570


##########
website/docs/syncing_metastore.md:
##########
@@ -297,3 +297,40 @@ While using hive beeline query, you need to enter settings:
 ```bash
 set hive.input.format = 
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
 ```
+
+## Spark Catalog Metastore Client
+
+When running Hudi inside a Spark environment that already has Hive support 
enabled (e.g., SparkSQL with `spark.sql.catalogImplementation=hive`), the 
standard `IMetaStoreClient` initialization can conflict with Spark's own Hive 
classloader. Setting
+
+```properties
+hoodie.datasource.hive_sync.use_spark_catalog=true
+```
+
+(default: `false`) makes Hudi use `SparkCatalogMetaStoreClient` — a 
Spark-native `IMetaStoreClient` implementation — instead of creating its own. 
This avoids classloader conflicts in Hive-on-Spark setups. Requires a 
`SparkSession` with Hive support active.
+
+## HMS 4.x Support via JDBC Fallback
+
+HMS 4.x changed several Thrift API method signatures (e.g., `get_table` → 
`get_table_req`), which makes the standard Thrift-based HMS client 
incompatible. Hudi 1.2.0 adds automatic fallback: when the first Thrift call 
fails with a `TApplicationException` indicating an API mismatch, all subsequent 
metadata operations are transparently rerouted through the JDBC path.

Review Comment:
   Verified against the release-1.2.0 source (`HoodieHiveSyncClient.java` at 
the 1.2.0 tag) and addressed both questions in bed135971a4e.
   
   **Detection scope (1):** The `thriftIncompatible` flag is a `volatile 
boolean` field on `HoodieHiveSyncClient`, scoped to a single sync-client 
instance, and only transitions monotonically from `false` to `true` (no reset 
path). In practice the first Thrift call of a sync run probes once; the rest of 
that run uses the JDBC fallback. The next sync run starts a fresh probe.
   
   **Bad JDBC URL/credentials (2):** With `mode=jdbc`, Hudi opens the JDBC 
connection eagerly in the `HoodieHiveSyncClient` constructor (`new 
JDBCExecutor(config)` + `jdbcExecutor.getConnection()`) — before any Thrift 
call. A bad URL, missing driver, or wrong creds therefore fails at startup with 
`HoodieHiveSyncException: Failed to create HiveMetaStoreClient` and the 
underlying JDBC exception as the cause in the stack trace. This is a 
configuration-error path, not an HMS API mismatch, and is the same behavior as 
`mode=jdbc` against any HMS version.
   
   Also called out the no-fallback case: with `mode=hms` or `mode=hiveql` 
against HMS 4.x, Hudi logs `"Thrift API incompatible with HMS but no JDBC 
fallback available. Consider using mode=jdbc with a valid jdbcUrl."` and 
surfaces the original exception — no automatic recovery happens.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to