Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19605#discussion_r147929746 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -89,10 +89,12 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Run some code involving `client` in a [[synchronized]] block and wrap certain - * exceptions thrown in the process in [[AnalysisException]]. + * Run some code involving `client` and wrap certain exceptions thrown in the process in + * [[AnalysisException]]. Thread-safety is guaranteed here because methods in the `client` + * ([[org.apache.spark.sql.hive.client.HiveClientImpl]]) are already synchronized through + * `clientLoader` in the `retryLocked` method. */ - private def withClient[T](body: => T): T = synchronized { + private def withClient[T](body: => T): T = { --- End diff -- I went through all methods in `HiveClient` having synchronization (except `addJar`): - `getState` is used only in test. - `setOut`, `setInfo` and `setError` are only used in `SparkSQLEnv.init()`. - all other methods are called through `HiveExternalCatalog`. So it seems `addJar` is the only exception. To make `addJar` also go throught `HiveExternalCatalog`, we can pass `externalCatalog` instead of `client` at [line46 in HiveSessionStateBuilder](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L46). But I don't know why we need to call `newSession()` at [line45](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L45), where a new `HiveClientImpl` instance is created, with the same class loader and Hive client.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org