Github user kmanamcheri commented on the issue: https://github.com/apache/spark/pull/22614 > Based on my understanding, the solution of FB team is to retry the following commands multiple times: > > ``` > getPartitionsByFilterMethod.invoke(hive, table, filter).asInstanceOf[JArrayList[Partition]] > ``` @gatorsmile hmm my understanding was different. I thought they were retrying the fetchAllpartitions method. Maybe @tejasapatil can clarify here? > This really depends on what is the actual errors that fail `getPartitionsByFilterMethod`. When there are many concurrent users share the same metastore, `exponential backoff with retries` is very reasonable since most of errors might be caused by timeout or similar reasons. Doesn't this apply with every other HMS API as well? If so, shouldn't we be building a complete solution in HiveShim around this to do an `exponential backoff with retries` on every single HMS call in HiveShim? > If it still fails, I would suggest to fail fast or depends on the conf value of `spark.sql.hive.metastorePartitionPruning.fallback.enabled` Ok I agree. I think we need clarification from @tejasapatil on which call they retry.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org