Github user kmanamcheri commented on the issue:

    https://github.com/apache/spark/pull/22614
  
    > Based on my understanding, the solution of FB team is to retry the 
following commands multiple times:
    > 
    > ```
    > getPartitionsByFilterMethod.invoke(hive, table, 
filter).asInstanceOf[JArrayList[Partition]]
    > ```
    
    @gatorsmile hmm my understanding was different. I thought they were 
retrying the fetchAllpartitions method. Maybe @tejasapatil can clarify here?
    
    > This really depends on what is the actual errors that fail 
`getPartitionsByFilterMethod`. When there are many concurrent users share the 
same metastore, `exponential backoff with retries` is very reasonable since 
most of errors might be caused by timeout or similar reasons.
    
    Doesn't this apply with every other HMS API as well? If so, shouldn't we be 
building a complete solution in HiveShim around this to do an `exponential 
backoff with retries` on every single HMS call in HiveShim?
    
    > If it still fails, I would suggest to fail fast or depends on the conf 
value of `spark.sql.hive.metastorePartitionPruning.fallback.enabled`
    
    Ok I agree. 
    
    I think we need clarification from @tejasapatil on which call they retry.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to