Github user kmanamcheri commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22614#discussion_r223498018
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
             getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
           } else {
             logDebug(s"Hive metastore filter is '$filter'.")
    -        val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
    -        // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
    -        // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
    -        // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
    -        val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
    -          tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
             try {
               // Hive may throw an exception when calling this method in some 
circumstances, such as
    -          // when filtering on a non-string partition column when the hive 
config key
    -          // hive.metastore.try.direct.sql is false
    +          // when filtering on a non-string partition column.
               getPartitionsByFilterMethod.invoke(hive, table, filter)
                 .asInstanceOf[JArrayList[Partition]]
             } catch {
    -          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
    -              !tryDirectSql =>
    +          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
    --- End diff --
    
    Could you review the newer changes I have done? Basically, yes, I agree 
that fetching all partitions is going to be bad and hence we'll leave it up to 
the user. They can disable fetching all the partitions by setting 
"spark.sql.hive.metastorePartitionPruning.fallback.enabled" to false. In that 
case, we'll never retry. If it is set to "true", then we'll retry. As simple as 
that.
    
    I don't completely understand "exponential backoff with retries". Do you do 
this at the HMS level? or at the query level? If HMS filter pushdown fails 
once, does it mean it will succeed in the future? Maybe this is a future 
improvement to this where instead of a boolean "fallback-enabled" or 
"fallback-disabled", we can have multiple levels for trying the fallback with 
timing etc. Thoughts @tejasapatil 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to