[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

rezasafi Wed, 03 Oct 2018 08:10:53 -0700

Github user rezasafi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22614#discussion_r222349989
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
             getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
           } else {
             logDebug(s"Hive metastore filter is '$filter'.")
    -        val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
    -        // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
    -        // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
    -        // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
    -        val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
    -          tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
             try {
               // Hive may throw an exception when calling this method in some 
circumstances, such as
    -          // when filtering on a non-string partition column when the hive 
config key
    -          // hive.metastore.try.direct.sql is false
    +          // when filtering on a non-string partition column.
               getPartitionsByFilterMethod.invoke(hive, table, filter)
                 .asInstanceOf[JArrayList[Partition]]
             } catch {
    -          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
    -              !tryDirectSql =>
    +          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
                 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
                   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
    -              "degrade performance. Modifying your Hive metastore 
configuration to set " +
    -              s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
    +              "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
    +              "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
    +              "optimization and does not guarantee improved performance.")
    --- End diff --
    
    Hive has a config "hive.metastore.limit.partition.request" that can limit 
number of partitions that can be requested from HMS. So I think there is no 
need for a new config on the Spark side. 
    Also since direct sql is a best effort approach just failing when direct 
sql is enabled is not good.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

Reply via email to