[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223172392
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

@gatorsmile : Sorry for late reply. We had seen issues with this in past 
and resorted to do exponential backoff with retries. Fetching all the 
partitions is going to be bad in a prod setting even if it makes it 
through, the underlying problem if left un-noticed is bad for the system health.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223148534
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -87,6 +90,18 @@ class HiveClientSuite(version: String)
 assert(filteredPartitions.size == testPartitionCount)
   }
 
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
+withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK.key -> 
"false") {
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223148506
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+buildConf("spark.sql.hive.metastorePartitionPruningFallback")
+  .doc("When true, enable fallback to fetch all partitions if Hive 
metastore partition " +
+   "push down fails. This is applicable only if partition pruning 
is enabled (see " +
+   s" ${HIVE_METASTORE_PARTITION_PRUNING.key}). Enabling this may 
degrade performance " +
+   "if there are a large number of partitions." )
+  .booleanConf
+  .createWithDefault(true)
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223139984
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+buildConf("spark.sql.hive.metastorePartitionPruningFallback")
--- End diff --

spark.sql.hive.metastorePartitionPruning.fallback.enabled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223139874
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -87,6 +90,18 @@ class HiveClientSuite(version: String)
 assert(filteredPartitions.size == testPartitionCount)
   }
 
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
+withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK.key -> 
"false") {
--- End diff --

Ok will do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223137994
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -87,6 +90,18 @@ class HiveClientSuite(version: String)
 assert(filteredPartitions.size == testPartitionCount)
   }
 
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
+withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK.key -> 
"false") {
--- End diff --

Also need to set `HIVE_METASTORE_PARTITION_PRUNING.key` to `true`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223136013
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+buildConf("spark.sql.hive.metastorePartitionPruningFallback")
+  .doc("When true, enable fallback to fetch all partitions if Hive 
metastore partition " +
+   "push down fails. This is applicable only if partition pruning 
is enabled (see " +
+   s" ${HIVE_METASTORE_PARTITION_PRUNING.key}). Enabling this may 
degrade performance " +
+   "if there are a large number of partitions." )
+  .booleanConf
+  .createWithDefault(true)
--- End diff --

If we set that to false, SPARK-17992 will change the default behavior. Is 
that ok?

So if we set this flag to false by default, then an exception from HMS will 
always be re-thrown (direct sql or not). Just want to make sure that we 
understand that the default behavior will change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223134564
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+buildConf("spark.sql.hive.metastorePartitionPruningFallback")
+  .doc("When true, enable fallback to fetch all partitions if Hive 
metastore partition " +
+   "push down fails. This is applicable only if partition pruning 
is enabled (see " +
+   s" ${HIVE_METASTORE_PARTITION_PRUNING.key}). Enabling this may 
degrade performance " +
+   "if there are a large number of partitions." )
+  .booleanConf
+  .createWithDefault(true)
--- End diff --

By default, this should be false


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223122115
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+buildConf("spark.sql.hive.metastorePartitionPruningFallback")
--- End diff --

What is the reasoning for marking this as legacy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223117511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK =
+buildConf("spark.sql.hive.metastorePartitionPruningFallback")
--- End diff --

Should we use `spark.sql.legacy` prefix like 
[SPARK-19724](https://github.com/apache/spark/pull/22515), @rxin and @cloud-fan 
?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222381843
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

Yes, if the query tries to get more partitions than 
"hive.metastore.limit.partition.request", the query will fail. Using the hive 
config the user can judge if he wants to get all the partitions or not. I also 
think that config covers all the concerns


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222373627
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

One option if we want to get all fancy is to add a configurable timeout in 
the fallback case - assuming it's possible to cancel an ongoing call (run in a 
separate thread + interrupt maybe?).

My main concern with the fallback, really, isn't the slowness, but that in 
the case where it would be slow (= too many partitions), the HMS might just run 
itself out of memory trying to serve the request.

Reza mentions the Hive config which I think is the right thing to do by the 
HMS admin, since it avoids apps DoS'ing the server. Not sure what's the 
behavior there, but I hope if fails the call if there are too many partitions 
(instead of returning a subset). IMO that config seems to cover all the 
concerns here assuming the call will fail when you have too many partitions, no?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222372452
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

@dongjoon-hyun @mallman I have updated the log messages to be more 
descriptive and helpful for the user to indicate what they should try doing. 
Does that help?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222359233
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

@mallman I haven't tried using that config option. If I am understanding 
[the documentation for 
HIVE_MANAGE_FILESOURCE_PARTITIONS](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L547)
 correctly, if I set that value to false, partitions will not be stored in HMS. 
That sounds like it is addressing a different issue, no?

If that's the suggested way to deal with non-supported partition filters, 
then this code should always fail if getPartitionsByFilter fails, no? Why even 
have a fallback (as we do currently)? SPARK-17992 seems to say that Spark 
should handle certain cases of partition pushdown failures (such as HMS ORM 
mode). My argument is that the case should be expanded to include even if 
hive.metastore.try.direct.sql is enabled to be true.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222349989
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

Hive has a config "hive.metastore.limit.partition.request" that can limit 
number of partitions that can be requested from HMS. So I think there is no 
need for a new config on the Spark side. 
Also since direct sql is a best effort approach just failing when direct 
sql is enabled is not good.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222348679
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

@mallman The key point to note here is that setting direct sql on HMS "may" 
resolve the problem. It is not guaranteed. HMS only optimistically optimizes 
this. If direct sql on HMS fails, it will fall back on ORM and then fail again. 
Spark'ss behavior should not be inconsistent depending on HMS config value. 

My suggested fix would still call getPartitionsByFilter and if that fails, 
will call getAllPartitions. We won't be calling getAllPartitions in all cases. 
It is a fallback mechanism. 

@gatorsmile hmm.. why not? We know it might be slow and hence the warning. 
Maybe the warning message should read that this could be slow depending on the 
number of partitions since partition push-down to HMS failed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222348674
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

> sometimes failing is better than spending several hours to get all 
partitions. Shall we add a config to switch the behavior?

I think that's what `${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS}` is for. 
@kmanamcheri, what happens if you set this to false?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222345462
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

I think the original warning message is more accurate. Direct sql mode 
isn't just about performance. It's also about enhanced capability, e.g. 
supporting filtering on non-string type columns. As the original comment 
states, setting the direct sql config value to true may resolve a problem 
around metastore-side partition filtering.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222340141
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

sometimes failing is better than spending several hours to get all 
partitions. Shall we add a config to switch the behavior?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222190138
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

To be honest, I do not think we should issue a warning message and call 
getAllPartitions. When the number of partitions is huge, getAllPartitions will 
be super super slow. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222140323
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

Good idea. Yes that would be better. I'll add that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222135430
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
+  "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
+  "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
+  "optimization and does not guarantee improved performance.")
--- End diff --

@kmanamcheri . Could you show different and more correct warning message 
based on `tryDirectSql` value here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222134301
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

cc @sameeragarwal @tejasapatil  Could you share what FB does for the retry? 
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222134090
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
--- End diff --

Ping @wangyum, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222123856
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

Also, it's not blindly calling that API right? It was already being called 
before if direct sql was disabled. In the other case, it was just throwing an 
exception. So now instead of erroring out it will work, just more slowly than 
expected.

Unless there's some retry at a higher layer that I'm not aware of.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222123426
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

@gatorsmile From HMS side, the error is always the same "MetaException" and 
there is no way to tell apart a direct SQL error from an error of "not 
supported" (unfortunately!). How do you propose we address this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222122774
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

ping @srinathshankar @ericl 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r222122400
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

We should not blindly call getAllPartitions. This will be super slow. We 
should do some retries. It depends on the errors we got. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org