[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r224639756
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

@kmanamcheri : Lets do this:
- We should prefer doing `getPartitionsByFilterMethod()`. If it fails, we 
retry with increasing delay across retries.
- If retries are exhausted, we could fetch all the partitions of the table. 
Some people might not want this so lets control this using a conf flag. For 
those who don't want it, the query could fail at this point.

What do you think ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223498018
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
-val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
-// We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
-// To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
-// val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
-val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
-  tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
--- End diff --

Could you review the newer changes I have done? Basically, yes, I agree 
that fetching all partitions is going to be bad and hence we'll leave it up to 
the user. They can disable fetching all the partitions by setting 
"spark.sql.hive.metastorePartitionPruning.fallback.enabled" to false. In that 
case, we'll never retry. If it is set to "true", then we'll retry. As simple as 
that.

I don't completely understand "exponential backoff with retries". Do you do 
this at the HMS level? or at the query level? If HMS filter pushdown fails 
once, does it mean it will succeed in the future? Maybe this is a future 
improvement to this where instead of a boolean "fallback-enabled" or 
"fallback-disabled", we can have multiple levels for trying the fallback with 
timing etc. Thoughts @tejasapatil 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223473324
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
+val shouldFallback = SQLConf.get.metastorePartitionPruningFallback
 val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
 // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
 // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
 // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
 val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
   tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
-  // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
-logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
-  "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
-// HiveShim clients are expected to handle a superset of the 
requested partitions
-getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  tryDirectSql =>
-throw new RuntimeException("Caught Hive MetaException 
attempting to get partition " +
-  "metadata by filter from Hive. You can set the Spark 
configuration setting " +
-  s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false 
to work around this " +
-  "problem, however this will result in degraded performance. 
Please report a bug: " +
-  "https://issues.apache.org/jira/browse/SPARK;, ex)
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
+if (shouldFallback) {
+  if (!tryDirectSql) {
--- End diff --

Good idea. Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223469446
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String)
 client = init(true)
   }
 
-  test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
-val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when 
$partPruningFallbackKey=true") {
+
withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> 
"true",
+SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") {
+  val client = init(false)
+  // tryDirectSql = false and a non-string partition filter will 
always fail. This condition
+  // is used to test if the fallback works
+  val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
+Seq(attr("ds") === 20170101))
 
-assert(filteredPartitions.size == testPartitionCount)
+  assert(filteredPartitions.size == testPartitionCount)
+}
+  }
+
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
--- End diff --

Ok I changed the test description to be more accurate of what we are 
testing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223462348
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String)
 client = init(true)
   }
 
-  test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
-val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when 
$partPruningFallbackKey=true") {
+
withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> 
"true",
+SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") {
+  val client = init(false)
+  // tryDirectSql = false and a non-string partition filter will 
always fail. This condition
+  // is used to test if the fallback works
+  val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
+Seq(attr("ds") === 20170101))
 
-assert(filteredPartitions.size == testPartitionCount)
+  assert(filteredPartitions.size == testPartitionCount)
+}
+  }
+
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
--- End diff --

Shorter: "getPartitionsByFilter fails if metastore call fails and 
$partPruningFallbackKey=false"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223461273
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
+val shouldFallback = SQLConf.get.metastorePartitionPruningFallback
 val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
 // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
 // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
 // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
 val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
   tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
-  // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
-logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
-  "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
-// HiveShim clients are expected to handle a superset of the 
requested partitions
-getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  tryDirectSql =>
-throw new RuntimeException("Caught Hive MetaException 
attempting to get partition " +
-  "metadata by filter from Hive. You can set the Spark 
configuration setting " +
-  s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false 
to work around this " +
-  "problem, however this will result in degraded performance. 
Please report a bug: " +
-  "https://issues.apache.org/jira/browse/SPARK;, ex)
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
+if (shouldFallback) {
+  if (!tryDirectSql) {
--- End diff --

Not really your fault, but getting the value of this config requires a 
round-trip to the HMS, so it would be good to only do that here and avoid that 
extra remote call in the normal case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223459567
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -544,6 +544,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED =
+buildConf("spark.sql.hive.metastorePartitionPruning.fallback.enabled")
+  .doc("When true, enable fallback to fetch all partitions if Hive 
metastore partition " +
+   "push down fails. This is applicable only if partition pruning 
is enabled (see " +
+   s" ${HIVE_METASTORE_PARTITION_PRUNING.key}). Enabling this may 
degrade performance " +
+   "if there are a large number of partitions." )
--- End diff --

"if there is"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223441744
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String)
 client = init(true)
   }
 
-  test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
-val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when 
$partPruningFallbackKey=true") {
+
withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> 
"true",
+SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") {
+  val client = init(false)
+  // tryDirectSql = false and a non-string partition filter will 
always fail. This condition
+  // is used to test if the fallback works
+  val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
+Seq(attr("ds") === 20170101))
 
-assert(filteredPartitions.size == testPartitionCount)
+  assert(filteredPartitions.size == testPartitionCount)
+}
+  }
+
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
--- End diff --

The test name states that `getPartitionsByFilter` should throw an exception 
if partition pruning fallback is disabled. But that's not right. I think we 
need an accurate name for this and the previous test. Perhaps it should include 
a mention that the underlying call to the metastore throws an exception. How 
about

```
s"getPartitionsByFilter should throw an exception if the underlying call to 
the
metastore throws an exception and $partPruningFallbackKey=false"
```

?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223429117
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String)
 client = init(true)
   }
 
-  test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
-val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when 
$partPruningFallbackKey=true") {
+
withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> 
"true",
+SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") {
+  val client = init(false)
+  // tryDirectSql = false and a non-string partition filter will 
always fail. This condition
+  // is used to test if the fallback works
+  val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
+Seq(attr("ds") === 20170101))
 
-assert(filteredPartitions.size == testPartitionCount)
+  assert(filteredPartitions.size == testPartitionCount)
+}
+  }
+
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
--- End diff --

Hmm.. The behavior does not depend on the value of tryDirectSqlKey though. 
It solely only depends on if pruning is enabled and if pruning.fallback is 
enabled. 

The reason we set tryDirectSqlKey to false is to generate a consistent way 
for pruning to fail. Only setting tryDirectSql to false does not throw an 
exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223425011
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String)
 client = init(true)
   }
 
-  test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
-val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when 
$partPruningFallbackKey=true") {
+
withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> 
"true",
+SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") {
+  val client = init(false)
+  // tryDirectSql = false and a non-string partition filter will 
always fail. This condition
+  // is used to test if the fallback works
+  val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
+Seq(attr("ds") === 20170101))
 
-assert(filteredPartitions.size == testPartitionCount)
+  assert(filteredPartitions.size == testPartitionCount)
+}
+  }
+
+  test(s"getPartitionsByFilter should throw an exception if 
$partPruningFallbackKey=false") {
--- End diff --

Change test name to

```
s"getPartitionsByFilter should throw an exception when 
$tryDirectSqlKey=false and $partPruningFallbackKey=false"
```

?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223424625
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala 
---
@@ -79,12 +82,30 @@ class HiveClientSuite(version: String)
 client = init(true)
   }
 
-  test(s"getPartitionsByFilter returns all partitions when 
$tryDirectSqlKey=false") {
-val client = init(false)
-val filteredPartitions = 
client.getPartitionsByFilter(client.getTable("default", "test"),
-  Seq(attr("ds") === 20170101))
+  test(s"getPartitionsByFilter returns all partitions when 
$partPruningFallbackKey=true") {
--- End diff --

Change test name to

```
s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false 
and $partPruningFallbackKey=true"
```

?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223422030
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
   } else {
 logDebug(s"Hive metastore filter is '$filter'.")
+val shouldFallback = SQLConf.get.metastorePartitionPruningFallback
 val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
 // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
 // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
 // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
 val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
   tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
-  // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
   getPartitionsByFilterMethod.invoke(hive, table, filter)
 .asInstanceOf[JArrayList[Partition]]
 } catch {
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  !tryDirectSql =>
-logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
-  "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
-  "degrade performance. Modifying your Hive metastore 
configuration to set " +
-  s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
-// HiveShim clients are expected to handle a superset of the 
requested partitions
-getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
-  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
-  tryDirectSql =>
-throw new RuntimeException("Caught Hive MetaException 
attempting to get partition " +
-  "metadata by filter from Hive. You can set the Spark 
configuration setting " +
-  s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false 
to work around this " +
-  "problem, however this will result in degraded performance. 
Please report a bug: " +
-  "https://issues.apache.org/jira/browse/SPARK;, ex)
+  case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
+if (shouldFallback) {
+  if (!tryDirectSql) {
+logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
+  "filter from Hive. Falling back to fetching all 
partition metadata, which will " +
+  "degrade performance. Modifying your Hive metastore 
configuration to set " +
+  s"${tryDirectSqlConfVar.varname} to true may resolve 
this problem.")
+  } else {
+logWarning("Caught Hive MetaException attempting to get 
partition metadata " +
+  "by filter from Hive. Hive metastore's direct SQL 
feature has been enabled, " +
+  "but it is an optimistic optimization and not guaranteed 
to work. Falling back " +
+  "to fetching all partition metadata, which will degrade 
performance (for the " +
+  "current query). If you see this error consistently, you 
can set the Spark " +
+  s"configuration setting 
${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to " +
+  "false as a work around, however this will result in 
degraded performance. " +
+  "Please report a bug to Hive stating that direct SQL is 
failing consistently " +
+  "for the specified query: 
https://issues.apache.org/jira/browse/HIVE;)
--- End diff --

I think we should remove the suggestion to file a Hive project bug. Even 
with the direct SQL configuration setting enabled, there are valid metastore 
deployments for which it will be ignored. For example, my understanding is that 
if the metastore uses MongoDB for its underlying storage, the direct SQL 
configuration setting will be ignored. That means a failure here is not a Hive 
bug with direct SQL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223419868
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
   tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
--- End diff --

Done @mallman 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread kmanamcheri
Github user kmanamcheri commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223418766
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
   tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
--- End diff --

> of course any method call may throw an exception in some circumstances.
:+1: Yes, you are right. I agree with removing comments which might mislead 
future developers. I'll rephrase this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request:

https://github.com/apache/spark/pull/22614#discussion_r223415835
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
   tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
 try {
   // Hive may throw an exception when calling this method in some 
circumstances, such as
-  // when filtering on a non-string partition column when the hive 
config key
-  // hive.metastore.try.direct.sql is false
+  // when filtering on a non-string partition column.
--- End diff --

To me your revised comment suggests that if we try to filter on a 
non-string partition column we can expect the call to `getPartitionedByFilter` 
to fail. That's not true.

Although I wrote the original comment, I can see how making an assumption 
about Hive's behavior when calling this method is rather specious and unwise. I 
suggest we just remove this comment entirely. To simply state that

```
Hive may throw an exception when calling this method in some circumstances.
```

only states the obvious—of course any method call may throw an exception 
in some circumstances.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org