[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r224639756 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") -val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL -// We should get this config value from the metaStore. otherwise hit SPARK-18681. -// To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by: -// val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean -val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, - tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false + // when filtering on a non-string partition column. getPartitionsByFilterMethod.invoke(hive, table, filter) .asInstanceOf[JArrayList[Partition]] } catch { - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - !tryDirectSql => + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => --- End diff -- @kmanamcheri : Lets do this: - We should prefer doing `getPartitionsByFilterMethod()`. If it fails, we retry with increasing delay across retries. - If retries are exhausted, we could fetch all the partitions of the table. Some people might not want this so lets control this using a conf flag. For those who don't want it, the query could fail at this point. What do you think ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user kmanamcheri commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223498018 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") -val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL -// We should get this config value from the metaStore. otherwise hit SPARK-18681. -// To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by: -// val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean -val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, - tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false + // when filtering on a non-string partition column. getPartitionsByFilterMethod.invoke(hive, table, filter) .asInstanceOf[JArrayList[Partition]] } catch { - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - !tryDirectSql => + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => --- End diff -- Could you review the newer changes I have done? Basically, yes, I agree that fetching all partitions is going to be bad and hence we'll leave it up to the user. They can disable fetching all the partitions by setting "spark.sql.hive.metastorePartitionPruning.fallback.enabled" to false. In that case, we'll never retry. If it is set to "true", then we'll retry. As simple as that. I don't completely understand "exponential backoff with retries". Do you do this at the HMS level? or at the query level? If HMS filter pushdown fails once, does it mean it will succeed in the future? Maybe this is a future improvement to this where instead of a boolean "fallback-enabled" or "fallback-disabled", we can have multiple levels for trying the fallback with timing etc. Thoughts @tejasapatil --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user kmanamcheri commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223473324 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") +val shouldFallback = SQLConf.get.metastorePartitionPruningFallback val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL // We should get this config value from the metaStore. otherwise hit SPARK-18681. // To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by: // val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { - // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false getPartitionsByFilterMethod.invoke(hive, table, filter) .asInstanceOf[JArrayList[Partition]] } catch { - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - !tryDirectSql => -logWarning("Caught Hive MetaException attempting to get partition metadata by " + - "filter from Hive. Falling back to fetching all partition metadata, which will " + - "degrade performance. Modifying your Hive metastore configuration to set " + - s"${tryDirectSqlConfVar.varname} to true may resolve this problem.", ex) -// HiveShim clients are expected to handle a superset of the requested partitions -getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - tryDirectSql => -throw new RuntimeException("Caught Hive MetaException attempting to get partition " + - "metadata by filter from Hive. You can set the Spark configuration setting " + - s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false to work around this " + - "problem, however this will result in degraded performance. Please report a bug: " + - "https://issues.apache.org/jira/browse/SPARK";, ex) + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => +if (shouldFallback) { + if (!tryDirectSql) { --- End diff -- Good idea. Done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user kmanamcheri commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223469446 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String) client = init(true) } - test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { -val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(attr("ds") === 20170101)) + test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") { + withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> "true", +SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") { + val client = init(false) + // tryDirectSql = false and a non-string partition filter will always fail. This condition + // is used to test if the fallback works + val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), +Seq(attr("ds") === 20170101)) -assert(filteredPartitions.size == testPartitionCount) + assert(filteredPartitions.size == testPartitionCount) +} + } + + test(s"getPartitionsByFilter should throw an exception if $partPruningFallbackKey=false") { --- End diff -- Ok I changed the test description to be more accurate of what we are testing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223462348 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String) client = init(true) } - test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { -val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(attr("ds") === 20170101)) + test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") { + withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> "true", +SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") { + val client = init(false) + // tryDirectSql = false and a non-string partition filter will always fail. This condition + // is used to test if the fallback works + val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), +Seq(attr("ds") === 20170101)) -assert(filteredPartitions.size == testPartitionCount) + assert(filteredPartitions.size == testPartitionCount) +} + } + + test(s"getPartitionsByFilter should throw an exception if $partPruningFallbackKey=false") { --- End diff -- Shorter: "getPartitionsByFilter fails if metastore call fails and $partPruningFallbackKey=false" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223461273 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") +val shouldFallback = SQLConf.get.metastorePartitionPruningFallback val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL // We should get this config value from the metaStore. otherwise hit SPARK-18681. // To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by: // val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { - // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false getPartitionsByFilterMethod.invoke(hive, table, filter) .asInstanceOf[JArrayList[Partition]] } catch { - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - !tryDirectSql => -logWarning("Caught Hive MetaException attempting to get partition metadata by " + - "filter from Hive. Falling back to fetching all partition metadata, which will " + - "degrade performance. Modifying your Hive metastore configuration to set " + - s"${tryDirectSqlConfVar.varname} to true may resolve this problem.", ex) -// HiveShim clients are expected to handle a superset of the requested partitions -getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - tryDirectSql => -throw new RuntimeException("Caught Hive MetaException attempting to get partition " + - "metadata by filter from Hive. You can set the Spark configuration setting " + - s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false to work around this " + - "problem, however this will result in degraded performance. Please report a bug: " + - "https://issues.apache.org/jira/browse/SPARK";, ex) + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => +if (shouldFallback) { + if (!tryDirectSql) { --- End diff -- Not really your fault, but getting the value of this config requires a round-trip to the HMS, so it would be good to only do that here and avoid that extra remote call in the normal case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223459567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -544,6 +544,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED = +buildConf("spark.sql.hive.metastorePartitionPruning.fallback.enabled") + .doc("When true, enable fallback to fetch all partitions if Hive metastore partition " + + "push down fails. This is applicable only if partition pruning is enabled (see " + + s" ${HIVE_METASTORE_PARTITION_PRUNING.key}). Enabling this may degrade performance " + + "if there are a large number of partitions." ) --- End diff -- "if there is" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223441744 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String) client = init(true) } - test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { -val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(attr("ds") === 20170101)) + test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") { + withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> "true", +SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") { + val client = init(false) + // tryDirectSql = false and a non-string partition filter will always fail. This condition + // is used to test if the fallback works + val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), +Seq(attr("ds") === 20170101)) -assert(filteredPartitions.size == testPartitionCount) + assert(filteredPartitions.size == testPartitionCount) +} + } + + test(s"getPartitionsByFilter should throw an exception if $partPruningFallbackKey=false") { --- End diff -- The test name states that `getPartitionsByFilter` should throw an exception if partition pruning fallback is disabled. But that's not right. I think we need an accurate name for this and the previous test. Perhaps it should include a mention that the underlying call to the metastore throws an exception. How about ``` s"getPartitionsByFilter should throw an exception if the underlying call to the metastore throws an exception and $partPruningFallbackKey=false" ``` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user kmanamcheri commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223429117 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String) client = init(true) } - test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { -val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(attr("ds") === 20170101)) + test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") { + withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> "true", +SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") { + val client = init(false) + // tryDirectSql = false and a non-string partition filter will always fail. This condition + // is used to test if the fallback works + val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), +Seq(attr("ds") === 20170101)) -assert(filteredPartitions.size == testPartitionCount) + assert(filteredPartitions.size == testPartitionCount) +} + } + + test(s"getPartitionsByFilter should throw an exception if $partPruningFallbackKey=false") { --- End diff -- Hmm.. The behavior does not depend on the value of tryDirectSqlKey though. It solely only depends on if pruning is enabled and if pruning.fallback is enabled. The reason we set tryDirectSqlKey to false is to generate a consistent way for pruning to fail. Only setting tryDirectSql to false does not throw an exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223425011 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String) client = init(true) } - test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { -val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(attr("ds") === 20170101)) + test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") { + withSQLConf(SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ENABLED.key -> "true", +SQLConf.HIVE_METASTORE_PARTITION_PRUNING.key -> "true") { + val client = init(false) + // tryDirectSql = false and a non-string partition filter will always fail. This condition + // is used to test if the fallback works + val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), +Seq(attr("ds") === 20170101)) -assert(filteredPartitions.size == testPartitionCount) + assert(filteredPartitions.size == testPartitionCount) +} + } + + test(s"getPartitionsByFilter should throw an exception if $partPruningFallbackKey=false") { --- End diff -- Change test name to ``` s"getPartitionsByFilter should throw an exception when $tryDirectSqlKey=false and $partPruningFallbackKey=false" ``` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223424625 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String) client = init(true) } - test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") { -val client = init(false) -val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"), - Seq(attr("ds") === 20170101)) + test(s"getPartitionsByFilter returns all partitions when $partPruningFallbackKey=true") { --- End diff -- Change test name to ``` s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false and $partPruningFallbackKey=true" ``` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223422030 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] } else { logDebug(s"Hive metastore filter is '$filter'.") +val shouldFallback = SQLConf.get.metastorePartitionPruningFallback val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL // We should get this config value from the metaStore. otherwise hit SPARK-18681. // To be compatible with hive-0.12 and hive-0.13, In the future we can achieve this by: // val tryDirectSql = hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean val tryDirectSql = hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname, tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { - // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false getPartitionsByFilterMethod.invoke(hive, table, filter) .asInstanceOf[JArrayList[Partition]] } catch { - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - !tryDirectSql => -logWarning("Caught Hive MetaException attempting to get partition metadata by " + - "filter from Hive. Falling back to fetching all partition metadata, which will " + - "degrade performance. Modifying your Hive metastore configuration to set " + - s"${tryDirectSqlConfVar.varname} to true may resolve this problem.", ex) -// HiveShim clients are expected to handle a superset of the requested partitions -getAllPartitionsMethod.invoke(hive, table).asInstanceOf[JSet[Partition]] - case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] && - tryDirectSql => -throw new RuntimeException("Caught Hive MetaException attempting to get partition " + - "metadata by filter from Hive. You can set the Spark configuration setting " + - s"${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to false to work around this " + - "problem, however this will result in degraded performance. Please report a bug: " + - "https://issues.apache.org/jira/browse/SPARK";, ex) + case ex: InvocationTargetException if ex.getCause.isInstanceOf[MetaException] => +if (shouldFallback) { + if (!tryDirectSql) { +logWarning("Caught Hive MetaException attempting to get partition metadata by " + + "filter from Hive. Falling back to fetching all partition metadata, which will " + + "degrade performance. Modifying your Hive metastore configuration to set " + + s"${tryDirectSqlConfVar.varname} to true may resolve this problem.") + } else { +logWarning("Caught Hive MetaException attempting to get partition metadata " + + "by filter from Hive. Hive metastore's direct SQL feature has been enabled, " + + "but it is an optimistic optimization and not guaranteed to work. Falling back " + + "to fetching all partition metadata, which will degrade performance (for the " + + "current query). If you see this error consistently, you can set the Spark " + + s"configuration setting ${SQLConf.HIVE_MANAGE_FILESOURCE_PARTITIONS.key} to " + + "false as a work around, however this will result in degraded performance. " + + "Please report a bug to Hive stating that direct SQL is failing consistently " + + "for the specified query: https://issues.apache.org/jira/browse/HIVE";) --- End diff -- I think we should remove the suggestion to file a Hive project bug. Even with the direct SQL configuration setting enabled, there are valid metastore deployments for which it will be ignored. For example, my understanding is that if the metastore uses MongoDB for its underlying storage, the direct SQL configuration setting will be ignored. That means a failure here is not a Hive bug with direct SQL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spar
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user kmanamcheri commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223419868 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false + // when filtering on a non-string partition column. --- End diff -- Done @mallman --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user kmanamcheri commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223418766 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false + // when filtering on a non-string partition column. --- End diff -- > of course any method call may throw an exception in some circumstances. :+1: Yes, you are right. I agree with removing comments which might mislead future developers. I'll rephrase this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223415835 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean try { // Hive may throw an exception when calling this method in some circumstances, such as - // when filtering on a non-string partition column when the hive config key - // hive.metastore.try.direct.sql is false + // when filtering on a non-string partition column. --- End diff -- To me your revised comment suggests that if we try to filter on a non-string partition column we can expect the call to `getPartitionedByFilter` to fail. That's not true. Although I wrote the original comment, I can see how making an assumption about Hive's behavior when calling this method is rather specious and unwise. I suggest we just remove this comment entirely. To simply state that ``` Hive may throw an exception when calling this method in some circumstances. ``` only states the obviousâof course any method call may throw an exception in some circumstances. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org