date:20200108

[GitHub] [spark] SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-08 Thread GitBox

SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting 
partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572437346
 
 
   **[Test build #116362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116362/testReport)**
 for PR 27129 at commit 
[`abeb326`](https://github.com/apache/spark/commit/abeb32672b18690d58fa34c099845c6a777333ca).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] The Executor Computing Time in Time Line of Stage Page is Wrong

2020-01-08 Thread GitBox

sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] 
The Executor Computing Time in Time Line of Stage Page is Wrong
URL: https://github.com/apache/spark/pull/27135#discussion_r364597689
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
 ##
 @@ -288,10 +288,10 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
 val executorOverhead = serializationTime + deserializationTime
 val executorRunTime = if (taskInfo.duration.isDefined) {
-  totalExecutionTime - executorOverhead - gettingResultTime
+  math.max(totalExecutionTime - executorOverhead - gettingResultTime - 
schedulerDelay, 0)
 
 Review comment:
   It is impossible.
   I write math.max(xxx, 0) here,  just be consistent with the calculation of 
the executorComputingTimeProportion 
   L297:
   ```
val executorComputingTimeProportion =
 math.max(100 - schedulerDelayProportion - 
shuffleReadTimeProportion -
   shuffleWriteTimeProportion - serializationTimeProportion -
   deserializationTimeProportion - gettingResultTimeProportion, 0)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] The Executor Computing Time in Time Line of Stage Page is Wrong

2020-01-08 Thread GitBox

sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] 
The Executor Computing Time in Time Line of Stage Page is Wrong
URL: https://github.com/apache/spark/pull/27135#discussion_r364597689
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
 ##
 @@ -288,10 +288,10 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 
 val executorOverhead = serializationTime + deserializationTime
 val executorRunTime = if (taskInfo.duration.isDefined) {
-  totalExecutionTime - executorOverhead - gettingResultTime
+  math.max(totalExecutionTime - executorOverhead - gettingResultTime - 
schedulerDelay, 0)
 
 Review comment:
   It is impossible.
   I write math.max(xxx, 0) here,  just be consistent with the calculation of 
the executorComputingTimeProportion 
   L297:
val executorComputingTimeProportion =
 math.max(100 - schedulerDelayProportion - 
shuffleReadTimeProportion -
   shuffleWriteTimeProportion - serializationTimeProportion -
   deserializationTimeProportion - gettingResultTimeProportion, 0)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364597264
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala
 ##
 @@ -35,21 +35,23 @@ case class DescribeNamespaceExec(
 namespace: Seq[String],
 isExtended: Boolean) extends V2CommandExec {
   private val encoder = 
RowEncoder(StructType.fromAttributes(output)).resolveAndBind()
+  import SupportsNamespaces._
 
   override protected def run(): Seq[InternalRow] = {
 val rows = new ArrayBuffer[InternalRow]()
 val ns = namespace.toArray
 val metadata = catalog.loadNamespaceMetadata(ns)
 
 rows += toCatalystRow("Namespace Name", ns.last)
-rows += toCatalystRow("Description", 
metadata.get(SupportsNamespaces.PROP_COMMENT))
-rows += toCatalystRow("Location", 
metadata.get(SupportsNamespaces.PROP_LOCATION))
+rows += toCatalystRow("Description", metadata.get(PROP_COMMENT))
+rows += toCatalystRow("Location", metadata.get(PROP_LOCATION))
+rows += toCatalystRow("Owner Name", metadata.getOrDefault(PROP_OWNER_NAME, 
""))
 
 Review comment:
   we should do the same for other fields.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

yaooqinn commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364596755
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 ##
 @@ -884,7 +885,8 @@ class DataSourceV2SQLSuite
 .isEmpty, s"$key is a reserved namespace property and ignored")
   val meta =
 
catalog("testcat").asNamespaceCatalog.loadNamespaceMetadata(Array("reservedTest"))
-  assert(meta.get(key) === null, "reserved properties should not have 
side effects")
+  assert(!Option(meta.get(key)).exists(_.contains("foo")),
 
 Review comment:
   this can be fixed if we revert the change in Create syntax


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572435529
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116361/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364596563
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala
 ##
 @@ -35,21 +35,23 @@ case class DescribeNamespaceExec(
 namespace: Seq[String],
 isExtended: Boolean) extends V2CommandExec {
   private val encoder = 
RowEncoder(StructType.fromAttributes(output)).resolveAndBind()
+  import SupportsNamespaces._
 
   override protected def run(): Seq[InternalRow] = {
 val rows = new ArrayBuffer[InternalRow]()
 val ns = namespace.toArray
 val metadata = catalog.loadNamespaceMetadata(ns)
 
 rows += toCatalystRow("Namespace Name", ns.last)
-rows += toCatalystRow("Description", 
metadata.get(SupportsNamespaces.PROP_COMMENT))
-rows += toCatalystRow("Location", 
metadata.get(SupportsNamespaces.PROP_LOCATION))
+rows += toCatalystRow("Description", metadata.get(PROP_COMMENT))
+rows += toCatalystRow("Location", metadata.get(PROP_LOCATION))
+rows += toCatalystRow("Owner Name", metadata.getOrDefault(PROP_OWNER_NAME, 
""))
 
 Review comment:
   we should skip printing owner if the properties don't exist


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572432308
 
 
   **[Test build #116361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116361/testReport)**
 for PR 26813 at commit 
[`a8f5e03`](https://github.com/apache/spark/commit/a8f5e038e9c549f085cfe3ba5d89dc89ec64d40f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572435523
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572435529
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116361/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572435523
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364596224
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 ##
 @@ -884,7 +885,8 @@ class DataSourceV2SQLSuite
 .isEmpty, s"$key is a reserved namespace property and ignored")
   val meta =
 
catalog("testcat").asNamespaceCatalog.loadNamespaceMetadata(Array("reservedTest"))
-  assert(meta.get(key) === null, "reserved properties should not have 
side effects")
+  assert(!Option(meta.get(key)).exists(_.contains("foo")),
 
 Review comment:
   what's wrong with `meta.get(key) === null`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed 
unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572435500
 
 
   **[Test build #116361 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116361/testReport)**
 for PR 26813 at commit 
[`a8f5e03`](https://github.com/apache/spark/commit/a8f5e038e9c549f085cfe3ba5d89dc89ec64d40f).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for 
limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572435187
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item 
for limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572435192
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21153/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item 
for limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572435187
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for 
limiting partition number when calculating statistics through File System
URL: https://github.com/apache/spark/pull/27129#issuecomment-572435192
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21153/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364595504
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala
 ##
 @@ -35,11 +36,14 @@ case class CreateNamespaceExec(
 extends V2CommandExec {
   override protected def run(): Seq[InternalRow] = {
 import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+import org.apache.spark.sql.connector.catalog.SupportsNamespaces._
 
 val ns = namespace.toArray
 if (!catalog.namespaceExists(ns)) {
   try {
-catalog.createNamespace(ns, properties.asJava)
+val ownership =
+  Map(PROP_OWNER_NAME -> Utils.getCurrentUserName(), PROP_OWNER_TYPE 
-> "USER")
 
 Review comment:
   can we do it later? It's unclear to me that what we should do when CREATE 
TABLE without owner specified.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364594708
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -2532,6 +2532,9 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 case (PROP_COMMENT, _) =>
   throw new ParseException(s"$PROP_COMMENT is a reserved namespace 
property, please use" +
 s" the COMMENT clause to specify it.", ctx)
+case (ownership, _) if ownership == PROP_OWNER_NAME || ownership == 
PROP_OWNER_TYPE =>
+  throw new ParseException(s"$ownership is a reserved namespace 
property , please use" +
 
 Review comment:
   is `ownership` guaranteed to be upper case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364594708
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##
 @@ -2532,6 +2532,9 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 case (PROP_COMMENT, _) =>
   throw new ParseException(s"$PROP_COMMENT is a reserved namespace 
property, please use" +
 s" the COMMENT clause to specify it.", ctx)
+case (ownership, _) if ownership == PROP_OWNER_NAME || ownership == 
PROP_OWNER_TYPE =>
+  throw new ParseException(s"$ownership is a reserved namespace 
property , please use" +
 
 Review comment:
   is `ownership` guaranteed to be upper case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364594386
 
 

 ##
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java
 ##
 @@ -64,9 +64,16 @@
   String PROP_OWNER_TYPE = "ownerType";
 
   /**
-   * The list of reserved namespace properties.
+   * The list of immutable namespace properties, which can not be removed or 
changed directly by
+   * the syntax:
+   * {{
+   *   ALTER (DATABASE|SCHEMA|NAMESPACE) SET DBPROPERTIES(...)
 
 Review comment:
   let's use the recommended syntax in examples
   ```
   ALTER NAMESPACE ... SET PROPERTIES ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364594207
 
 

 ##
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java
 ##
 @@ -64,9 +64,16 @@
   String PROP_OWNER_TYPE = "ownerType";
 
   /**
-   * The list of reserved namespace properties.
+   * The list of immutable namespace properties, which can not be removed or 
changed directly by
 
 Review comment:
   immutable -> reserved


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] 
Support ALTER DATABASE SET OWNER syntax
URL: https://github.com/apache/spark/pull/26775#discussion_r364593906
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -1347,6 +1349,7 @@ nonReserved
 | OVERLAPS
 | OVERLAY
 | OVERWRITE
+| OWNER
 
 Review comment:
   this should also be put in `ansiNonReserved`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572432869
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572432882
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21152/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572432882
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21152/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572432869
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation

2020-01-08 Thread GitBox

HeartSaVioR edited a comment on issue #26889: [SPARK-30261][SQL] Should not 
change owner of hive table for some commands like 'alter' operation
URL: https://github.com/apache/spark/pull/26889#issuecomment-572432167
 
 
   Thanks for the contribution. @southernriver 
   
   1. Could you resolve the conflict?
   2. Could we add UT for verifying the patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation

2020-01-08 Thread GitBox

HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change 
owner of hive table for some commands like 'alter' operation
URL: https://github.com/apache/spark/pull/26889#issuecomment-572432167
 
 
   Thanks for the contribution. Could we add UT for verifying the patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation

2020-01-08 Thread GitBox

HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change 
owner of hive table for some commands like 'alter' operation
URL: https://github.com/apache/spark/pull/26889#issuecomment-572432353
 
 
   cc. @cloud-fan @gatorsmile @dongjoon-hyun 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed 
unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572432308
 
 
   **[Test build #116361 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116361/testReport)**
 for PR 26813 at commit 
[`a8f5e03`](https://github.com/apache/spark/commit/a8f5e038e9c549f085cfe3ba5d89dc89ec64d40f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572429704
 
 
   **[Test build #116360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116360/testReport)**
 for PR 27136 at commit 
[`afef6ad`](https://github.com/apache/spark/commit/afef6ad4119791eae2f889a478f1064752cac840).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on issue #26847: [SPARK-30214][SQL] A new framework to resolve v2 commands

2020-01-08 Thread GitBox

gatorsmile commented on issue #26847: [SPARK-30214][SQL]  A new framework to 
resolve v2 commands
URL: https://github.com/apache/spark/pull/26847#issuecomment-572428660
 
 
   @imback82 COMMENT ON  is a good example, but the code changes for 
implementing COMMENT ON are not small. It might make the code review harder. I 
think we can follow what @rdblue and @brkyvz suggested above to separate the 
actual changes from COMMENT ON? 
   
   How about reverting it and submitted two separate PRs?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572427581
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21150/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572427581
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21150/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572427046
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116358/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-572427584
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21151/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572427575
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-572427577
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-572427584
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21151/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572427575
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-572427577
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572427038
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r364587799
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, 
HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, 
AttributeSet, Expression, PredicateHelper, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.CommandUtils
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with PredicateHelper {
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def extractPartitionPruningFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): Seq[Expression] = {
+val normalizedFilters = filters.map { e =>
+  e transform {
+case a: AttributeReference =>
+  a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+  }
+}
+val partitionSet = AttributeSet(relation.partitionCols)
+normalizedFilters.filter { predicate =>
+  !predicate.references.isEmpty && 
predicate.references.subsetOf(partitionSet)
+}
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table,
+   * and also update the statistics of the table.
+   */
+  private def prunedHiveTableWithStats(
+  relation: HiveTableRelation,
+  partitionFilters: Seq[Expression]): HiveTableRelation = {
+val conf = session.sessionState.conf
+val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter(
+  relation.tableMeta.identifier,
+  partitionFilters)
+val sizeInBytes = try {
+  val sizeOfPartitions = prunedPartitions.map { partition =>
+val rawDataSize = 
partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong)
+val totalSize = 
partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong)
+if (rawDataSize.isDefined && rawDataSize.get > 0) {
+  rawDataSize.get
+} else if (totalSize.isDefined && totalSize.get > 0L) {
+  totalSize.get
+} else {
+  CommandUtils.calculateLocationSize(
+session.sessionState, relation.tableMeta.identifier, 
partition.storage.locationUri)
+}
+  }.sum
+  // If size of partitions is zero fall back to the default size.
+  if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions
+} catch {
+  case e: IOException =>
+logWarning("Failed to get table size from HDFS.", e)
+conf.defaultSizeInBytes
+}
+val newTableMeta =
+  if (relation.tableMeta.stats.isDefined) {
+relation.tableMeta.copy(
+  stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = 
BigInt(sizeInBytes
 
 Review comment:
   could be inconsistent, eg. rowCount and sizeInBytes may be inconsistent 
after this rule.
   Restored to creating new CatalogStatistics instance. But by doing so, some 
statistics may be lost which should not impact accuracy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r364587799
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, 
HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, 
AttributeSet, Expression, PredicateHelper, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.CommandUtils
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with PredicateHelper {
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def extractPartitionPruningFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): Seq[Expression] = {
+val normalizedFilters = filters.map { e =>
+  e transform {
+case a: AttributeReference =>
+  a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+  }
+}
+val partitionSet = AttributeSet(relation.partitionCols)
+normalizedFilters.filter { predicate =>
+  !predicate.references.isEmpty && 
predicate.references.subsetOf(partitionSet)
+}
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table,
+   * and also update the statistics of the table.
+   */
+  private def prunedHiveTableWithStats(
+  relation: HiveTableRelation,
+  partitionFilters: Seq[Expression]): HiveTableRelation = {
+val conf = session.sessionState.conf
+val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter(
+  relation.tableMeta.identifier,
+  partitionFilters)
+val sizeInBytes = try {
+  val sizeOfPartitions = prunedPartitions.map { partition =>
+val rawDataSize = 
partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong)
+val totalSize = 
partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong)
+if (rawDataSize.isDefined && rawDataSize.get > 0) {
+  rawDataSize.get
+} else if (totalSize.isDefined && totalSize.get > 0L) {
+  totalSize.get
+} else {
+  CommandUtils.calculateLocationSize(
+session.sessionState, relation.tableMeta.identifier, 
partition.storage.locationUri)
+}
+  }.sum
+  // If size of partitions is zero fall back to the default size.
+  if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions
+} catch {
+  case e: IOException =>
+logWarning("Failed to get table size from HDFS.", e)
+conf.defaultSizeInBytes
+}
+val newTableMeta =
+  if (relation.tableMeta.stats.isDefined) {
+relation.tableMeta.copy(
+  stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = 
BigInt(sizeInBytes
 
 Review comment:
   could be inconsistent, eg. rowCount and sizeInBytes may be inconsistent 
after this rule.
   So restored to creating new CatalogStatistics instance. But by doing so, 
some statistics may be lost which should not impact accuracy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-

[GitHub] [spark] SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572424510
 
 
   **[Test build #116358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116358/testReport)**
 for PR 26813 at commit 
[`c72c20d`](https://github.com/apache/spark/commit/c72c20d296a730fd2cd8f65c157c610017536b1f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572427046
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116358/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed 
unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572427015
 
 
   **[Test build #116358 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116358/testReport)**
 for PR 26813 at commit 
[`c72c20d`](https://github.com/apache/spark/commit/c72c20d296a730fd2cd8f65c157c610017536b1f).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

cloud-fan commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572427078
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r364587799
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, 
HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, 
AttributeSet, Expression, PredicateHelper, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.CommandUtils
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with PredicateHelper {
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def extractPartitionPruningFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): Seq[Expression] = {
+val normalizedFilters = filters.map { e =>
+  e transform {
+case a: AttributeReference =>
+  a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+  }
+}
+val partitionSet = AttributeSet(relation.partitionCols)
+normalizedFilters.filter { predicate =>
+  !predicate.references.isEmpty && 
predicate.references.subsetOf(partitionSet)
+}
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table,
+   * and also update the statistics of the table.
+   */
+  private def prunedHiveTableWithStats(
+  relation: HiveTableRelation,
+  partitionFilters: Seq[Expression]): HiveTableRelation = {
+val conf = session.sessionState.conf
+val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter(
+  relation.tableMeta.identifier,
+  partitionFilters)
+val sizeInBytes = try {
+  val sizeOfPartitions = prunedPartitions.map { partition =>
+val rawDataSize = 
partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong)
+val totalSize = 
partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong)
+if (rawDataSize.isDefined && rawDataSize.get > 0) {
+  rawDataSize.get
+} else if (totalSize.isDefined && totalSize.get > 0L) {
+  totalSize.get
+} else {
+  CommandUtils.calculateLocationSize(
+session.sessionState, relation.tableMeta.identifier, 
partition.storage.locationUri)
+}
+  }.sum
+  // If size of partitions is zero fall back to the default size.
+  if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions
+} catch {
+  case e: IOException =>
+logWarning("Failed to get table size from HDFS.", e)
+conf.defaultSizeInBytes
+}
+val newTableMeta =
+  if (relation.tableMeta.stats.isDefined) {
+relation.tableMeta.copy(
+  stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = 
BigInt(sizeInBytes
 
 Review comment:
   could be inconsistent, eg. rowCount and sizeInBytes is inconsistent.
   Restored to creating new CatalogStatistics instance. But by doing so, some 
statistics may be lost which should not impact accuracy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#discussion_r364588106
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala
 ##
 @@ -107,7 +111,8 @@ class DataFrameCallbackSuite extends QueryTest with 
SharedSparkSession {
 val df = Seq(1 -> "a").toDF("i", "j").groupBy("i").count()
 
 df.collect()
-// Wait for the first `collect` to be caught by our listener. Otherwise 
the next `collect` will
+// Wait for the first `collect` to be caught by our listener.
+// Otherwise the next `collect` will
 
 Review comment:
   unnecessary change


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-572427096
 
 
   **[Test build #116359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116359/testReport)**
 for PR 26805 at commit 
[`0218202`](https://github.com/apache/spark/commit/02182020fcfa8d6867d2bedf678b4fdb71ac04ff).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572427038
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should 
not remove orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#discussion_r364580910
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 ##
 @@ -308,4 +308,11 @@ class EliminateSortsSuite extends PlanTest {
 val correctAnswer = PushDownOptimizer.execute(noOrderByPlan.analyze)
 comparePlans(optimized, correctAnswer)
   }
+
+  test("should not remove orderBy in sortBy clause") {
+val plan = testRelation.orderBy('a.asc).sortBy('b.desc)
+val optimized = Optimize.execute(plan.analyze)
+val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze
+comparePlans(optimized, correctAnswer)
 
 Review comment:
   New UT added for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should 
not remove orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#discussion_r364580898
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##
 @@ -980,15 +980,25 @@ object EliminateSorts extends Rule[LogicalPlan] {
   if (newOrders.isEmpty) child else s.copy(order = newOrders)
 case Sort(orders, true, child) if 
SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   child
-case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
+case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(s, child))
 case j @ Join(originLeft, originRight, _, cond, _) if 
cond.forall(_.deterministic) =>
   j.copy(left = recursiveRemoveSort(originLeft), right = 
recursiveRemoveSort(originRight))
 case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
   g.copy(child = recursiveRemoveSort(originChild))
   }
 
+  private def recursiveRemoveSort(parent: Sort, plan: LogicalPlan): 
LogicalPlan = plan match {
+case s @ Sort(_, _, child) =>
+  if (parent.global == s.global) {
 
 Review comment:
   New UT added for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL][WIP] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL][WIP] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#discussion_r364587799
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.io.IOException
+
+import org.apache.hadoop.hive.common.StatsSetupConst
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, 
HiveTableRelation}
+import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, 
AttributeSet, Expression, PredicateHelper, SubqueryExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.command.CommandUtils
+
+/**
+ * TODO: merge this with PruneFileSourcePartitions after we completely make 
hive as a data source.
+ */
+private[sql] class PruneHiveTablePartitions(session: SparkSession)
+  extends Rule[LogicalPlan] with PredicateHelper {
+  /**
+   * Extract the partition filters from the filters on the table.
+   */
+  private def extractPartitionPruningFilters(
+  filters: Seq[Expression],
+  relation: HiveTableRelation): Seq[Expression] = {
+val normalizedFilters = filters.map { e =>
+  e transform {
+case a: AttributeReference =>
+  a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+  }
+}
+val partitionSet = AttributeSet(relation.partitionCols)
+normalizedFilters.filter { predicate =>
+  !predicate.references.isEmpty && 
predicate.references.subsetOf(partitionSet)
+}
+  }
+
+  /**
+   * Prune the hive table using filters on the partitions of the table,
+   * and also update the statistics of the table.
+   */
+  private def prunedHiveTableWithStats(
+  relation: HiveTableRelation,
+  partitionFilters: Seq[Expression]): HiveTableRelation = {
+val conf = session.sessionState.conf
+val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter(
+  relation.tableMeta.identifier,
+  partitionFilters)
+val sizeInBytes = try {
+  val sizeOfPartitions = prunedPartitions.map { partition =>
+val rawDataSize = 
partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong)
+val totalSize = 
partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong)
+if (rawDataSize.isDefined && rawDataSize.get > 0) {
+  rawDataSize.get
+} else if (totalSize.isDefined && totalSize.get > 0L) {
+  totalSize.get
+} else {
+  CommandUtils.calculateLocationSize(
+session.sessionState, relation.tableMeta.identifier, 
partition.storage.locationUri)
+}
+  }.sum
+  // If size of partitions is zero fall back to the default size.
+  if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions
+} catch {
+  case e: IOException =>
+logWarning("Failed to get table size from HDFS.", e)
+conf.defaultSizeInBytes
+}
+val newTableMeta =
+  if (relation.tableMeta.stats.isDefined) {
+relation.tableMeta.copy(
+  stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = 
BigInt(sizeInBytes
 
 Review comment:
   could be inconsistent, eg. rowCount and sizeInBytes is inconsistent.
   Restored to new CatalogStatistics instance. But by doing so, some statistics 
may be lost which should not impact accuracy.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#discussion_r364587953
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
 ##
 @@ -602,9 +602,14 @@ class AdaptiveQueryExecSuite
 
   test("SPARK-30403: AQE should handle InSubquery") {
 withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+  // If the subquery does not contain the shuffle node,
+  // it may get the "SubqueryAdaptiveNotSupportedException"
+  // and the main sql will also not be inserted the 
"AdaptiveSparkPlanExec" node.
+  // Here change the subquery to contain the exchange node.
 
 Review comment:
   we can revert the changes in this file now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fuwhu commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-08 Thread GitBox

fuwhu commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-572426719
 
 
   Did some refining, @cloud-fan please help review again. thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572425310
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116338/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#discussion_r364587258
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala
 ##
 @@ -39,11 +41,27 @@ case class InsertAdaptiveSparkPlan(
 
   private val conf = adaptiveExecutionContext.session.sessionState.conf
 
+  def containShuffle(plan: SparkPlan): Boolean = {
+plan.find {
+  case plan: SparkPlan => !plan.requiredChildDistribution.forall(_ == 
UnspecifiedDistribution)
+  case _: Exchange => true
+  case _ => false
+}.isDefined
+  }
+
+  def containSubQuery(plan: SparkPlan): Boolean = {
+plan.find(_.expressions.exists(_.find {
+  case _: SubqueryExpression => true
+  case _ => false
+}.isDefined)).isDefined
+  }
+
   override def apply(plan: SparkPlan): SparkPlan = applyInternal(plan, false)
 
   private def applyInternal(plan: SparkPlan, isSubquery: Boolean): SparkPlan = 
plan match {
 case _: ExecutedCommandExec => plan
-case _ if conf.adaptiveExecutionEnabled && supportAdaptive(plan) =>
+case _ if conf.adaptiveExecutionEnabled && supportAdaptive(plan)
+  && (containSubQuery(plan) || containShuffle(plan) || isSubquery) =>
 
 Review comment:
   nit: we should put `isSubquery` at the beginning to save the checking time 
for subqueries.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572425299
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572425310
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116338/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572425299
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572424911
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support 
non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
URL: https://github.com/apache/spark/pull/27110#issuecomment-572424945
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support 
non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
URL: https://github.com/apache/spark/pull/27110#issuecomment-572424953
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21148/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

SparkQA removed a comment on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572364940
 
 
   **[Test build #116338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116338/testReport)**
 for PR 27136 at commit 
[`afef6ad`](https://github.com/apache/spark/commit/afef6ad4119791eae2f889a478f1064752cac840).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572424919
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21149/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2

2020-01-08 Thread GitBox

SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix 
ignoreMissingFiles/ignoreCorruptFiles in DSv2
URL: https://github.com/apache/spark/pull/27136#issuecomment-572424746
 
 
   **[Test build #116338 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116338/testReport)**
 for PR 27136 at commit 
[`afef6ad`](https://github.com/apache/spark/commit/afef6ad4119791eae2f889a478f1064752cac840).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572424911
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support 
non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
URL: https://github.com/apache/spark/pull/27110#issuecomment-572424945
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support 
non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
URL: https://github.com/apache/spark/pull/27110#issuecomment-572424953
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21148/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the 
failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572424919
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21149/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE

2020-01-08 Thread GitBox

SparkQA commented on issue #27110: [SPARK-30439][SQL] Support non-nullable 
column in CREATE TABLE, ADD COLUMN and ALTER TABLE
URL: https://github.com/apache/spark/pull/27110#issuecomment-572424490
 
 
   **[Test build #116357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116357/testReport)**
 for PR 27110 at commit 
[`0b047b5`](https://github.com/apache/spark/commit/0b047b5ac72a01ca0c120fb3b31321ddcd293a69).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed 
unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#issuecomment-572424510
 
 
   **[Test build #116358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116358/testReport)**
 for PR 26813 at commit 
[`c72c20d`](https://github.com/apache/spark/commit/c72c20d296a730fd2cd8f65c157c610017536b1f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE

2020-01-08 Thread GitBox

cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] 
Resolve the failed unit tests when enable AQE
URL: https://github.com/apache/spark/pull/26813#discussion_r364586169
 
 

 ##
 File path: 
mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala
 ##
 @@ -41,6 +41,7 @@ import org.apache.spark.scheduler.{SparkListener, 
SparkListenerStageCompleted}
 import org.apache.spark.sql.{DataFrame, Encoder, Row, SparkSession}
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
 import org.apache.spark.sql.functions.{col, lit}
+import org.apache.spark.sql.internal.SQLConf
 
 Review comment:
   unnecessary import


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE

2020-01-08 Thread GitBox

cloud-fan commented on issue #27110: [SPARK-30439][SQL] Support non-nullable 
column in CREATE TABLE, ADD COLUMN and ALTER TABLE
URL: https://github.com/apache/spark/pull/27110#issuecomment-572423525
 
 
   I checked and we don't have any document for ALTER COLUMN. Shall we add it 
in a different PR? It's more than just the new syntax added here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow 
negative scale of Decimal under ansi mode
URL: https://github.com/apache/spark/pull/26881#issuecomment-572422536
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative 
scale of Decimal under ansi mode
URL: https://github.com/apache/spark/pull/26881#issuecomment-572422536
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative 
scale of Decimal under ansi mode
URL: https://github.com/apache/spark/pull/26881#issuecomment-572422540
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21147/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow 
negative scale of Decimal under ansi mode
URL: https://github.com/apache/spark/pull/26881#issuecomment-572422540
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21147/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode

2020-01-08 Thread GitBox

SparkQA commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale 
of Decimal under ansi mode
URL: https://github.com/apache/spark/pull/26881#issuecomment-572422106
 
 
   **[Test build #116356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116356/testReport)**
 for PR 26881 at commit 
[`9d41ea2`](https://github.com/apache/spark/commit/9d41ea22b2ec6eeb296c5e7f456cb95bba3e7818).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support 
filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572420194
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support 
filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572420203
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21146/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters 
pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572420194
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters 
pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572420203
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21146/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-08 Thread GitBox

SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572419843
 
 
   **[Test build #116355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116355/testReport)**
 for PR 26973 at commit 
[`17e742f`](https://github.com/apache/spark/commit/17e742f59ecccd5ad76da69676c8faea8a0494ad).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not 
remove orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#issuecomment-572417888
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not 
remove orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#issuecomment-572417898
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21145/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove 
orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#issuecomment-572417888
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove 
orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#issuecomment-572417898
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21145/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should 
not remove orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#discussion_r364580910
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
 ##
 @@ -308,4 +308,11 @@ class EliminateSortsSuite extends PlanTest {
 val correctAnswer = PushDownOptimizer.execute(noOrderByPlan.analyze)
 comparePlans(optimized, correctAnswer)
   }
+
+  test("should not remove orderBy in sortBy clause") {
+val plan = testRelation.orderBy('a.asc).sortBy('b.desc)
+val optimized = Optimize.execute(plan.analyze)
+val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze
+comparePlans(optimized, correctAnswer)
 
 Review comment:
   I add new UT for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should 
not remove orderBy in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#discussion_r364580898
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ##
 @@ -980,15 +980,25 @@ object EliminateSorts extends Rule[LogicalPlan] {
   if (newOrders.isEmpty) child else s.copy(order = newOrders)
 case Sort(orders, true, child) if 
SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   child
-case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
+case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(s, child))
 case j @ Join(originLeft, originRight, _, cond, _) if 
cond.forall(_.deterministic) =>
   j.copy(left = recursiveRemoveSort(originLeft), right = 
recursiveRemoveSort(originRight))
 case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
   g.copy(child = recursiveRemoveSort(originChild))
   }
 
+  private def recursiveRemoveSort(parent: Sort, plan: LogicalPlan): 
LogicalPlan = plan match {
+case s @ Sort(_, _, child) =>
+  if (parent.global == s.global) {
 
 Review comment:
   I add new UT for this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer

2020-01-08 Thread GitBox

SparkQA commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy 
in sortBy clause in Optimizer
URL: https://github.com/apache/spark/pull/27077#issuecomment-572417353
 
 
   **[Test build #116354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116354/testReport)**
 for PR 27077 at commit 
[`9d8b6ce`](https://github.com/apache/spark/commit/9d8b6cef5b5de527d6a3ffcf9d8de0f8ff99c407).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nchammas commented on issue #26936: [SPARK-30296][SQL] Add Dataset diffing feature

2020-01-08 Thread GitBox

nchammas commented on issue #26936: [SPARK-30296][SQL] Add Dataset diffing 
feature
URL: https://github.com/apache/spark/pull/26936#issuecomment-572417229
 
 
   Yes, my bad. An equality check can be built on the same components as 
diffing, but they are indeed different. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression

2020-01-08 Thread GitBox

beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r364580011
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -317,7 +363,74 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
 } else {
-  a
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+val phantomId = i + 1
 
 Review comment:
   The illusionary mechanism may result in multiple distinct aggregations uses 
different column, so we still need to call `rewrite`. (e.g., count(distinct 
phantom1-a) and count(distinct a phantom2-a))


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression

2020-01-08 Thread GitBox

beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r364580011
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -317,7 +363,74 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
 } else {
-  a
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+val phantomId = i + 1
 
 Review comment:
   The illusionary mechanism may result in multiple distinct aggregations uses 
different column, so we still need to call `rewrite`. (e.g., `count(distinct 
phantom1-a) and count(distinct a phantom2-a)`)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression

2020-01-08 Thread GitBox

beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r364544828
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -317,7 +363,74 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
 } else {
-  a
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map {
+  case (ae @ AggregateExpression(af, _, _, filter, _), i) =>
+val phantomId = i + 1
 
 Review comment:
   // Why do we need to construct the phantom id ?
   // First, In order to reduce costs, it is better to handle the 
filter clause locally.
   // e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate 
expression
   // If(id > 1) 'a else null first, and use the result as output.
   // Second, If more than one DISTINCT aggregate expression uses 
the same column,
   // We need to construct the phantom attributes so as the output 
not lost.
   // e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 
1) will output
   // attribute 'a and attribute 'phantom1-a instead of two 'a.
   // Note: We just need to illusion the expression with filter 
clause.
   // The illusionary mechanism may result in multiple distinct 
aggregations uses
   // different column, so we still need to call `rewrite`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-572415419
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size

2020-01-08 Thread GitBox

AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize 
skewed partition based on data size
URL: https://github.com/apache/spark/pull/26434#issuecomment-572415425
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21144/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1245 matches

Mail list logo