[GitHub] [spark] SparkQA removed a comment on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA removed a comment on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872759448


   **[Test build #140564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140564/testReport)**
 for PR 33172 at commit 
[`51546e8`](https://github.com/apache/spark/commit/51546e87c33064c80f902d4eecff1f640222dfb2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872771758


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45074/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33177: [SPARK-35955][SQL] Check for overflow in Average in ANSI mode

2021-07-02 Thread GitBox


cloud-fan commented on pull request #33177:
URL: https://github.com/apache/spark/pull/33177#issuecomment-872774201


   late LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


cloud-fan commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872782010


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872784946


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45076/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] linhongliu-db commented on a change in pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


linhongliu-db commented on a change in pull request #32959:
URL: https://github.com/apache/spark/pull/32959#discussion_r662800469



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala
##
@@ -82,6 +82,9 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite 
with SharedThriftServ
 "postgreSQL/case.sql",
 // SPARK-28624
 "date.sql",
+"datetime.sql",
+"datetime-legacy.sql",
+"ansi/datetime.sql",

Review comment:
   same reason to "date.sql" that thriftserver couldn't handle negative year




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


SparkQA commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872785323


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45075/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33034: WIP: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

2021-07-02 Thread GitBox


Ngone51 commented on a change in pull request #33034:
URL: https://github.com/apache/spark/pull/33034#discussion_r662801009



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##
@@ -222,7 +223,7 @@ public void sendMergedBlockMetaReq(
 handler.addRpcRequest(requestId, callback);
 RpcChannelListener listener = new RpcChannelListener(requestId, callback);
 channel.writeAndFlush(
-  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
reduceId)).addListener(listener);
+  new MergedBlockMetaRequest(requestId, appId, shuffleId, 
shuffleSequenceId, reduceId)).addListener(listener);

Review comment:
   @venkata91 The branch-3.2 is already cut. I think we can just proceed 
with this PR. The idea sounds good to me. Could you update accordingly? We have 
very limited time to merge it into branch-3.2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #33078: [SPARK-35546][Shuffle] Enable push-based shuffle when multiple app attempts are enabled and manage concurrent access to the state in a better

2021-07-02 Thread GitBox


Ngone51 commented on pull request #33078:
URL: https://github.com/apache/spark/pull/33078#issuecomment-872786808


   @zhouyejoe Could you address all the comments? Branch-3.2 is already cut, we 
now have very limited time to merge this into it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] linhongliu-db commented on a change in pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


linhongliu-db commented on a change in pull request #32959:
URL: https://github.com/apache/spark/pull/32959#discussion_r662803317



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
##
@@ -2684,16 +2682,13 @@ abstract class JsonSuite
   }
 
   test("SPARK-30960, SPARK-31641: parse date/timestamp string with legacy 
format") {
-val julianDay = -141704 // 1582-01-01 in Julian calendar
 val ds = Seq(
-  s"{'t': '2020-1-12 3:23:34.12', 'd': '2020-1-12 T', 'd2': '12345', 'd3': 
'$julianDay'}"

Review comment:
   '12345' and '-141704' are treated as epoch days before this PR because 
it's out of the - range.
   this is used for backward compatibility with JSON data generated by spark 
1.5.
   But this compatibility is very confusing, for example, before this PR:
   '' will be converted to '-01-01' while '1' will be converted to 
'1997-05-19'
   So I suggest just removing this compatibility




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


cloud-fan commented on a change in pull request #32959:
URL: https://github.com/apache/spark/pull/32959#discussion_r662805329



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -269,10 +274,6 @@ object DateTimeUtils {
   i += 3
 } else if (i < 2) {
   if (b == '-') {
-if (i == 0 && j != 4) {
-  // year should have exact four digits

Review comment:
   Again, shall we check the number of year digits? what if overflow 
happens? like `99-12-12`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on a change in pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking commented on a change in pull request #32933:
URL: https://github.com/apache/spark/pull/32933#discussion_r662805978



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
##
@@ -207,6 +273,133 @@ class RocksDBSuite extends SparkFunSuite {
 }
   }
 
+  test("disallow concurrent updates to the same RocksDB instance") {
+quietly {
+  withDB(
+Utils.createTempDir().toString,
+conf = RocksDBConf().copy(lockAcquireTimeoutMs = 20)) { db =>
+// DB has been loaded so current thread has alread acquired the lock 
on the RocksDB instance
+
+db.load(0)  // Current thread should be able to load again
+
+// Another thread should not be able to load while current thread is 
using it
+val ex = intercept[IllegalStateException] {
+  ThreadUtils.runInNewThread("concurrent-test-thread-1") { db.load(0) }
+}
+// Assert that the error message contains the stack trace
+assert(ex.getMessage.contains("Thread holding the lock has trace:"))
+assert(ex.getMessage.contains("runInNewThread"))
+
+// Commit should release the instance allowing other threads to load 
new version
+db.commit()
+ThreadUtils.runInNewThread("concurrent-test-thread-2") {
+  db.load(1)
+  db.commit()
+}
+
+// Another thread should not be able to load while current thread is 
using it
+db.load(2)
+intercept[IllegalStateException] {
+  ThreadUtils.runInNewThread("concurrent-test-thread-2") { db.load(2) }
+}
+
+// Rollback should release the instance allowing other threads to load 
new version
+db.rollback()
+ThreadUtils.runInNewThread("concurrent-test-thread-3") {
+  db.load(1)
+  db.commit()
+}
+  }
+}
+  }
+
+  test("ensure concurrent access lock is released after Spark task completes") 
{
+val conf = new SparkConf().setAppName("test").setMaster("local")
+val sc = new SparkContext(conf)
+
+try {
+  RocksDBSuite.withSingletonDB {
+// Load a RocksDB instance, that is, get a lock inside a task and then 
fail
+quietly {
+  intercept[Exception] {
+sc.makeRDD[Int](1 to 1, 1).map { i =>
+  RocksDBSuite.singleton.load(0)
+  throw new Exception("fail this task to test lock release")
+}.count()
+  }
+}
+
+// Test whether you can load again, that is, will it successfully lock 
again
+RocksDBSuite.singleton.load(0)
+  }
+} finally {
+  sc.stop()
+}
+  }
+
+  test("ensure that concurrent update and cleanup consistent versions") {
+quietly {
+  val numThreads = 20
+  val numUpdatesInEachThread = 20
+  val remoteDir = Utils.createTempDir().toString
+  @volatile var exception: Exception = null
+  val updatingThreads = Array.fill(numThreads) {
+new Thread() {
+  override def run(): Unit = {
+try {
+  for (version <- 0 to numUpdatesInEachThread) {
+withDB(

Review comment:
   Yea, I think the purpose of this test is to make sure no error thrown 
and the result is correct in the end.
   After taking a further look, there's a small issue is that `exception` never 
used. I'll confirm it separately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872790449


   **[Test build #140557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140557/testReport)**
 for PR 33172 at commit 
[`0d6b0c1`](https://github.com/apache/spark/commit/0d6b0c15e43e5e831a4ffd0063c0d6e20b032f03).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872790588


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45074/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA removed a comment on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872689889


   **[Test build #140557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140557/testReport)**
 for PR 33172 at commit 
[`0d6b0c1`](https://github.com/apache/spark/commit/0d6b0c15e43e5e831a4ffd0063c0d6e20b032f03).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872766563






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


viirya commented on a change in pull request #32933:
URL: https://github.com/apache/spark/pull/32933#discussion_r662807147



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
##
@@ -207,6 +273,133 @@ class RocksDBSuite extends SparkFunSuite {
 }
   }
 
+  test("disallow concurrent updates to the same RocksDB instance") {
+quietly {
+  withDB(
+Utils.createTempDir().toString,
+conf = RocksDBConf().copy(lockAcquireTimeoutMs = 20)) { db =>
+// DB has been loaded so current thread has alread acquired the lock 
on the RocksDB instance
+
+db.load(0)  // Current thread should be able to load again
+
+// Another thread should not be able to load while current thread is 
using it
+val ex = intercept[IllegalStateException] {
+  ThreadUtils.runInNewThread("concurrent-test-thread-1") { db.load(0) }
+}
+// Assert that the error message contains the stack trace
+assert(ex.getMessage.contains("Thread holding the lock has trace:"))
+assert(ex.getMessage.contains("runInNewThread"))
+
+// Commit should release the instance allowing other threads to load 
new version
+db.commit()
+ThreadUtils.runInNewThread("concurrent-test-thread-2") {
+  db.load(1)
+  db.commit()
+}
+
+// Another thread should not be able to load while current thread is 
using it
+db.load(2)
+intercept[IllegalStateException] {
+  ThreadUtils.runInNewThread("concurrent-test-thread-2") { db.load(2) }
+}
+
+// Rollback should release the instance allowing other threads to load 
new version
+db.rollback()
+ThreadUtils.runInNewThread("concurrent-test-thread-3") {
+  db.load(1)
+  db.commit()
+}
+  }
+}
+  }
+
+  test("ensure concurrent access lock is released after Spark task completes") 
{
+val conf = new SparkConf().setAppName("test").setMaster("local")
+val sc = new SparkContext(conf)
+
+try {
+  RocksDBSuite.withSingletonDB {
+// Load a RocksDB instance, that is, get a lock inside a task and then 
fail
+quietly {
+  intercept[Exception] {
+sc.makeRDD[Int](1 to 1, 1).map { i =>
+  RocksDBSuite.singleton.load(0)
+  throw new Exception("fail this task to test lock release")
+}.count()
+  }
+}
+
+// Test whether you can load again, that is, will it successfully lock 
again
+RocksDBSuite.singleton.load(0)
+  }
+} finally {
+  sc.stop()
+}
+  }
+
+  test("ensure that concurrent update and cleanup consistent versions") {
+quietly {
+  val numThreads = 20
+  val numUpdatesInEachThread = 20
+  val remoteDir = Utils.createTempDir().toString
+  @volatile var exception: Exception = null
+  val updatingThreads = Array.fill(numThreads) {
+new Thread() {
+  override def run(): Unit = {
+try {
+  for (version <- 0 to numUpdatesInEachThread) {
+withDB(

Review comment:
   @xuanyuanking and me discussed this test offline. Seems there is 
something wrong with `exception` usage. It doesn't look completely correct. 
@xuanyuanking will address it by fixing it or deleting the test later in a 
follow-up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872791283


   **[Test build #140565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140565/testReport)**
 for PR 33172 at commit 
[`51546e8`](https://github.com/apache/spark/commit/51546e87c33064c80f902d4eecff1f640222dfb2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


SparkQA commented on pull request #32959:
URL: https://github.com/apache/spark/pull/32959#issuecomment-872791394


   **[Test build #140566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140566/testReport)**
 for PR 32959 at commit 
[`5b4fe62`](https://github.com/apache/spark/commit/5b4fe6236fd799cf175b869aa5f63cc20ded8f9e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


cloud-fan commented on a change in pull request #32959:
URL: https://github.com/apache/spark/pull/32959#discussion_r662807650



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -224,12 +224,12 @@ object DateTimeUtils {
* value. The return type is [[Option]] in order to distinguish between 0L 
and null. The following
* formats are allowed:
*
-   * ``
-   * `-[m]m`
-   * `-[m]m-[d]d`
-   * `-[m]m-[d]d `
-   * `-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
-   * `-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+   * `[+-]y*`
+   * `[+-]y*-[m]m`
+   * `[+-]y*-[m]m-[d]d`
+   * `[+-]y*-[m]m-[d]d `
+   * `[+-]y*-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+   * `[+-]y*-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
* `[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`

Review comment:
   for `+12:12:12` do we fail or simply ignore the `+`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872791473


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140557/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #32933:
URL: https://github.com/apache/spark/pull/32933#issuecomment-872441605


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140521/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872791473


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140557/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


viirya commented on pull request #32933:
URL: https://github.com/apache/spark/pull/32933#issuecomment-872793167


   Thanks @xuanyuanking for working on this and @HeartSaVioR for the review! 
Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking commented on pull request #32933:
URL: https://github.com/apache/spark/pull/32933#issuecomment-872793590


   Great thanks for the help! @viirya @HeartSaVioR 
   I'll update the rest PR and submit the last one of RocksDBStateStoreProvider 
today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya closed pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


viirya closed pull request #32933:
URL: https://github.com/apache/spark/pull/32933


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


viirya commented on pull request #32933:
URL: https://github.com/apache/spark/pull/32933#issuecomment-872795036


   Ah, sorry, I forgot this should be in branch-3.2 too. @xuanyuanking Can you 
submit a PR for 3.2?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33140: [SPARK-35881][SQL] Add support for columnar execution of final query stage in AdaptiveSparkPlanExec

2021-07-02 Thread GitBox


SparkQA commented on pull request #33140:
URL: https://github.com/apache/spark/pull/33140#issuecomment-872795927


   **[Test build #140558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140558/testReport)**
 for PR 33140 at commit 
[`5dcf102`](https://github.com/apache/spark/commit/5dcf102d533da1916e910f91feafed9f626dad46).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking commented on pull request #32933:
URL: https://github.com/apache/spark/pull/32933#issuecomment-872796115


   Sure. Let me do it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-07-02 Thread GitBox


Ngone51 commented on a change in pull request #33101:
URL: https://github.com/apache/spark/pull/33101#discussion_r662813036



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -285,9 +285,10 @@ private[spark] object Utils extends Logging {
*/
   def createDirectory(dir: File): Boolean = {
 try {
-  // This sporadically fails - not sure why ... !dir.exists() && 
!dir.mkdirs()
-  // So attempting to create and then check if directory was created or 
not.
-  dir.mkdirs()
+  // SPARK-35907
+  // This could throw more meaningful exception information if directory 
creation failed.
+  // To be on the safe side, try to create and then check if directory was 
created or not.
+  Files.createDirectories(dir.toPath)
   if ( !dir.exists() || !dir.isDirectory) {
 logError(s"Failed to create directory " + dir)
   }

Review comment:
   @Shockang Could you leave a comment above the check to give the 
historical context, e.g.,
   
   "The check was required by File#mkdirs() because it could sporadically fail 
silently. After switching to  Files.createDirectories(), ideally, there should 
no longer be silent fails. But the check is kept for the safety concern. We can 
remove the check when we're sure that Files.createDirectories() would never 
fail silently." 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya edited a comment on pull request #32933: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


viirya edited a comment on pull request #32933:
URL: https://github.com/apache/spark/pull/32933#issuecomment-872795036


   Ah, sorry, I forgot branch-3.2 was cut and this should be in branch-3.2 too. 
@xuanyuanking Can you submit a PR for 3.2?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33140: [SPARK-35881][SQL] Add support for columnar execution of final query stage in AdaptiveSparkPlanExec

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33140:
URL: https://github.com/apache/spark/pull/33140#issuecomment-872797123


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-07-02 Thread GitBox


Ngone51 commented on a change in pull request #33101:
URL: https://github.com/apache/spark/pull/33101#discussion_r662814514



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -285,9 +285,10 @@ private[spark] object Utils extends Logging {
*/
   def createDirectory(dir: File): Boolean = {
 try {
-  // This sporadically fails - not sure why ... !dir.exists() && 
!dir.mkdirs()
-  // So attempting to create and then check if directory was created or 
not.
-  dir.mkdirs()
+  // SPARK-35907
+  // This could throw more meaningful exception information if directory 
creation failed.

Review comment:
   nit: "SPARK-35907: This could..."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33101: [SPARK-35907][CORE] Instead of File#mkdirs, Files#createDirectories is expected.

2021-07-02 Thread GitBox


Ngone51 commented on a change in pull request #33101:
URL: https://github.com/apache/spark/pull/33101#discussion_r662815246



##
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##
@@ -315,10 +316,14 @@ private[spark] object Utils extends Logging {
   }
   try {
 dir = new File(root, namePrefix + "-" + UUID.randomUUID.toString)
-if (dir.exists() || !dir.mkdirs()) {
+// SPARK-35907
+// This could throw more meaningful exception information if directory 
creation failed.

Review comment:
   nit: "SPARK-35907: This could..."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Yikun commented on pull request #33174: [SPARK-35721][PYTHON] Path level discover for python unittests

2021-07-02 Thread GitBox


Yikun commented on pull request #33174:
URL: https://github.com/apache/spark/pull/33174#issuecomment-872798365


   cc @HyukjinKwon @ueshin @viirya @xinrong-databricks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


c21 commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662816765



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -3555,6 +3564,9 @@ class SQLConf extends Serializable with Logging {
 
   def coalesceShufflePartitionsEnabled: Boolean = 
getConf(COALESCE_PARTITIONS_ENABLED)
 
+  def adaptiveCustomCostEvaluatorClass: Option[String] =

Review comment:
   @cloud-fan - sure, updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33070: [SPARK-35551][SQL] Handle the COUNT bug for lateral subqueries

2021-07-02 Thread GitBox


cloud-fan commented on a change in pull request #33070:
URL: https://github.com/apache/spark/pull/33070#discussion_r662817040



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
##
@@ -212,14 +214,38 @@ object DecorrelateInnerQuery extends PredicateHelper {
   }
 
   /**
-   * Rewrite all [[DomainJoin]]s in the inner query to actual inner joins with 
the outer query.
+   * Rewrite all [[DomainJoin]]s in the inner query to actual joins with the 
outer query.
*/
   def rewriteDomainJoins(
   outerPlan: LogicalPlan,
   innerPlan: LogicalPlan,
   conditions: Seq[Expression]): LogicalPlan = innerPlan match {
-case d @ DomainJoin(domainAttrs, child) =>
+case d @ DomainJoin(domainAttrs, child, joinType, condition) =>
   val domainAttrMap = buildDomainAttrMap(conditions, domainAttrs)
+
+  val newChild = joinType match {
+// Left outer domain joins are used to handle the COUNT bug.
+case LeftOuter =>
+  // Replace the attributes in the domain join condition with the 
actual outer expressions
+  // and use the new join conditions to rewrite domain joins in its 
child. For example:
+  // DomainJoin [c'] LeftOuter (a = c') with domainAttrMap: { c' -> _1 
}.
+  // Then the new conditions to use will be [(a = _1)].
+  val newConditions = condition.map(
+_.transform { case a: Attribute => domainAttrMap.getOrElse(a, a)}
+  ).map(splitConjunctivePredicates).getOrElse(conditions)

Review comment:
   That said, for left outer domain join, we don't need to propagate the 
`conditions` parameter of `rewriteDomainJoins` when recursively calling 
`rewriteDomainJoins`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


SparkQA commented on pull request #32944:
URL: https://github.com/apache/spark/pull/32944#issuecomment-872799742


   **[Test build #140567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140567/testReport)**
 for PR 32944 at commit 
[`e202aa8`](https://github.com/apache/spark/commit/e202aa87023d97bae7964cb42f1fce2d5efc1a88).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33070: [SPARK-35551][SQL] Handle the COUNT bug for lateral subqueries

2021-07-02 Thread GitBox


cloud-fan commented on a change in pull request #33070:
URL: https://github.com/apache/spark/pull/33070#discussion_r662817414



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala
##
@@ -212,14 +214,38 @@ object DecorrelateInnerQuery extends PredicateHelper {
   }
 
   /**
-   * Rewrite all [[DomainJoin]]s in the inner query to actual inner joins with 
the outer query.
+   * Rewrite all [[DomainJoin]]s in the inner query to actual joins with the 
outer query.
*/
   def rewriteDomainJoins(
   outerPlan: LogicalPlan,
   innerPlan: LogicalPlan,
   conditions: Seq[Expression]): LogicalPlan = innerPlan match {
-case d @ DomainJoin(domainAttrs, child) =>
+case d @ DomainJoin(domainAttrs, child, joinType, condition) =>
   val domainAttrMap = buildDomainAttrMap(conditions, domainAttrs)
+
+  val newChild = joinType match {
+// Left outer domain joins are used to handle the COUNT bug.
+case LeftOuter =>
+  // Replace the attributes in the domain join condition with the 
actual outer expressions
+  // and use the new join conditions to rewrite domain joins in its 
child. For example:
+  // DomainJoin [c'] LeftOuter (a = c') with domainAttrMap: { c' -> _1 
}.
+  // Then the new conditions to use will be [(a = _1)].
+  val newConditions = condition.map(
+_.transform { case a: Attribute => domainAttrMap.getOrElse(a, a)}
+  ).map(splitConjunctivePredicates).getOrElse(conditions)

Review comment:
   and let's make the code more explicit
   ```
   assert(condition.isDefne)
   val newConditions = condition.get.trasform ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


c21 commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662817747



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1885,4 +1885,54 @@ class AdaptiveQueryExecSuite
   }
 }
   }
+
+  test("SPARK-35794: Allow custom plugin for cost evaluator") {
+CostEvaluator.instantiate(
+  classOf[SimpleShuffleSortCostEvaluator].getCanonicalName, 
spark.sparkContext.getConf)
+intercept[IllegalArgumentException] {
+  CostEvaluator.instantiate(
+classOf[InvalidCostEvaluator].getCanonicalName, 
spark.sparkContext.getConf)
+}
+
+withSQLConf(
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "80") {
+  val query = "SELECT * FROM testData join testData2 ON key = a where 
value = '1'"
+
+  withSQLConf(SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS.key ->
+
"org.apache.spark.sql.execution.adaptive.SimpleShuffleSortCostEvaluator") {

Review comment:
   @cloud-fan - this evaluator does not change plan, and to be the same 
with the builtin evaluator for this query. Do we want to come up a different 
one here? I think this just validates the custom evaluator works.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #33181: [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals

2021-07-02 Thread GitBox


sarutak commented on pull request #33181:
URL: https://github.com/apache/spark/pull/33181#issuecomment-872800794


   retest this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33181: [SPARK-35982][SQL] Allow from_json/to_json for map types where value types are year-month intervals

2021-07-02 Thread GitBox


SparkQA commented on pull request #33181:
URL: https://github.com/apache/spark/pull/33181#issuecomment-872801877


   **[Test build #140568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140568/testReport)**
 for PR 33181 at commit 
[`e795847`](https://github.com/apache/spark/commit/e795847f86274a1d112f11082e0b90917850acef).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33182: [SPARK-35984][SQL] Config to force applying shuffled hash join

2021-07-02 Thread GitBox


SparkQA commented on pull request #33182:
URL: https://github.com/apache/spark/pull/33182#issuecomment-872802886


   **[Test build #140556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140556/testReport)**
 for PR 33182 at commit 
[`24b39a9`](https://github.com/apache/spark/commit/24b39a9667365428a3bbd5f2fe9a92face499420).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33140: [SPARK-35881][SQL] Add support for columnar execution of final query stage in AdaptiveSparkPlanExec

2021-07-02 Thread GitBox


SparkQA removed a comment on pull request #33140:
URL: https://github.com/apache/spark/pull/33140#issuecomment-872689929


   **[Test build #140558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140558/testReport)**
 for PR 33140 at commit 
[`5dcf102`](https://github.com/apache/spark/commit/5dcf102d533da1916e910f91feafed9f626dad46).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33182: [SPARK-35984][SQL] Config to force applying shuffled hash join

2021-07-02 Thread GitBox


SparkQA removed a comment on pull request #33182:
URL: https://github.com/apache/spark/pull/33182#issuecomment-872689863


   **[Test build #140556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140556/testReport)**
 for PR 33182 at commit 
[`24b39a9`](https://github.com/apache/spark/commit/24b39a9667365428a3bbd5f2fe9a92face499420).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872804022


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45076/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


SparkQA commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872804488


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45075/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


cloud-fan commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872805469


   thanks for the review, merging to master/3.2 (since it kinds of fix AQE perf 
issues in the default case)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


cloud-fan closed pull request #33172:
URL: https://github.com/apache/spark/pull/33172


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking opened a new pull request #33184: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking opened a new pull request #33184:
URL: https://github.com/apache/spark/pull/33184


   ### What changes were proposed in this pull request?
   Add the functionality of cleaning up files of old versions for the RocksDB 
instance and RocksDBFileManager.
   
   ### Why are the changes needed?
   Part of the implementation of RocksDB state store.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   New UT added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #33184: [SPARK-35785][SS] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking commented on pull request #33184:
URL: https://github.com/apache/spark/pull/33184#issuecomment-872806205


   cc @HeartSaVioR and @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


cloud-fan commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662823916



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1885,4 +1885,54 @@ class AdaptiveQueryExecSuite
   }
 }
   }
+
+  test("SPARK-35794: Allow custom plugin for cost evaluator") {
+CostEvaluator.instantiate(
+  classOf[SimpleShuffleSortCostEvaluator].getCanonicalName, 
spark.sparkContext.getConf)
+intercept[IllegalArgumentException] {
+  CostEvaluator.instantiate(
+classOf[InvalidCostEvaluator].getCanonicalName, 
spark.sparkContext.getConf)
+}
+
+withSQLConf(
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "80") {
+  val query = "SELECT * FROM testData join testData2 ON key = a where 
value = '1'"
+
+  withSQLConf(SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS.key ->
+
"org.apache.spark.sql.execution.adaptive.SimpleShuffleSortCostEvaluator") {

Review comment:
   SGTM, let's leave it then




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


cloud-fan commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662823916



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##
@@ -1885,4 +1885,54 @@ class AdaptiveQueryExecSuite
   }
 }
   }
+
+  test("SPARK-35794: Allow custom plugin for cost evaluator") {
+CostEvaluator.instantiate(
+  classOf[SimpleShuffleSortCostEvaluator].getCanonicalName, 
spark.sparkContext.getConf)
+intercept[IllegalArgumentException] {
+  CostEvaluator.instantiate(
+classOf[InvalidCostEvaluator].getCanonicalName, 
spark.sparkContext.getConf)
+}
+
+withSQLConf(
+  SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "80") {
+  val query = "SELECT * FROM testData join testData2 ON key = a where 
value = '1'"
+
+  withSQLConf(SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS.key ->
+
"org.apache.spark.sql.execution.adaptive.SimpleShuffleSortCostEvaluator") {

Review comment:
   SGTM




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #32477: [SPARK-35348][SQL] Support the utils for escapse the regex for ANSI SQL: SIMILAR TO … ESCAPE syntax

2021-07-02 Thread GitBox


beliefer commented on pull request #32477:
URL: https://github.com/apache/spark/pull/32477#issuecomment-872806953


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


cloud-fan commented on pull request #32944:
URL: https://github.com/apache/spark/pull/32944#issuecomment-872807066


   @c21 can you fix the code conflicts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


cloud-fan commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872807635


   cc @karenfeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


cloud-fan commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872808864


   @AngersZh can you add a test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR closed pull request #33184: [SPARK-35785][SS][3.2] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


HeartSaVioR closed pull request #33184:
URL: https://github.com/apache/spark/pull/33184


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #33184: [SPARK-35785][SS][3.2] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


HeartSaVioR commented on pull request #33184:
URL: https://github.com/apache/spark/pull/33184#issuecomment-872809187


   Merged to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #33184: [SPARK-35785][SS][3.2] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


HeartSaVioR commented on pull request #33184:
URL: https://github.com/apache/spark/pull/33184#issuecomment-872809916


   forgot my +1 . this is just a cherry-pick of #32933.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-02 Thread GitBox


cloud-fan commented on pull request #33175:
URL: https://github.com/apache/spark/pull/33175#issuecomment-872810020


   cc @yaooqinn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


c21 commented on pull request #32944:
URL: https://github.com/apache/spark/pull/32944#issuecomment-872810966


   @cloud-fan - thanks, just rebased to latest master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-02 Thread GitBox


yaooqinn commented on a change in pull request #33175:
URL: https://github.com/apache/spark/pull/33175#discussion_r662832269



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -662,6 +662,15 @@ case class ShowCurrentNamespace(catalogManager: 
CatalogManager) extends LeafComm
 AttributeReference("namespace", StringType, nullable = false)())
 }
 
+/**
+ * The logical plan of the SHOW CATALOGS command.
+ */
+case class ShowCatalogs(catalogManager: CatalogManager) extends LeafCommand {
+  override val output: Seq[Attribute] = Seq(
+AttributeReference("catalog", StringType, nullable = false)(),
+AttributeReference("default-namespace", StringType, nullable = false)())

Review comment:
   Why we need this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872817850


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45077/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #33184: [SPARK-35785][SS][3.2] Cleanup support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking commented on pull request #33184:
URL: https://github.com/apache/spark/pull/33184#issuecomment-872819404


   Thanks @HeartSaVioR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


SparkQA commented on pull request #32959:
URL: https://github.com/apache/spark/pull/32959#issuecomment-872821187


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45078/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Peng-Lei commented on a change in pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-02 Thread GitBox


Peng-Lei commented on a change in pull request #33175:
URL: https://github.com/apache/spark/pull/33175#discussion_r662839253



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -662,6 +662,15 @@ case class ShowCurrentNamespace(catalogManager: 
CatalogManager) extends LeafComm
 AttributeReference("namespace", StringType, nullable = false)())
 }
 
+/**
+ * The logical plan of the SHOW CATALOGS command.
+ */
+case class ShowCatalogs(catalogManager: CatalogManager) extends LeafCommand {
+  override val output: Seq[Attribute] = Seq(
+AttributeReference("catalog", StringType, nullable = false)(),
+AttributeReference("default-namespace", StringType, nullable = false)())

Review comment:
   schema of output




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Peng-Lei commented on a change in pull request #33175: [SPARK-35973][SQL] DataSourceV2: Support SHOW CATALOGS

2021-07-02 Thread GitBox


Peng-Lei commented on a change in pull request #33175:
URL: https://github.com/apache/spark/pull/33175#discussion_r662839253



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -662,6 +662,15 @@ case class ShowCurrentNamespace(catalogManager: 
CatalogManager) extends LeafComm
 AttributeReference("namespace", StringType, nullable = false)())
 }
 
+/**
+ * The logical plan of the SHOW CATALOGS command.
+ */
+case class ShowCatalogs(catalogManager: CatalogManager) extends LeafCommand {
+  override val output: Seq[Attribute] = Seq(
+AttributeReference("catalog", StringType, nullable = false)(),
+AttributeReference("default-namespace", StringType, nullable = false)())

Review comment:
   @yaooqinn schema of output




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] linhongliu-db commented on a change in pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


linhongliu-db commented on a change in pull request #32959:
URL: https://github.com/apache/spark/pull/32959#discussion_r662843530



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -224,12 +224,12 @@ object DateTimeUtils {
* value. The return type is [[Option]] in order to distinguish between 0L 
and null. The following
* formats are allowed:
*
-   * ``
-   * `-[m]m`
-   * `-[m]m-[d]d`
-   * `-[m]m-[d]d `
-   * `-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
-   * `-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+   * `[+-]y*`
+   * `[+-]y*-[m]m`
+   * `[+-]y*-[m]m-[d]d`
+   * `[+-]y*-[m]m-[d]d `
+   * `[+-]y*-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
+   * `[+-]y*-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
* `[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`

Review comment:
   we simply ignore the `+`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] linhongliu-db commented on a change in pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


linhongliu-db commented on a change in pull request #32959:
URL: https://github.com/apache/spark/pull/32959#discussion_r662844803



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##
@@ -269,10 +274,6 @@ object DateTimeUtils {
   i += 3
 } else if (i < 2) {
   if (b == '-') {
-if (i == 0 && j != 4) {
-  // year should have exact four digits

Review comment:
   sorry, I was thinking too much about the overflow of the whole date and 
missed the overflow of the year segment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33182: [SPARK-35984][SQL] Config to force applying shuffled hash join

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33182:
URL: https://github.com/apache/spark/pull/33182#issuecomment-872827704


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872827705


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45075/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872827702


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45076/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #32959:
URL: https://github.com/apache/spark/pull/32959#issuecomment-872827699


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45078/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33182: [SPARK-35984][SQL] Config to force applying shuffled hash join

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #33182:
URL: https://github.com/apache/spark/pull/33182#issuecomment-872827704


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/140556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872827705


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45075/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872827702


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45076/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32959: [SPARK-35780][SQL] Support DATE/TIMESTAMP literals across the full range

2021-07-02 Thread GitBox


AmplabJenkins removed a comment on pull request #32959:
URL: https://github.com/apache/spark/pull/32959#issuecomment-872827699


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45078/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


SparkQA commented on pull request #32944:
URL: https://github.com/apache/spark/pull/32944#issuecomment-872830092


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45079/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33038: [SPARK-35861][SS] Introduce "prefix match scan" feature on state store

2021-07-02 Thread GitBox


SparkQA commented on pull request #33038:
URL: https://github.com/apache/spark/pull/33038#issuecomment-872830269


   **[Test build #140569 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140569/testReport)**
 for PR 33038 at commit 
[`ef8b767`](https://github.com/apache/spark/commit/ef8b767bc996e1d73b8fc0aaa0825b653d88ac4d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


SparkQA commented on pull request #32944:
URL: https://github.com/apache/spark/pull/32944#issuecomment-872830393


   **[Test build #140570 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140570/testReport)**
 for PR 32944 at commit 
[`c5ed8e7`](https://github.com/apache/spark/commit/c5ed8e7b1c781e13c78af7718a6cfddf773f52bb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #33183: [SPARK-35972][SQL] When replace ExtractValue in NestedColumnAliasing we should use semanticEquals

2021-07-02 Thread GitBox


dongjoon-hyun commented on pull request #33183:
URL: https://github.com/apache/spark/pull/33183#issuecomment-872830607


   +1 for adding test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tpvasconcelos opened a new pull request #33185: [SPARK-35986][PYSPARK] Fix type hint for RDD.histogram's buckets

2021-07-02 Thread GitBox


tpvasconcelos opened a new pull request #33185:
URL: https://github.com/apache/spark/pull/33185


   
   
   ### What changes were proposed in this pull request?
   Fix the type hint for `pyspark.rdd .RDD.histogram`'s `buckets` argument
   
   ### Why are the changes needed?
   The current type hint is incomplete.
   
![image](https://user-images.githubusercontent.com/17701527/124248180-df7fd580-db22-11eb-8391-ba0bb51d689b.png)
   From `pyspark.rdd .RDD.histogram`'s source:
   ```python
   if isinstance(buckets, int):
   ...
   elif isinstance(buckets, (list, tuple)):
   ...
   else:
   raise TypeError("buckets should be a list or tuple or number(int or 
long)")
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Fixed the warning displayed above.
   
   
   
   ### How was this patch tested?
   Fixed warning above with this change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33185: [SPARK-35986][PYSPARK] Fix type hint for RDD.histogram's buckets

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33185:
URL: https://github.com/apache/spark/pull/33185#issuecomment-872833299


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #32934: [WIP][SPARK-35788][SS] Metrics support for RocksDB instance

2021-07-02 Thread GitBox


HeartSaVioR commented on pull request #32934:
URL: https://github.com/apache/spark/pull/32934#issuecomment-872835089


   Shall we rebase this one for continue reviewing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #33179: [SPARK-35981][PYTHON][TEST] Use check_exact=False to loosen the check precision

2021-07-02 Thread GitBox


HyukjinKwon commented on pull request #33179:
URL: https://github.com/apache/spark/pull/33179#issuecomment-872837810


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #33179: [SPARK-35981][PYTHON][TEST] Use check_exact=False to loosen the check precision

2021-07-02 Thread GitBox


HyukjinKwon closed pull request #33179:
URL: https://github.com/apache/spark/pull/33179


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xuanyuanking commented on pull request #32934: [WIP][SPARK-35788][SS] Metrics support for RocksDB instance

2021-07-02 Thread GitBox


xuanyuanking commented on pull request #32934:
URL: https://github.com/apache/spark/pull/32934#issuecomment-872839096


   Yea, just finished the rebasing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #33182: [SPARK-35984][SQL] Config to force applying shuffled hash join

2021-07-02 Thread GitBox


c21 commented on a change in pull request #33182:
URL: https://github.com/apache/spark/pull/33182#discussion_r662854713



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -419,6 +419,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val FORCE_APPLY_SHUFFLEDHASHJOIN = 
buildConf("spark.sql.join.forceApplyShuffledHashJoin")
+.internal()
+.doc("When true, force applying shuffled hash join even if the table sizes 
exceed the " +
+  "threshold. This is for testing/benchmarking only. If this config is set 
to true, the " +
+  "value spark.sql.join.perferSortMergejoin will be ignored.")

Review comment:
   nit: `PREFER_SORTMERGEJOIN.key` instead of 
`spark.sql.join.perferSortMergejoin`.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
##
@@ -272,14 +272,14 @@ trait JoinSelectionHelper {
 val buildLeft = if (hintOnly) {
   hintToShuffleHashJoinLeft(hint)
 } else {
-  hintToPreferShuffleHashJoinLeft(hint) ||
+  hintToPreferShuffleHashJoinLeft(hint) || conf.forceApplyShuffledHashJoin 
||

Review comment:
   I think we don't want user to use this config, and this should be only 
taking effect in testing right? Should we add condition e.g. `Utils.isTesting`?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -419,6 +419,15 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val FORCE_APPLY_SHUFFLEDHASHJOIN = 
buildConf("spark.sql.join.forceApplyShuffledHashJoin")
+.internal()
+.doc("When true, force applying shuffled hash join even if the table sizes 
exceed the " +
+  "threshold. This is for testing/benchmarking only. If this config is set 
to true, the " +
+  "value spark.sql.join.perferSortMergejoin will be ignored.")
+.version("3.2.0")

Review comment:
   nit: we are on `3.3.0` now I think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32934: [WIP][SPARK-35788][SS] Metrics support for RocksDB instance

2021-07-02 Thread GitBox


SparkQA commented on pull request #32934:
URL: https://github.com/apache/spark/pull/32934#issuecomment-872839787


   **[Test build #140571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140571/testReport)**
 for PR 32934 at commit 
[`eea3aa0`](https://github.com/apache/spark/commit/eea3aa02796ab47635d03913bb1b1fdb3f013191).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


AmplabJenkins commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872839961


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45077/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33172: [SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

2021-07-02 Thread GitBox


SparkQA commented on pull request #33172:
URL: https://github.com/apache/spark/pull/33172#issuecomment-872839907


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45077/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33116: [SPARK-35259][SHUFFLE] Rename ExternalBlockHandler Timer variables to remove incorrect millis suffix

2021-07-02 Thread GitBox


Ngone51 commented on a change in pull request #33116:
URL: https://github.com/apache/spark/pull/33116#discussion_r662860573



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java
##
@@ -323,10 +323,13 @@ private void checkAuth(TransportClient client, String 
appId) {
 
 public ShuffleMetrics() {
   allMetrics = new HashMap<>();
-  allMetrics.put("openBlockRequestLatencyMillis", 
openBlockRequestLatencyMillis);
-  allMetrics.put("registerExecutorRequestLatencyMillis", 
registerExecutorRequestLatencyMillis);
-  allMetrics.put("fetchMergedBlocksMetaLatencyMillis", 
fetchMergedBlocksMetaLatencyMillis);
-  allMetrics.put("finalizeShuffleMergeLatencyMillis", 
finalizeShuffleMergeLatencyMillis);
+  // Note that for the latency metrics, the default unit is actually 
nanos, not millis.
+  // The variables have been renamed, but to preserve backwards 
compatibility, the metric
+  // names remain unchanged. See SPARK-35259 for more details.
+  allMetrics.put("openBlockRequestLatencyMillis", openBlockRequestLatency);

Review comment:
   @dongjoon-hyun It requires a `Metric` here rather than a long value.
   
   The `Timer` doesn't seem to provide any APIs to get the milliseconds. The 
only way I see now is to implements Spark's own `Timer` by extending 
Dropwizard's one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang opened a new pull request #33186: [SPARK-35987][SQL] The ANSI flags of Sum and Avg should be kept after being copied

2021-07-02 Thread GitBox


gengliangwang opened a new pull request #33186:
URL: https://github.com/apache/spark/pull/33186


   
   
   ### What changes were proposed in this pull request?
   
   Make the ANSI flag part of expressions `Sum` and `Average`'s parameter list, 
instead of fetching it from the sessional SQLConf.
   
   ### Why are the changes needed?
   
   For Views, it is important to show consistent results even the ANSI 
configuration is different in the running session. This is why many expressions 
like 'Add'/'Divide' making the ANSI flag part of its case class parameter list.
   
   We should make it consistent for the expressions `Sum` and `Average`
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the `Sum` and `Average` inside a View always behaves the same, 
independent of the ANSI model SQL configuration in the current session.
   
   ### How was this patch tested?
   
   Existing UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


HyukjinKwon commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662860881



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -130,7 +130,11 @@ case class AdaptiveSparkPlanExec(
 }
   }
 
-  @transient private val costEvaluator = SimpleCostEvaluator
+  @transient private val costEvaluator =
+conf.getConf(SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS) match {

Review comment:
   Hey, how do we use this? `CostEvaluator` isn't an API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33180: [SPARK-35825][INFRA][FOLLOWUP] Increase it in build/mvn script

2021-07-02 Thread GitBox


SparkQA commented on pull request #33180:
URL: https://github.com/apache/spark/pull/33180#issuecomment-872843326


   **[Test build #140551 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140551/testReport)**
 for PR 33180 at commit 
[`4c4fcec`](https://github.com/apache/spark/commit/4c4fcec9d3002c7486930c49b65a57d4ace72288).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33180: [SPARK-35825][INFRA][FOLLOWUP] Increase it in build/mvn script

2021-07-02 Thread GitBox


SparkQA removed a comment on pull request #33180:
URL: https://github.com/apache/spark/pull/33180#issuecomment-872635660


   **[Test build #140551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140551/testReport)**
 for PR 33180 at commit 
[`4c4fcec`](https://github.com/apache/spark/commit/4c4fcec9d3002c7486930c49b65a57d4ace72288).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


HyukjinKwon commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662862925



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/costing.scala
##
@@ -17,16 +17,35 @@
 
 package org.apache.spark.sql.execution.adaptive
 
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Utils
 
 /**
- * Represents the cost of a plan.
+ * An interface to represent the cost of a plan.
  */
 trait Cost extends Ordered[Cost]
 
 /**
- * Evaluates the cost of a physical plan.
+ * An interface to evaluate the cost of a physical plan.
  */
 trait CostEvaluator {

Review comment:
   whole execution package is for internal purpose. I don't think it makes 
much sense to make it pluggable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32934: [WIP][SPARK-35788][SS] Metrics support for RocksDB instance

2021-07-02 Thread GitBox


SparkQA commented on pull request #32934:
URL: https://github.com/apache/spark/pull/32934#issuecomment-872845045


   **[Test build #140571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/140571/testReport)**
 for PR 32934 at commit 
[`eea3aa0`](https://github.com/apache/spark/commit/eea3aa02796ab47635d03913bb1b1fdb3f013191).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32944: [SPARK-35794][SQL] Allow custom plugin for AQE cost evaluator

2021-07-02 Thread GitBox


HyukjinKwon commented on a change in pull request #32944:
URL: https://github.com/apache/spark/pull/32944#discussion_r662863060



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -678,6 +678,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS =
+buildConf("spark.sql.adaptive.customCostEvaluatorClass")
+  .doc("The custom cost evaluator class to be used for adaptive execution. 
If not being set," +
+" Spark will use its own SimpleCostEvaluator by default.")
+  .version("3.2.0")
+  .internal()

Review comment:
   Why is it an internal configuration?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >