date:20210207

[GitHub] [spark] linhongliu-db commented on pull request #30363: [SPARK-33438][SQL] Eagerly init objects with defined SQL Confs for command `set -v`

2021-02-07 Thread GitBox



linhongliu-db commented on pull request #30363:
URL: https://github.com/apache/spark/pull/30363#issuecomment-774948026


   cc @viirya @maropu @HyukjinKwon, this PR is updated based on discussion



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers

2021-02-07 Thread GitBox



HyukjinKwon commented on pull request #31384:
URL: https://github.com/apache/spark/pull/31384#issuecomment-774947870


   @gaborgsomogyi is there anybody who you know are used to JDBC and Kerberos 
and can review? Looks fine but to be honest I am not very used to this area, 
and don't have an env to test either.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31491:
URL: https://github.com/apache/spark/pull/31491#issuecomment-774841886


   **[Test build #134997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134997/testReport)**
 for PR 31491 at commit 
[`899706f`](https://github.com/apache/spark/commit/899706f5d89f29c4c4d93db92179da081f5bb10d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType

2021-02-07 Thread GitBox



SparkQA commented on pull request #31491:
URL: https://github.com/apache/spark/pull/31491#issuecomment-774947449


   **[Test build #134997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134997/testReport)**
 for PR 31491 at commit 
[`899706f`](https://github.com/apache/spark/commit/899706f5d89f29c4c4d93db92179da081f5bb10d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



SparkQA commented on pull request #31516:
URL: https://github.com/apache/spark/pull/31516#issuecomment-774946282


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39590/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2021-02-07 Thread GitBox



cloud-fan commented on pull request #24559:
URL: https://github.com/apache/spark/pull/24559#issuecomment-774945050


   @rdblue Thanks for writing up the design doc! This is a very important and 
useful feature, and the `UnboundFunction` seems like a very interesting idea. 
It allows function overload (for different input schema, people can return 
different `BoundFunction`), but I'm wondering how it can suggest Spark to add 
Cast. For example, if a function accepts int type input, but the actual input 
is byte type.
   
   Another point is we should think of the final generated java code when 
invoking UDF. With whole-stage-codegen (the default case), the input values are 
actually java variables in the generated java code. It means we need to build 
an `InternalRow` before invoking the new UDF, which is very inefficient and is 
even worse than the current Spark Scala/Java UDF. Also, the type parameter of 
the return type has perf issues because of primitive type boxing.
   
   My rough idea is
   ```
   interface ScalarFunction {
 StructType[] expectedInputTypes();
 DataType returnType();
   }
   
   class MyScalaFunction implements ScalarFunction {
 StructType[] expectedInputTypes() { // ... allows int and string }
 DataType returnType() { return IntegerType; }
   
 int call(int arg) { return String.valueOf(arg).length(); }
 int call(UTF8String arg) { return arg.length(); }
   }
   ```
   The analyzer will bind the UDF with actual input types (add implicit cast if 
needed), and check if the `call` method exits
for certain input/return types via reflection. Then in whole-stage-codegen, 
we just call the `call` method with certain type of inputs, and assign the 
result to a java variable. No need to build `InternalRow`, no boxing overhead, 
but no compile-time type safety (analyzer can still catch errors).
   
   cc @viirya @maropu @kiszk @rednaxelafx 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



Ngone51 commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571835445



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   Thanks for bringing the details, it makes sense to me. But please also 
note that I wouldn't raise such optimization changes if the `TreeMap` is 
already an existing implementation. However, for a PR, I think it's good to 
have more inputs regardless of the final decision.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-07 Thread GitBox



HeartSaVioR closed pull request #31471:
URL: https://github.com/apache/spark/pull/31471


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31471:
URL: https://github.com/apache/spark/pull/31471#issuecomment-774838774


   **[Test build #134999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134999/testReport)**
 for PR 31471 at commit 
[`9d6eec7`](https://github.com/apache/spark/commit/9d6eec760927d7ae01c7a4b0f0fb6457df80ce6f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-07 Thread GitBox



HeartSaVioR commented on pull request #31471:
URL: https://github.com/apache/spark/pull/31471#issuecomment-774942511


   Thanks! Merging to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job

2021-02-07 Thread GitBox



SparkQA commented on pull request #31471:
URL: https://github.com/apache/spark/pull/31471#issuecomment-774942341


   **[Test build #134999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134999/testReport)**
 for PR 31471 at commit 
[`9d6eec7`](https://github.com/apache/spark/commit/9d6eec760927d7ae01c7a4b0f0fb6457df80ce6f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you commented on a change in pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate

2021-02-07 Thread GitBox



ulysses-you commented on a change in pull request #31509:
URL: https://github.com/apache/spark/pull/31509#discussion_r571833216



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
##
@@ -269,3 +269,62 @@ case class TypeOf(child: Expression) extends 
UnaryExpression {
 defineCodeGen(ctx, ev, _ => 
s"""UTF8String.fromString(${child.dataType.catalogString})""")
   }
 }
+
+@ExpressionDescription(
+  usage = """_FUNC_(expr) - Execute all children and return the last child 
result.""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1, 2);
+   2
+  > SELECT _FUNC_(1 + 2, 3 + 4);
+   7
+  """,
+  since = "3.2.0",
+  group = "misc_funcs")
+case class DelegateFunction(children: Seq[Expression]) extends Expression {
+  require(children.nonEmpty, s"$prettyName function requires children is not 
empty.")
+
+  private lazy val lastChild = children.last
+
+  override lazy val deterministic: Boolean = children.forall(_.deterministic)
+  override lazy val resolved: Boolean = children.forall(_.resolved)
+  override def foldable: Boolean = children.forall(_.foldable)
+  override def nullable: Boolean = lastChild.nullable
+  override def dataType: DataType = lastChild.dataType
+
+  override def eval(input: InternalRow): Any = {
+var result: Any = null
+children.foreach { child =>
+  result = child.eval(input)
+}
+result

Review comment:
   Not sure what do you mean `same child` ?  This function just execute 
child one by one.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571829604



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   This is another sort of micro-optimization; realistic latency of 
micro-batch is 1s+ (doesn't matter if we consider very tight micro-batch, like 
500ms) and we worry about creating "an" object per such period which will be 
marked as "unused" after couple of batches.
   
   This is the clear example why micro-optimization is bad without 
understanding full context - optimization should evaluate about the impact and 
proceed only when it contributes at least 1% (I'd rather not even concern about 
1% though if the sub-optimal code is more intuitive).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571832414



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
##
@@ -239,18 +239,35 @@ class HDFSMetadataLog[T <: AnyRef : 
ClassTag](sparkSession: SparkSession, path:
   .reverse
   }
 
+  private var lastPurgedBatchId: Long = -1L
+
   /**
* Removes all the log entry earlier than thresholdBatchId (exclusive).
*/
   override def purge(thresholdBatchId: Long): Unit = {
-val batchIds = fileManager.list(metadataPath, batchFilesFilter)
-  .map(f => pathToBatchId(f.getPath))
-
-for (batchId <- batchIds if batchId < thresholdBatchId) {
-  val path = batchIdToPath(batchId)
-  fileManager.delete(path)
-  logTrace(s"Removed metadata log file: $path")
+val possibleTargetBatchIds = (lastPurgedBatchId + 1 until thresholdBatchId)
+if (possibleTargetBatchIds.length <= 3) {
+  // avoid using list if we only need to purge at most 3 elements
+  possibleTargetBatchIds.foreach { batchId =>
+val path = batchIdToPath(batchId)
+if (fileManager.exists(path)) {

Review comment:
   (Just wanted to mention; the case what I'm considering is when the file 
doesn't exist - then the case would be changed to exist vs delete. I'm going to 
evaluate this because I don't know the cost comparison between exist and delete 
on non-exist - if the cost difference is significant and exist is faster, it's 
going to be some sort of probability/heuristic. If not, we should simply try 
calling delete.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gaborgsomogyi commented on pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers

2021-02-07 Thread GitBox



gaborgsomogyi commented on pull request #31384:
URL: https://github.com/apache/spark/pull/31384#issuecomment-774940510


   Are there anything I can add/fix?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

2021-02-07 Thread GitBox



SparkQA commented on pull request #31508:
URL: https://github.com/apache/spark/pull/31508#issuecomment-774939013


   **[Test build #135012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135012/testReport)**
 for PR 31508 at commit 
[`d964a05`](https://github.com/apache/spark/commit/d964a059a4882cecddc3dbe2d4343cbf6298ff44).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571829604



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   This is another sort of micro-optimization; realistic latency of 
micro-batch is 1s+ (doesn't matter if we consider very tight micro-batch, like 
500ms) and we worry about creating "an" object per such period which will be 
marked as "unused" after couple of batches.
   
   This is the clear example why micro-optimization is bad without 
understanding full context - optimization should evaluate about the impact and 
proceed only when it contributes at least 1% (I'd rather not even concern about 
1% though if the code is more intuitive).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31519: [SPARK-34394][SQL] Unify output of SHOW FUNCTIONS and pass output attributes properly

2021-02-07 Thread GitBox



SparkQA commented on pull request #31519:
URL: https://github.com/apache/spark/pull/31519#issuecomment-774937492


   **[Test build #135010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135010/testReport)**
 for PR 31519 at commit 
[`a980eb4`](https://github.com/apache/spark/commit/a980eb417e7ecfd0569129c6809450d762e0bdb5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



SparkQA commented on pull request #31518:
URL: https://github.com/apache/spark/pull/31518#issuecomment-774937178


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39588/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-07 Thread GitBox



HyukjinKwon commented on a change in pull request #31466:
URL: https://github.com/apache/spark/pull/31466#discussion_r571829295



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
##
@@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSparkSession with SQLHelper
 // Filter out test files with invalid extensions such as temp files created
 // by vi (.swp), Mac (.DS_Store) etc.
 val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions))
-filteredFiles ++ dirs.flatMap(listFilesRecursively)
+val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively)
+// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` 
command,
+// here we need to check command available
+if (TestUtils.testCommandAvailable("/bin/bash")) {
+  allFiles
+} else {
+  allFiles.filterNot(_.getName == "transform.sql")

Review comment:
   `TestUtils.testCommandAvailable("/bin/bash")` won't be executed via 
short circuiting





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-07 Thread GitBox



beliefer commented on a change in pull request #31466:
URL: https://github.com/apache/spark/pull/31466#discussion_r571828443



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
##
@@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSparkSession with SQLHelper
 // Filter out test files with invalid extensions such as temp files created
 // by vi (.swp), Mac (.DS_Store) etc.
 val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions))
-filteredFiles ++ dirs.flatMap(listFilesRecursively)
+val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively)
+// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` 
command,
+// here we need to check command available
+if (TestUtils.testCommandAvailable("/bin/bash")) {
+  allFiles
+} else {
+  allFiles.filterNot(_.getName == "transform.sql")

Review comment:
   If so, `SQLQueryTestSuite` will execute 
`TestUtils.testCommandAvailable("/bin/bash")` many times.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774935418


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39591/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774935418


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39591/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



SparkQA commented on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774935398


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39591/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774929003


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39592/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31485:
URL: https://github.com/apache/spark/pull/31485#issuecomment-774929004


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39587/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774931027


   **[Test build #135011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135011/testReport)**
 for PR 31517 at commit 
[`4b49b84`](https://github.com/apache/spark/commit/4b49b84e0c038d286ca09039e774815f4aea7296).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #31369: [SPARK-34270][SS] Combine StateStoreMetrics should not override StateStoreCustomMetric

2021-02-07 Thread GitBox



Ngone51 commented on pull request #31369:
URL: https://github.com/apache/spark/pull/31369#issuecomment-774929505


   Hopefully, callers in the future could be always aware that 
`StateStoreMetrics` is combined by `StateStoreCustomMetric` rather than 
`StateStoreCustomMetric.name`. Otherwise, usages like 
   ```scala
   combinedMetrics.customMetrics.foreach { case (metric, value) => 
  longMetric(metric.name) = value 
  ^ "=" instead of "+="
   } 
   ```
   could result in the wrong metrics again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31485:
URL: https://github.com/apache/spark/pull/31485#issuecomment-774929004


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39587/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774929003


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39592/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



SparkQA commented on pull request #31516:
URL: https://github.com/apache/spark/pull/31516#issuecomment-774928806


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39590/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-07 Thread GitBox



HyukjinKwon commented on a change in pull request #31466:
URL: https://github.com/apache/spark/pull/31466#discussion_r571820552



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
##
@@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSparkSession with SQLHelper
 // Filter out test files with invalid extensions such as temp files created
 // by vi (.swp), Mac (.DS_Store) etc.
 val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions))
-filteredFiles ++ dirs.flatMap(listFilesRecursively)
+val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively)
+// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` 
command,
+// here we need to check command available
+if (TestUtils.testCommandAvailable("/bin/bash")) {
+  allFiles
+} else {
+  allFiles.filterNot(_.getName == "transform.sql")

Review comment:
   I meant the fix such as:
   
   ```scala
   assume(
 !testCase.inputFile.endsWith("transform.sql") ||
 TestUtils.testCommandAvailable("/bin/bash"))
   ```
   
   I tested that it only skips transform.sql when `/bin/bash` is not available:
   
   ```
   [info] - transform.sql !!! CANCELED !!! (36 milliseconds)
   [info]   
"/.../spark/sql/core/src/test/resources/sql-tests/inputs/transform.sql" ended 
with "transform.sql", and 
org.apache.spark.TestUtils.testCommandAvailable("/bin/bas") was false 
(SQLQueryTestSuite.scala:265)
   [info]   org.scalatest.exceptions.TestCanceledException:
   [info]   at 
org.scalatest.Assertions.newTestCanceledException(Assertions.scala:475)
   [info]   at 
org.scalatest.Assertions.newTestCanceledException$(Assertions.scala:474)
   [info]   at 
org.scalatest.Assertions$.newTestCanceledException(Assertions.scala:1231)
   [info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssume(Assertions.scala:1310)
   [info]   at 
org.apache.spark.sql.SQLQueryTestSuite.runTest(SQLQueryTestSuite.scala:265)
   [info]   at 
org.apache.spark.sql.SQLQueryTestSuite.$anonfun$createScalaTestCase$5(SQLQueryTestSuite.scala:247)
   [info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
   [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
   [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
   [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
   [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
   [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
   [info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
   [info]   at 
org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ben-manes commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



ben-manes commented on a change in pull request #31517:
URL: https://github.com/apache/spark/pull/31517#discussion_r571820166



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala
##
@@ -58,24 +58,26 @@ private[history] class ApplicationCache(
 
   }
 
-  private val removalListener = new RemovalListener[CacheKey, CacheEntry] {
+  private val cacheWriter = new CacheWriter[CacheKey, CacheEntry] {

Review comment:
   fyi, `CacheWriter` will be deprecated and replaced in 2.9, and removed 
in 3.0. Instead `Caffeine.evictionListener(RemovalListener)` will provide the 
sync remove behavior, and any other atomic writes can be captured manually via 
`asMap().compute` methods. Should be minor change for you when 2.9 is released.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer opened a new pull request #31519: [SPARK-34394][SQL] Unify output of SHOW FUNCTIONS and pass output attributes properly

2021-02-07 Thread GitBox



beliefer opened a new pull request #31519:
URL: https://github.com/apache/spark/pull/31519


   ### What changes were proposed in this pull request?
   The current implement of some DDL not unify the output and not pass the 
output properly to physical command.
   Such as: The output attributes of `ShowFunctions` does't pass to 
`ShowFunctionsCommand` properly.
   
   As the query plan, this PR pass the output attributes from `ShowFunctions` 
to `ShowFunctionsCommand`.
   
   
   ### Why are the changes needed?
   This PR pass the output attributes could keep the expr ID unchanged, so that 
avoid bugs when we apply more operators above the command output dataframe.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   Jenkins test.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



SparkQA commented on pull request #31518:
URL: https://github.com/apache/spark/pull/31518#issuecomment-774922915


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39588/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



Ngone51 commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571818019



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
##
@@ -239,18 +239,35 @@ class HDFSMetadataLog[T <: AnyRef : 
ClassTag](sparkSession: SparkSession, path:
   .reverse
   }
 
+  private var lastPurgedBatchId: Long = -1L
+
   /**
* Removes all the log entry earlier than thresholdBatchId (exclusive).
*/
   override def purge(thresholdBatchId: Long): Unit = {
-val batchIds = fileManager.list(metadataPath, batchFilesFilter)
-  .map(f => pathToBatchId(f.getPath))
-
-for (batchId <- batchIds if batchId < thresholdBatchId) {
-  val path = batchIdToPath(batchId)
-  fileManager.delete(path)
-  logTrace(s"Removed metadata log file: $path")
+val possibleTargetBatchIds = (lastPurgedBatchId + 1 until thresholdBatchId)
+if (possibleTargetBatchIds.length <= 3) {
+  // avoid using list if we only need to purge at most 3 elements
+  possibleTargetBatchIds.foreach { batchId =>
+val path = batchIdToPath(batchId)
+if (fileManager.exists(path)) {

Review comment:
   Sure, please. I'm fine either way unless there's a noticeable perf 
difference.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



Ngone51 commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571817596



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   Ok..I saw where the problem is with my test. You're right the latency is 
trivial.
   
   I'm not against your solution here. But since we've reached here, I'd like 
to mention one more thing that TreeMap tends to produce the instant object 
`AscendingSubMap` for each batch while Array doesn't. Although, It might also 
be trivial.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



SparkQA commented on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774921702


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39591/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] pan3793 commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



pan3793 commented on a change in pull request #31517:
URL: https://github.com/apache/spark/pull/31517#discussion_r571813479



##
File path: core/pom.xml
##
@@ -47,6 +47,14 @@
   com.google.guava
   guava
 
+
+  com.github.ben-manes.caffeine
+  caffeine
+
+
+  com.github.ben-manes.caffeine 

Review comment:
   redundant space





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

2021-02-07 Thread GitBox



beliefer commented on a change in pull request #31466:
URL: https://github.com/apache/spark/pull/31466#discussion_r571811985



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala
##
@@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSparkSession with SQLHelper
 // Filter out test files with invalid extensions such as temp files created
 // by vi (.swp), Mac (.DS_Store) etc.
 val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions))
-filteredFiles ++ dirs.flatMap(listFilesRecursively)
+val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively)
+// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` 
command,
+// here we need to check command available
+if (TestUtils.testCommandAvailable("/bin/bash")) {
+  allFiles
+} else {
+  allFiles.filterNot(_.getName == "transform.sql")

Review comment:
   `SQLQueryTestSuite` must contains `transform.sql`. Why we need to judge 
`test name contains transform`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

2021-02-07 Thread GitBox



beliefer commented on pull request #31508:
URL: https://github.com/apache/spark/pull/31508#issuecomment-774912164


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

2021-02-07 Thread GitBox



SparkQA commented on pull request #31485:
URL: https://github.com/apache/spark/pull/31485#issuecomment-774910167


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39587/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



SparkQA commented on pull request #31516:
URL: https://github.com/apache/spark/pull/31516#issuecomment-774906595


   **[Test build #135007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135007/testReport)**
 for PR 31516 at commit 
[`43a6c5d`](https://github.com/apache/spark/commit/43a6c5d65e5288f8b626581ccf7f13649f7f7fc1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



SparkQA commented on pull request #31518:
URL: https://github.com/apache/spark/pull/31518#issuecomment-774906642


   **[Test build #135005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135005/testReport)**
 for PR 31518 at commit 
[`12e569b`](https://github.com/apache/spark/commit/12e569be80f3bb03daac2dfa15b507572cafbaaa).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774906502


   **[Test build #135006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135006/testReport)**
 for PR 31517 at commit 
[`0c5382a`](https://github.com/apache/spark/commit/0c5382af0a54c5db8cf9ffee6a7a5040be5cb1c7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



LuciferYang commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774905437


   thx ~ @HyukjinKwon 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774903911


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39589/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774903911


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39589/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-774903659


   **[Test build #135009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135009/testReport)**
 for PR 31517 at commit 
[`4761a5b`](https://github.com/apache/spark/commit/4761a5b24637020028f71387e8fecbd4c4f67ba1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #31378: [SPARK-34240][SQL] Unify output of `SHOW TBLPROPERTIES` clause's output attribute's schema and ExprID

2021-02-07 Thread GitBox



AngersZh commented on pull request #31378:
URL: https://github.com/apache/spark/pull/31378#issuecomment-774901974


   ping @cloud-fan Any more need update?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31508:
URL: https://github.com/apache/spark/pull/31508#issuecomment-774899900


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134995/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

2021-02-07 Thread GitBox



SparkQA commented on pull request #31485:
URL: https://github.com/apache/spark/pull/31485#issuecomment-774901374


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39587/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31484:
URL: https://github.com/apache/spark/pull/31484#issuecomment-774899153


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39586/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31508:
URL: https://github.com/apache/spark/pull/31508#issuecomment-774823100


   **[Test build #134995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134995/testReport)**
 for PR 31508 at commit 
[`d964a05`](https://github.com/apache/spark/commit/d964a059a4882cecddc3dbe2d4343cbf6298ff44).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774899148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-774899151


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135000/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774899147







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate

2021-02-07 Thread GitBox



AngersZh commented on a change in pull request #31509:
URL: https://github.com/apache/spark/pull/31509#discussion_r571802245



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
##
@@ -269,3 +269,62 @@ case class TypeOf(child: Expression) extends 
UnaryExpression {
 defineCodeGen(ctx, ev, _ => 
s"""UTF8String.fromString(${child.dataType.catalogString})""")
   }
 }
+
+@ExpressionDescription(
+  usage = """_FUNC_(expr) - Execute all children and return the last child 
result.""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1, 2);
+   2
+  > SELECT _FUNC_(1 + 2, 3 + 4);
+   7
+  """,
+  since = "3.2.0",
+  group = "misc_funcs")
+case class DelegateFunction(children: Seq[Expression]) extends Expression {
+  require(children.nonEmpty, s"$prettyName function requires children is not 
empty.")
+
+  private lazy val lastChild = children.last
+
+  override lazy val deterministic: Boolean = children.forall(_.deterministic)
+  override lazy val resolved: Boolean = children.forall(_.resolved)
+  override def foldable: Boolean = children.forall(_.foldable)
+  override def nullable: Boolean = lastChild.nullable
+  override def dataType: DataType = lastChild.dataType
+
+  override def eval(input: InternalRow): Any = {
+var result: Any = null
+children.foreach { child =>
+  result = child.eval(input)
+}
+result

Review comment:
   Hmmm how about add a result map and avoid re-compute same child?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31508:
URL: https://github.com/apache/spark/pull/31508#issuecomment-774899900


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134995/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



SparkQA commented on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774899751


   **[Test build #135008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135008/testReport)**
 for PR 31504 at commit 
[`5e32ffd`](https://github.com/apache/spark/commit/5e32ffd3b10ed1d4e349cb0b972296ac7bd5b0fe).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774899148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774899147







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31484:
URL: https://github.com/apache/spark/pull/31484#issuecomment-774899153


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39586/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

2021-02-07 Thread GitBox



SparkQA commented on pull request #31508:
URL: https://github.com/apache/spark/pull/31508#issuecomment-774899206


   **[Test build #134995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134995/testReport)**
 for PR 31508 at commit 
[`d964a05`](https://github.com/apache/spark/commit/d964a059a4882cecddc3dbe2d4343cbf6298ff44).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-774899151


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135000/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



HyukjinKwon closed pull request #31487:
URL: https://github.com/apache/spark/pull/31487


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



HyukjinKwon commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774897760


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-774842281


   **[Test build #135000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135000/testReport)**
 for PR 31419 at commit 
[`010413e`](https://github.com/apache/spark/commit/010413ee49728b5ed537636aef520f024e12ec09).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

2021-02-07 Thread GitBox



SparkQA commented on pull request #31419:
URL: https://github.com/apache/spark/pull/31419#issuecomment-774895493


   **[Test build #135000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135000/testReport)**
 for PR 31419 at commit 
[`010413e`](https://github.com/apache/spark/commit/010413ee49728b5ed537636aef520f024e12ec09).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class AlterTableSetLocation(`
 * `case class AlterTableSetProperties(`
 * `case class AlterTableUnsetProperties(`
 * `  implicit class MetadataColumnHelper(attr: Attribute) `
 * `class ResolveSessionCatalog(val catalogManager: CatalogManager)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774842883


   **[Test build #134998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)**
 for PR 31480 at commit 
[`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-07 Thread GitBox



SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774893887


   **[Test build #134998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)**
 for PR 31480 at commit 
[`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774844717







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



SparkQA commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774892805


   **[Test build #135001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135001/testReport)**
 for PR 31487 at commit 
[`0ecf1a2`](https://github.com/apache/spark/commit/0ecf1a223488eac6d293a656978e2c85fa00).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  implicit class MetadataColumnHelper(attr: Attribute) `



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



SparkQA commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774892695


   **[Test build #135002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135002/testReport)**
 for PR 31487 at commit 
[`a8ebb43`](https://github.com/apache/spark/commit/a8ebb4326f3ec92d7eee87dc72f4eb806a1e8c7c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571798020



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala
##
@@ -239,18 +239,35 @@ class HDFSMetadataLog[T <: AnyRef : 
ClassTag](sparkSession: SparkSession, path:
   .reverse
   }
 
+  private var lastPurgedBatchId: Long = -1L
+
   /**
* Removes all the log entry earlier than thresholdBatchId (exclusive).
*/
   override def purge(thresholdBatchId: Long): Unit = {
-val batchIds = fileManager.list(metadataPath, batchFilesFilter)
-  .map(f => pathToBatchId(f.getPath))
-
-for (batchId <- batchIds if batchId < thresholdBatchId) {
-  val path = batchIdToPath(batchId)
-  fileManager.delete(path)
-  logTrace(s"Removed metadata log file: $path")
+val possibleTargetBatchIds = (lastPurgedBatchId + 1 until thresholdBatchId)
+if (possibleTargetBatchIds.length <= 3) {
+  // avoid using list if we only need to purge at most 3 elements
+  possibleTargetBatchIds.foreach { batchId =>
+val path = batchIdToPath(batchId)
+if (fileManager.exists(path)) {

Review comment:
   Yeah that also makes sense. I'm not sure about how much the cost would 
be saved though. Let me play with this a bit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #31503: [SPARK-34391][BUILD] Upgrade commons-io to 2.8.0

2021-02-07 Thread GitBox



dongjoon-hyun closed pull request #31503:
URL: https://github.com/apache/spark/pull/31503


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



LuciferYang commented on a change in pull request #31517:
URL: https://github.com/apache/spark/pull/31517#discussion_r571795620



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala
##
@@ -192,6 +192,7 @@ class ApplicationCacheSuite extends SparkFunSuite with 
Logging with MockitoSugar
 cache.get("2")
 cache.get("3")
 
+Thread.sleep(5L)

Review comment:
   wait data eviction





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #31503: [SPARK-34391][BUILD] Upgrade commons-io to 2.8.0

2021-02-07 Thread GitBox



dongjoon-hyun commented on pull request #31503:
URL: https://github.com/apache/spark/pull/31503#issuecomment-774888294


   Thank you, @srowen !



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #31515: [SPARK-34346][CORE][TESTS][FOLLOWUP] Fix UT by removing core-site.xml

2021-02-07 Thread GitBox



dongjoon-hyun commented on pull request #31515:
URL: https://github.com/apache/spark/pull/31515#issuecomment-774888033


   Thank you, @srowen , @yaooqinn , @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner

2021-02-07 Thread GitBox



SparkQA commented on pull request #31480:
URL: https://github.com/apache/spark/pull/31480#issuecomment-774887363


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39581/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



LuciferYang commented on a change in pull request #31517:
URL: https://github.com/apache/spark/pull/31517#discussion_r571795094



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala
##
@@ -58,24 +58,26 @@ private[history] class ApplicationCache(
 
   }
 
-  private val removalListener = new RemovalListener[CacheKey, CacheEntry] {
+  private val cacheWriter = new CacheWriter[CacheKey, CacheEntry] {

Review comment:
   `CacheWriter ` adopts sync remove behavior similar to guava and 
`RemovalListener ` always Asynchronous





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571792973



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   https://gist.github.com/HeartSaVioR/111ed75aa2dc4672e36968c02db83e26
   
   ```
   import java.lang.{Long => JLong}
   import java.util.{ArrayList, Collections, TreeMap}
   
   def c(treeMap: TreeMap[Long, String]): Long = {
 val t1 = System.nanoTime()
 treeMap.put(1, "1")
 treeMap.put(2, "3")
 treeMap.put(3, "3")
 treeMap.headMap(2, true).clear()
 (System.nanoTime() - t1)
   }
   
   def d(treeMap: TreeMap[Long, String], idx: Long, value: String): Long = {
 val t1 = System.nanoTime()
 treeMap.put(idx, value)
 treeMap.headMap(idx - 2, true).clear()
 (System.nanoTime() - t1)
   }
   
   def experimentC(): Unit = {
 val latencies = new ArrayList[JLong]()
 val warmupCount = 100
 val runCount = 1000
   
 (1 to warmupCount).foreach { _ =>
   val t = new java.util.TreeMap[Long, String]()
   c(t)
 }
   
 (1 to runCount).foreach { _ =>
   val t = new java.util.TreeMap[Long, String]()
   latencies.add(JLong.valueOf(c(t)))
 }
   
 java.util.Collections.sort(latencies)
   
 printLatencies(latencies)
   }
   
   def experimentD(): Unit = {
 val latencies = new ArrayList[JLong]()
 val warmupCount = 100
 val runCount = 1000
   
 val t = new java.util.TreeMap[Long, String]()
 (1 to warmupCount).foreach { idx =>
   d(t, idx, idx.toString)
 }
   
 val t2 = new java.util.TreeMap[Long, String]()
 (1 to runCount).foreach { idx =>
   latencies.add(JLong.valueOf(d(t2, idx, idx.toString)))
 }
   
 printLatencies(latencies)
   }
   
   def printLatencies(latencies: ArrayList[JLong]): Unit = {
 val arraySize = latencies.size()
 val minIdx = 0
 val maxIdx = arraySize - 1
 val percentile50 = (arraySize * 0.5).toInt
 val percentile90 = (arraySize * 0.9).toInt
 val percentile99 = (arraySize * 0.99).toInt
 val percentile999 = (arraySize * 0.999).toInt
 val percentile = (arraySize * 0.).toInt
 val percentile9 = (arraySize * 0.9).toInt
 val percentile99 = (arraySize * 0.99).toInt
   
 java.util.Collections.sort(latencies)
   
 Seq(minIdx, percentile50, percentile90, percentile99, percentile999, 
percentile, percentile9, percentile99, maxIdx).foreach { idx =>
   printLatency(latencies, idx)
 }  
   }
   
   def printLatency(latencies: ArrayList[JLong], idx: Int): Unit = {
 println(s"$idx th : ${latencies.get(idx) / 1000} microseconds = 
${latencies.get(idx) / 100} milliseconds")
   }
   
   // experimentC()
   
   /*
   0 th : 0 microseconds = 0 milliseconds
   500 th : 0 microseconds = 0 milliseconds
   900 th : 0 microseconds = 0 milliseconds
   990 th : 0 microseconds = 0 milliseconds
   999 th : 1 microseconds = 0 milliseconds
   000 th : 9 microseconds = 0 milliseconds
   900 th : 37 microseconds = 0 milliseconds
   990 th : 223 microseconds = 0 milliseconds
   999 th : 53612 microseconds = 53 milliseconds
   */
   
   experimentD()
   
   /*
   0 th : 0 microseconds = 0 milliseconds
   500 th : 0 microseconds = 0 milliseconds
   900 th : 0 microseconds = 0 milliseconds
   990 th : 0 microseconds = 0 milliseconds
   999 th : 0 microseconds = 0 milliseconds
   000 th : 6 microseconds = 0 milliseconds
   900 th : 25 microseconds = 0 milliseconds
   990 th : 150 microseconds = 0 milliseconds
   999 th : 57887 microseconds = 57 milliseconds
   */
   ```
   
   2018 13-inch MBP, i7 quad-core 2.7Ghz
   
   ```
   ./bin/spark-shell --driver-memory 2g
   ...
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
 /_/
   
   Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_191)
   ```
   
   Still think this really matters?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571794422



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   Even without warmup (commenting out),
   
   ```
   // experimentC()
   
   ...
   990 th : 1632 microseconds = 1 milliseconds
   999 th : 60999 microseconds = 60 milliseconds
   
   // experimentD()
   
   ...
   990 th : 321 microseconds = 0 milliseconds
   999 th : 35074 microseconds = 35 milliseconds
   ```
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



AngersZh opened a new pull request #31518:
URL: https://github.com/apache/spark/pull/31518


   ### What changes were proposed in this pull request?
   Keep consistence with other `SHOW` command according to  
https://github.com/apache/spark/pull/31341#issuecomment-774613080
   
   ### Why are the changes needed?
   Keep consistence
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Not need
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang opened a new pull request #31517: [SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache

2021-02-07 Thread GitBox



LuciferYang opened a new pull request #31517:
URL: https://github.com/apache/spark/pull/31517


   ### What changes were proposed in this pull request?
   Caffeine is a high performance, near optimal caching library based on Java 
8, it is used in a similar way to guava cache, but with better performance.  
The main purpose of this pr is Use Caffeine instead of Guava Cache.
   
   
   ### Why are the changes needed?
   Use better local cache lib.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

2021-02-07 Thread GitBox



HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r571792973



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()
+
+  override def add(batchId: Long, metadata: OffsetSeq): Boolean = {
+val added = super.add(batchId, metadata)
+if (added) {
+  // cache metadata as it will be read again
+  cachedMetadata.put(batchId, metadata)
+  // we don't access metadata for (batchId - 2) batches; evict them

Review comment:
   ```
   import java.lang.{Long => JLong}
   import java.util.{ArrayList, Collections, TreeMap}
   
   def c(treeMap: TreeMap[Long, String]): Long = {
 val t1 = System.nanoTime()
 treeMap.put(1, "1")
 treeMap.put(2, "3")
 treeMap.put(3, "3")
 treeMap.headMap(2, true).clear()
 (System.nanoTime() - t1)
   }
   
   def d(treeMap: TreeMap[Long, String], idx: Long, value: String): Long = {
 val t1 = System.nanoTime()
 treeMap.put(idx, value)
 treeMap.headMap(idx - 2, true).clear()
 (System.nanoTime() - t1)
   }
   
   def experimentC(): Unit = {
 val latencies = new ArrayList[JLong]()
 val warmupCount = 100
 val runCount = 1000
   
 (1 to warmupCount).foreach { _ =>
   val t = new java.util.TreeMap[Long, String]()
   c(t)
 }
   
 (1 to runCount).foreach { _ =>
   val t = new java.util.TreeMap[Long, String]()
   latencies.add(JLong.valueOf(c(t)))
 }
   
 java.util.Collections.sort(latencies)
   
 printLatencies(latencies)
   }
   
   def experimentD(): Unit = {
 val latencies = new ArrayList[JLong]()
 val warmupCount = 100
 val runCount = 1000
   
 val t = new java.util.TreeMap[Long, String]()
 (1 to warmupCount).foreach { idx =>
   d(t, idx, idx.toString)
 }
   
 val t2 = new java.util.TreeMap[Long, String]()
 (1 to runCount).foreach { idx =>
   latencies.add(JLong.valueOf(d(t2, idx, idx.toString)))
 }
   
 printLatencies(latencies)
   }
   
   def printLatencies(latencies: ArrayList[JLong]): Unit = {
 val arraySize = latencies.size()
 val minIdx = 0
 val maxIdx = arraySize - 1
 val percentile50 = (arraySize * 0.5).toInt
 val percentile90 = (arraySize * 0.9).toInt
 val percentile99 = (arraySize * 0.99).toInt
 val percentile999 = (arraySize * 0.999).toInt
 val percentile = (arraySize * 0.).toInt
 val percentile9 = (arraySize * 0.9).toInt
 val percentile99 = (arraySize * 0.99).toInt
   
 java.util.Collections.sort(latencies)
   
 Seq(minIdx, percentile50, percentile90, percentile99, percentile999, 
percentile, percentile9, percentile99, maxIdx).foreach { idx =>
   printLatency(latencies, idx)
 }  
   }
   
   def printLatency(latencies: ArrayList[JLong], idx: Int): Unit = {
 println(s"$idx th : ${latencies.get(idx) / 1000} microseconds = 
${latencies.get(idx) / 100} milliseconds")
   }
   
   // experimentC()
   
   /*
   0 th : 0 microseconds = 0 milliseconds
   500 th : 0 microseconds = 0 milliseconds
   900 th : 0 microseconds = 0 milliseconds
   990 th : 0 microseconds = 0 milliseconds
   999 th : 1 microseconds = 0 milliseconds
   000 th : 9 microseconds = 0 milliseconds
   900 th : 37 microseconds = 0 milliseconds
   990 th : 223 microseconds = 0 milliseconds
   999 th : 53612 microseconds = 53 milliseconds
   */
   
   experimentD()
   
   /*
   0 th : 0 microseconds = 0 milliseconds
   500 th : 0 microseconds = 0 milliseconds
   900 th : 0 microseconds = 0 milliseconds
   990 th : 0 microseconds = 0 milliseconds
   999 th : 0 microseconds = 0 milliseconds
   000 th : 6 microseconds = 0 milliseconds
   900 th : 25 microseconds = 0 milliseconds
   990 th : 150 microseconds = 0 milliseconds
   999 th : 57887 microseconds = 57 milliseconds
   */
   ```
   
   2018 13-inch MBP, i7 quad-core 2.7Ghz
   
   ```
   ./bin/spark-shell --driver-memory 2g
   ...
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
 /_/
   
   Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.8.0_191)
   ```
   
   Still think this really matters?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@inf

[GitHub] [spark] AngersZhuuuu opened a new pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command

2021-02-07 Thread GitBox



AngersZh opened a new pull request #31516:
URL: https://github.com/apache/spark/pull/31516


   ### What changes were proposed in this pull request?
   Keep consistence with other `SHOW` command according to  
https://github.com/apache/spark/pull/31341#issuecomment-774613080
   
   ### Why are the changes needed?
   Keep consistence
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Not need
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

2021-02-07 Thread GitBox



SparkQA commented on pull request #31485:
URL: https://github.com/apache/spark/pull/31485#issuecomment-774878636


   **[Test build #135004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135004/testReport)**
 for PR 31485 at commit 
[`89783c1`](https://github.com/apache/spark/commit/89783c18fdfae87d398a37438975843a4f64274d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #31341: [SPARK-34238][SQL] Unify output of SHOW PARTITIONS and pass output attributes properly

2021-02-07 Thread GitBox



AngersZh commented on pull request #31341:
URL: https://github.com/apache/spark/pull/31341#issuecomment-774876516


   > @AngersZh yea I think so
   
   Yea， will raise follow up pr soon.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution

2021-02-07 Thread GitBox



Ngone51 commented on pull request #30650:
URL: https://github.com/apache/spark/pull/30650#issuecomment-774876158


   cc @mridulm @tgravescs Please take another look when you're available:)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #31341: [SPARK-34238][SQL] Unify output of SHOW PARTITIONS and pass output attributes properly

2021-02-07 Thread GitBox



cloud-fan commented on pull request #31341:
URL: https://github.com/apache/spark/pull/31341#issuecomment-774876014


   @AngersZh yea I think so



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-77487







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774872223







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType

2021-02-07 Thread GitBox



AmplabJenkins removed a comment on pull request #31491:
URL: https://github.com/apache/spark/pull/31491#issuecomment-774872221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39580/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774872223







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31491:
URL: https://github.com/apache/spark/pull/31491#issuecomment-774872221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39580/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



AmplabJenkins commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-77487







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

2021-02-07 Thread GitBox



SparkQA commented on pull request #31487:
URL: https://github.com/apache/spark/pull/31487#issuecomment-774869229


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39585/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map

2021-02-07 Thread GitBox



LuciferYang commented on a change in pull request #31484:
URL: https://github.com/apache/spark/pull/31484#discussion_r571782459



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
##
@@ -406,7 +406,7 @@ object PreprocessTableInsertion extends Rule[LogicalPlan] {
   catalogTable.get.tracksPartitionsInCatalog
 if (partitionsTrackedByCatalog && normalizedPartSpec.nonEmpty) {
   // empty partition column value
-  if (normalizedPartSpec.map(_._2)
+  if (normalizedPartSpec.values
   .filter(_.isDefined).map(_.get).exists(v => v != null && v.isEmpty)) 
{

Review comment:
   7eac600 fix this





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function

2021-02-07 Thread GitBox



SparkQA removed a comment on pull request #31504:
URL: https://github.com/apache/spark/pull/31504#issuecomment-774828953


   **[Test build #134996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134996/testReport)**
 for PR 31504 at commit 
[`3f42e91`](https://github.com/apache/spark/commit/3f42e9145b4b7452d7263d8d4ecf4646c8a51886).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 379 matches

Mail list logo