[GitHub] [spark] linhongliu-db commented on pull request #30363: [SPARK-33438][SQL] Eagerly init objects with defined SQL Confs for command `set -v`
linhongliu-db commented on pull request #30363: URL: https://github.com/apache/spark/pull/30363#issuecomment-774948026 cc @viirya @maropu @HyukjinKwon, this PR is updated based on discussion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers
HyukjinKwon commented on pull request #31384: URL: https://github.com/apache/spark/pull/31384#issuecomment-774947870 @gaborgsomogyi is there anybody who you know are used to JDBC and Kerberos and can review? Looks fine but to be honest I am not very used to this area, and don't have an env to test either. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType
SparkQA removed a comment on pull request #31491: URL: https://github.com/apache/spark/pull/31491#issuecomment-774841886 **[Test build #134997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134997/testReport)** for PR 31491 at commit [`899706f`](https://github.com/apache/spark/commit/899706f5d89f29c4c4d93db92179da081f5bb10d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType
SparkQA commented on pull request #31491: URL: https://github.com/apache/spark/pull/31491#issuecomment-774947449 **[Test build #134997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134997/testReport)** for PR 31491 at commit [`899706f`](https://github.com/apache/spark/commit/899706f5d89f29c4c4d93db92179da081f5bb10d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command
SparkQA commented on pull request #31516: URL: https://github.com/apache/spark/pull/31516#issuecomment-774946282 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39590/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API
cloud-fan commented on pull request #24559: URL: https://github.com/apache/spark/pull/24559#issuecomment-774945050 @rdblue Thanks for writing up the design doc! This is a very important and useful feature, and the `UnboundFunction` seems like a very interesting idea. It allows function overload (for different input schema, people can return different `BoundFunction`), but I'm wondering how it can suggest Spark to add Cast. For example, if a function accepts int type input, but the actual input is byte type. Another point is we should think of the final generated java code when invoking UDF. With whole-stage-codegen (the default case), the input values are actually java variables in the generated java code. It means we need to build an `InternalRow` before invoking the new UDF, which is very inefficient and is even worse than the current Spark Scala/Java UDF. Also, the type parameter of the return type has perf issues because of primitive type boxing. My rough idea is ``` interface ScalarFunction { StructType[] expectedInputTypes(); DataType returnType(); } class MyScalaFunction implements ScalarFunction { StructType[] expectedInputTypes() { // ... allows int and string } DataType returnType() { return IntegerType; } int call(int arg) { return String.valueOf(arg).length(); } int call(UTF8String arg) { return arg.length(); } } ``` The analyzer will bind the UDF with actual input types (add implicit cast if needed), and check if the `call` method exits for certain input/return types via reflection. Then in whole-stage-codegen, we just call the `call` method with certain type of inputs, and assign the result to a java variable. No need to build `InternalRow`, no boxing overhead, but no compile-time type safety (analyzer can still catch errors). cc @viirya @maropu @kiszk @rednaxelafx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
Ngone51 commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571835445 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: Thanks for bringing the details, it makes sense to me. But please also note that I wouldn't raise such optimization changes if the `TreeMap` is already an existing implementation. However, for a PR, I think it's good to have more inputs regardless of the final decision. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job
HeartSaVioR closed pull request #31471: URL: https://github.com/apache/spark/pull/31471 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job
SparkQA removed a comment on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-774838774 **[Test build #134999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134999/testReport)** for PR 31471 at commit [`9d6eec7`](https://github.com/apache/spark/commit/9d6eec760927d7ae01c7a4b0f0fb6457df80ce6f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job
HeartSaVioR commented on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-774942511 Thanks! Merging to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31471: [SPARK-34355][SQL] Add log and time cost for commit job
SparkQA commented on pull request #31471: URL: https://github.com/apache/spark/pull/31471#issuecomment-774942341 **[Test build #134999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134999/testReport)** for PR 31471 at commit [`9d6eec7`](https://github.com/apache/spark/commit/9d6eec760927d7ae01c7a4b0f0fb6457df80ce6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate
ulysses-you commented on a change in pull request #31509: URL: https://github.com/apache/spark/pull/31509#discussion_r571833216 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ## @@ -269,3 +269,62 @@ case class TypeOf(child: Expression) extends UnaryExpression { defineCodeGen(ctx, ev, _ => s"""UTF8String.fromString(${child.dataType.catalogString})""") } } + +@ExpressionDescription( + usage = """_FUNC_(expr) - Execute all children and return the last child result.""", + examples = """ +Examples: + > SELECT _FUNC_(1, 2); + 2 + > SELECT _FUNC_(1 + 2, 3 + 4); + 7 + """, + since = "3.2.0", + group = "misc_funcs") +case class DelegateFunction(children: Seq[Expression]) extends Expression { + require(children.nonEmpty, s"$prettyName function requires children is not empty.") + + private lazy val lastChild = children.last + + override lazy val deterministic: Boolean = children.forall(_.deterministic) + override lazy val resolved: Boolean = children.forall(_.resolved) + override def foldable: Boolean = children.forall(_.foldable) + override def nullable: Boolean = lastChild.nullable + override def dataType: DataType = lastChild.dataType + + override def eval(input: InternalRow): Any = { +var result: Any = null +children.foreach { child => + result = child.eval(input) +} +result Review comment: Not sure what do you mean `same child` ? This function just execute child one by one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571829604 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: This is another sort of micro-optimization; realistic latency of micro-batch is 1s+ (doesn't matter if we consider very tight micro-batch, like 500ms) and we worry about creating "an" object per such period which will be marked as "unused" after couple of batches. This is the clear example why micro-optimization is bad without understanding full context - optimization should evaluate about the impact and proceed only when it contributes at least 1% (I'd rather not even concern about 1% though if the sub-optimal code is more intuitive). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571832414 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala ## @@ -239,18 +239,35 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](sparkSession: SparkSession, path: .reverse } + private var lastPurgedBatchId: Long = -1L + /** * Removes all the log entry earlier than thresholdBatchId (exclusive). */ override def purge(thresholdBatchId: Long): Unit = { -val batchIds = fileManager.list(metadataPath, batchFilesFilter) - .map(f => pathToBatchId(f.getPath)) - -for (batchId <- batchIds if batchId < thresholdBatchId) { - val path = batchIdToPath(batchId) - fileManager.delete(path) - logTrace(s"Removed metadata log file: $path") +val possibleTargetBatchIds = (lastPurgedBatchId + 1 until thresholdBatchId) +if (possibleTargetBatchIds.length <= 3) { + // avoid using list if we only need to purge at most 3 elements + possibleTargetBatchIds.foreach { batchId => +val path = batchIdToPath(batchId) +if (fileManager.exists(path)) { Review comment: (Just wanted to mention; the case what I'm considering is when the file doesn't exist - then the case would be changed to exist vs delete. I'm going to evaluate this because I don't know the cost comparison between exist and delete on non-exist - if the cost difference is significant and exist is faster, it's going to be some sort of probability/heuristic. If not, we should simply try calling delete.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on pull request #31384: [SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers
gaborgsomogyi commented on pull request #31384: URL: https://github.com/apache/spark/pull/31384#issuecomment-774940510 Are there anything I can add/fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly
SparkQA commented on pull request #31508: URL: https://github.com/apache/spark/pull/31508#issuecomment-774939013 **[Test build #135012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135012/testReport)** for PR 31508 at commit [`d964a05`](https://github.com/apache/spark/commit/d964a059a4882cecddc3dbe2d4343cbf6298ff44). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571829604 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: This is another sort of micro-optimization; realistic latency of micro-batch is 1s+ (doesn't matter if we consider very tight micro-batch, like 500ms) and we worry about creating "an" object per such period which will be marked as "unused" after couple of batches. This is the clear example why micro-optimization is bad without understanding full context - optimization should evaluate about the impact and proceed only when it contributes at least 1% (I'd rather not even concern about 1% though if the code is more intuitive). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31519: [SPARK-34394][SQL] Unify output of SHOW FUNCTIONS and pass output attributes properly
SparkQA commented on pull request #31519: URL: https://github.com/apache/spark/pull/31519#issuecomment-774937492 **[Test build #135010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135010/testReport)** for PR 31519 at commit [`a980eb4`](https://github.com/apache/spark/commit/a980eb417e7ecfd0569129c6809450d762e0bdb5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command
SparkQA commented on pull request #31518: URL: https://github.com/apache/spark/pull/31518#issuecomment-774937178 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39588/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system
HyukjinKwon commented on a change in pull request #31466: URL: https://github.com/apache/spark/pull/31466#discussion_r571829295 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper // Filter out test files with invalid extensions such as temp files created // by vi (.swp), Mac (.DS_Store) etc. val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions)) -filteredFiles ++ dirs.flatMap(listFilesRecursively) +val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively) +// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` command, +// here we need to check command available +if (TestUtils.testCommandAvailable("/bin/bash")) { + allFiles +} else { + allFiles.filterNot(_.getName == "transform.sql") Review comment: `TestUtils.testCommandAvailable("/bin/bash")` won't be executed via short circuiting This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system
beliefer commented on a change in pull request #31466: URL: https://github.com/apache/spark/pull/31466#discussion_r571828443 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper // Filter out test files with invalid extensions such as temp files created // by vi (.swp), Mac (.DS_Store) etc. val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions)) -filteredFiles ++ dirs.flatMap(listFilesRecursively) +val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively) +// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` command, +// here we need to check command available +if (TestUtils.testCommandAvailable("/bin/bash")) { + allFiles +} else { + allFiles.filterNot(_.getName == "transform.sql") Review comment: If so, `SQLQueryTestSuite` will execute `TestUtils.testCommandAvailable("/bin/bash")` many times. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
AmplabJenkins removed a comment on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774935418 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39591/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
AmplabJenkins commented on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774935418 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39591/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
SparkQA commented on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774935398 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39591/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
AmplabJenkins removed a comment on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774929003 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39592/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats
AmplabJenkins removed a comment on pull request #31485: URL: https://github.com/apache/spark/pull/31485#issuecomment-774929004 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39587/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774931027 **[Test build #135011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135011/testReport)** for PR 31517 at commit [`4b49b84`](https://github.com/apache/spark/commit/4b49b84e0c038d286ca09039e774815f4aea7296). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31369: [SPARK-34270][SS] Combine StateStoreMetrics should not override StateStoreCustomMetric
Ngone51 commented on pull request #31369: URL: https://github.com/apache/spark/pull/31369#issuecomment-774929505 Hopefully, callers in the future could be always aware that `StateStoreMetrics` is combined by `StateStoreCustomMetric` rather than `StateStoreCustomMetric.name`. Otherwise, usages like ```scala combinedMetrics.customMetrics.foreach { case (metric, value) => longMetric(metric.name) = value ^ "=" instead of "+=" } ``` could result in the wrong metrics again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats
AmplabJenkins commented on pull request #31485: URL: https://github.com/apache/spark/pull/31485#issuecomment-774929004 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39587/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
AmplabJenkins commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774929003 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39592/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command
SparkQA commented on pull request #31516: URL: https://github.com/apache/spark/pull/31516#issuecomment-774928806 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39590/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system
HyukjinKwon commented on a change in pull request #31466: URL: https://github.com/apache/spark/pull/31466#discussion_r571820552 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper // Filter out test files with invalid extensions such as temp files created // by vi (.swp), Mac (.DS_Store) etc. val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions)) -filteredFiles ++ dirs.flatMap(listFilesRecursively) +val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively) +// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` command, +// here we need to check command available +if (TestUtils.testCommandAvailable("/bin/bash")) { + allFiles +} else { + allFiles.filterNot(_.getName == "transform.sql") Review comment: I meant the fix such as: ```scala assume( !testCase.inputFile.endsWith("transform.sql") || TestUtils.testCommandAvailable("/bin/bash")) ``` I tested that it only skips transform.sql when `/bin/bash` is not available: ``` [info] - transform.sql !!! CANCELED !!! (36 milliseconds) [info] "/.../spark/sql/core/src/test/resources/sql-tests/inputs/transform.sql" ended with "transform.sql", and org.apache.spark.TestUtils.testCommandAvailable("/bin/bas") was false (SQLQueryTestSuite.scala:265) [info] org.scalatest.exceptions.TestCanceledException: [info] at org.scalatest.Assertions.newTestCanceledException(Assertions.scala:475) [info] at org.scalatest.Assertions.newTestCanceledException$(Assertions.scala:474) [info] at org.scalatest.Assertions$.newTestCanceledException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssume(Assertions.scala:1310) [info] at org.apache.spark.sql.SQLQueryTestSuite.runTest(SQLQueryTestSuite.scala:265) [info] at org.apache.spark.sql.SQLQueryTestSuite.$anonfun$createScalaTestCase$5(SQLQueryTestSuite.scala:247) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ben-manes commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
ben-manes commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r571820166 ## File path: core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala ## @@ -58,24 +58,26 @@ private[history] class ApplicationCache( } - private val removalListener = new RemovalListener[CacheKey, CacheEntry] { + private val cacheWriter = new CacheWriter[CacheKey, CacheEntry] { Review comment: fyi, `CacheWriter` will be deprecated and replaced in 2.9, and removed in 3.0. Instead `Caffeine.evictionListener(RemovalListener)` will provide the sync remove behavior, and any other atomic writes can be captured manually via `asMap().compute` methods. Should be minor change for you when 2.9 is released. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #31519: [SPARK-34394][SQL] Unify output of SHOW FUNCTIONS and pass output attributes properly
beliefer opened a new pull request #31519: URL: https://github.com/apache/spark/pull/31519 ### What changes were proposed in this pull request? The current implement of some DDL not unify the output and not pass the output properly to physical command. Such as: The output attributes of `ShowFunctions` does't pass to `ShowFunctionsCommand` properly. As the query plan, this PR pass the output attributes from `ShowFunctions` to `ShowFunctionsCommand`. ### Why are the changes needed? This PR pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command
SparkQA commented on pull request #31518: URL: https://github.com/apache/spark/pull/31518#issuecomment-774922915 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39588/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
Ngone51 commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571818019 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala ## @@ -239,18 +239,35 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](sparkSession: SparkSession, path: .reverse } + private var lastPurgedBatchId: Long = -1L + /** * Removes all the log entry earlier than thresholdBatchId (exclusive). */ override def purge(thresholdBatchId: Long): Unit = { -val batchIds = fileManager.list(metadataPath, batchFilesFilter) - .map(f => pathToBatchId(f.getPath)) - -for (batchId <- batchIds if batchId < thresholdBatchId) { - val path = batchIdToPath(batchId) - fileManager.delete(path) - logTrace(s"Removed metadata log file: $path") +val possibleTargetBatchIds = (lastPurgedBatchId + 1 until thresholdBatchId) +if (possibleTargetBatchIds.length <= 3) { + // avoid using list if we only need to purge at most 3 elements + possibleTargetBatchIds.foreach { batchId => +val path = batchIdToPath(batchId) +if (fileManager.exists(path)) { Review comment: Sure, please. I'm fine either way unless there's a noticeable perf difference. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
Ngone51 commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571817596 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: Ok..I saw where the problem is with my test. You're right the latency is trivial. I'm not against your solution here. But since we've reached here, I'd like to mention one more thing that TreeMap tends to produce the instant object `AscendingSubMap` for each batch while Array doesn't. Although, It might also be trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
SparkQA commented on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774921702 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39591/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] pan3793 commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
pan3793 commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r571813479 ## File path: core/pom.xml ## @@ -47,6 +47,14 @@ com.google.guava guava + + com.github.ben-manes.caffeine + caffeine + + + com.github.ben-manes.caffeine Review comment: redundant space This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #31466: [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system
beliefer commented on a change in pull request #31466: URL: https://github.com/apache/spark/pull/31466#discussion_r571811985 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -566,7 +563,14 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper // Filter out test files with invalid extensions such as temp files created // by vi (.swp), Mac (.DS_Store) etc. val filteredFiles = files.filter(_.getName.endsWith(validFileExtensions)) -filteredFiles ++ dirs.flatMap(listFilesRecursively) +val allFiles = filteredFiles ++ dirs.flatMap(listFilesRecursively) +// SPARK-32106 Since we add SQL test 'transform.sql' will use `cat` command, +// here we need to check command available +if (TestUtils.testCommandAvailable("/bin/bash")) { + allFiles +} else { + allFiles.filterNot(_.getName == "transform.sql") Review comment: `SQLQueryTestSuite` must contains `transform.sql`. Why we need to judge `test name contains transform`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly
beliefer commented on pull request #31508: URL: https://github.com/apache/spark/pull/31508#issuecomment-774912164 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats
SparkQA commented on pull request #31485: URL: https://github.com/apache/spark/pull/31485#issuecomment-774910167 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39587/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command
SparkQA commented on pull request #31516: URL: https://github.com/apache/spark/pull/31516#issuecomment-774906595 **[Test build #135007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135007/testReport)** for PR 31516 at commit [`43a6c5d`](https://github.com/apache/spark/commit/43a6c5d65e5288f8b626581ccf7f13649f7f7fc1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command
SparkQA commented on pull request #31518: URL: https://github.com/apache/spark/pull/31518#issuecomment-774906642 **[Test build #135005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135005/testReport)** for PR 31518 at commit [`12e569b`](https://github.com/apache/spark/commit/12e569be80f3bb03daac2dfa15b507572cafbaaa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774906502 **[Test build #135006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135006/testReport)** for PR 31517 at commit [`0c5382a`](https://github.com/apache/spark/commit/0c5382af0a54c5db8cf9ffee6a7a5040be5cb1c7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
LuciferYang commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774905437 thx ~ @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
AmplabJenkins removed a comment on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774903911 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39589/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
AmplabJenkins commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774903911 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39589/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
SparkQA commented on pull request #31517: URL: https://github.com/apache/spark/pull/31517#issuecomment-774903659 **[Test build #135009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135009/testReport)** for PR 31517 at commit [`4761a5b`](https://github.com/apache/spark/commit/4761a5b24637020028f71387e8fecbd4c4f67ba1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31378: [SPARK-34240][SQL] Unify output of `SHOW TBLPROPERTIES` clause's output attribute's schema and ExprID
AngersZh commented on pull request #31378: URL: https://github.com/apache/spark/pull/31378#issuecomment-774901974 ping @cloud-fan Any more need update? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly
AmplabJenkins removed a comment on pull request #31508: URL: https://github.com/apache/spark/pull/31508#issuecomment-774899900 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134995/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats
SparkQA commented on pull request #31485: URL: https://github.com/apache/spark/pull/31485#issuecomment-774901374 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39587/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map
AmplabJenkins removed a comment on pull request #31484: URL: https://github.com/apache/spark/pull/31484#issuecomment-774899153 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39586/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly
SparkQA removed a comment on pull request #31508: URL: https://github.com/apache/spark/pull/31508#issuecomment-774823100 **[Test build #134995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134995/testReport)** for PR 31508 at commit [`d964a05`](https://github.com/apache/spark/commit/d964a059a4882cecddc3dbe2d4343cbf6298ff44). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner
AmplabJenkins removed a comment on pull request #31480: URL: https://github.com/apache/spark/pull/31480#issuecomment-774899148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
AmplabJenkins removed a comment on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-774899151 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135000/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
AmplabJenkins removed a comment on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774899147 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31509: [SPARK-34396][SQL] Add a new build-in function delegate
AngersZh commented on a change in pull request #31509: URL: https://github.com/apache/spark/pull/31509#discussion_r571802245 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala ## @@ -269,3 +269,62 @@ case class TypeOf(child: Expression) extends UnaryExpression { defineCodeGen(ctx, ev, _ => s"""UTF8String.fromString(${child.dataType.catalogString})""") } } + +@ExpressionDescription( + usage = """_FUNC_(expr) - Execute all children and return the last child result.""", + examples = """ +Examples: + > SELECT _FUNC_(1, 2); + 2 + > SELECT _FUNC_(1 + 2, 3 + 4); + 7 + """, + since = "3.2.0", + group = "misc_funcs") +case class DelegateFunction(children: Seq[Expression]) extends Expression { + require(children.nonEmpty, s"$prettyName function requires children is not empty.") + + private lazy val lastChild = children.last + + override lazy val deterministic: Boolean = children.forall(_.deterministic) + override lazy val resolved: Boolean = children.forall(_.resolved) + override def foldable: Boolean = children.forall(_.foldable) + override def nullable: Boolean = lastChild.nullable + override def dataType: DataType = lastChild.dataType + + override def eval(input: InternalRow): Any = { +var result: Any = null +children.foreach { child => + result = child.eval(input) +} +result Review comment: Hmmm how about add a result map and avoid re-compute same child? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly
AmplabJenkins commented on pull request #31508: URL: https://github.com/apache/spark/pull/31508#issuecomment-774899900 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134995/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
SparkQA commented on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774899751 **[Test build #135008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135008/testReport)** for PR 31504 at commit [`5e32ffd`](https://github.com/apache/spark/commit/5e32ffd3b10ed1d4e349cb0b972296ac7bd5b0fe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner
AmplabJenkins commented on pull request #31480: URL: https://github.com/apache/spark/pull/31480#issuecomment-774899148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
AmplabJenkins commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774899147 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map
AmplabJenkins commented on pull request #31484: URL: https://github.com/apache/spark/pull/31484#issuecomment-774899153 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39586/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31508: [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly
SparkQA commented on pull request #31508: URL: https://github.com/apache/spark/pull/31508#issuecomment-774899206 **[Test build #134995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134995/testReport)** for PR 31508 at commit [`d964a05`](https://github.com/apache/spark/commit/d964a059a4882cecddc3dbe2d4343cbf6298ff44). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
AmplabJenkins commented on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-774899151 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135000/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
HyukjinKwon closed pull request #31487: URL: https://github.com/apache/spark/pull/31487 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
HyukjinKwon commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774897760 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
SparkQA removed a comment on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-774842281 **[Test build #135000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135000/testReport)** for PR 31419 at commit [`010413e`](https://github.com/apache/spark/commit/010413ee49728b5ed537636aef520f024e12ec09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31419: [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types
SparkQA commented on pull request #31419: URL: https://github.com/apache/spark/pull/31419#issuecomment-774895493 **[Test build #135000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135000/testReport)** for PR 31419 at commit [`010413e`](https://github.com/apache/spark/commit/010413ee49728b5ed537636aef520f024e12ec09). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AlterTableSetLocation(` * `case class AlterTableSetProperties(` * `case class AlterTableUnsetProperties(` * ` implicit class MetadataColumnHelper(attr: Attribute) ` * `class ResolveSessionCatalog(val catalogManager: CatalogManager)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner
SparkQA removed a comment on pull request #31480: URL: https://github.com/apache/spark/pull/31480#issuecomment-774842883 **[Test build #134998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)** for PR 31480 at commit [`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner
SparkQA commented on pull request #31480: URL: https://github.com/apache/spark/pull/31480#issuecomment-774893887 **[Test build #134998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134998/testReport)** for PR 31480 at commit [`ab948f7`](https://github.com/apache/spark/commit/ab948f732ce95b5f409696d7c182c016c2b1bf61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
SparkQA removed a comment on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774844717 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
SparkQA commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774892805 **[Test build #135001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135001/testReport)** for PR 31487 at commit [`0ecf1a2`](https://github.com/apache/spark/commit/0ecf1a223488eac6d293a656978e2c85fa00). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` implicit class MetadataColumnHelper(attr: Attribute) ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
SparkQA commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774892695 **[Test build #135002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135002/testReport)** for PR 31487 at commit [`a8ebb43`](https://github.com/apache/spark/commit/a8ebb4326f3ec92d7eee87dc72f4eb806a1e8c7c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571798020 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala ## @@ -239,18 +239,35 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](sparkSession: SparkSession, path: .reverse } + private var lastPurgedBatchId: Long = -1L + /** * Removes all the log entry earlier than thresholdBatchId (exclusive). */ override def purge(thresholdBatchId: Long): Unit = { -val batchIds = fileManager.list(metadataPath, batchFilesFilter) - .map(f => pathToBatchId(f.getPath)) - -for (batchId <- batchIds if batchId < thresholdBatchId) { - val path = batchIdToPath(batchId) - fileManager.delete(path) - logTrace(s"Removed metadata log file: $path") +val possibleTargetBatchIds = (lastPurgedBatchId + 1 until thresholdBatchId) +if (possibleTargetBatchIds.length <= 3) { + // avoid using list if we only need to purge at most 3 elements + possibleTargetBatchIds.foreach { batchId => +val path = batchIdToPath(batchId) +if (fileManager.exists(path)) { Review comment: Yeah that also makes sense. I'm not sure about how much the cost would be saved though. Let me play with this a bit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #31503: [SPARK-34391][BUILD] Upgrade commons-io to 2.8.0
dongjoon-hyun closed pull request #31503: URL: https://github.com/apache/spark/pull/31503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
LuciferYang commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r571795620 ## File path: core/src/test/scala/org/apache/spark/deploy/history/ApplicationCacheSuite.scala ## @@ -192,6 +192,7 @@ class ApplicationCacheSuite extends SparkFunSuite with Logging with MockitoSugar cache.get("2") cache.get("3") +Thread.sleep(5L) Review comment: wait data eviction This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31503: [SPARK-34391][BUILD] Upgrade commons-io to 2.8.0
dongjoon-hyun commented on pull request #31503: URL: https://github.com/apache/spark/pull/31503#issuecomment-774888294 Thank you, @srowen ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31515: [SPARK-34346][CORE][TESTS][FOLLOWUP] Fix UT by removing core-site.xml
dongjoon-hyun commented on pull request #31515: URL: https://github.com/apache/spark/pull/31515#issuecomment-774888033 Thank you, @srowen , @yaooqinn , @HyukjinKwon . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31480: [SPARK-32384][CORE] repartitionAndSortWithinPartitions avoid shuffle with same partitioner
SparkQA commented on pull request #31480: URL: https://github.com/apache/spark/pull/31480#issuecomment-774887363 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39581/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #31517: [WIP][SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
LuciferYang commented on a change in pull request #31517: URL: https://github.com/apache/spark/pull/31517#discussion_r571795094 ## File path: core/src/main/scala/org/apache/spark/deploy/history/ApplicationCache.scala ## @@ -58,24 +58,26 @@ private[history] class ApplicationCache( } - private val removalListener = new RemovalListener[CacheKey, CacheEntry] { + private val cacheWriter = new CacheWriter[CacheKey, CacheEntry] { Review comment: `CacheWriter ` adopts sync remove behavior similar to guava and `RemovalListener ` always Asynchronous This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571792973 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: https://gist.github.com/HeartSaVioR/111ed75aa2dc4672e36968c02db83e26 ``` import java.lang.{Long => JLong} import java.util.{ArrayList, Collections, TreeMap} def c(treeMap: TreeMap[Long, String]): Long = { val t1 = System.nanoTime() treeMap.put(1, "1") treeMap.put(2, "3") treeMap.put(3, "3") treeMap.headMap(2, true).clear() (System.nanoTime() - t1) } def d(treeMap: TreeMap[Long, String], idx: Long, value: String): Long = { val t1 = System.nanoTime() treeMap.put(idx, value) treeMap.headMap(idx - 2, true).clear() (System.nanoTime() - t1) } def experimentC(): Unit = { val latencies = new ArrayList[JLong]() val warmupCount = 100 val runCount = 1000 (1 to warmupCount).foreach { _ => val t = new java.util.TreeMap[Long, String]() c(t) } (1 to runCount).foreach { _ => val t = new java.util.TreeMap[Long, String]() latencies.add(JLong.valueOf(c(t))) } java.util.Collections.sort(latencies) printLatencies(latencies) } def experimentD(): Unit = { val latencies = new ArrayList[JLong]() val warmupCount = 100 val runCount = 1000 val t = new java.util.TreeMap[Long, String]() (1 to warmupCount).foreach { idx => d(t, idx, idx.toString) } val t2 = new java.util.TreeMap[Long, String]() (1 to runCount).foreach { idx => latencies.add(JLong.valueOf(d(t2, idx, idx.toString))) } printLatencies(latencies) } def printLatencies(latencies: ArrayList[JLong]): Unit = { val arraySize = latencies.size() val minIdx = 0 val maxIdx = arraySize - 1 val percentile50 = (arraySize * 0.5).toInt val percentile90 = (arraySize * 0.9).toInt val percentile99 = (arraySize * 0.99).toInt val percentile999 = (arraySize * 0.999).toInt val percentile = (arraySize * 0.).toInt val percentile9 = (arraySize * 0.9).toInt val percentile99 = (arraySize * 0.99).toInt java.util.Collections.sort(latencies) Seq(minIdx, percentile50, percentile90, percentile99, percentile999, percentile, percentile9, percentile99, maxIdx).foreach { idx => printLatency(latencies, idx) } } def printLatency(latencies: ArrayList[JLong], idx: Int): Unit = { println(s"$idx th : ${latencies.get(idx) / 1000} microseconds = ${latencies.get(idx) / 100} milliseconds") } // experimentC() /* 0 th : 0 microseconds = 0 milliseconds 500 th : 0 microseconds = 0 milliseconds 900 th : 0 microseconds = 0 milliseconds 990 th : 0 microseconds = 0 milliseconds 999 th : 1 microseconds = 0 milliseconds 000 th : 9 microseconds = 0 milliseconds 900 th : 37 microseconds = 0 milliseconds 990 th : 223 microseconds = 0 milliseconds 999 th : 53612 microseconds = 53 milliseconds */ experimentD() /* 0 th : 0 microseconds = 0 milliseconds 500 th : 0 microseconds = 0 milliseconds 900 th : 0 microseconds = 0 milliseconds 990 th : 0 microseconds = 0 milliseconds 999 th : 0 microseconds = 0 milliseconds 000 th : 6 microseconds = 0 milliseconds 900 th : 25 microseconds = 0 milliseconds 990 th : 150 microseconds = 0 milliseconds 999 th : 57887 microseconds = 57 milliseconds */ ``` 2018 13-inch MBP, i7 quad-core 2.7Ghz ``` ./bin/spark-shell --driver-memory 2g ... Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.1 /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191) ``` Still think this really matters? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571794422 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: Even without warmup (commenting out), ``` // experimentC() ... 990 th : 1632 microseconds = 1 milliseconds 999 th : 60999 microseconds = 60 milliseconds // experimentD() ... 990 th : 321 microseconds = 0 milliseconds 999 th : 35074 microseconds = 35 milliseconds ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #31518: [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command
AngersZh opened a new pull request #31518: URL: https://github.com/apache/spark/pull/31518 ### What changes were proposed in this pull request? Keep consistence with other `SHOW` command according to https://github.com/apache/spark/pull/31341#issuecomment-774613080 ### Why are the changes needed? Keep consistence ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request #31517: [SPARK-34309][CORE][SQL] Use Caffeine instead of Guava Cache
LuciferYang opened a new pull request #31517: URL: https://github.com/apache/spark/pull/31517 ### What changes were proposed in this pull request? Caffeine is a high performance, near optimal caching library based on Java 8, it is used in a similar way to guava cache, but with better performance. The main purpose of this pr is Use Caffeine instead of Guava Cache. ### Why are the changes needed? Use better local cache lib. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations
HeartSaVioR commented on a change in pull request #31495: URL: https://github.com/apache/spark/pull/31495#discussion_r571792973 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala ## @@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset => OffsetV2} class OffsetSeqLog(sparkSession: SparkSession, path: String) extends HDFSMetadataLog[OffsetSeq](sparkSession, path) { + private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]() + + override def add(batchId: Long, metadata: OffsetSeq): Boolean = { +val added = super.add(batchId, metadata) +if (added) { + // cache metadata as it will be read again + cachedMetadata.put(batchId, metadata) + // we don't access metadata for (batchId - 2) batches; evict them Review comment: ``` import java.lang.{Long => JLong} import java.util.{ArrayList, Collections, TreeMap} def c(treeMap: TreeMap[Long, String]): Long = { val t1 = System.nanoTime() treeMap.put(1, "1") treeMap.put(2, "3") treeMap.put(3, "3") treeMap.headMap(2, true).clear() (System.nanoTime() - t1) } def d(treeMap: TreeMap[Long, String], idx: Long, value: String): Long = { val t1 = System.nanoTime() treeMap.put(idx, value) treeMap.headMap(idx - 2, true).clear() (System.nanoTime() - t1) } def experimentC(): Unit = { val latencies = new ArrayList[JLong]() val warmupCount = 100 val runCount = 1000 (1 to warmupCount).foreach { _ => val t = new java.util.TreeMap[Long, String]() c(t) } (1 to runCount).foreach { _ => val t = new java.util.TreeMap[Long, String]() latencies.add(JLong.valueOf(c(t))) } java.util.Collections.sort(latencies) printLatencies(latencies) } def experimentD(): Unit = { val latencies = new ArrayList[JLong]() val warmupCount = 100 val runCount = 1000 val t = new java.util.TreeMap[Long, String]() (1 to warmupCount).foreach { idx => d(t, idx, idx.toString) } val t2 = new java.util.TreeMap[Long, String]() (1 to runCount).foreach { idx => latencies.add(JLong.valueOf(d(t2, idx, idx.toString))) } printLatencies(latencies) } def printLatencies(latencies: ArrayList[JLong]): Unit = { val arraySize = latencies.size() val minIdx = 0 val maxIdx = arraySize - 1 val percentile50 = (arraySize * 0.5).toInt val percentile90 = (arraySize * 0.9).toInt val percentile99 = (arraySize * 0.99).toInt val percentile999 = (arraySize * 0.999).toInt val percentile = (arraySize * 0.).toInt val percentile9 = (arraySize * 0.9).toInt val percentile99 = (arraySize * 0.99).toInt java.util.Collections.sort(latencies) Seq(minIdx, percentile50, percentile90, percentile99, percentile999, percentile, percentile9, percentile99, maxIdx).foreach { idx => printLatency(latencies, idx) } } def printLatency(latencies: ArrayList[JLong], idx: Int): Unit = { println(s"$idx th : ${latencies.get(idx) / 1000} microseconds = ${latencies.get(idx) / 100} milliseconds") } // experimentC() /* 0 th : 0 microseconds = 0 milliseconds 500 th : 0 microseconds = 0 milliseconds 900 th : 0 microseconds = 0 milliseconds 990 th : 0 microseconds = 0 milliseconds 999 th : 1 microseconds = 0 milliseconds 000 th : 9 microseconds = 0 milliseconds 900 th : 37 microseconds = 0 milliseconds 990 th : 223 microseconds = 0 milliseconds 999 th : 53612 microseconds = 53 milliseconds */ experimentD() /* 0 th : 0 microseconds = 0 milliseconds 500 th : 0 microseconds = 0 milliseconds 900 th : 0 microseconds = 0 milliseconds 990 th : 0 microseconds = 0 milliseconds 999 th : 0 microseconds = 0 milliseconds 000 th : 6 microseconds = 0 milliseconds 900 th : 25 microseconds = 0 milliseconds 990 th : 150 microseconds = 0 milliseconds 999 th : 57887 microseconds = 57 milliseconds */ ``` 2018 13-inch MBP, i7 quad-core 2.7Ghz ``` ./bin/spark-shell --driver-memory 2g ... Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.1 /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191) ``` Still think this really matters? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@inf
[GitHub] [spark] AngersZhuuuu opened a new pull request #31516: [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command
AngersZh opened a new pull request #31516: URL: https://github.com/apache/spark/pull/31516 ### What changes were proposed in this pull request? Keep consistence with other `SHOW` command according to https://github.com/apache/spark/pull/31341#issuecomment-774613080 ### Why are the changes needed? Keep consistence ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31485: [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats
SparkQA commented on pull request #31485: URL: https://github.com/apache/spark/pull/31485#issuecomment-774878636 **[Test build #135004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135004/testReport)** for PR 31485 at commit [`89783c1`](https://github.com/apache/spark/commit/89783c18fdfae87d398a37438975843a4f64274d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #31341: [SPARK-34238][SQL] Unify output of SHOW PARTITIONS and pass output attributes properly
AngersZh commented on pull request #31341: URL: https://github.com/apache/spark/pull/31341#issuecomment-774876516 > @AngersZh yea I think so Yea, will raise follow up pr soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #30650: [SPARK-24818][CORE] Support delay scheduling for barrier execution
Ngone51 commented on pull request #30650: URL: https://github.com/apache/spark/pull/30650#issuecomment-774876158 cc @mridulm @tgravescs Please take another look when you're available:) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31341: [SPARK-34238][SQL] Unify output of SHOW PARTITIONS and pass output attributes properly
cloud-fan commented on pull request #31341: URL: https://github.com/apache/spark/pull/31341#issuecomment-774876014 @AngersZh yea I think so This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
AmplabJenkins removed a comment on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-77487 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
AmplabJenkins removed a comment on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774872223 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType
AmplabJenkins removed a comment on pull request #31491: URL: https://github.com/apache/spark/pull/31491#issuecomment-774872221 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39580/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
AmplabJenkins commented on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774872223 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31491: [SPARK-34379][SQL] Map JDBC RowID to StringType rather than LongType
AmplabJenkins commented on pull request #31491: URL: https://github.com/apache/spark/pull/31491#issuecomment-774872221 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39580/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
AmplabJenkins commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-77487 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31487: [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'
SparkQA commented on pull request #31487: URL: https://github.com/apache/spark/pull/31487#issuecomment-774869229 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39585/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #31484: [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map
LuciferYang commented on a change in pull request #31484: URL: https://github.com/apache/spark/pull/31484#discussion_r571782459 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala ## @@ -406,7 +406,7 @@ object PreprocessTableInsertion extends Rule[LogicalPlan] { catalogTable.get.tracksPartitionsInCatalog if (partitionsTrackedByCatalog && normalizedPartSpec.nonEmpty) { // empty partition column value - if (normalizedPartSpec.map(_._2) + if (normalizedPartSpec.values .filter(_.isDefined).map(_.get).exists(v => v != null && v.isEmpty)) { Review comment: 7eac600 fix this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31504: [SPARK-34172][SQL] Add `SHOW DATABASES` as table-valued function
SparkQA removed a comment on pull request #31504: URL: https://github.com/apache/spark/pull/31504#issuecomment-774828953 **[Test build #134996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134996/testReport)** for PR 31504 at commit [`3f42e91`](https://github.com/apache/spark/commit/3f42e9145b4b7452d7263d8d4ecf4646c8a51886). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org