[GitHub] [spark] AmplabJenkins removed a comment on issue #26817: [SPARK-30192][SQL] support column position in DS v2
AmplabJenkins removed a comment on issue #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#issuecomment-565340046 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565340080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20094/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26817: [SPARK-30192][SQL] support column position in DS v2
AmplabJenkins removed a comment on issue #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#issuecomment-565340054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20095/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565340073 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2
AmplabJenkins commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#issuecomment-565340054 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20095/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2
cloud-fan commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#issuecomment-565339937 @rdblue thanks for catching the bug! comments addressed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565340080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20094/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565340073 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2
AmplabJenkins commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#issuecomment-565340046 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2
SparkQA commented on issue #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#issuecomment-565339687 **[Test build #115286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115286/testReport)** for PR 26817 at commit [`c01f565`](https://github.com/apache/spark/commit/c01f565d048f9f84aa08616113941e5f072158c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26817: [SPARK-30192][SQL] support column position in DS v2
cloud-fan commented on a change in pull request #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#discussion_r357520684 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala ## @@ -101,6 +101,27 @@ trait AlterTableTests extends SharedSparkSession { } } + test("AlterTable: add column with position") { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
SparkQA commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565339667 **[Test build #115285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115285/testReport)** for PR 26875 at commit [`424e0e3`](https://github.com/apache/spark/commit/424e0e31cca5bc6d31730fc9cfbce82252518c93). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26817: [SPARK-30192][SQL] support column position in DS v2
cloud-fan commented on a change in pull request #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#discussion_r357520738 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala ## @@ -471,6 +492,27 @@ trait AlterTableTests extends SharedSparkSession { } } + test("AlterTable: update column position") { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26817: [SPARK-30192][SQL] support column position in DS v2
cloud-fan commented on a change in pull request #26817: [SPARK-30192][SQL] support column position in DS v2 URL: https://github.com/apache/spark/pull/26817#discussion_r357520387 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala ## @@ -147,6 +137,25 @@ private[sql] object CatalogV2Util { replace(schema, update.fieldNames, field => Some(field.withComment(update.newComment))) +case update: UpdateColumnPosition => + def updateFieldPos(struct: StructType, name: String): StructType = { +val oldField = struct.fields.find(_.name == name).getOrElse { + throw new IllegalArgumentException("field not found: " + name) +} +val withFieldRemoved = StructType(struct.fields.filter(_ != oldField)) +addField(withFieldRemoved, oldField, update.position()) + } + + update.fieldNames() match { +case Array(name) => + updateFieldPos(schema, name) +case names => + replace(schema, names.init, parent => parent.dataType match { +case parentType: StructType => Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26871: [SPARK-30238][SQL] hive partition pruning can only support string and integral types
HyukjinKwon commented on issue #26871: [SPARK-30238][SQL] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26871#issuecomment-565338376 LGTM too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
HyukjinKwon commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565338109 Merged to branch-2.4 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on issue #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex
fuwhu commented on issue #26850: [SPARK-30215][SQL] Remove PrunedInMemoryFileIndex and merge its functionality into InMemoryFileIndex URL: https://github.com/apache/spark/pull/26850#issuecomment-565338050 gently cc: @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
AmplabJenkins commented on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server. URL: https://github.com/apache/spark/pull/26873#issuecomment-565337433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115277/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
AmplabJenkins removed a comment on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server. URL: https://github.com/apache/spark/pull/26873#issuecomment-565337433 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115277/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
AmplabJenkins removed a comment on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server. URL: https://github.com/apache/spark/pull/26873#issuecomment-565337428 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
AmplabJenkins commented on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server. URL: https://github.com/apache/spark/pull/26873#issuecomment-565337428 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
SparkQA removed a comment on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server. URL: https://github.com/apache/spark/pull/26873#issuecomment-565303493 **[Test build #115277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115277/testReport)** for PR 26873 at commit [`30f4fbf`](https://github.com/apache/spark/commit/30f4fbfa085b1921abe6221eead22602f1e5fc52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server.
SparkQA commented on issue #26873: [SPARK-30240][core] Support HTTP redirects directly to a proxy server. URL: https://github.com/apache/spark/pull/26873#issuecomment-565336936 **[Test build #115277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115277/testReport)** for PR 26873 at commit [`30f4fbf`](https://github.com/apache/spark/commit/30f4fbfa085b1921abe6221eead22602f1e5fc52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
AmplabJenkins commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565334339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115279/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
SparkQA removed a comment on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565307710 **[Test build #115279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115279/testReport)** for PR 26876 at commit [`8f3b3cf`](https://github.com/apache/spark/commit/8f3b3cfba624436c312462b098bdc8f18f667182). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
AmplabJenkins removed a comment on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565334333 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
AmplabJenkins commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565334333 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
AmplabJenkins removed a comment on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565334339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115279/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types
SparkQA commented on issue #26876: [SPARK-30238][SQL][2.4] hive partition pruning can only support string and integral types URL: https://github.com/apache/spark/pull/26876#issuecomment-565334150 **[Test build #115279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115279/testReport)** for PR 26876 at commit [`8f3b3cf`](https://github.com/apache/spark/commit/8f3b3cfba624436c312462b098bdc8f18f667182). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
HeartSaVioR commented on a change in pull request #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#discussion_r357514816 ## File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ## @@ -1125,6 +1126,36 @@ class StreamSuite extends StreamTest { } ) } + + // ProcessingTime trigger generates MicroBatchExecution, and ContinuousTrigger starts a + // ContinuousExecution + Seq(Trigger.ProcessingTime("1 second"), Trigger.Continuous("1 second")).foreach { trigger => +test(s"SPARK-30143: stop waits until timeout if blocked - trigger: $trigger") { + BlockOnStopSourceProvider.enableBlocking() + val sq = spark.readStream.format(BlockOnStopSourceProvider.getClass.getName) Review comment: This seems to return the class name of object `BlockOnStopSourceProvider` (which would be `BlockOnStopSourceProvider$`), not the class itself. You may need to provide full path as String, or rename object. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #26828: [SPARK-30198][Core] BytesToBytesMap does not grow internal long array as expected
viirya commented on issue #26828: [SPARK-30198][Core] BytesToBytesMap does not grow internal long array as expected URL: https://github.com/apache/spark/pull/26828#issuecomment-565332779 > Can we at least provide a manual regression test in the PR description? so that people can try and evaluate the risk. Good suggestion. I added one manual test case in the PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types
yaooqinn commented on issue #26699: [SPARK-30066][SQL] Columnar execution support for interval types URL: https://github.com/apache/spark/pull/26699#issuecomment-565331192 Thanks @dongjoon-hyun for bringing up jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565330141 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115276/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565330141 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115276/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565330137 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
AmplabJenkins commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565330137 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
SparkQA removed a comment on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565299410 **[Test build #115276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115276/testReport)** for PR 26875 at commit [`5250186`](https://github.com/apache/spark/commit/5250186bf921f07fd5af0681de4df2e8b8d02cdd). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static
SparkQA commented on issue #26875: [SPARK-30245][SQL] Add cache for Like and RLike when pattern is not static URL: https://github.com/apache/spark/pull/26875#issuecomment-565330012 **[Test build #115276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115276/testReport)** for PR 26875 at commit [`5250186`](https://github.com/apache/spark/commit/5250186bf921f07fd5af0681de4df2e8b8d02cdd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26808: [SPARK-30184][SQL] Implement a helper method for aliasing functions
amanomer commented on issue #26808: [SPARK-30184][SQL] Implement a helper method for aliasing functions URL: https://github.com/apache/spark/pull/26808#issuecomment-565329505 @cloud-fan @maropu Do we need to use `expressionWithAlias` for Average and ApproximatePercentile, too? In this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26811: [SPARK-29600][SQL] array_contains built in function is not backward compatible in 3.0
maropu commented on a change in pull request #26811: [SPARK-29600][SQL] array_contains built in function is not backward compatible in 3.0 URL: https://github.com/apache/spark/pull/26811#discussion_r357510107 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -863,6 +863,21 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { |value with same element type, but it's [array, string]. """.stripMargin.replace("\n", " ").trim() assert(e2.message.contains(errorMsg2)) + +checkAnswer( Review comment: Since this is a bug, can you split these three tests into a separate test unit and add a test title with the jira ID(SPARK-29600)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26811: [SPARK-29600][SQL] array_contains built in function is not backward compatible in 3.0
maropu commented on a change in pull request #26811: [SPARK-29600][SQL] array_contains built in function is not backward compatible in 3.0 URL: https://github.com/apache/spark/pull/26811#discussion_r357510170 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -863,6 +863,21 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { |value with same element type, but it's [array, string]. """.stripMargin.replace("\n", " ").trim() assert(e2.message.contains(errorMsg2)) + +checkAnswer( Review comment: Also, can you update the title, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on issue #26811: [SPARK-29600][SQL] array_contains built in function is not backward compatible in 3.0
amanomer commented on issue #26811: [SPARK-29600][SQL] array_contains built in function is not backward compatible in 3.0 URL: https://github.com/apache/spark/pull/26811#issuecomment-565328273 cc @cloud-fan @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
sarutak commented on a change in pull request #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#discussion_r357506912 ## File path: repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala ## @@ -297,4 +299,110 @@ class ReplSuite extends SparkFunSuite with BeforeAndAfterAll { assertContains("successful", output) } + test("SPARK-30167: Log4j configuration for REPL should override root logger properly") { +val testConfiguration = + """ +|# Set everything to be logged to the console +|log4j.rootCategory=INFO, console +|log4j.appender.console=org.apache.log4j.ConsoleAppender +|log4j.appender.console.target=System.err +|log4j.appender.console.layout=org.apache.log4j.PatternLayout +|log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n +| +|# Set the log level for this class to WARN same as the default setting. +|log4j.logger.org.apache.spark.repl.Main=ERROR +|""".stripMargin + +val log4jprops = Files.createTempFile("log4j.properties.d", "log4j.properties") +Files.write(log4jprops, testConfiguration.getBytes) + +val originalRootLogger = LogManager.getRootLogger +val originalRootAppender = originalRootLogger.getAppender("file") +val originalStderr = System.err +val originalReplThresholdLevel = Logging.sparkShellThresholdLevel + +val replLoggerLogMessage = "Log level for REPL: " +val warnLogMessage1 = "warnLogMessage1 should not be output" +val errorLogMessage1 = "errorLogMessage1 should be output" +val infoLogMessage1 = "infoLogMessage2 should be output" +val infoLogMessage2 = "infoLogMessage3 should be output" + +val out = try { + PropertyConfigurator.configure(log4jprops.toString) Review comment: Thanks. I've replaced it with absolute path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
sarutak commented on a change in pull request #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#discussion_r357506575 ## File path: repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala ## @@ -297,4 +299,110 @@ class ReplSuite extends SparkFunSuite with BeforeAndAfterAll { assertContains("successful", output) } + test("SPARK-30167: Log4j configuration for REPL should override root logger properly") { +val testConfiguration = + """ +|# Set everything to be logged to the console +|log4j.rootCategory=INFO, console +|log4j.appender.console=org.apache.log4j.ConsoleAppender +|log4j.appender.console.target=System.err +|log4j.appender.console.layout=org.apache.log4j.PatternLayout +|log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n +| +|# Set the log level for this class to WARN same as the default setting. +|log4j.logger.org.apache.spark.repl.Main=ERROR +|""".stripMargin + +val log4jprops = Files.createTempFile("log4j.properties.d", "log4j.properties") +Files.write(log4jprops, testConfiguration.getBytes) + +val originalRootLogger = LogManager.getRootLogger +val originalRootAppender = originalRootLogger.getAppender("file") +val originalStderr = System.err +val originalReplThresholdLevel = Logging.sparkShellThresholdLevel + +val replLoggerLogMessage = "Log level for REPL: " +val warnLogMessage1 = "warnLogMessage1 should not be output" +val errorLogMessage1 = "errorLogMessage1 should be output" +val infoLogMessage1 = "infoLogMessage2 should be output" +val infoLogMessage2 = "infoLogMessage3 should be output" + +val out = try { + PropertyConfigurator.configure(log4jprops.toString) + + // Re-initialization is needed to set SparkShellLoggingFilter to ConsoleAppender + Main.initializeForcefully(true, false) + runInterpreter("local", +s""" + |import java.io.{ByteArrayOutputStream, PrintStream} + | + |import org.apache.log4j.{ConsoleAppender, Level, LogManager} + | + |val replLogger = LogManager.getLogger("${Main.getClass.getName.stripSuffix("$")}") + | + |// Log level for REPL is expected to be ERROR + |"$replLoggerLogMessage" + replLogger.getLevel() + | + |val bout = new ByteArrayOutputStream() + | + |// Configure stderr to let log messages output to ByteArrayOutputStream. + |val defaultErrStream: PrintStream = System.err + |try { + | System.setErr(new PrintStream(bout)) + | + | // Reconfigure ConsoleAppender to reflect the stderr setting. + | val consoleAppender = + | LogManager.getRootLogger.getAllAppenders.nextElement.asInstanceOf[ConsoleAppender] + | consoleAppender.activateOptions() + | + | // customLogger1 is not explicitly configured neither its log level nor appender + | // so this inherits the settings of rootLogger + | // but ConsoleAppender can use a different log level. + | val customLogger1 = LogManager.getLogger("customLogger1") + | customLogger1.warn("$warnLogMessage1") + | customLogger1.error("$errorLogMessage1") + | + | // customLogger2 is explicitly configured its log level as INFO + | // so info level messages logged via customLogger2 should be output. + | val customLogger2 = LogManager.getLogger("customLogger2") + | customLogger2.setLevel(Level.INFO) + | customLogger2.info("$infoLogMessage1") + | + | // customLogger2 is explicitly configured its log level + | // so its child should inherit the settings. + | val customLogger3 = LogManager.getLogger("customLogger2.child") + | customLogger3.info("$infoLogMessage2") + | + | // echo log messages + | bout.toString + |} finally { + | System.setErr(defaultErrStream) + |} + |""".stripMargin) +} finally { + // Restore log4j settings for this suite + val log4jproperties = Thread.currentThread() Review comment: I think that we can't use `Logging.uninitialize` because of following 2 reasons. 1. That method doesn't reload the default log4j.properties. 2. Loggers configured in the new test case are not removed. They are removed by invoking `LogManager.resetConfiguration`. So if we replace the restoration procedure with `Logging.uninitialize`, [this assertion](https://github.com/apache/spark/pull/26798/files#diff-e796b23ac8447d31f622de5ecac88e64R401) will fail. Th
[GitHub] [spark] AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565324028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20093/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565324019 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
imback82 commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#discussion_r357506323 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala ## @@ -130,39 +163,12 @@ private[sql] trait LookupCatalog extends Logging { */ object AsTemporaryViewIdentifier { Review comment: sure. I can do a quick follow-up PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565324019 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565324028 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20093/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
SparkQA commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565323576 **[Test build #115284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115284/testReport)** for PR 26798 at commit [`df78a35`](https://github.com/apache/spark/commit/df78a354a8b0cbce4650685ec2f098a81cb0626d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322869 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322872 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115274/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322869 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on issue #25797: [SPARK-29043][Core] Improve the concurrent performance of History Server
turboFei commented on issue #25797: [SPARK-29043][Core] Improve the concurrent performance of History Server URL: https://github.com/apache/spark/pull/25797#issuecomment-565322918 > @turboFei Hi, could you address the review comments? This is good to have and seems close to be merged (according to [#26416 (review)](https://github.com/apache/spark/pull/26416#pullrequestreview-331596655) ). Thanks, I will address it as soon as possible. Thanks for your reminder. @HeartSaVioR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322872 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115274/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115275/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors URL: https://github.com/apache/spark/pull/26858#discussion_r357504875 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ## @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] // Limit the use of hashDist since it's controversial val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), DataTypes.DoubleType) val hashDistCol = hashDistUDF(col($(outputCol))) - - // Compute threshold to get around k elements. - // To guarantee to have enough neighbors in one pass, we need (p - err) * N >= M - // so we pick quantile p = M / N + err - // M: the number of nearest neighbors; N: the number of elements in dataset - val relativeError = 0.05 - val approxQuantile = numNearestNeighbors.toDouble / count + relativeError val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol) - if (approxQuantile >= 1) { -modelDatasetWithDist + // for a small dataset, use BoundedPriorityQueue + if (count < 1000) { +val queue = new BoundedPriorityQueue[Double](count.toInt)(Ordering[Double]) Review comment: `BoundedPriorityQueue` only maintains the topK entries, so it is safe to absorb a lot of entries. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322485 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115275/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322400 **[Test build #115274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115274/testReport)** for PR 26416 at commit [`3286f9d`](https://github.com/apache/spark/commit/3286f9dfcea52378b79caf45f88cdebea4cca8d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
maropu commented on a change in pull request #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#discussion_r357504697 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -770,6 +770,12 @@ object SQLConf { .intConf .createWithDefault(200) + val THRIFTSERVER_RESULT_ESCAPE_STRUCT_STRING = +buildConf("spark.sql.thriftserver.result.escapeStructString.enabled") Review comment: nit: can you follow the format with the other configs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565292777 **[Test build #115274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115274/testReport)** for PR 26416 at commit [`3286f9d`](https://github.com/apache/spark/commit/3286f9dfcea52378b79caf45f88cdebea4cca8d5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
AmplabJenkins commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565322485 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA removed a comment on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565295328 **[Test build #115275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115275/testReport)** for PR 26416 at commit [`ab5d233`](https://github.com/apache/spark/commit/ab5d2332ce2490e2f3dcc6d5211dd708e02e35d9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565322032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20092/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
cloud-fan commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#discussion_r357504518 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala ## @@ -130,39 +163,12 @@ private[sql] trait LookupCatalog extends Logging { */ object AsTemporaryViewIdentifier { Review comment: if it's only used in test, we can remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565322032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20092/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
maropu commented on a change in pull request #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#discussion_r357504464 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -770,6 +770,12 @@ object SQLConf { .intConf .createWithDefault(200) + val THRIFTSERVER_RESULT_ESCAPE_STRUCT_STRING = +buildConf("spark.sql.thriftserver.result.escapeStructString.enabled") +.doc("When true, escape string when needed to ensure the result returned is a valid json.") +.booleanConf +.createWithDefault(false) Review comment: Is this a bug or a hive compatibility issue? If so, we might be able to set true by default? Then, update the migration guide. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins removed a comment on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565322027 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly.
AmplabJenkins commented on issue #26798: [SPARK-30167][REPL] Log4j configuration for REPL can't override the root logger properly. URL: https://github.com/apache/spark/pull/26798#issuecomment-565322027 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup
SparkQA commented on issue #26416: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/26416#issuecomment-565321984 **[Test build #115275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115275/testReport)** for PR 26416 at commit [`ab5d233`](https://github.com/apache/spark/commit/ab5d2332ce2490e2f3dcc6d5211dd708e02e35d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp'
fuwhu commented on a change in pull request #26741: [SPARK-30104][SQL] Fix catalog resolution for 'global_temp' URL: https://github.com/apache/spark/pull/26741#discussion_r357503236 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala ## @@ -130,39 +163,12 @@ private[sql] trait LookupCatalog extends Logging { */ object AsTemporaryViewIdentifier { Review comment: It seems this object is only used in LookupCatalogSuite. if it is only for test, shall we put it in LookupCatalogSuite.scala ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
AmplabJenkins commented on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#issuecomment-565320026 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
AmplabJenkins commented on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#issuecomment-565320034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115271/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
AmplabJenkins removed a comment on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#issuecomment-565320026 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
AmplabJenkins removed a comment on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#issuecomment-565320034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115271/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
SparkQA commented on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#issuecomment-565319563 **[Test build #115271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115271/testReport)** for PR 26270 at commit [`4f057c7`](https://github.com/apache/spark/commit/4f057c76b63f59df7f923bfaf9f3e3511bef9d09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive
SparkQA removed a comment on issue #26270: [SPARK-26544][SQL] Escape struct string in spark thriftserver to keep alignment with hive URL: https://github.com/apache/spark/pull/26270#issuecomment-565275281 **[Test build #115271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115271/testReport)** for PR 26270 at commit [`4f057c7`](https://github.com/apache/spark/commit/4f057c76b63f59df7f923bfaf9f3e3511bef9d09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors URL: https://github.com/apache/spark/pull/26858#discussion_r357502027 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ## @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] // Limit the use of hashDist since it's controversial val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), DataTypes.DoubleType) val hashDistCol = hashDistUDF(col($(outputCol))) - - // Compute threshold to get around k elements. - // To guarantee to have enough neighbors in one pass, we need (p - err) * N >= M - // so we pick quantile p = M / N + err - // M: the number of nearest neighbors; N: the number of elements in dataset - val relativeError = 0.05 - val approxQuantile = numNearestNeighbors.toDouble / count + relativeError val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol) - if (approxQuantile >= 1) { -modelDatasetWithDist + // for a small dataset, use BoundedPriorityQueue + if (count < 1000) { +val queue = new BoundedPriorityQueue[Double](count.toInt)(Ordering[Double]) Review comment: this only depends on `numNearestNeighbors`, when it is small (maybe < 1?). On each partition, collect the minmum 10 values, and merge them by `treeAggregate` to get the global minmum 10 values, and the max value in them is the threshold. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565318994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115278/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565318987 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
SparkQA removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565304845 **[Test build #115278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115278/testReport)** for PR 26771 at commit [`7537033`](https://github.com/apache/spark/commit/753703353a8cea2c8a20c799fd70cd17c7f002ed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565318987 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565318994 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115278/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
SparkQA commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565318948 **[Test build #115278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115278/testReport)** for PR 26771 at commit [`7537033`](https://github.com/apache/spark/commit/753703353a8cea2c8a20c799fd70cd17c7f002ed). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier
AmplabJenkins removed a comment on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878#issuecomment-565318308 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20091/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier
AmplabJenkins removed a comment on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878#issuecomment-565318298 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier
AmplabJenkins commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878#issuecomment-565318308 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20091/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier
AmplabJenkins commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878#issuecomment-565318298 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier
SparkQA commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE behavior when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878#issuecomment-565317913 **[Test build #115283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115283/testReport)** for PR 26878 at commit [`4d2bc2e`](https://github.com/apache/spark/commit/4d2bc2ec9fd8130279ea97e1622406f5cd82da79). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors URL: https://github.com/apache/spark/pull/26858#discussion_r357500230 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ## @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] // Limit the use of hashDist since it's controversial val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), DataTypes.DoubleType) val hashDistCol = hashDistUDF(col($(outputCol))) - - // Compute threshold to get around k elements. - // To guarantee to have enough neighbors in one pass, we need (p - err) * N >= M - // so we pick quantile p = M / N + err - // M: the number of nearest neighbors; N: the number of elements in dataset - val relativeError = 0.05 - val approxQuantile = numNearestNeighbors.toDouble / count + relativeError val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol) - if (approxQuantile >= 1) { -modelDatasetWithDist + // for a small dataset, use BoundedPriorityQueue + if (count < 1000) { +val queue = new BoundedPriorityQueue[Double](count.toInt)(Ordering[Double]) Review comment: This place should be like: ```scala val exactThreshold = modelDatasetWithDist .select(distCol) .as[Double] .rdd .treeAggregate(new BoundedPriorityQueue[Double](numNearestNeighbors)(Ordering[Double].reverse))( seqOp= (q, v) => q += v, combOp = (q1, q2) => q1 ++= q2, depth = 2 ).toArray.max ``` And this impl should have no dependency on the size of dataset. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE when session catalog name is provided in the identifier
imback82 commented on issue #26878: [SPARK-30248][SQL] Fix DROP TABLE when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878#issuecomment-565316954 cc: @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565316414 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 opened a new pull request #26878: [SPARK-30248][SQL] Fix DROP TABLE when session catalog name is provided in the identifier
imback82 opened a new pull request #26878: [SPARK-30248][SQL] Fix DROP TABLE when session catalog name is provided in the identifier URL: https://github.com/apache/spark/pull/26878 ### What changes were proposed in this pull request? If a table name is qualified with session catalog name `spark_catalog`, the `DROP TABLE` command fails. For example, the following ``` sql("CREATE TABLE tbl USING json AS SELECT 1 AS i") sql("DROP TABLE spark_catalog.tbl") ``` fails with: ``` org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'spark_catalog' not found; at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:42) at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists$(ExternalCatalog.scala:40) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:336) ``` This PR correctly resolves `spark_catalog` as a catalog. ### Why are the changes needed? It's fixing a bug. ### Does this PR introduce any user-facing change? Yes, now, the `spark_catalog.tbl` in the above example is dropped as expected. ### How was this patch tested? Added a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins removed a comment on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565316420 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20090/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565316414 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
AmplabJenkins commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565316420 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20090/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
SparkQA commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565316066 **[Test build #115282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115282/testReport)** for PR 26771 at commit [`7537033`](https://github.com/apache/spark/commit/753703353a8cea2c8a20c799fd70cd17c7f002ed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] brkyvz commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query
brkyvz commented on issue #26771: [SPARK-30143][SS] Add a timeout on stopping a streaming query URL: https://github.com/apache/spark/pull/26771#issuecomment-565315374 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #26848: [SPARK-30216][INFRA] Use python3 in Docker release image
wangyum commented on issue #26848: [SPARK-30216][INFRA] Use python3 in Docker release image URL: https://github.com/apache/spark/pull/26848#issuecomment-565314894 I tested it on the master branch. Do we need to test on branch-2.4? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org