[GitHub] [spark] SparkQA removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
SparkQA removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347523 **[Test build #111673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111673/testReport)** for PR 25993 at commit [`16209e4`](https://github.com/apache/spark/commit/16209e42817854389b8d1b1a42214521062564d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347956 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347960 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111673/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347956 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347882 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16668/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347882 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
SparkQA commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347950 **[Test build #111673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111673/testReport)** for PR 25993 at commit [`16209e4`](https://github.com/apache/spark/commit/16209e42817854389b8d1b1a42214521062564d4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16668/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
SparkQA commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537347523 **[Test build #111673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111673/testReport)** for PR 25993 at commit [`16209e4`](https://github.com/apache/spark/commit/16209e42817854389b8d1b1a42214521062564d4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
SparkQA commented on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT) URL: https://github.com/apache/spark/pull/16478#issuecomment-537345896 **[Test build #111672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111672/testReport)** for PR 16478 at commit [`c138227`](https://github.com/apache/spark/commit/c13822730e66b05095e524abbf8fd4b0cea0a542). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
AmplabJenkins removed a comment on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT) URL: https://github.com/apache/spark/pull/16478#issuecomment-537344688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16667/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
AmplabJenkins removed a comment on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT) URL: https://github.com/apache/spark/pull/16478#issuecomment-537344682 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
AmplabJenkins commented on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT) URL: https://github.com/apache/spark/pull/16478#issuecomment-537344688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16667/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
AmplabJenkins commented on issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT) URL: https://github.com/apache/spark/pull/16478#issuecomment-537344682 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions
ueshin commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions URL: https://github.com/apache/spark/pull/25666#discussion_r330375481 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ## @@ -369,6 +383,9 @@ case class ArrayFilter( var i = 0 while (i < arr.numElements) { elementVar.value.set(arr.get(i, elementVar.dataType)) + if (indexVar.isDefined) { Review comment: I think this is good enough to go. How about merging this for now, and addressing it in a separate PR? `transform` is doing the same way, so I think we should do the same thing if needed, maybe at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
AmplabJenkins removed a comment on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#issuecomment-537338604 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
AmplabJenkins removed a comment on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#issuecomment-537338610 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/1/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
AmplabJenkins commented on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#issuecomment-537338610 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/1/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
AmplabJenkins commented on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#issuecomment-537338604 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
SparkQA commented on issue #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#issuecomment-537338309 **[Test build #111671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111671/testReport)** for PR 24903 at commit [`ca4b85c`](https://github.com/apache/spark/commit/ca4b85c60b88e23ec82611b5915e73bc2edb760a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sujith71955 commented on a change in pull request #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
sujith71955 commented on a change in pull request #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#discussion_r330373508 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -2012,6 +2012,26 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("SPARK-28084 check for case insensitive property of partition column name in load command") { +withTempDir { dir => + val path = dir.toURI.toString.stripSuffix("/") + val dirPath = dir.getAbsoluteFile + Files.append("1", new File(dirPath, "part-r-11"), StandardCharsets.UTF_8) + withTable("part_table") { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +""" + |CREATE TABLE part_table (c STRING) + |PARTITIONED BY (d STRING) +""".stripMargin) + sql("LOAD DATA LOCAL INPATH '$path/part-r-11' " + Review comment: yeah, not observed that interpolation is happening in the statement (: , corrected it now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sujith71955 commented on a change in pull request #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command
sujith71955 commented on a change in pull request #24903: [SPARK-28084][SQL] Resolving the partition column name based on the resolver in sql load command URL: https://github.com/apache/spark/pull/24903#discussion_r330373167 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -2012,6 +2012,26 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("SPARK-28084 check for case insensitive property of partition column name in load command") { +withTempDir { dir => + val path = dir.toURI.toString.stripSuffix("/") + val dirPath = dir.getAbsoluteFile + Files.append("1", new File(dirPath, s"part-r-11"), StandardCharsets.UTF_8) + withTable("part_table") { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +""" + |CREATE TABLE part_table (c STRING) + |PARTITIONED BY (d STRING) +""".stripMargin) + sql(s"LOAD DATA LOCAL INPATH '$path/part-r-11' " + +s"INTO TABLE part_table PARTITION(D ='1')") Review comment: yeah, not observed that interpolation is happening in the statement (:, corrected it now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable
AmplabJenkins removed a comment on issue #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable URL: https://github.com/apache/spark/pull/25995#issuecomment-537336729 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable
AmplabJenkins commented on issue #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable URL: https://github.com/apache/spark/pull/25995#issuecomment-537336969 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable
AmplabJenkins commented on issue #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable URL: https://github.com/apache/spark/pull/25995#issuecomment-537336729 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] karuppayya opened a new pull request #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable
karuppayya opened a new pull request #25995: SPARK-29324: Fix overwrite behaviour for saveAsTable URL: https://github.com/apache/spark/pull/25995 ### What changes were proposed in this pull request? When saveAstable is used in overwrite mode, the metadata of the table gets overwritten. In this PR, adding changes to retain the metadata after overwrite to an existing table ### Why are the changes needed? The tables metadata gets overwritten without this change ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537332388 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16665/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
imback82 commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537332401 I just resolved the conflicts. Thanks @cloud-fan! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537332385 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537332385 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537332388 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16665/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-536642091 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111609/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
SparkQA commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537332117 **[Test build #111670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111670/testReport)** for PR 25771 at commit [`88872ea`](https://github.com/apache/spark/commit/88872ea5c04a21a524f77505542c5ebc5f206da6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-537331195 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111663/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-537331195 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111663/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins commented on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-537331186 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files
AmplabJenkins removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-537331186 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537330979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16664/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537330973 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins removed a comment on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537330979 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16664/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
AmplabJenkins commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537330973 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files
SparkQA removed a comment on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-537305903 **[Test build #111663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111663/testReport)** for PR 25670 at commit [`7156b24`](https://github.com/apache/spark/commit/7156b247ad778361926c42ad2d6bd135cc42538a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25670: [SPARK-28869][CORE] Roll over event log files
SparkQA commented on issue #25670: [SPARK-28869][CORE] Roll over event log files URL: https://github.com/apache/spark/pull/25670#issuecomment-537330819 **[Test build #111663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111663/testReport)** for PR 25670 at commit [`7156b24`](https://github.com/apache/spark/commit/7156b247ad778361926c42ad2d6bd135cc42538a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page
AmplabJenkins commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page URL: https://github.com/apache/spark/pull/25994#issuecomment-537329652 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page
AmplabJenkins removed a comment on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page URL: https://github.com/apache/spark/pull/25994#issuecomment-537329412 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page
AmplabJenkins commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page URL: https://github.com/apache/spark/pull/25994#issuecomment-537329412 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-537328712 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-537328716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111662/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-537328716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111662/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
AmplabJenkins removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-537328712 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
SparkQA removed a comment on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-537301414 **[Test build #111662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111662/testReport)** for PR 24898 at commit [`aae2f81`](https://github.com/apache/spark/commit/aae2f81fc69cc2e257bc025d4acec79e02b988a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] liucht-inspur opened a new pull request #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page
liucht-inspur opened a new pull request #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page URL: https://github.com/apache/spark/pull/25994 ### What changes were proposed in this pull request? This PR is Adding tooltip for The Executors Tab's column names include RDD Blocks, Disk Used,Cores, Activity Tasks, Failed Tasks , Complete Tasks, Total Tasks in the history server Page. ![image](https://user-images.githubusercontent.com/28332082/66017759-b6c24a80-e50e-11e9-807b-5b076f701d2f.png) I have modify the following code in executorspage-template.html Before: RDD Blocks Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks After: RDD Blocks Disk Used Cores Active Tasks Failed Tasks Complete Tasks Total Tasks ### Why are the changes needed? the spark Executors of history Tab page, the Summary part shows the line in the list of title, but format is irregular. Some column names have tooltip, such as Storage Memory, Task Time(GC Time), Input, Shuffle Read, Shuffle Write and Blacklisted, but there are still some list names that have not tooltip. They are RDD Blocks, Disk Used,Cores, Activity Tasks, Failed Tasks , Complete Tasks and Total Tasks. oddly, Executors section below,All the column names Contains the column names above have tooltip . It's important for open source projects to have consistent style and user-friendly UI, and I'm working on keeping it consistent And more user-friendly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual tests for Chrome, Firefox and Safari Authored-by: liucht-inspur Signed-off-by: liucht-inspur This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's
SparkQA commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-537328365 **[Test build #111662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111662/testReport)** for PR 24898 at commit [`aae2f81`](https://github.com/apache/spark/commit/aae2f81fc69cc2e257bc025d4acec79e02b988a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
ueshin commented on a change in pull request #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#discussion_r330365006 ## File path: sql/core/src/test/java/test/org/apache/spark/sql/JavaHigherOrderFunctionsSuite.java ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql; + +import java.util.HashMap; +import java.util.List; + +import scala.collection.Seq; +import static scala.collection.JavaConverters.mapAsScalaMap; + +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import org.apache.spark.sql.Dataset; +import org.apache.spark.sql.Row; +import org.apache.spark.sql.types.*; +import static org.apache.spark.sql.types.DataTypes.*; +import static org.apache.spark.sql.functions.*; +import org.apache.spark.sql.test.TestSparkSession; +import static test.org.apache.spark.sql.JavaTestUtils.*; +import test.org.apache.spark.sql.JavaTestUtils; + +public class JavaHigherOrderFunctionsSuite { +private transient TestSparkSession spark; +private Dataset arrDf; +private Dataset mapDf; + +private void setUpArrDf() { +List data = toRows( +makeArray(1, 9, 8, 7), +makeArray(5, 8, 9, 7, 2), +JavaTestUtils.makeArray(), +null +); +StructType schema = new StructType() +.add("x", new ArrayType(IntegerType, true), true); +arrDf = spark.createDataFrame(data, schema); +} + +private void setUpMapDf() { +List data = toRows( +new HashMap() {{ +put(1, 1); +put(2, 2); +}}, +null +); +StructType schema = new StructType() +.add("x", new MapType(IntegerType, IntegerType, true)); +mapDf = spark.createDataFrame(data, schema); +} + +@Before +public void setUp() { +spark = new TestSparkSession(); +setUpArrDf(); +setUpMapDf(); +} + +@After +public void tearDown() { +spark.stop(); +spark = null; +} + +@Test +public void testTransform() { +checkAnswer( +arrDf.select(transform(col("x"), x -> x.plus(1))), +toRows( +makeArray(2, 10, 9, 8), +makeArray(6, 9, 10, 8, 3), +JavaTestUtils.makeArray(), +null +)); +checkAnswer( +arrDf.select(transform(col("x"), (x, i) -> x.plus(i))), +toRows( +makeArray(1, 10, 10, 10), +makeArray(5, 9, 11, 10, 6), +JavaTestUtils.makeArray(), +null +)); +} + +@Test +public void testFilter() { +checkAnswer( +arrDf.select(filter(col("x"), x -> x.plus(1).equalTo(10))), +toRows( +makeArray(9), +makeArray(9), +JavaTestUtils.makeArray(), +null +)); +} + +@Test +public void testExists() { +checkAnswer( +arrDf.select(exists(col("x"), x -> x.plus(1).equalTo(10))), +toRows( +true, +true, +false, +null +)); +} + +@Test +public void testForall() { +checkAnswer( +arrDf.select(forall(col("x"), x -> x.plus(1).equalTo(10))), +toRows( +false, +false, +true, +null +)); +} + +@Test +public void testAggregate() { +checkAnswer( +arrDf.select(aggregate(col("x"), lit(0), (acc, x) -> acc.plus(x))), +toRows( +25, +31, +0, +null +)); +checkAnswer( +arrDf.select(aggregate(col("x"), lit(0), (acc, x) -> acc.plus(x), x -> x)), +toRows( +25, +31, +0, +null +)); +} + +@Test +public void testZipWith() { +
[GitHub] [spark] dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988#issuecomment-537325767 Thank you, @MaxGekk , @cloud-fan , @srowen . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
dongjoon-hyun closed pull request #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988#issuecomment-537325531 Thanks. Then, I'll merge this. This will unblock the benchmarks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988#issuecomment-537325559 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1)
dongjoon-hyun closed pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) URL: https://github.com/apache/spark/pull/25992 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1)
dongjoon-hyun commented on issue #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) URL: https://github.com/apache/spark/pull/25992#issuecomment-537324686 After resolve the conflicts, I'll regenerate the result and reopen this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
cloud-fan commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988#issuecomment-537324573 we are going to support all save modes in `DataFrameWriter`. For now this change is OK to me to unblock benchmark changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1)
maropu commented on a change in pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) URL: https://github.com/apache/spark/pull/25992#discussion_r330362889 ## File path: sql/core/benchmarks/CSVBenchmark-results.txt ## @@ -2,58 +2,3 @@ Benchmark to measure CSV read/write performance -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Parsing quoted values:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -One quoted string 36998 37134 120 0.0 739953.1 1.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Select 1000 columns 140620 141162 737 0.0 140620.5 1.0X -Select 100 columns35170 35287 183 0.0 35170.0 4.0X -Select one column 27711 27927 187 0.0 27710.9 5.1X -count()7707 7804 84 0.17707.4 18.2X -Select 100 columns, one bad input field 41762 41851 117 0.0 41761.8 3.4X -Select 100 columns, corrupt record field 48717 48761 44 0.0 48717.4 2.9X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Select 10 columns + count() 16001 16053 53 0.61600.1 1.0X -Select 1 column + count() 11571 11614 58 0.91157.1 1.4X -count()4752 4766 18 2.1 475.2 3.4X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Create a dataset of timestamps 1070 1072 2 9.3 107.0 1.0X -to_csv(timestamp) 10446 10746 344 1.01044.6 0.1X -write timestamps to files 9573 9659 101 1.0 957.3 0.1X -Create a dataset of dates 1245 1260 17 8.0 124.5 0.9X -to_csv(date) 7157 7167 11 1.4 715.7 0.1X -write dates to files 5415 5450 57 1.8 541.5 0.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Read dates and timestamps:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -read timestamp text from files 1880 1887 8 5.3 188.0 1.0X -read timestamps from files27135 27180 43 0.42713.5 0.1X -infer timestamps from files 51426 51534 97 0.25142.6 0.0X -read date text from files 1618 1622 4 6.2 161.8 1.2X -read date from files 20207 20218 13 0.52020.7 0.1X -infer date from files 19418 19479 94 0.51941.8 0.1X -timestamp strings
[GitHub] [spark] maropu commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions
maropu commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions URL: https://github.com/apache/spark/pull/25666#discussion_r330362826 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ## @@ -369,6 +383,9 @@ case class ArrayFilter( var i = 0 while (i < arr.numElements) { elementVar.value.set(arr.get(i, elementVar.dataType)) + if (indexVar.isDefined) { Review comment: Yea, if no big difference, I like the similar handling with the others, e.g., https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L555 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] henrydavidge commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions
henrydavidge commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions URL: https://github.com/apache/spark/pull/25666#discussion_r330362364 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ## @@ -369,6 +383,9 @@ case class ArrayFilter( var i = 0 while (i < arr.numElements) { elementVar.value.set(arr.get(i, elementVar.dataType)) + if (indexVar.isDefined) { Review comment: Ok, tried that as well. It doesn't seem to be significantly different from the others. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1)
dongjoon-hyun commented on a change in pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) URL: https://github.com/apache/spark/pull/25992#discussion_r330361985 ## File path: sql/core/benchmarks/CSVBenchmark-results.txt ## @@ -2,58 +2,3 @@ Benchmark to measure CSV read/write performance -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Parsing quoted values:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -One quoted string 36998 37134 120 0.0 739953.1 1.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Select 1000 columns 140620 141162 737 0.0 140620.5 1.0X -Select 100 columns35170 35287 183 0.0 35170.0 4.0X -Select one column 27711 27927 187 0.0 27710.9 5.1X -count()7707 7804 84 0.17707.4 18.2X -Select 100 columns, one bad input field 41762 41851 117 0.0 41761.8 3.4X -Select 100 columns, corrupt record field 48717 48761 44 0.0 48717.4 2.9X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Select 10 columns + count() 16001 16053 53 0.61600.1 1.0X -Select 1 column + count() 11571 11614 58 0.91157.1 1.4X -count()4752 4766 18 2.1 475.2 3.4X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Create a dataset of timestamps 1070 1072 2 9.3 107.0 1.0X -to_csv(timestamp) 10446 10746 344 1.01044.6 0.1X -write timestamps to files 9573 9659 101 1.0 957.3 0.1X -Create a dataset of dates 1245 1260 17 8.0 124.5 0.9X -to_csv(date) 7157 7167 11 1.4 715.7 0.1X -write dates to files 5415 5450 57 1.8 541.5 0.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Read dates and timestamps:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -read timestamp text from files 1880 1887 8 5.3 188.0 1.0X -read timestamps from files27135 27180 43 0.42713.5 0.1X -infer timestamps from files 51426 51534 97 0.25142.6 0.0X -read date text from files 1618 1622 4 6.2 161.8 1.2X -read date from files 20207 20218 13 0.52020.7 0.1X -infer date from files 19418 19479 94 0.51941.8 0.1X -timestamp strings
[GitHub] [spark] AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-537322311 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537322290 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
AmplabJenkins removed a comment on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-537322317 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16663/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins removed a comment on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537322294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16662/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537322290 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-537322311 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
AmplabJenkins commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-537322317 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16663/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
AmplabJenkins commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537322294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16662/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
SparkQA commented on issue #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993#issuecomment-537322064 **[Test build #111667 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111667/testReport)** for PR 25993 at commit [`1e95144`](https://github.com/apache/spark/commit/1e95144ab035da63a6b5578017dec95230f4edbc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
SparkQA commented on issue #24232: [SPARK-27297] [SQL] Add higher order functions to scala API URL: https://github.com/apache/spark/pull/24232#issuecomment-537322074 **[Test build #111669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111669/testReport)** for PR 24232 at commit [`64c0f87`](https://github.com/apache/spark/commit/64c0f87a8005a27458394a14648c0e75ee514678). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available
SparkQA commented on issue #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available URL: https://github.com/apache/spark/pull/25760#issuecomment-537322067 **[Test build #111668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111668/testReport)** for PR 25760 at commit [`a7181a5`](https://github.com/apache/spark/commit/a7181a531134501ae24c1c9b890ed2d980181925). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1)
maropu commented on a change in pull request #25992: [SPARK-29320][TESTS] Compare `sql/core` module in JDK8/11 (Part 1) URL: https://github.com/apache/spark/pull/25992#discussion_r330361486 ## File path: sql/core/benchmarks/CSVBenchmark-results.txt ## @@ -2,58 +2,3 @@ Benchmark to measure CSV read/write performance -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Parsing quoted values:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -One quoted string 36998 37134 120 0.0 739953.1 1.0X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Wide rows with 1000 columns: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Select 1000 columns 140620 141162 737 0.0 140620.5 1.0X -Select 100 columns35170 35287 183 0.0 35170.0 4.0X -Select one column 27711 27927 187 0.0 27710.9 5.1X -count()7707 7804 84 0.17707.4 18.2X -Select 100 columns, one bad input field 41762 41851 117 0.0 41761.8 3.4X -Select 100 columns, corrupt record field 48717 48761 44 0.0 48717.4 2.9X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Count a dataset with 10 columns: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Select 10 columns + count() 16001 16053 53 0.61600.1 1.0X -Select 1 column + count() 11571 11614 58 0.91157.1 1.4X -count()4752 4766 18 2.1 475.2 3.4X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Write dates and timestamps: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -Create a dataset of timestamps 1070 1072 2 9.3 107.0 1.0X -to_csv(timestamp) 10446 10746 344 1.01044.6 0.1X -write timestamps to files 9573 9659 101 1.0 957.3 0.1X -Create a dataset of dates 1245 1260 17 8.0 124.5 0.9X -to_csv(date) 7157 7167 11 1.4 715.7 0.1X -write dates to files 5415 5450 57 1.8 541.5 0.2X - -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.14.4 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz -Read dates and timestamps:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - -read timestamp text from files 1880 1887 8 5.3 188.0 1.0X -read timestamps from files27135 27180 43 0.42713.5 0.1X -infer timestamps from files 51426 51534 97 0.25142.6 0.0X -read date text from files 1618 1622 4 6.2 161.8 1.2X -read date from files 20207 20218 13 0.52020.7 0.1X -infer date from files 19418 19479 94 0.51941.8 0.1X -timestamp strings
[GitHub] [spark] HyukjinKwon opened a new pull request #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build
HyukjinKwon opened a new pull request #25993: [DO-NOT-MERGE][R] Install Arrow and test Arrow optimization in AppVeyor build URL: https://github.com/apache/spark/pull/25993 ### What changes were proposed in this pull request? This PR proposes to install Arrow and test Arrow optimization in AppVeyor build. We're currently not testing this in CI. ### Why are the changes needed? To check if there's any regression and if it works correctly. ### Does this PR introduce any user-facing change? No ### How was this patch tested? AppVeyor This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r330361239 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V1FallbackWriters.scala ## @@ -82,7 +80,7 @@ case class OverwriteByExpressionExecV1( } /** Some helper interfaces that use V2 write semantics through the V1 writer interface. */ -sealed trait V1FallbackWriters extends SupportsV1Write { +sealed trait V1FallbackWriters extends SupportsV1Write { this: SupportsV1Write => Review comment: unnecessary change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r330361189 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/connector/write/WriteInfoImpl.scala ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.write + +import org.apache.spark.sql.types.StructType + +private[sql] case class WriteInfoImpl(queryId: String, + schema: StructType, Review comment: same as https://github.com/apache/spark/pull/25990/files#r330361054, we should use 4 space intonation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder
cloud-fan commented on a change in pull request #25990: [SPARK-29248][SQL][WIP] Pass in number of partitions to WriteBuilder URL: https://github.com/apache/spark/pull/25990#discussion_r330361054 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroTable.scala ## @@ -42,8 +42,10 @@ case class AvroTable( override def inferSchema(files: Seq[FileStatus]): Option[StructType] = AvroUtils.inferSchema(sparkSession, options.asScala.toMap, files) - override def newWriteBuilder(options: CaseInsensitiveStringMap): WriteBuilder = -new AvroWriteBuilder(options, paths, formatName, supportsDataType) + override def newWriteBuilder(options: CaseInsensitiveStringMap, Review comment: nit: the code style should be ``` def fun1( arg1: T, arg2: T)... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988#issuecomment-537321407 @brkyvz and @cloud-fan . Is this change intentional? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available
gaborgsomogyi commented on a change in pull request #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available URL: https://github.com/apache/spark/pull/25760#discussion_r330360852 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala ## @@ -516,13 +524,25 @@ private[kafka010] class KafkaDataConsumer( fetchedData.withNewPoll(records.listIterator, offsetAfterPoll) } - private def getOrRetrieveConsumer(): InternalKafkaConsumer = _consumer match { -case None => - _consumer = Option(consumerPool.borrowObject(cacheKey, kafkaParams)) - require(_consumer.isDefined, "borrowing consumer from pool must always succeed.") - _consumer.get + private[kafka010] def getOrRetrieveConsumer(): InternalKafkaConsumer = { +if (!_consumer.isDefined) { + retrieveConsumer() +} +require(_consumer.isDefined, "Consumer must be defined") +if (!KafkaTokenUtil.isConnectorUsingCurrentToken(_consumer.get.kafkaParamsWithSecurity, + _consumer.get.clusterConfig)) { Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks
dongjoon-hyun commented on issue #25988: [SPARK-29313][SQL] Fix failure on writing to `noop` in benchmarks URL: https://github.com/apache/spark/pull/25988#issuecomment-537321016 I'm not sure about this. For me, we need to fix the root cause inside DSv2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available
gaborgsomogyi commented on a change in pull request #25760: [SPARK-29054][SS] Invalidate Kafka consumer when new delegation token available URL: https://github.com/apache/spark/pull/25760#discussion_r330360862 ## File path: external/kafka-0-10-token-provider/src/main/scala/org/apache/spark/kafka010/KafkaTokenUtil.scala ## @@ -288,4 +289,18 @@ private[spark] object KafkaTokenUtil extends Logging { params } + + def isConnectorUsingCurrentToken( + params: ju.Map[String, Object], + clusterConfig: Option[KafkaTokenClusterConf]): Boolean = { +if (params.containsKey(SaslConfigs.SASL_JAAS_CONFIG)) { + logDebug("Delegation token used by connector, checking if uses the latest token.") + val consumerJaasParams = params.get(SaslConfigs.SASL_JAAS_CONFIG).asInstanceOf[String] + require(clusterConfig.isDefined, "Delegation token must exist for this connector.") + val currentJaasParams = getTokenJaasParams(clusterConfig.get) + consumerJaasParams.equals(currentJaasParams) Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25955: [SPARK-29277][SQL] Add early DSv2 filter and projection pushdown (WIP)
cloud-fan commented on a change in pull request #25955: [SPARK-29277][SQL] Add early DSv2 filter and projection pushdown (WIP) URL: https://github.com/apache/spark/pull/25955#discussion_r330360645 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.v2 + +import org.apache.spark.sql.catalyst.expressions.{And, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.DataSourceStrategy + +object V2ScanRelationPushDown extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown { +case PhysicalOperation(project, filters, relation: DataSourceV2Relation) => + val scanBuilder = relation.newScanBuilder() + + val (withSubquery, withoutSubquery) = filters.partition(SubqueryExpression.hasSubquery) + val normalizedFilters = DataSourceStrategy.normalizeFilters( +withoutSubquery, relation.output) + + // `pushedFilters` will be pushed down and evaluated in the underlying data sources. + // `postScanFilters` need to be evaluated after the scan. + // `postScanFilters` and `pushedFilters` can overlap, e.g. the parquet row group filter. + val (pushedFilters, postScanFiltersWithoutSubquery) = + PushDownUtils.pushFilters(scanBuilder, normalizedFilters) + val postScanFilters = postScanFiltersWithoutSubquery ++ withSubquery + val (scan, output) = PushDownUtils.pruneColumns( +scanBuilder, relation, project ++ postScanFilters) + logInfo( +s""" + |Pushing operators to ${relation.name} + |Pushed Filters: ${pushedFilters.mkString(", ")} + |Post-Scan Filters: ${postScanFilters.mkString(",")} + |Output: ${output.mkString(", ")} + """.stripMargin) + + val scanRelation = DataSourceV2ScanRelation(relation.table, scan, output) Review comment: do we always need to create a `DataSourceV2ScanRelation` even if there is no filter/project above? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25955: [SPARK-29277][SQL] Add early DSv2 filter and projection pushdown (WIP)
cloud-fan commented on a change in pull request #25955: [SPARK-29277][SQL] Add early DSv2 filter and projection pushdown (WIP) URL: https://github.com/apache/spark/pull/25955#discussion_r330360144 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -252,6 +259,11 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) */ def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil + /** + * Override to provide additional rules for early projection and filter pushdown to scans. + */ + def earlyScanPushDownRules: Seq[Rule[LogicalPlan]] = Nil Review comment: can we use the `extendedOperatorOptimizationRules`? It also happens before any rules that depending on stats This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #25600: [SPARK-11150][SQL] Dynamic Partition Pruning
viirya commented on a change in pull request #25600: [SPARK-11150][SQL] Dynamic Partition Pruning URL: https://github.com/apache/spark/pull/25600#discussion_r330359786 ## File path: sql/core/src/main/scala/org/apache/spark/sql/dynamicpruning/PartitionPruning.scala ## @@ -0,0 +1,264 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.dynamicpruning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * Dynamic partition pruning optimization is performed based on the type and + * selectivity of the join operation. During query optimization, we insert a + * predicate on the partitioned table using the filter from the other side of + * the join and a custom wrapper called DynamicPruning. + * + * The basic mechanism for DPP inserts a duplicated subquery with the filter from the other side, + * when the following conditions are met: + *(1) the table to prune is partitioned by the JOIN key + *(2) the join operation is one of the following types: INNER, LEFT SEMI (partitioned on left), + *LEFT OUTER (partitioned on right), or RIGHT OUTER (partitioned on left) + * + * In order to enable partition pruning directly in broadcasts, we use a custom DynamicPruning + * clause that incorporates the In clause with the subquery and the benefit estimation. + * During query planning, when the join type is known, we use the following mechanism: + *(1) if the join is a broadcast hash join, we replace the duplicated subquery with the reused + *results of the broadcast, + *(2) else if the estimated benefit of partition pruning outweighs the overhead of running the + *subquery query twice, we keep the duplicated subquery + *(3) otherwise, we drop the subquery. + */ +object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper { + + /** + * Search the partitioned table scan for a given partition column in a logical plan + */ + def getPartitionTableScan(a: Expression, plan: LogicalPlan): Option[LogicalRelation] = { +val srcInfo: Option[(Expression, LogicalPlan)] = findExpressionAndTrackLineageDown(a, plan) +srcInfo.flatMap { + case (resExp, l: LogicalRelation) => +l.relation match { + case fs: HadoopFsRelation => +val partitionColumns = AttributeSet( + l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver)) +if (resExp.references.subsetOf(partitionColumns)) { + return Some(l) +} else { + None +} + case _ => None +} + case _ => None +} + } + + /** + * Insert a dynamic partition pruning predicate on the left side of the join using the filter + * on the right side of the join. + * - to be able to identify this filter during query planning, we use a custom + *DynamicPruning expression that wraps a regular In expression + * - we also insert a flag that indicates if the subquery duplication is worthwhile and it + * should run irrespective the type of join, or is too expensive and it should be run only if + * we can reuse the results of a broadcast + */ + private def insertPredicate( + pruningKey: Expression, + pruningPlan: LogicalPlan, + filteringKey: Expression, + filteringPlan: LogicalPlan, + joinKeys: Seq[Expression], + hasBenefit: Boolean, + broadcastHint: Boolean): LogicalPlan = { +val reuseEnabled = SQLConf.get.dynamicPruningReuseBroadcast +val index = joinKeys.indexOf(filteringKey) +if (hasBenefit || reuseEnabled) { + // insert a DynamicPruning wrapper to identify the subquery during query planning + Filter( +DynamicPruningSubquery( + pruningKey, + filteringPlan, + joinKeys, + index, + !hasBenefit, +
[GitHub] [spark] maropu commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions
maropu commented on a change in pull request #25666: [SPARK-28962][SQL] Provide index argument to filter lambda functions URL: https://github.com/apache/spark/pull/25666#discussion_r330358827 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ## @@ -369,6 +383,9 @@ case class ArrayFilter( var i = 0 while (i < arr.numElements) { elementVar.value.set(arr.get(i, elementVar.dataType)) + if (indexVar.isDefined) { Review comment: I thought code like this; ``` @transient lazy val (elementVar, mayFillIndex) = function match { case LambdaFunction(_, Seq(elemVar: NamedLambdaVariable), _) => (elemVar, (_: Int) => {}) case LambdaFunction(_, Seq(elemVar: NamedLambdaVariable, idxVar: NamedLambdaVariable), _) => (elemVar, (i: Int) => idxVar.value.set(i)) } override def nullSafeEval(inputRow: InternalRow, argumentValue: Any): Any = { val arr = argumentValue.asInstanceOf[ArrayData] val f = functionForEval val buffer = new mutable.ArrayBuffer[Any](arr.numElements) var i = 0 while (i < arr.numElements) { elementVar.value.set(arr.get(i, elementVar.dataType)) mayFillIndex(i) if (f.eval(inputRow).asInstanceOf[Boolean]) { buffer += elementVar.value.get } i += 1 } new GenericArrayData(buffer) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2
cloud-fan commented on issue #25771: [SPARK-28970][SQL] Implement USE CATALOG/NAMESPACE for Data Source V2 URL: https://github.com/apache/spark/pull/25771#issuecomment-537316802 I think it's ready to go. @imback82 can you fix the conflict? thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25600: [SPARK-11150][SQL] Dynamic Partition Pruning
cloud-fan commented on a change in pull request #25600: [SPARK-11150][SQL] Dynamic Partition Pruning URL: https://github.com/apache/spark/pull/25600#discussion_r330356993 ## File path: sql/core/src/main/scala/org/apache/spark/sql/dynamicpruning/PartitionPruning.scala ## @@ -0,0 +1,264 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.dynamicpruning + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * Dynamic partition pruning optimization is performed based on the type and + * selectivity of the join operation. During query optimization, we insert a + * predicate on the partitioned table using the filter from the other side of + * the join and a custom wrapper called DynamicPruning. + * + * The basic mechanism for DPP inserts a duplicated subquery with the filter from the other side, + * when the following conditions are met: + *(1) the table to prune is partitioned by the JOIN key + *(2) the join operation is one of the following types: INNER, LEFT SEMI (partitioned on left), + *LEFT OUTER (partitioned on right), or RIGHT OUTER (partitioned on left) + * + * In order to enable partition pruning directly in broadcasts, we use a custom DynamicPruning + * clause that incorporates the In clause with the subquery and the benefit estimation. + * During query planning, when the join type is known, we use the following mechanism: + *(1) if the join is a broadcast hash join, we replace the duplicated subquery with the reused + *results of the broadcast, + *(2) else if the estimated benefit of partition pruning outweighs the overhead of running the + *subquery query twice, we keep the duplicated subquery + *(3) otherwise, we drop the subquery. + */ +object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper { + + /** + * Search the partitioned table scan for a given partition column in a logical plan + */ + def getPartitionTableScan(a: Expression, plan: LogicalPlan): Option[LogicalRelation] = { +val srcInfo: Option[(Expression, LogicalPlan)] = findExpressionAndTrackLineageDown(a, plan) +srcInfo.flatMap { + case (resExp, l: LogicalRelation) => +l.relation match { + case fs: HadoopFsRelation => +val partitionColumns = AttributeSet( + l.resolve(fs.partitionSchema, fs.sparkSession.sessionState.analyzer.resolver)) +if (resExp.references.subsetOf(partitionColumns)) { + return Some(l) +} else { + None +} + case _ => None +} + case _ => None +} + } + + /** + * Insert a dynamic partition pruning predicate on the left side of the join using the filter + * on the right side of the join. + * - to be able to identify this filter during query planning, we use a custom + *DynamicPruning expression that wraps a regular In expression + * - we also insert a flag that indicates if the subquery duplication is worthwhile and it + * should run irrespective the type of join, or is too expensive and it should be run only if + * we can reuse the results of a broadcast + */ + private def insertPredicate( + pruningKey: Expression, + pruningPlan: LogicalPlan, + filteringKey: Expression, + filteringPlan: LogicalPlan, + joinKeys: Seq[Expression], + hasBenefit: Boolean, + broadcastHint: Boolean): LogicalPlan = { +val reuseEnabled = SQLConf.get.dynamicPruningReuseBroadcast +val index = joinKeys.indexOf(filteringKey) +if (hasBenefit || reuseEnabled) { + // insert a DynamicPruning wrapper to identify the subquery during query planning + Filter( +DynamicPruningSubquery( + pruningKey, + filteringPlan, + joinKeys, + index, + !hasBenefit, +
[GitHub] [spark] cloud-fan closed pull request #25985: [SPARK-29310][CORE][TESTS] TestMemoryManager should implement getExecutionMemoryUsageForTask()
cloud-fan closed pull request #25985: [SPARK-29310][CORE][TESTS] TestMemoryManager should implement getExecutionMemoryUsageForTask() URL: https://github.com/apache/spark/pull/25985 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25985: [SPARK-29310][CORE][TESTS] TestMemoryManager should implement getExecutionMemoryUsageForTask()
cloud-fan commented on issue #25985: [SPARK-29310][CORE][TESTS] TestMemoryManager should implement getExecutionMemoryUsageForTask() URL: https://github.com/apache/spark/pull/25985#issuecomment-537315968 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-537311371 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111664/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
SparkQA commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-537311325 **[Test build #111664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111664/testReport)** for PR 25960 at commit [`2acb51a`](https://github.com/apache/spark/commit/2acb51ab20a06ef606394c3ca650e36fcfad23c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
SparkQA removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-537307379 **[Test build #111664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/111664/testReport)** for PR 25960 at commit [`2acb51a`](https://github.com/apache/spark/commit/2acb51ab20a06ef606394c3ca650e36fcfad23c2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-537311371 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/111664/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins commented on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-537311369 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution
AmplabJenkins removed a comment on issue #25960: [SPARK-29283][SQL] Error message is hidden when query from JDBC, especially enabled adaptive execution URL: https://github.com/apache/spark/pull/25960#issuecomment-537311369 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org