[spark] branch master updated: [SPARK-39689][SQL] Support 2-chars `lineSep` in CSV datasource
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bb4c4778713 [SPARK-39689][SQL] Support 2-chars `lineSep` in CSV datasource bb4c4778713 is described below commit bb4c4778713c7ba1ee92d0bb0763d7d3ce54374f Author: yaohua AuthorDate: Thu Jul 7 15:22:06 2022 +0900 [SPARK-39689][SQL] Support 2-chars `lineSep` in CSV datasource ### What changes were proposed in this pull request? Univocity parser allows to set line separator to 1 to 2 characters ([code](https://github.com/uniVocity/univocity-parsers/blob/master/src/main/java/com/univocity/parsers/common/Format.java#L103)), CSV options should not block this usage ([code](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala#L218)). This PR updates the requirement of `lineSep` accepting 1 or 2 characters in `CSVOptions`. Due to the limitation of supporting multi-chars `lineSep` within quotes, I propose to leave this feature undocumented and add a WARN message to users. ### Why are the changes needed? Unblock the usability of 2 characters `lineSep`. ### Does this PR introduce _any_ user-facing change? No - undocumented feature. ### How was this patch tested? New UT. Closes #37107 from Yaohua628/spark-39689. Lead-authored-by: yaohua Co-authored-by: Yaohua Zhao <79476540+yaohua...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon --- .../apache/spark/sql/catalyst/csv/CSVOptions.scala | 6 +++- .../sql/execution/datasources/csv/CSVSuite.scala | 35 ++ 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala index 9daa50ba5a4..3e92c3d25eb 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala @@ -215,7 +215,11 @@ class CSVOptions( */ val lineSeparator: Option[String] = parameters.get("lineSep").map { sep => require(sep.nonEmpty, "'lineSep' cannot be an empty string.") -require(sep.length == 1, "'lineSep' can contain only 1 character.") +// Intentionally allow it up to 2 for Window's CRLF although multiple +// characters have an issue with quotes. This is intentionally undocumented. +require(sep.length <= 2, "'lineSep' can contain only 1 character.") +if (sep.length == 2) logWarning("It is not recommended to set 'lineSep' " + + "with 2 characters due to the limitation of supporting multi-char 'lineSep' within quotes.") sep } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala index 62dccaad1dd..bf92ffcf465 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala @@ -34,6 +34,7 @@ import org.apache.commons.lang3.exception.ExceptionUtils import org.apache.commons.lang3.time.FastDateFormat import org.apache.hadoop.io.SequenceFile.CompressionType import org.apache.hadoop.io.compress.GzipCodec +import org.apache.logging.log4j.Level import org.apache.spark.{SparkConf, SparkException, TestUtils} import org.apache.spark.sql.{AnalysisException, Column, DataFrame, Encoders, QueryTest, Row} @@ -2296,6 +2297,40 @@ abstract class CSVSuite assert(errMsg2.contains("'lineSep' can contain only 1 character")) } + Seq(true, false).foreach { multiLine => +test(s"""lineSep with 2 chars when multiLine set to $multiLine""") { + Seq("\r\n", "||", "|").foreach { newLine => +val logAppender = new LogAppender("lineSep WARN logger") +withTempDir { dir => + val inputData = if (multiLine) { +s"""name,"i am the${newLine} column1"${newLine}jack,30${newLine}tom,18""" + } else { +s"name,age${newLine}jack,30${newLine}tom,18" + } + Files.write(new File(dir, "/data.csv").toPath, inputData.getBytes()) + withLogAppender(logAppender) { +val df = spark.read + .options( +Map("header" -> "true", "multiLine" -> multiLine.toString, "lineSep" -> newLine)) + .csv(dir.getCanonicalPath) +// Due to the limitation of Univocity parser: +// multiple chars of newlines cannot be properly handled when they exist within quotes. +// Leave 2-char lineSep as an undocumented features and logWarn user +if (newLine !=
[spark] branch master updated: [SPARK-39703][CORE][BUILD] Mima complains with Scala 2.13 for the changes in DeployMessages
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 845950b72b6 [SPARK-39703][CORE][BUILD] Mima complains with Scala 2.13 for the changes in DeployMessages 845950b72b6 is described below commit 845950b72b63f94b03436a598d9d041e662a0b53 Author: Hyukjin Kwon AuthorDate: Thu Jul 7 15:21:25 2022 +0900 [SPARK-39703][CORE][BUILD] Mima complains with Scala 2.13 for the changes in DeployMessages ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/36716. Mima with Scala 2.13 complains about the changes in `DeployMessages` for some reasons: ``` [error] spark-core: Failed binary compatibility check against org.apache.spark:spark-core_2.13:3.2.0! Found 6 potential problems (filtered 933) [error] * the type hierarchy of object org.apache.spark.deploy.DeployMessages#LaunchExecutor is different in current version. Missing types {scala.runtime.AbstractFunction7} [error]filter with: ProblemFilters.exclude[MissingTypesProblem]("org.apache.spark.deploy.DeployMessages$LaunchExecutor$") [error] * method requestedTotal()Int in class org.apache.spark.deploy.DeployMessages#RequestExecutors does not have a correspondent in current version [error]filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.requestedTotal") [error] * method copy(java.lang.String,Int)org.apache.spark.deploy.DeployMessages#RequestExecutors in class org.apache.spark.deploy.DeployMessages#RequestExecutors's type is different in current version, where it is (java.lang.String,scala.collection.immutable.Map)org.apache.spark.deploy.DeployMessages#RequestExecutors instead of (java.lang.String,Int)org.apache.spark.deploy.DeployMessages#RequestExecutors [error]filter with: ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.copy") [error] * synthetic method copy$default$2()Int in class org.apache.spark.deploy.DeployMessages#RequestExecutors has a different result type in current version, where it is scala.collection.immutable.Map rather than Int [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.copy$default$2") [error] * method this(java.lang.String,Int)Unit in class org.apache.spark.deploy.DeployMessages#RequestExecutors's type is different in current version, where it is (java.lang.String,scala.collection.immutable.Map)Unit instead of (java.lang.String,Int)Unit [error]filter with: ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.this") [error] * method apply(java.lang.String,Int)org.apache.spark.deploy.DeployMessages#RequestExecutors in object org.apache.spark.deploy.DeployMessages#RequestExecutors in current version does not have a correspondent with same parameter signature among (java.lang.String,scala.collection.immutable.Map)org.apache.spark.deploy.DeployMessages#RequestExecutors, (java.lang.Object,java.lang.Object)java.lang.Object [error]filter with: ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.deploy.DeployMessages#RequestExecutors.apply") ``` https://github.com/apache/spark/runs/7221231391?check_suite_focus=true This PR adds the suggested filters. ### Why are the changes needed? To make the scheduled build (Scala 2.13) pass in https://github.com/apache/spark/actions/workflows/build_scala213.yml ### Does this PR introduce _any_ user-facing change? No, dev-only. The alarms are false positive. ### How was this patch tested? CI should verify this, Closes #37109 from HyukjinKwon/SPARK-39703. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- project/MimaExcludes.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index fb71155657f..3f3d8575477 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -54,7 +54,15 @@ object MimaExcludes { ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.Classifier.getNumClasses"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.Classifier.getNumClasses$default$2"), ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.OneVsRest.extractInstances"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.OneVsRestModel.extractInstances") + ProblemFilters.exclude[Dir
[spark] branch master updated: [SPARK-39679][SQL] TakeOrderedAndProjectExec should respect child output ordering
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 427fbee4c00 [SPARK-39679][SQL] TakeOrderedAndProjectExec should respect child output ordering 427fbee4c00 is described below commit 427fbee4c009d8d49fdb80a2e2532723eff84150 Author: ulysses-you AuthorDate: Thu Jul 7 14:20:29 2022 +0800 [SPARK-39679][SQL] TakeOrderedAndProjectExec should respect child output ordering ### What changes were proposed in this pull request? Skip local sort in `TakeOrderedAndProjectExec` if child output ordering satisfies the required. ### Why are the changes needed? TakeOrderedAndProjectExec should respect child output ordering to avoid unnecessary sort. For example: TakeOrderedAndProjectExec on the top of SortMergeJoin. ```SQL SELECT * FROM t1 JOIN t2 ON t1.c1 = t2.c2 ORDER BY t1.c1 LIMIT 100; ``` ### Does this PR introduce _any_ user-facing change? no, only improve performance ### How was this patch tested? Add benchmark test: ```sql val row = 10 * 1000 val df1 = spark.range(0, row, 1, 2).selectExpr("id % 3 as c1") val df2 = spark.range(0, row, 1, 2).selectExpr("id % 3 as c2") df1.join(df2, col("c1") === col("c2")) .orderBy(col("c1")) .limit(100) .noop() ``` Before: ``` TakeOrderedAndProject OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8370C CPU 2.80GHz TakeOrderedAndProject with SMJ:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - TakeOrderedAndProject with SMJ for doExecute3356 3414 61 0.0 335569.5 1.0X TakeOrderedAndProject with SMJ for executeCollect 3331 3370 47 0.0 333118.0 1.0X OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz TakeOrderedAndProject with SMJ:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - TakeOrderedAndProject with SMJ for doExecute3745 3766 24 0.0 374477.3 1.0X TakeOrderedAndProject with SMJ for executeCollect 3657 3680 38 0.0 365703.4 1.0X OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz TakeOrderedAndProject with SMJ:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - TakeOrderedAndProject with SMJ for doExecute2499 2554 47 0.0 249945.5 1.0X TakeOrderedAndProject with SMJ for executeCollect 2510 2515 8 0.0 250956.9 1.0X ``` After: ``` TakeOrderedAndProject OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8171M CPU 2.60GHz TakeOrderedAndProject with SMJ:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative - TakeOrderedAndProject with SMJ for doExecute 287 337 43 0.0 28734.9 1.0X TakeOrderedAndProject with SMJ for executeCollect150 170 30 0.1 15037.8 1.9X OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz TakeOrderedAndProject with SMJ:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -
[spark] branch branch-3.2 updated (1c0bd4c15a2 -> be891ad9908)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git from 1c0bd4c15a2 [SPARK-39656][SQL][3.2] Fix wrong namespace in DescribeNamespaceExec add be891ad9908 [SPARK-39551][SQL][3.2] Add AQE invalid plan check No new revisions were added by this update. Summary of changes: .../execution/adaptive/AdaptiveSparkPlanExec.scala | 72 -- .../adaptive/InvalidAQEPlanException.scala | 17 +++-- .../sql/execution/adaptive/ValidateSparkPlan.scala | 68 .../adaptive/AdaptiveQueryExecSuite.scala | 25 +++- 4 files changed, 141 insertions(+), 41 deletions(-) copy core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala => sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InvalidAQEPlanException.scala (61%) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ValidateSparkPlan.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 88b983d9f2a [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo 88b983d9f2a is described below commit 88b983d9f2a7190b8d74a6176740afb65fa08223 Author: ulysses-you AuthorDate: Thu Jul 7 13:17:11 2022 +0900 [SPARK-39503][SQL][FOLLOWUP] Fix ansi golden files and typo ### What changes were proposed in this pull request? - re-generate ansi golden files - fix FunctionIdentifier parameter name typo ### Why are the changes needed? Fix ansi golden files and typo ### Does this PR introduce _any_ user-facing change? no, not released ### How was this patch tested? pass CI Closes #37111 from ulysses-you/catalog-followup. Authored-by: ulysses-you Signed-off-by: Hyukjin Kwon --- .../apache/spark/sql/catalyst/identifiers.scala| 2 +- .../approved-plans-v1_4/q83.ansi/explain.txt | 28 +++--- .../approved-plans-v1_4/q83.ansi/simplified.txt| 14 +-- .../approved-plans-v1_4/q83.sf100.ansi/explain.txt | 28 +++--- .../q83.sf100.ansi/simplified.txt | 14 +-- 5 files changed, 43 insertions(+), 43 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala index 9cae2b622a7..2de44d6f349 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala @@ -142,7 +142,7 @@ case class FunctionIdentifier(funcName: String, database: Option[String], catalo override val identifier: String = funcName def this(funcName: String) = this(funcName, None, None) - def this(table: String, database: Option[String]) = this(table, database, None) + def this(funcName: String, database: Option[String]) = this(funcName, database, None) override def toString: String = unquotedString } diff --git a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt index d281e59c727..905d29293a3 100644 --- a/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt +++ b/sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q83.ansi/explain.txt @@ -13,11 +13,11 @@ TakeOrderedAndProject (46) : : : +- * BroadcastHashJoin Inner BuildRight (8) : : : :- * Filter (3) : : : : +- * ColumnarToRow (2) - : : : : +- Scan parquet default.store_returns (1) + : : : : +- Scan parquet spark_catalog.default.store_returns (1) : : : +- BroadcastExchange (7) : : :+- * Filter (6) : : : +- * ColumnarToRow (5) - : : : +- Scan parquet default.item (4) + : : : +- Scan parquet spark_catalog.default.item (4) : : +- ReusedExchange (10) : +- BroadcastExchange (28) :+- * HashAggregate (27) @@ -29,7 +29,7 @@ TakeOrderedAndProject (46) : : +- * BroadcastHashJoin Inner BuildRight (20) : : :- * Filter (18) : : : +- * ColumnarToRow (17) - : : : +- Scan parquet default.catalog_returns (16) + : : : +- Scan parquet spark_catalog.default.catalog_returns (16) : : +- ReusedExchange (19) : +- ReusedExchange (22) +- BroadcastExchange (43) @@ -42,12 +42,12 @@ TakeOrderedAndProject (46) : +- * BroadcastHashJoin Inner BuildRight (35) : :- * Filter (33) : : +- * ColumnarToRow (32) -: : +- Scan parquet default.web_returns (31) +: : +- Scan parquet spark_catalog.default.web_returns (31) : +- ReusedExchange (34) +- ReusedExchange (37) -(1) Scan parquet default.store_returns +(1) Scan parquet spark_catalog.default.store_returns Output [3]: [sr_item_sk#1, sr_return_quantity#2, sr_returned_date_sk#3] Batched: true Location: InMemoryFileIndex [] @@ -62,7 +62,7 @@ Input [3]: [sr_item_sk#1, sr_return_quantity#2, sr_returned_date_sk#3] Input [3]: [sr_item_sk#1, sr_return_qu
[spark] branch branch-3.3 updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 016dfeb760d [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` 016dfeb760d is described below commit 016dfeb760dbe1109e3c81c39bcd1bf3316a3e20 Author: Jiaan Geng AuthorDate: Thu Jul 7 09:55:45 2022 +0800 [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` https://github.com/apache/spark/pull/35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So https://github.com/apache/spark/pull/35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. 'Yes'. Bug will be fix. New test cases. Closes #37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan (cherry picked from commit 14f2bae208c093dea58e3f947fb660e8345fb256) Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 15 - .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 38 +++--- 2 files changed, 32 insertions(+), 21 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala index 4a88203ec59..967df112af2 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala @@ -55,18 +55,15 @@ private object H2Dialect extends JdbcDialect { assert(f.children().length == 1) val distinct = if (f.isDistinct) "DISTINCT " else "" Some(s"STDDEV_SAMP($distinct${f.children().head})") -case f: GeneralAggregateFunc if f.name() == "COVAR_POP" => +case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_POP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" => + Some(s"COVAR_POP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_SAMP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "CORR" => + Some(s"COVAR_SAMP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "CORR" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"CORR($distinct${f.children().head}, ${f.children().last})") + Some(s"CORR(${f.children().head}, ${f.children().last})") case _ => None } ) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index 2f94f9ef31e..293334084af 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -1028,23 +1028,37 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: COVAR_POP COVAR_SAMP with filter and group by") { -val df = sql("select COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + - " FROM h2.test.employee where dept > 0 group by DePt") -checkFiltersRemoved(df) -checkAggregateRemoved(df) -checkPushedInfo(df, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + +val df1 = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + + " FROM h2.test.employee WHERE dept > 0 GROUP BY DePt") +checkFiltersRemoved(df1) +checkAggregateRemoved(df1) +checkPushedInfo(df1, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + "PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: [DEPT]") -checkAnswer(df, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) +checkAnswer(df1, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) + +val df2 = sql("SELECT COVAR_POP(DISTINCT bonus, bonus), COVAR_SAMP(DISTINCT bonus, bonus)" + + " FROM h2.test.employee WHERE dept > 0
[spark] branch master updated: [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT`
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 14f2bae208c [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` 14f2bae208c is described below commit 14f2bae208c093dea58e3f947fb660e8345fb256 Author: Jiaan Geng AuthorDate: Thu Jul 7 09:55:45 2022 +0800 [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So https://github.com/apache/spark/pull/35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Why are the changes needed? Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Does this PR introduce _any_ user-facing change? 'Yes'. Bug will be fix. ### How was this patch tested? New test cases. Closes #37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng Signed-off-by: Wenchen Fan --- .../org/apache/spark/sql/jdbc/H2Dialect.scala | 15 -- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 34 +++--- 2 files changed, 30 insertions(+), 19 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala index 124cb001b5c..5dfc64d7b6c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala @@ -62,18 +62,15 @@ private[sql] object H2Dialect extends JdbcDialect { assert(f.children().length == 1) val distinct = if (f.isDistinct) "DISTINCT " else "" Some(s"STDDEV_SAMP($distinct${f.children().head})") -case f: GeneralAggregateFunc if f.name() == "COVAR_POP" => +case f: GeneralAggregateFunc if f.name() == "COVAR_POP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_POP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" => + Some(s"COVAR_POP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "COVAR_SAMP" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"COVAR_SAMP($distinct${f.children().head}, ${f.children().last})") -case f: GeneralAggregateFunc if f.name() == "CORR" => + Some(s"COVAR_SAMP(${f.children().head}, ${f.children().last})") +case f: GeneralAggregateFunc if f.name() == "CORR" && !f.isDistinct => assert(f.children().length == 2) - val distinct = if (f.isDistinct) "DISTINCT " else "" - Some(s"CORR($distinct${f.children().head}, ${f.children().last})") + Some(s"CORR(${f.children().head}, ${f.children().last})") case _ => None } ) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala index 108348fbcd3..0a713bdb76c 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala @@ -1652,23 +1652,37 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel } test("scan with aggregate push-down: COVAR_POP COVAR_SAMP with filter and group by") { -val df = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + +val df1 = sql("SELECT COVAR_POP(bonus, bonus), COVAR_SAMP(bonus, bonus)" + " FROM h2.test.employee WHERE dept > 0 GROUP BY DePt") -checkFiltersRemoved(df) -checkAggregateRemoved(df) -checkPushedInfo(df, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + +checkFiltersRemoved(df1) +checkAggregateRemoved(df1) +checkPushedInfo(df1, "PushedAggregates: [COVAR_POP(BONUS, BONUS), COVAR_SAMP(BONUS, BONUS)], " + "PushedFilters: [DEPT IS NOT NULL, DEPT > 0], PushedGroupByExpressions: [DEPT]") -checkAnswer(df, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) +checkAnswer(df1, Seq(Row(1d, 2d), Row(2500d, 5000d), Row(0d, null))) + +val df2 = sql("SELECT COVAR_POP(DISTINCT bonus, bonus), COVAR_SAMP(DISTINCT bonus, bonus)" + + " FROM h2.test.employee
[spark] branch master updated: [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2a2608ad557 [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image 2a2608ad557 is described below commit 2a2608ad557e3ebb160287b7d7fd9d14c251b3c2 Author: Yikun Jiang AuthorDate: Thu Jul 7 08:59:38 2022 +0900 [SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build cache image ### What changes were proposed in this pull request? This patch have two improvment: - Add `cache-from`: this will help to speed up cache build and ensure the image will NOT do full refresh if `REFRESH_DATE` is not changed by intention. - Add `FULL_REFRESH_DATE` in dockerfile: this will help force to do a full refresh. ### Why are the changes needed? Without this PR, if you change the dockerfile, the cache image will do a **complete refreshed** when dockerfile with any changes. This cause the different behavoir between ci tmp image (cache based refresh, in pyspark/sparkr/lint job) and infra cache (full refresh, in build infra cache job). Finally, if a PR refresh dockerfile, you might see pyspark/sparkr/lint CI is successful, but next pyspark/sparkr/lint CI failure after cache is refreshed (because deps may be changed when image do full refreshed). After this PR, if you change the dockerfile, the cache image job will do a cache based refreshed (use previous cache as much as possible, and refreshed the left layers when cache mismatch) to keep same behavior of pyspark/sparkr/lint job result. This behavior is similar to **static image** in some level, you can refresh the `FULL_REFRESH_DATE` to force refresh cache completely, the advantage is you can see the pyspark/sparkr/lint ci results in GA when you do full refresh. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test local Closes #37103 from Yikun/SPARK-39522-FOLLOWUP. Authored-by: Yikun Jiang Signed-off-by: Hyukjin Kwon --- .github/workflows/build_infra_images_cache.yml | 1 + dev/infra/Dockerfile | 2 ++ 2 files changed, 3 insertions(+) diff --git a/.github/workflows/build_infra_images_cache.yml b/.github/workflows/build_infra_images_cache.yml index 4ab27da7bdf..145769d1506 100644 --- a/.github/workflows/build_infra_images_cache.yml +++ b/.github/workflows/build_infra_images_cache.yml @@ -57,6 +57,7 @@ jobs: context: ./dev/infra/ push: true tags: ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }} + cache-from: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }} cache-to: type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name }},mode=max - name: Image digest diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile index 8968b097251..e3ba4f6110b 100644 --- a/dev/infra/Dockerfile +++ b/dev/infra/Dockerfile @@ -18,6 +18,8 @@ # Image for building and testing Spark branches. Based on Ubuntu 20.04. FROM ubuntu:20.04 +ENV FULL_REFRESH_DATE 20220706 + ENV DEBIAN_FRONTEND noninteractive ENV DEBCONF_NONINTERACTIVE_SEEN true - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new be7dab12677 [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column be7dab12677 is described below commit be7dab12677a180908b6ce37847abdda12adeb9b Author: Xinrong Meng AuthorDate: Thu Jul 7 08:53:50 2022 +0900 [SPARK-39648][PYTHON][PS][DOC] Fix type hints of `like`, `rlike`, `ilike` of Column ### What changes were proposed in this pull request? Fix type hints of `like`, `rlike`, `ilike` of Column. ### Why are the changes needed? Current type hints are incorrect so the doc is confusing: `Union["Column", "LiteralType", "DecimalLiteral", "DateTimeLiteral"]]` is hinted whereas only `str` is accepted. The PR is proposed to adjust the above issue by introducing `_bin_op_other_str`. ### Does this PR introduce _any_ user-facing change? No. Doc change only. ### How was this patch tested? Manual tests. Closes #37038 from xinrong-databricks/like_rlike. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/series.py | 2 +- python/pyspark/sql/column.py| 117 +--- 2 files changed, 64 insertions(+), 55 deletions(-) diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py index a7852c110f7..838077ed7cd 100644 --- a/python/pyspark/pandas/series.py +++ b/python/pyspark/pandas/series.py @@ -5024,7 +5024,7 @@ class Series(Frame, IndexOpsMixin, Generic[T]): else: if regex: # to_replace must be a string -cond = self.spark.column.rlike(to_replace) +cond = self.spark.column.rlike(cast(str, to_replace)) else: cond = self.spark.column.isin(to_replace) # to_replace may be a scalar diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index 04458d560ee..31954a95690 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -573,57 +573,6 @@ class Column: >>> df.filter(df.name.contains('o')).collect() [Row(age=5, name='Bob')] """ -_rlike_doc = """ -SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex -match. - -Parameters --- -other : str -an extended regex expression - -Examples - ->>> df.filter(df.name.rlike('ice$')).collect() -[Row(age=2, name='Alice')] -""" -_like_doc = """ -SQL like expression. Returns a boolean :class:`Column` based on a SQL LIKE match. - -Parameters --- -other : str -a SQL LIKE pattern - -See Also - -pyspark.sql.Column.rlike - -Examples - ->>> df.filter(df.name.like('Al%')).collect() -[Row(age=2, name='Alice')] -""" -_ilike_doc = """ -SQL ILIKE expression (case insensitive LIKE). Returns a boolean :class:`Column` -based on a case insensitive match. - -.. versionadded:: 3.3.0 - -Parameters --- -other : str -a SQL LIKE pattern - -See Also - -pyspark.sql.Column.rlike - -Examples - ->>> df.filter(df.name.ilike('%Ice')).collect() -[Row(age=2, name='Alice')] -""" _startswith_doc = """ String starts with. Returns a boolean :class:`Column` based on a string match. @@ -656,12 +605,72 @@ class Column: """ contains = _bin_op("contains", _contains_doc) -rlike = _bin_op("rlike", _rlike_doc) -like = _bin_op("like", _like_doc) -ilike = _bin_op("ilike", _ilike_doc) startswith = _bin_op("startsWith", _startswith_doc) endswith = _bin_op("endsWith", _endswith_doc) +def like(self: "Column", other: str) -> "Column": +""" +SQL like expression. Returns a boolean :class:`Column` based on a SQL LIKE match. + +Parameters +-- +other : str +a SQL LIKE pattern + +See Also + +pyspark.sql.Column.rlike + +Examples + +>>> df.filter(df.name.like('Al%')).collect() +[Row(age=2, name='Alice')] +""" +njc = getattr(self._jc, "like")(other) +return Column(njc) + +def rlike(self: "Column", other: str) -> "Column": +""" +SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex +match. + +Parameters +-- +other : str +an extended regex expression + +Examples + +>>> df.filter(df.name.rlike('ice$')).collect() +[Row(age=2, name='Alice')] +""" +njc = getattr(self._jc, "rlike")(othe
[spark] branch master updated: [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1cf4fe5cd4d [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse 1cf4fe5cd4d is described below commit 1cf4fe5cd4dedd6ccd38fc9c159069f7c5a72191 Author: Dongjoon Hyun AuthorDate: Wed Jul 6 16:27:58 2022 -0700 [SPARK-39701][CORE][K8S][TESTS] Move `withSecretFile` to `SparkFunSuite` to reuse ### What changes were proposed in this pull request? This PR aims to move `withSecretFile` to `SparkFunSuite` and reuse it in Kubernetes tests. ### Why are the changes needed? Currently, K8s unit tests generate a leftover because it doesn't clean up the temporary secret files. By reusing the existing method, we can avoid this ``` $ build/sbt -Pkubernetes "kubernetes/test" $ git status On branch master Your branch is up to date with 'apache/master'. Untracked files: (use "git add ..." to include in what will be committed) resource-managers/kubernetes/core/temp-secret/ ``` ### Does this PR introduce _any_ user-facing change? No. This is a test-only change. ### How was this patch tested? Pass the CIs. Closes #37106 from dongjoon-hyun/SPARK-39701. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/SecurityManagerSuite.scala| 11 +--- .../scala/org/apache/spark/SparkFunSuite.scala | 16 ++- .../features/BasicExecutorFeatureStepSuite.scala | 31 +- 3 files changed, 29 insertions(+), 29 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala b/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala index 44e338c6f00..a11ecc22d0b 100644 --- a/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala +++ b/core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala @@ -29,7 +29,7 @@ import org.apache.spark.internal.config._ import org.apache.spark.internal.config.UI._ import org.apache.spark.launcher.SparkLauncher import org.apache.spark.security.GroupMappingServiceProvider -import org.apache.spark.util.{ResetSystemProperties, SparkConfWithEnv, Utils} +import org.apache.spark.util.{ResetSystemProperties, SparkConfWithEnv} class DummyGroupMappingServiceProvider extends GroupMappingServiceProvider { @@ -513,14 +513,5 @@ class SecurityManagerSuite extends SparkFunSuite with ResetSystemProperties { private def encodeFileAsBase64(secretFile: File) = { Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath)) } - - private def withSecretFile(contents: String = "test-secret")(f: File => Unit): Unit = { -val secretDir = Utils.createTempDir("temp-secrets") -val secretFile = new File(secretDir, "temp-secret.txt") -Files.write(secretFile.toPath, contents.getBytes(UTF_8)) -try f(secretFile) finally { - Utils.deleteRecursively(secretDir) -} - } } diff --git a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala index 7922e13db69..b17aacc0a9f 100644 --- a/core/src/test/scala/org/apache/spark/SparkFunSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkFunSuite.scala @@ -18,7 +18,8 @@ package org.apache.spark import java.io.File -import java.nio.file.Path +import java.nio.charset.StandardCharsets.UTF_8 +import java.nio.file.{Files, Path} import java.util.{Locale, TimeZone} import scala.annotation.tailrec @@ -223,6 +224,19 @@ abstract class SparkFunSuite } } + /** + * Creates a temporary directory containing a secret file, which is then passed to `f` and + * will be deleted after `f` returns. + */ + protected def withSecretFile(contents: String = "test-secret")(f: File => Unit): Unit = { +val secretDir = Utils.createTempDir("temp-secrets") +val secretFile = new File(secretDir, "temp-secret.txt") +Files.write(secretFile.toPath, contents.getBytes(UTF_8)) +try f(secretFile) finally { + Utils.deleteRecursively(secretDir) +} + } + /** * Adds a log appender and optionally sets a log level to the root logger or the logger with * the specified name, then executes the specified function, and in the end removes the log diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala index 84c4f3b8ba3..420edddb693 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apa
[GitHub] [spark-website] holdenk commented on pull request #400: [SPARK-39512] Document docker image release steps
holdenk commented on PR #400: URL: https://github.com/apache/spark-website/pull/400#issuecomment-1176618299 ping @MaxGekk & @tgravescs @gengliangwang since y'all had comments on the first draft, this one looking ok? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9983bdb3b88 [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method 9983bdb3b88 is described below commit 9983bdb3b882a083cba9785392c3ba5d7a36496a Author: panbingkun AuthorDate: Wed Jul 6 11:27:17 2022 -0500 [SPARK-39663][SQL][TESTS] Add UT for MysqlDialect listIndexes method ### What changes were proposed in this pull request? Add complemented UT for MysqlDialect's lustIndexes method. ### Why are the changes needed? Add UT for existed function & improve test coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #37060 from panbingkun/SPARK-39663. Authored-by: panbingkun Signed-off-by: Sean Owen --- .../spark/sql/jdbc/v2/MySQLIntegrationSuite.scala | 2 ++ .../org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala | 30 ++ .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 2 +- 3 files changed, 28 insertions(+), 6 deletions(-) diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala index 97f521a378e..6e76b74c7d8 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala @@ -119,6 +119,8 @@ class MySQLIntegrationSuite extends DockerJDBCIntegrationV2Suite with V2JDBCTest override def supportsIndex: Boolean = true + override def supportListIndexes: Boolean = true + override def indexOptions: String = "KEY_BLOCK_SIZE=10" testVarPop() diff --git a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala index 5f0033490d5..0f85bd534c3 100644 --- a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala +++ b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala @@ -197,6 +197,8 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu def supportsIndex: Boolean = false + def supportListIndexes: Boolean = false + def indexOptions: String = "" test("SPARK-36895: Test INDEX Using SQL") { @@ -219,11 +221,21 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu s" The supported Index Types are:")) sql(s"CREATE index i1 ON $catalogName.new_table USING BTREE (col1)") +assert(jdbcTable.indexExists("i1")) +if (supportListIndexes) { + val indexes = jdbcTable.listIndexes() + assert(indexes.size == 1) + assert(indexes.head.indexName() == "i1") +} + sql(s"CREATE index i2 ON $catalogName.new_table (col2, col3, col5)" + s" OPTIONS ($indexOptions)") - -assert(jdbcTable.indexExists("i1") == true) -assert(jdbcTable.indexExists("i2") == true) +assert(jdbcTable.indexExists("i2")) +if (supportListIndexes) { + val indexes = jdbcTable.listIndexes() + assert(indexes.size == 2) + assert(indexes.map(_.indexName()).sorted === Array("i1", "i2")) +} // This should pass without exception sql(s"CREATE index IF NOT EXISTS i1 ON $catalogName.new_table (col1)") @@ -234,10 +246,18 @@ private[v2] trait V2JDBCTest extends SharedSparkSession with DockerIntegrationFu assert(m.contains("Failed to create index i1 in new_table")) sql(s"DROP index i1 ON $catalogName.new_table") -sql(s"DROP index i2 ON $catalogName.new_table") - assert(jdbcTable.indexExists("i1") == false) +if (supportListIndexes) { + val indexes = jdbcTable.listIndexes() + assert(indexes.size == 1) + assert(indexes.head.indexName() == "i2") +} + +sql(s"DROP index i2 ON $catalogName.new_table") assert(jdbcTable.indexExists("i2") == false) +if (supportListIndexes) { + assert(jdbcTable.listIndexes().isEmpty) +} // This should pass without exception sql(s"DROP index IF EXISTS i1 ON $catalogName.new_table") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala index 24f9bac74f8..c4cb5369af9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala +++ b/sql/core/src/main/scala/org
[spark] branch master updated: [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3cf07c3e031 [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files 3cf07c3e031 is described below commit 3cf07c3e03195b95a37d3635f736fe78b70d22f7 Author: yangjie01 AuthorDate: Wed Jul 6 09:23:37 2022 -0700 [SPARK-39691][TESTS] Update `MapStatusesConvertBenchmark` result files ### What changes were proposed in this pull request? SPARK-39325 add `MapStatusesConvertBenchmark` but only upload result file generated by Java 8, so this pr supplement `MapStatusesConvertBenchmark` result generated by Java 11 and 17. On the other hand, SPARK-39626 upgraded `RoaringBitmap` from `0.9.28` to `0.9.30` and from the `IntelliJ Profiler` sampling, the hotspot path of `MapStatusesConvertBenchmark` contains `RoaringBitmap#contains`, so this pr also updated the result file generated by Java 8. ### Why are the changes needed? Update `MapStatusesConvertBenchmark` result files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes #37100 from LuciferYang/MapStatusesConvertBenchmark-result. Authored-by: yangjie01 Signed-off-by: Dongjoon Hyun --- .../MapStatusesConvertBenchmark-jdk11-results.txt | 13 + ...ts.txt => MapStatusesConvertBenchmark-jdk17-results.txt} | 8 core/benchmarks/MapStatusesConvertBenchmark-results.txt | 10 +- 3 files changed, 22 insertions(+), 9 deletions(-) diff --git a/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt new file mode 100644 index 000..96fa24175c5 --- /dev/null +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk11-results.txt @@ -0,0 +1,13 @@ + +MapStatuses Convert Benchmark + + +OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1031-azure +Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz +MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative + +Num Maps: 5 Fetch partitions:500 1324 1333 7 0.0 1324283680.0 1.0X +Num Maps: 5 Fetch partitions:1000 2650 2670 32 0.0 2650318387.0 0.5X +Num Maps: 5 Fetch partitions:1500 4018 4059 53 0.0 4017921009.0 0.3X + + diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt similarity index 54% copy from core/benchmarks/MapStatusesConvertBenchmark-results.txt copy to core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt index f41401bbe2e..0ba8d756dfc 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results.txt +++ b/core/benchmarks/MapStatusesConvertBenchmark-jdk17-results.txt @@ -2,12 +2,12 @@ MapStatuses Convert Benchmark -OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1025-azure +OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz MapStatuses Convert: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Num Maps: 5 Fetch partitions:500 1330 1359 26 0.0 1329827185.0 1.0X -Num Maps: 5 Fetch partitions:1000 2648 2666 20 0.0 2647944453.0 0.5X -Num Maps: 5 Fetch partitions:1500 4155 4436 383 0.0 4154563448.0 0.3X +Num Maps: 5 Fetch partitions:500 1092 1104 22 0.0 1091691925.0 1.0X +Num Maps: 5 Fetch partitions:1000 2172 2192 29 0.0 2171702137.0 0.5X +Num Maps: 5 Fetch partitions:1500 3268 3291 27 0.0 3267904436.0 0.3X diff --git a/core/benchmarks/MapStatusesConvertBenchmark-results.txt b/core/benchmarks/MapStatusesConvertBenchmark-results.txt index f41401bbe2e..ae84abfdcc2 100644 --- a/core/benchmarks/MapStatusesConvertBenchmark-results