[spark] branch master updated (1a042cc -> 52e5cc4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1a042cc [SPARK-33530][CORE] Support --archives and spark.archives option natively add 52e5cc4 [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 6 +- .../streaming/CompactibleFileStreamLog.scala | 8 ++- .../sql/execution/streaming/FileStreamSink.scala | 7 +- .../execution/streaming/FileStreamSinkLog.scala| 25 ++- .../streaming/FileStreamSinkLogSuite.scala | 77 +- 5 files changed, 83 insertions(+), 40 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2af2da5 -> 1a042cc)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2af2da5 [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch add 1a042cc [SPARK-33530][CORE] Support --archives and spark.archives option natively No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/SparkContext.scala | 89 +++--- .../src/main/scala/org/apache/spark/SparkEnv.scala | 5 +- .../org/apache/spark/deploy/SparkSubmit.scala | 3 + .../apache/spark/deploy/SparkSubmitArguments.scala | 5 +- .../scala/org/apache/spark/executor/Executor.scala | 50 .../org/apache/spark/internal/config/package.scala | 10 +++ .../apache/spark/scheduler/TaskDescription.scala | 9 ++- .../apache/spark/scheduler/TaskSetManager.scala| 2 + .../main/scala/org/apache/spark/util/Utils.scala | 52 +++-- .../scala/org/apache/spark/SparkContextSuite.scala | 79 +++ .../org/apache/spark/deploy/SparkSubmitSuite.scala | 37 + .../deploy/rest/SubmitRestProtocolSuite.scala | 3 + .../CoarseGrainedExecutorBackendSuite.scala| 2 +- .../org/apache/spark/executor/ExecutorSuite.scala | 1 + .../CoarseGrainedSchedulerBackendSuite.scala | 3 +- .../scheduler/EventLoggingListenerSuite.scala | 3 +- .../spark/scheduler/TaskDescriptionSuite.scala | 6 ++ docs/configuration.md | 11 +++ project/MimaExcludes.scala | 1 + python/docs/source/user_guide/python_packaging.rst | 27 --- .../MesosFineGrainedSchedulerBackendSuite.scala| 2 + 21 files changed, 347 insertions(+), 53 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c50fcac -> 2af2da5)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c50fcac [SPARK-33607][SS][WEBUI] Input Rate timeline/histogram aren't rendered if built with Scala 2.13 add 2af2da5 [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch No new revisions were added by this update. Summary of changes: .../sql/execution/streaming/FileStreamSource.scala | 2 +- .../execution/streaming/FileStreamSourceLog.scala | 27 + .../sql/streaming/FileStreamSourceSuite.scala | 64 ++ 3 files changed, 92 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (8016123 -> c50fcac)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8016123 [SPARK-33592] Fix: Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading add c50fcac [SPARK-33607][SS][WEBUI] Input Rate timeline/histogram aren't rendered if built with Scala 2.13 No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/streaming/ProgressReporter.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (aeb3649 -> 8016123)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from aeb3649 [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests add 8016123 [SPARK-33592] Fix: Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading No new revisions were added by this update. Summary of changes: dev/sparktestsupport/modules.py| 1 + python/pyspark/ml/classification.py| 46 + python/pyspark/ml/param/__init__.py| 6 +++ python/pyspark/ml/pipeline.py | 53 +-- python/pyspark/ml/tests/test_tuning.py | 47 +++-- python/pyspark/ml/tests/test_util.py | 84 ++ python/pyspark/ml/tuning.py| 94 +++--- python/pyspark/ml/util.py | 38 ++ python/pyspark/ml/util.pyi | 6 +++ 9 files changed, 268 insertions(+), 107 deletions(-) create mode 100644 python/pyspark/ml/tests/test_util.py - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (596fbc1 -> aeb3649)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 596fbc1 [SPARK-33556][ML] Add array_to_vector function for dataframe column add aeb3649 [SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests No new revisions were added by this update. Summary of changes: python/pyspark/ml/tests/test_feature.py| 2 +- python/pyspark/ml/tests/test_image.py | 6 +- python/pyspark/ml/tests/test_param.py | 2 +- python/pyspark/ml/tests/test_persistence.py| 2 +- python/pyspark/ml/tests/test_tuning.py | 4 +- python/pyspark/ml/tests/test_wrapper.py| 6 +- python/pyspark/sql/tests/test_arrow.py | 28 +++--- python/pyspark/sql/tests/test_catalog.py | 56 +-- python/pyspark/sql/tests/test_column.py| 10 +- python/pyspark/sql/tests/test_conf.py | 2 +- python/pyspark/sql/tests/test_dataframe.py | 78 +++ python/pyspark/sql/tests/test_datasources.py | 10 +- python/pyspark/sql/tests/test_functions.py | 22 ++--- .../pyspark/sql/tests/test_pandas_cogrouped_map.py | 14 +-- .../pyspark/sql/tests/test_pandas_grouped_map.py | 32 +++--- python/pyspark/sql/tests/test_pandas_map.py| 8 +- python/pyspark/sql/tests/test_pandas_udf.py| 32 +++--- .../sql/tests/test_pandas_udf_grouped_agg.py | 16 +-- python/pyspark/sql/tests/test_pandas_udf_scalar.py | 108 ++--- .../pyspark/sql/tests/test_pandas_udf_typehints.py | 2 +- python/pyspark/sql/tests/test_pandas_udf_window.py | 6 +- python/pyspark/sql/tests/test_types.py | 24 ++--- python/pyspark/sql/tests/test_udf.py | 28 +++--- python/pyspark/sql/tests/test_utils.py | 15 ++- python/pyspark/tests/test_profiler.py | 4 +- python/pyspark/tests/test_rdd.py | 30 +++--- python/pyspark/tests/test_worker.py| 2 +- 27 files changed, 274 insertions(+), 275 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f5d2165 -> 596fbc1)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f5d2165 [SPARK-33440][CORE] Use current timestamp with warning log in HadoopFSDelegationTokenProvider when the issue date for token is not set up properly add 596fbc1 [SPARK-33556][ML] Add array_to_vector function for dataframe column No new revisions were added by this update. Summary of changes: .../main/scala/org/apache/spark/ml/functions.scala | 16 +- .../scala/org/apache/spark/ml/FunctionsSuite.scala | 18 ++-- python/docs/source/reference/pyspark.ml.rst| 1 + python/pyspark/ml/functions.py | 34 ++ python/pyspark/ml/functions.pyi| 2 ++ 5 files changed, 68 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33440][CORE] Use current timestamp with warning log in HadoopFSDelegationTokenProvider when the issue date for token is not set up properly
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 242581f [SPARK-33440][CORE] Use current timestamp with warning log in HadoopFSDelegationTokenProvider when the issue date for token is not set up properly 242581f is described below commit 242581f4926c994bfc5af388cae31645112b2798 Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Tue Dec 1 06:44:15 2020 +0900 [SPARK-33440][CORE] Use current timestamp with warning log in HadoopFSDelegationTokenProvider when the issue date for token is not set up properly ### What changes were proposed in this pull request? This PR proposes to use current timestamp with warning log when the issue date for token is not set up properly. The next section will explain the rationalization with details. ### Why are the changes needed? Unfortunately not every implementations respect the `issue date` in `AbstractDelegationTokenIdentifier`, which Spark relies on while calculating. The default value of issue date is 0L, which is far from actual issue date, breaking logic on calculating next renewal date under some circumstance, leading to 0 interval (immediate) on rescheduling token renewal. In HadoopFSDelegationTokenProvider, Spark calculates token renewal interval as below: https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala#L123-L134 The interval is calculated as `token.renew() - identifier.getIssueDate`, which is providing correct interval assuming both `token.renew()` and `identifier.getIssueDate` produce correct value, but it's going to be weird when `identifier.getIssueDate` provides 0L (default value), like below: ``` 20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 1603175657000 for token S3ADelegationToken/IDBroker 20/10/13 06:34:19 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400048 for token HDFS_DELEGATION_TOKEN ``` Hopefully we pick the minimum value as safety guard (so in this case, `86400048` is being picked up), but the safety guard leads unintentional bad impact on this case. https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala#L58-L71 Spark leverages the interval being calculated in above, "minimum" value of intervals, and blindly adds the value to token's issue date to calculates the next renewal date for the token, and picks "minimum" value again. In problematic case, the value would be `86400048` (86400048 + 0) which is quite smaller than current timestamp. https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala#L228-L234 The next renewal date is subtracted with current timestamp again to get the interval, and multiplexed by configured ratio to produce the final schedule interval. In problematic case, this value goes to negative. https://github.com/apache/spark/blob/2c64b731ae6a976b0d75a95901db849b4a0e2393/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala#L180-L188 There's a safety guard to not allow negative value, but that's simply 0 meaning schedule immediately. This triggers next calculation of next renewal date to calculate the schedule interval, lead to the same behavior, hence updating delegation token immediately and continuously. As we fetch token just before the calculation happens, the actual issue date is likely slightly before, hence it's not that dangerous to use current timestamp as issue date for the token the issue date has not been set up properly. Still, it's better not to leave the token implementation as it is, so we log warn message to let end users consult with token implementer. ### Does this PR introduce _any_ user-facing change? Yes. End users won't encounter the tight loop of schedule of token renewal after the PR. In end users' perspective of reflection, there's nothing end users need to change. ### How was this patch tested? Manually tested with problematic environment. Closes #30366 from HeartSaVioR/SPARK-33440. Authored-by: Jungtaek Lim (HeartSaVioR) Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit f5d2165c95fe83f24be9841807613950c1d5d6d0) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../security/HadoopDelegationTokenManager.scala| 4 +++- .../security/HadoopFSDelegationTokenProvider.scala | 27
[spark] branch master updated (c699435 -> f5d2165)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c699435 [SPARK-33545][CORE] Support Fallback Storage during Worker decommission add f5d2165 [SPARK-33440][CORE] Use current timestamp with warning log in HadoopFSDelegationTokenProvider when the issue date for token is not set up properly No new revisions were added by this update. Summary of changes: .../security/HadoopDelegationTokenManager.scala| 4 +++- .../security/HadoopFSDelegationTokenProvider.scala | 27 +++--- 2 files changed, 27 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3c2583 -> c699435)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3c2583 [SPARK-33185][YARN][FOLLOW-ON] Leverage RM's RPC API instead of REST to fetch driver log links in yarn.Client add c699435 [SPARK-33545][CORE] Support Fallback Storage during Worker decommission No new revisions were added by this update. Summary of changes: core/pom.xml | 41 .../main/scala/org/apache/spark/SparkContext.scala | 1 + .../org/apache/spark/internal/config/package.scala | 10 + .../spark/shuffle/IndexShuffleBlockResolver.scala | 2 +- .../org/apache/spark/storage/BlockManager.scala| 18 +- .../spark/storage/BlockManagerDecommissioner.scala | 3 + .../org/apache/spark/storage/FallbackStorage.scala | 174 + .../storage/ShuffleBlockFetcherIterator.scala | 3 +- .../spark/storage/FallbackStorageSuite.scala | 269 + 9 files changed, 517 insertions(+), 4 deletions(-) create mode 100644 core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala create mode 100644 core/src/test/scala/org/apache/spark/storage/FallbackStorageSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33185][YARN][FOLLOW-ON] Leverage RM's RPC API instead of REST to fetch driver log links in yarn.Client
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f3c2583 [SPARK-33185][YARN][FOLLOW-ON] Leverage RM's RPC API instead of REST to fetch driver log links in yarn.Client f3c2583 is described below commit f3c2583cc3ad6a2a24bfb09e2ee7af4e63e5bf66 Author: Erik Krogen AuthorDate: Mon Nov 30 14:40:51 2020 -0600 [SPARK-33185][YARN][FOLLOW-ON] Leverage RM's RPC API instead of REST to fetch driver log links in yarn.Client ### What changes were proposed in this pull request? This is a follow-on to PR #30096 which initially added support for printing direct links to the driver stdout/stderr logs from the application report output in `yarn.Client` using the `spark.yarn.includeDriverLogsLink` configuration. That PR made use of the ResourceManager's REST APIs to fetch the necessary information to construct the links. This PR proposes removing the dependency on the REST API, since the new logic is the only place in `yarn.Client` which makes use of this API, an [...] ### Why are the changes needed? While the old logic worked okay when running a Spark application in a "standard" environment with full access to Kerberos credentials, it can fail when run in an environment with restricted Kerberos credentials. In our case, this environment is represented by [Azkaban](https://azkaban.github.io/), but it likely affects other job scheduling systems as well. In such an environment, the application has delegation tokens which enabled it to communicate with services such as YARN, but the [...] Besides this enhancement, leveraging the `YarnClient` APIs greatly simplifies the processing logic, such as removing all JSON parsing. ### Does this PR introduce _any_ user-facing change? Very minimal user-facing changes on top of PR #30096. Basically expands the scope of environments in which that feature will operate correctly. ### How was this patch tested? In addition to redoing the `spark-submit` testing as mentioned in PR #30096, I also tested this logic in a restricted-credentials environment (Azkaban). It succeeds where the previous logic would fail with a 401 error. Closes #30450 from xkrogen/xkrogen-SPARK-33185-driverlogs-followon. Authored-by: Erik Krogen Signed-off-by: Mridul Muralidharan gmail.com> --- .../org/apache/spark/deploy/yarn/Client.scala | 67 -- .../org/apache/spark/deploy/yarn/ClientSuite.scala | 47 --- .../spark/deploy/yarn/YarnClusterSuite.scala | 31 ++ 3 files changed, 54 insertions(+), 91 deletions(-) diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala index 552167c..d252e83 100644 --- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala +++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala @@ -29,12 +29,8 @@ import scala.collection.immutable.{Map => IMap} import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, ListBuffer, Map} import scala.util.control.NonFatal -import com.fasterxml.jackson.databind.ObjectMapper import com.google.common.base.Objects import com.google.common.io.Files -import javax.ws.rs.client.ClientBuilder -import javax.ws.rs.core.MediaType -import javax.ws.rs.core.Response.Status.Family import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs._ import org.apache.hadoop.fs.permission.FsPermission @@ -51,7 +47,6 @@ import org.apache.hadoop.yarn.conf.YarnConfiguration import org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException import org.apache.hadoop.yarn.security.AMRMTokenIdentifier import org.apache.hadoop.yarn.util.Records -import org.apache.hadoop.yarn.webapp.util.WebAppUtils import org.apache.spark.{SecurityManager, SparkConf, SparkException} import org.apache.spark.api.python.PythonUtils @@ -1089,9 +1084,9 @@ private[spark] class Client( // If DEBUG is enabled, log report details every iteration // Otherwise, log them every time the application changes state if (log.isDebugEnabled) { - logDebug(formatReportDetails(report, getDriverLogsLink(report.getApplicationId))) + logDebug(formatReportDetails(report, getDriverLogsLink(report))) } else if (lastState != state) { - logInfo(formatReportDetails(report, getDriverLogsLink(report.getApplicationId))) + logInfo(formatReportDetails(report, getDriverLogsLink(report))) } } @@ -1192,33 +1187,31 @@ private[spark] class Client( } /** - * Fetch links to the logs of the driver for the given application ID. This requires hitting the - * RM
[spark] branch master updated (6fd148f -> 030b313)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6fd148f [SPARK-33569][SQL] Remove getting partitions by an identifier prefix add 030b313 [SPARK-33569][SPARK-33452][SQL][FOLLOWUP] Fix a build error in `ShowPartitionsExec` No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 7b68757 [SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` 7b68757 is described below commit 7b6875797537ee18e8721a9e7efc70996a3635a9 Author: Max Gekk AuthorDate: Mon Nov 30 08:39:31 2020 -0800 [SPARK-33588][SQL][2.4] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` ### What changes were proposed in this pull request? Perform partition spec normalization in `ShowTablesCommand` according to the table schema before getting partitions from the catalog. The normalization via `PartitioningUtils.normalizePartitionSpec()` adjusts the column names in partition specification, w.r.t. the real partition column names and case sensitivity. ### Why are the changes needed? Even when `spark.sql.caseSensitive` is `false` which is the default value, v1 `SHOW TABLE EXTENDED` is case sensitive: ```sql spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > partitioned by (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`'; ``` ### Does this PR introduce _any_ user-facing change? Yes. After the changes, the `SHOW TABLE EXTENDED` command respects the SQL config. And for example above, it returns correct result: ```sql spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); default tbl1false Partition Values: [year=2015, month=1] Location: file:/Users/maximgekk/spark-warehouse/tbl1/year=2015/month=1 Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Storage Properties: [serialization.format=1, path=file:/Users/maximgekk/spark-warehouse/tbl1] Partition Parameters: {transient_lastDdlTime=1606595118, totalSize=623, numFiles=1} Created Time: Sat Nov 28 23:25:18 MSK 2020 Last Access: UNKNOWN Partition Statistics: 623 bytes ``` ### How was this patch tested? By running the modified test suite via: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DDLSuite" ``` Closes #30551 from MaxGekk/show-table-case-sensitive-spec-2.4. Authored-by: Max Gekk Signed-off-by: Dongjoon Hyun --- .../apache/spark/sql/execution/command/tables.scala | 17 +++-- .../resources/sql-tests/results/show-tables.sql.out | 2 +- .../spark/sql/execution/command/DDLSuite.scala | 21 + 3 files changed, 33 insertions(+), 7 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index 1abbc72..4a75bcb 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -822,12 +822,17 @@ case class ShowTablesCommand( // // Note: tableIdentifierPattern should be non-empty, otherwise a [[ParseException]] // should have been thrown by the sql parser. - val tableIdent = TableIdentifier(tableIdentifierPattern.get, Some(db)) - val table = catalog.getTableMetadata(tableIdent).identifier - val partition = catalog.getPartition(tableIdent, partitionSpec.get) - val database = table.database.getOrElse("") - val tableName = table.table - val isTemp = catalog.isTemporaryTable(table) + val table = catalog.getTableMetadata(TableIdentifier(tableIdentifierPattern.get, Some(db))) + val tableIdent = table.identifier + val normalizedSpec = PartitioningUtils.normalizePartitionSpec( +partitionSpec.get, +table.partitionColumnNames, +tableIdent.quotedString, +sparkSession.sessionState.conf.resolver) + val partition = catalog.getPartition(tableIdent, normalizedSpec) + val database = tableIdent.database.getOrElse("") + val tableName = tableIdent.table + val isTemp = catalog.isTemporaryTable(tableIdent) val information = partition.simpleString Seq(Row(database, tableName, isTemp, s"$information\n")) } diff --git a/sql/core/src/test/resources/sql-tests/results/show-tables.sql.out
[spark] branch branch-3.0 updated: [SPARK-33588][SQL][3.0] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 03291c8 [SPARK-33588][SQL][3.0] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` 03291c8 is described below commit 03291c80c5b1aa2b18e53617676f36d40e01188f Author: Max Gekk AuthorDate: Mon Nov 30 08:37:13 2020 -0800 [SPARK-33588][SQL][3.0] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SHOW TABLE EXTENDED` ### What changes were proposed in this pull request? Perform partition spec normalization in `ShowTablesCommand` according to the table schema before getting partitions from the catalog. The normalization via `PartitioningUtils.normalizePartitionSpec()` adjusts the column names in partition specification, w.r.t. the real partition column names and case sensitivity. ### Why are the changes needed? Even when `spark.sql.caseSensitive` is `false` which is the default value, v1 `SHOW TABLE EXTENDED` is case sensitive: ```sql spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > partitioned by (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`'; ``` ### Does this PR introduce _any_ user-facing change? Yes. After the changes, the `SHOW TABLE EXTENDED` command respects the SQL config. And for example above, it returns correct result: ```sql spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); default tbl1false Partition Values: [year=2015, month=1] Location: file:/Users/maximgekk/spark-warehouse/tbl1/year=2015/month=1 Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat Storage Properties: [serialization.format=1, path=file:/Users/maximgekk/spark-warehouse/tbl1] Partition Parameters: {transient_lastDdlTime=1606595118, totalSize=623, numFiles=1} Created Time: Sat Nov 28 23:25:18 MSK 2020 Last Access: UNKNOWN Partition Statistics: 623 bytes ``` ### How was this patch tested? By running the modified test suite via: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *DDLSuite" ``` Authored-by: Max Gekk Signed-off-by: Dongjoon Hyun (cherry picked from commit 0054fc937f804660c6501d9d3f6319f3047a68f8) Signed-off-by: Max Gekk Closes #30549 from MaxGekk/show-table-case-sensitive-spec-3.0. Authored-by: Max Gekk Signed-off-by: Dongjoon Hyun --- .../spark/sql/execution/command/tables.scala | 17 +++-- .../sql-tests/results/show-tables.sql.out | 2 +- .../spark/sql/execution/command/DDLSuite.scala | 22 ++ 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index fc8cc11..75e0d2c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -884,12 +884,17 @@ case class ShowTablesCommand( // // Note: tableIdentifierPattern should be non-empty, otherwise a [[ParseException]] // should have been thrown by the sql parser. - val tableIdent = TableIdentifier(tableIdentifierPattern.get, Some(db)) - val table = catalog.getTableMetadata(tableIdent).identifier - val partition = catalog.getPartition(tableIdent, partitionSpec.get) - val database = table.database.getOrElse("") - val tableName = table.table - val isTemp = catalog.isTemporaryTable(table) + val table = catalog.getTableMetadata(TableIdentifier(tableIdentifierPattern.get, Some(db))) + val tableIdent = table.identifier + val normalizedSpec = PartitioningUtils.normalizePartitionSpec( +partitionSpec.get, +table.partitionColumnNames, +tableIdent.quotedString, +sparkSession.sessionState.conf.resolver) + val partition = catalog.getPartition(tableIdent, normalizedSpec) + val database = tableIdent.database.getOrElse("") + val tableName = tableIdent.table + val isTemp = catalog.isTemporaryTable(tableIdent) val information = partition.simpleString
[spark] branch master updated (0a612b6 -> 6fd148f)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0a612b6 [SPARK-33452][SQL] Support v2 SHOW PARTITIONS add 6fd148f [SPARK-33569][SQL] Remove getting partitions by an identifier prefix No new revisions were added by this update. Summary of changes: .../catalog/SupportsPartitionManagement.java | 15 +++ .../sql/connector/InMemoryPartitionTable.scala | 10 + .../SupportsAtomicPartitionManagementSuite.scala | 28 +++-- .../catalog/SupportsPartitionManagementSuite.scala | 48 -- .../connector/AlterTablePartitionV2SQLSuite.scala | 6 ++- 5 files changed, 52 insertions(+), 55 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33579][UI] Fix executor blank page behind proxy
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new f6638cf [SPARK-33579][UI] Fix executor blank page behind proxy f6638cf is described below commit f6638cfd624ee3b31f68e3d2b539dbae351730a4 Author: Pascal Gillet AuthorDate: Mon Nov 30 19:31:42 2020 +0900 [SPARK-33579][UI] Fix executor blank page behind proxy ### What changes were proposed in this pull request? Fix some "hardcoded" API urls in Web UI. More specifically, we avoid the use of `location.origin` when constructing URLs for internal API calls within the JavaScript. Instead, we use `apiRoot` global variable. ### Why are the changes needed? On one hand, it allows us to build relative URLs. On the other hand, `apiRoot` reflects the Spark property `spark.ui.proxyBase` which can be set to change the root path of the Web UI. If `spark.ui.proxyBase` is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). See the following link for more context: https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115 ### Does this PR introduce _any_ user-facing change? Yes, as all the changes introduced are in the JavaScript for the Web UI. ### How the changes have been tested ? I modified/debugged the JavaScript as in the commit with the help of the developer tools in Google Chrome, while accessing the Web UI of my Spark app behind my k8s ingress. Closes #30523 from pgillet/fix-executors-blank-page-behind-proxy. Authored-by: Pascal Gillet Signed-off-by: Kousuke Saruta (cherry picked from commit 6e5446e61f278e9afac342e8f33905f5630aa7d5) Signed-off-by: Kousuke Saruta --- core/src/main/resources/org/apache/spark/ui/static/stagepage.js | 2 +- core/src/main/resources/org/apache/spark/ui/static/utils.js | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js index ee2b7b3..b296495 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js @@ -70,7 +70,7 @@ function stageEndPoint(appId) { return newBaseURI + "/api/v1/applications/" + appId + "/" + appAttemptId + "/stages/" + stageId; } } -return location.origin + "/api/v1/applications/" + appId + "/stages/" + stageId; +return uiRoot + "/api/v1/applications/" + appId + "/stages/" + stageId; } function getColumnNameForTaskMetricSummary(columnKey) { diff --git a/core/src/main/resources/org/apache/spark/ui/static/utils.js b/core/src/main/resources/org/apache/spark/ui/static/utils.js index 2e46111..d15d003 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/utils.js +++ b/core/src/main/resources/org/apache/spark/ui/static/utils.js @@ -105,7 +105,7 @@ function getStandAloneAppId(cb) { } // Looks like Web UI is running in standalone mode // Let's get application-id using REST End Point - $.getJSON(location.origin + "/api/v1/applications", function(response, status, jqXHR) { + $.getJSON(uiRoot + "/api/v1/applications", function(response, status, jqXHR) { if (response && response.length > 0) { var appId = response[0].id; cb(appId); @@ -152,7 +152,7 @@ function createTemplateURI(appId, templateName) { var baseURI = words.slice(0, ind).join('/') + '/static/' + templateName + '-template.html'; return baseURI; } - return location.origin + "/static/" + templateName + "-template.html"; + return uiRoot + "/static/" + templateName + "-template.html"; } function setDataTableDefaults() { @@ -190,5 +190,5 @@ function createRESTEndPointForExecutorsPage(appId) { return newBaseURI + "/api/v1/applications/" + appId + "/" + attemptId + "/allexecutors"; } } -return location.origin + "/api/v1/applications/" + appId + "/allexecutors"; +return uiRoot + "/api/v1/applications/" + appId + "/allexecutors"; } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33579][UI] Fix executor blank page behind proxy
This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6e5446e [SPARK-33579][UI] Fix executor blank page behind proxy 6e5446e is described below commit 6e5446e61f278e9afac342e8f33905f5630aa7d5 Author: Pascal Gillet AuthorDate: Mon Nov 30 19:31:42 2020 +0900 [SPARK-33579][UI] Fix executor blank page behind proxy ### What changes were proposed in this pull request? Fix some "hardcoded" API urls in Web UI. More specifically, we avoid the use of `location.origin` when constructing URLs for internal API calls within the JavaScript. Instead, we use `apiRoot` global variable. ### Why are the changes needed? On one hand, it allows us to build relative URLs. On the other hand, `apiRoot` reflects the Spark property `spark.ui.proxyBase` which can be set to change the root path of the Web UI. If `spark.ui.proxyBase` is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). See the following link for more context: https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115 ### Does this PR introduce _any_ user-facing change? Yes, as all the changes introduced are in the JavaScript for the Web UI. ### How the changes have been tested ? I modified/debugged the JavaScript as in the commit with the help of the developer tools in Google Chrome, while accessing the Web UI of my Spark app behind my k8s ingress. Closes #30523 from pgillet/fix-executors-blank-page-behind-proxy. Authored-by: Pascal Gillet Signed-off-by: Kousuke Saruta --- core/src/main/resources/org/apache/spark/ui/static/stagepage.js | 2 +- core/src/main/resources/org/apache/spark/ui/static/utils.js | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js index ee11158..2877aa8 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js @@ -70,7 +70,7 @@ function stageEndPoint(appId) { return newBaseURI + "/api/v1/applications/" + appId + "/" + appAttemptId + "/stages/" + stageId; } } -return location.origin + "/api/v1/applications/" + appId + "/stages/" + stageId; +return uiRoot + "/api/v1/applications/" + appId + "/stages/" + stageId; } function getColumnNameForTaskMetricSummary(columnKey) { diff --git a/core/src/main/resources/org/apache/spark/ui/static/utils.js b/core/src/main/resources/org/apache/spark/ui/static/utils.js index 7e6dd67..f4914f0 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/utils.js +++ b/core/src/main/resources/org/apache/spark/ui/static/utils.js @@ -105,7 +105,7 @@ function getStandAloneAppId(cb) { } // Looks like Web UI is running in standalone mode // Let's get application-id using REST End Point - $.getJSON(location.origin + "/api/v1/applications", function(response, status, jqXHR) { + $.getJSON(uiRoot + "/api/v1/applications", function(response, status, jqXHR) { if (response && response.length > 0) { var appId = response[0].id; cb(appId); @@ -152,7 +152,7 @@ function createTemplateURI(appId, templateName) { var baseURI = words.slice(0, ind).join('/') + '/static/' + templateName + '-template.html'; return baseURI; } - return location.origin + "/static/" + templateName + "-template.html"; + return uiRoot + "/static/" + templateName + "-template.html"; } function setDataTableDefaults() { @@ -193,5 +193,5 @@ function createRESTEndPointForExecutorsPage(appId) { return newBaseURI + "/api/v1/applications/" + appId + "/" + attemptId + "/allexecutors"; } } -return location.origin + "/api/v1/applications/" + appId + "/allexecutors"; +return uiRoot + "/api/v1/applications/" + appId + "/allexecutors"; } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b665d58 -> 5cfbddd)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b665d58 [SPARK-28646][SQL] Fix bug of Count so as consistent with mainstream databases add 5cfbddd [SPARK-33480][SQL] Support char/varchar type No new revisions were added by this update. Summary of changes: docs/sql-ref-datatypes.md | 2 + .../spark/sql/catalyst/analysis/Analyzer.scala | 9 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 6 +- .../sql/catalyst/analysis/ResolveCatalogs.scala| 5 - .../catalyst/analysis/ResolvePartitionSpec.scala | 4 +- .../catalyst/analysis/TableOutputResolver.scala| 19 +- .../sql/catalyst/catalog/SessionCatalog.scala | 7 +- .../spark/sql/catalyst/parser/AstBuilder.scala | 17 +- .../sql/catalyst/plans/logical/v2Commands.scala| 4 +- .../spark/sql/catalyst/util/CharVarcharUtils.scala | 276 +++ .../sql/connector/catalog/CatalogV2Util.scala | 18 +- .../datasources/v2/DataSourceV2Relation.scala | 8 +- .../sql/types/{StringType.scala => CharType.scala} | 33 +- .../org/apache/spark/sql/types/DataType.scala | 10 +- .../apache/spark/sql/types/HiveStringType.scala| 81 .../types/{StringType.scala => VarcharType.scala} | 34 +- .../scala/org/apache/spark/sql/types/package.scala | 10 +- .../sql/catalyst/analysis/AnalysisSuite.scala | 18 +- .../catalyst/parser/TableSchemaParserSuite.scala | 15 +- .../apache/spark/sql/connector/InMemoryTable.scala | 15 +- .../sql/connector/catalog/CatalogV2UtilSuite.scala | 2 +- .../main/scala/org/apache/spark/sql/Column.scala | 6 +- .../org/apache/spark/sql/DataFrameReader.scala | 4 +- .../catalyst/analysis/ResolveSessionCatalog.scala | 37 +- .../datasources/ApplyCharTypePadding.scala | 135 ++ .../execution/datasources/LogicalRelation.scala| 18 +- .../sql/execution/datasources/jdbc/JdbcUtils.scala | 19 +- .../execution/datasources/v2/PushDownUtils.scala | 4 +- .../sql/internal/BaseSessionStateBuilder.scala | 1 + .../spark/sql/streaming/DataStreamReader.scala | 4 +- .../apache/spark/sql/CharVarcharTestSuite.scala| 505 + .../execution/command/PlanResolutionSuite.scala| 44 +- .../apache/spark/sql/sources/TableScanSuite.scala | 14 +- .../spark/sql/hive/HiveSessionStateBuilder.scala | 1 + .../spark/sql/hive/client/HiveClientImpl.scala | 19 +- ...tSuite.scala => HiveCharVarcharTestSuite.scala} | 24 +- .../spark/sql/hive/HiveMetastoreCatalogSuite.scala | 15 +- .../spark/sql/hive/execution/HiveDDLSuite.scala| 4 +- 38 files changed, 1093 insertions(+), 354 deletions(-) create mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala copy sql/catalyst/src/main/scala/org/apache/spark/sql/types/{StringType.scala => CharType.scala} (60%) delete mode 100644 sql/catalyst/src/main/scala/org/apache/spark/sql/types/HiveStringType.scala copy sql/catalyst/src/main/scala/org/apache/spark/sql/types/{StringType.scala => VarcharType.scala} (60%) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ApplyCharTypePadding.scala create mode 100644 sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala copy sql/hive/src/test/scala/org/apache/spark/sql/{hive/HiveSQLInsertTestSuite.scala => HiveCharVarcharTestSuite.scala} (56%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (225c2e2 -> b665d58)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 225c2e2 [SPARK-33498][SQL][FOLLOW-UP] Deduplicate the unittest by using checkCastWithParseError add b665d58 [SPARK-28646][SQL] Fix bug of Count so as consistent with mainstream databases No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/expressions/aggregate/Count.scala| 10 ++ sql/core/src/test/resources/sql-tests/inputs/count.sql | 3 +++ sql/core/src/test/resources/sql-tests/results/count.sql.out | 13 +++-- 3 files changed, 24 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org