[spark] branch master updated (34286ae -> ba0a479)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 34286ae [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9 add ba0a479 [SPARK-35961][SQL] Only use local shuffle reader when REBALANCE_PARTITIONS_BY_NONE without CustomShuffleReaderExec No new revisions were added by this update. Summary of changes: .../sql/execution/adaptive/OptimizeLocalShuffleReader.scala | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 34286ae [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9 34286ae is described below commit 34286ae5bf120544524a6cfe934755e20eb9f835 Author: Holden Karau AuthorDate: Wed Jun 30 21:39:12 2021 -0700 [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9 ### What changes were proposed in this pull request? Bump the scalatest version to 3.2.9 ### Why are the changes needed? With the scalatestplus change to 3.2.9.0, recent sbt fails to handle the mismatch between scalatest and scalatestplus and resolve resulting in test:compile errors of not being able to find the org.scalatest package. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? sbt tags/test:compile failed before and passes with this change. Closes #33163 from holdenk/SPARK-35960-test-compile-sbt-issue. Authored-by: Holden Karau Signed-off-by: Dongjoon Hyun --- pom.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pom.xml b/pom.xml index 347c777..48307fd 100644 --- a/pom.xml +++ b/pom.xml @@ -937,7 +937,7 @@ org.scalatest scalatest_${scala.binary.version} -3.2.3 +3.2.9 test - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dc85b0b [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page dc85b0b is described below commit dc85b0b51a02b9d6c52ffb1600f26ccdd7d7829a Author: Kevin Su AuthorDate: Thu Jul 1 12:32:54 2021 +0800 [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page ### What changes were proposed in this pull request? Update the executor's page, so it can successfully hide the "Exec Loss Reason" column. ### Why are the changes needed? When unselected the checkbox "Exec Loss Reason" on the executor page, the "Active tasks" column disappears instead of the "Exec Loss Reason" column. Before: ![Screenshot from 2021-06-30 15-55-05](https://user-images.githubusercontent.com/37936015/123930908-bd6f4180-d9c2-11eb-9aba-bbfe0a237776.png) After: ![Screenshot from 2021-06-30 22-21-38](https://user-images.githubusercontent.com/37936015/123977632-bf042e00-d9f1-11eb-910e-93d615d2db47.png) ### Does this PR introduce _any_ user-facing change? Yes, The Web UI is updated. ### How was this patch tested? Pass the CIs. Closes #33155 from pingsutw/SPARK-35950. Lead-authored-by: Kevin Su Co-authored-by: Kevin Su Signed-off-by: Gengliang Wang --- .../src/main/resources/org/apache/spark/ui/static/executorspage.js | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js index ab412a8..b7fbe04 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js +++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js @@ -140,7 +140,7 @@ function totalDurationColor(totalGCTime, totalDuration) { } var sumOptionalColumns = [3, 4]; -var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 15]; +var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 25]; var execDataTable; var sumDataTable; @@ -566,7 +566,8 @@ $(document).ready(function () { {"visible": false, "targets": 9}, {"visible": false, "targets": 10}, {"visible": false, "targets": 13}, -{"visible": false, "targets": 14} +{"visible": false, "targets": 14}, +{"visible": false, "targets": 25} ], "deferRender": true }; @@ -721,7 +722,7 @@ $(document).ready(function () { " Peak Pool Memory Direct / Mapped" + " Resources" + " Resource Profile Id" + - " Exec Loss Reason" + + " Exec Loss Reason" + ""); reselectCheckboxesBasedOnTaskTableState(); - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (6a1361c -> 845e750)
This is an automated email from the ASF dual-hosted git repository. wuyi pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 6a1361c [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR TABLE` on table refreshing add 845e750 [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher to avoid the duplicate System.exit No new revisions were added by this update. Summary of changes: .../apache/spark/deploy/worker/WorkerWatcher.scala | 17 ++- .../executor/CoarseGrainedExecutorBackend.scala| 33 +- 2 files changed, 27 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated: [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher to avoid the duplicate System.exit
This is an automated email from the ASF dual-hosted git repository. wuyi pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 9229eb9 [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher to avoid the duplicate System.exit 9229eb9 is described below commit 9229eb9a94cfcfd8c030b7dff4ab81227fffc742 Author: yi.wu AuthorDate: Thu Jul 1 11:40:00 2021 +0800 [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher to avoid the duplicate System.exit ### What changes were proposed in this pull request? This PR proposes to let `WorkerWatcher` reuse the `stopping` flag in `CoarseGrainedExecutorBackend` to avoid the duplicate call of `System.exit`. ### Why are the changes needed? As a followup of https://github.com/apache/spark/pull/32868, this PR tries to give a more robust fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass existing tests. Closes #33028 from Ngone51/spark-35714-followup. Lead-authored-by: yi.wu Co-authored-by: wuyi Signed-off-by: yi.wu (cherry picked from commit 868a59470650cc12272de0d0b04c6d98b1fe076d) Signed-off-by: yi.wu --- .../apache/spark/deploy/worker/WorkerWatcher.scala | 17 ++- .../executor/CoarseGrainedExecutorBackend.scala| 33 +- 2 files changed, 27 insertions(+), 23 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala b/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala index 43ec492..efffc9f 100644 --- a/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala +++ b/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala @@ -17,13 +17,10 @@ package org.apache.spark.deploy.worker -import scala.concurrent.ExecutionContext.Implicits.global -import scala.concurrent.Future -import scala.concurrent.duration._ +import java.util.concurrent.atomic.AtomicBoolean import org.apache.spark.internal.Logging import org.apache.spark.rpc._ -import org.apache.spark.util.ThreadUtils /** * Endpoint which connects to a worker process and terminates the JVM if the @@ -31,7 +28,10 @@ import org.apache.spark.util.ThreadUtils * Provides fate sharing between a worker and its associated child processes. */ private[spark] class WorkerWatcher( -override val rpcEnv: RpcEnv, workerUrl: String, isTesting: Boolean = false) +override val rpcEnv: RpcEnv, +workerUrl: String, +isTesting: Boolean = false, +isChildProcessStopping: AtomicBoolean = new AtomicBoolean(false)) extends RpcEndpoint with Logging { logInfo(s"Connecting to worker $workerUrl") @@ -53,10 +53,9 @@ private[spark] class WorkerWatcher( private def exitNonZero() = if (isTesting) { isShutDown = true -} else { - ThreadUtils.awaitResult(Future { -System.exit(-1) - }, 5.seconds) +} else if (isChildProcessStopping.compareAndSet(false, true)) { + // SPARK-35714: avoid the duplicate call of `System.exit` to avoid the dead lock + System.exit(-1) } override def receive: PartialFunction[Any, Unit] = { diff --git a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala index d607ee8..95237c9 100644 --- a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala +++ b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala @@ -60,7 +60,7 @@ private[spark] class CoarseGrainedExecutorBackend( private implicit val formats = DefaultFormats - private[this] val stopping = new AtomicBoolean(false) + private[executor] val stopping = new AtomicBoolean(false) var executor: Executor = null @volatile var driver: Option[RpcEndpointRef] = None @@ -261,18 +261,22 @@ private[spark] class CoarseGrainedExecutorBackend( reason: String, throwable: Throwable = null, notifyDriver: Boolean = true) = { -val message = "Executor self-exiting due to : " + reason -if (throwable != null) { - logError(message, throwable) -} else { - logError(message) -} +if (stopping.compareAndSet(false, true)) { + val message = "Executor self-exiting due to : " + reason + if (throwable != null) { +logError(message, throwable) + } else { +logError(message) + } -if (notifyDriver && driver.nonEmpty) { - driver.get.send(RemoveExecutor(executorId, new ExecutorLossReason(reason))) -} + if (notifyDriver && driver.nonEmpty) { +driver.get.send(RemoveExecutor(executorId, new
[spark] branch master updated (5d74ace -> 868a594)
This is an automated email from the ASF dual-hosted git repository. wuyi pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d74ace [SPARK-35065][SQL] Group exception messages in spark/sql (core) add 868a594 [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher to avoid the duplicate System.exit No new revisions were added by this update. Summary of changes: .../apache/spark/deploy/worker/WorkerWatcher.scala | 17 ++- .../executor/CoarseGrainedExecutorBackend.scala| 33 +- 2 files changed, 27 insertions(+), 23 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cd6a463 -> 5d74ace)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cd6a463 [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all the shuffles add 5d74ace [SPARK-35065][SQL] Group exception messages in spark/sql (core) No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/ResolveUnion.scala | 2 +- .../spark/sql/errors/QueryCompilationErrors.scala | 300 - .../spark/sql/errors/QueryExecutionErrors.scala| 254 - .../apache/spark/sql/DataFrameNaFunctions.scala| 7 +- .../org/apache/spark/sql/DataFrameReader.scala | 12 +- .../org/apache/spark/sql/DataFrameWriter.scala | 42 ++- .../org/apache/spark/sql/DataFrameWriterV2.scala | 3 +- .../main/scala/org/apache/spark/sql/Dataset.scala | 28 +- .../spark/sql/RelationalGroupedDataset.scala | 17 +- .../scala/org/apache/spark/sql/RuntimeConfig.scala | 5 +- .../scala/org/apache/spark/sql/SparkSession.scala | 3 +- .../org/apache/spark/sql/UDFRegistration.scala | 155 --- .../sql/execution/AggregatingAccumulator.scala | 5 +- .../execution/BaseScriptTransformationExec.scala | 9 +- .../org/apache/spark/sql/execution/Columnar.scala | 4 +- .../ExternalAppendOnlyUnsafeRowArray.scala | 12 +- .../sql/execution/OptimizeMetadataOnlyQuery.scala | 5 +- .../org/apache/spark/sql/execution/SparkPlan.scala | 3 +- .../execution/SubqueryAdaptiveBroadcastExec.scala | 4 +- .../sql/execution/SubqueryBroadcastExec.scala | 4 +- .../execution/adaptive/AdaptiveSparkPlanExec.scala | 5 +- .../sql/execution/adaptive/QueryStageExec.scala| 2 +- .../columnar/compression/CompressionScheme.scala | 5 +- .../sql/execution/datasources/FileScanRDD.scala| 12 +- .../execution/datasources/SchemaMergeUtils.scala | 7 +- .../execution/datasources/csv/CSVFileFormat.scala | 14 +- .../spark/sql/execution/datasources/ddl.scala | 7 +- .../execution/datasources/jdbc/DriverWrapper.scala | 7 +- .../execution/datasources/jdbc/JDBCRelation.scala | 12 +- .../datasources/jdbc/JdbcRelationProvider.scala| 7 +- .../datasources/json/JsonFileFormat.scala | 14 +- .../execution/datasources/orc/OrcFileFormat.scala | 7 +- .../datasources/parquet/ParquetRowConverter.scala | 17 +- .../sql/execution/datasources/pathFilters.scala| 6 +- .../datasources/text/TextFileFormat.scala | 6 +- .../datasources/v2/AddPartitionExec.scala | 5 +- .../datasources/v2/DataSourceV2Utils.scala | 4 +- .../datasources/v2/DropPartitionExec.scala | 5 +- .../sql/execution/datasources/v2/FileScan.scala| 7 +- .../sql/execution/datasources/v2/FileTable.scala | 9 +- .../sql/execution/datasources/v2/FileWrite.scala | 6 +- .../datasources/v2/TruncatePartitionExec.scala | 4 +- .../sql/execution/datasources/v2/V2Writes.scala| 9 +- .../sql/execution/datasources/v2/csv/CSVScan.scala | 14 +- .../execution/datasources/v2/json/JsonScan.scala | 14 +- .../execution/datasources/v2/text/TextWrite.scala | 5 +- .../execution/exchange/BroadcastExchangeExec.scala | 25 +- .../spark/sql/execution/metric/SQLMetrics.scala| 5 +- .../streaming/CompactibleFileStreamLog.scala | 5 +- .../sql/execution/streaming/FileStreamSource.scala | 7 +- .../execution/streaming/ResolveWriteToStream.scala | 31 +-- .../sql/execution/streaming/StreamMetadata.scala | 6 +- .../streaming/continuous/ContinuousExecution.scala | 4 +- .../continuous/ContinuousQueuedDataReader.scala| 10 +- .../WriteToContinuousDataSourceExec.scala | 4 +- .../streaming/sources/ForeachWriterTable.scala | 4 +- .../sources/RateStreamMicroBatchStream.scala | 7 +- .../sources/TextSocketSourceProvider.scala | 7 +- .../state/HDFSBackedStateStoreProvider.scala | 12 +- .../apache/spark/sql/expressions/WindowSpec.scala | 7 +- .../scala/org/apache/spark/sql/functions.scala | 16 +- .../apache/spark/sql/internal/CatalogImpl.scala| 10 +- .../apache/spark/sql/internal/SharedState.scala| 8 +- .../org/apache/spark/sql/jdbc/DerbyDialect.scala | 5 +- .../org/apache/spark/sql/jdbc/JdbcDialects.scala | 3 +- .../apache/spark/sql/jdbc/MsSqlServerDialect.scala | 6 +- .../org/apache/spark/sql/jdbc/MySQLDialect.scala | 8 +- .../spark/sql/streaming/DataStreamReader.scala | 2 +- .../spark/sql/util/QueryExecutionListener.scala| 7 +- 69 files changed, 826 insertions(+), 457 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all the shuffles
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cd6a463 [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all the shuffles cd6a463 is described below commit cd6a4638110ef3f0db8b6366be680870dfb0bcad Author: Wenchen Fan AuthorDate: Thu Jul 1 01:43:11 2021 + [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all the shuffles ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/33079, to fix a bug in corner cases: `ShufflePartitionsUtil.coalescePartitions` should either return the shuffle spec for all the shuffles, or none. If the input RDD has no partition, the `mapOutputStatistics` is None, and we should still return shuffle specs with size 0. ### Why are the changes needed? bug fix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? a new test Closes #33158 from cloud-fan/bug. Authored-by: Wenchen Fan Signed-off-by: Wenchen Fan --- .../execution/adaptive/ShufflePartitionsUtil.scala | 43 -- .../adaptive/AdaptiveQueryExecSuite.scala | 24 2 files changed, 48 insertions(+), 19 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala index a1f2d91..1353dc9 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala @@ -96,9 +96,8 @@ object ShufflePartitionsUtil extends Logging { val numPartitions = validMetrics.head.bytesByPartitionId.length val newPartitionSpecs = coalescePartitions(0, numPartitions, validMetrics, targetSize) -assert(newPartitionSpecs.length == validMetrics.length) -if (newPartitionSpecs.head.length < numPartitions) { - newPartitionSpecs +if (newPartitionSpecs.length < numPartitions) { + attachDataSize(mapOutputStatistics, newPartitionSpecs) } else { Seq.empty } @@ -148,7 +147,8 @@ object ShufflePartitionsUtil extends Logging { if (i - 1 > start) { val partitionSpecs = coalescePartitions( partitionIndices(start), repeatValue, validMetrics, targetSize, true) - newPartitionSpecsSeq.zip(partitionSpecs).foreach(spec => spec._1 ++= spec._2) + newPartitionSpecsSeq.zip(attachDataSize(mapOutputStatistics, partitionSpecs)) +.foreach(spec => spec._1 ++= spec._2) } // find the end of this skew section, skipping partition(i - 1) and partition(i). var repeatIndex = i + 1 @@ -173,7 +173,8 @@ object ShufflePartitionsUtil extends Logging { if (numPartitions > start) { val partitionSpecs = coalescePartitions( partitionIndices(start), partitionIndices.last + 1, validMetrics, targetSize, true) - newPartitionSpecsSeq.zip(partitionSpecs).foreach(spec => spec._1 ++= spec._2) + newPartitionSpecsSeq.zip(attachDataSize(mapOutputStatistics, partitionSpecs)) +.foreach(spec => spec._1 ++= spec._2) } // only return coalesced result if any coalescing has happened. if (newPartitionSpecsSeq.head.length < numPartitions) { @@ -204,19 +205,17 @@ object ShufflePartitionsUtil extends Logging { * - coalesced partition 2: shuffle partition 2 (size 170 MiB) * - coalesced partition 3: shuffle partition 3 and 4 (size 50 MiB) * - * @return A sequence of sequence of [[CoalescedPartitionSpec]]s. which each inner sequence as - * the new partition specs for its corresponding shuffle after coalescing. For example, - * if partitions [0, 1, 2, 3, 4] and partition bytes [10, 10, 100, 10, 20] with - * targetSize 100, split at indices [0, 2, 3], the returned partition specs will be: - * CoalescedPartitionSpec(0, 2, 20), CoalescedPartitionSpec(2, 3, 100) and - * CoalescedPartitionSpec(3, 5, 30). + * @return A sequence of [[CoalescedPartitionSpec]]s. For example, if partitions [0, 1, 2, 3, 4] + * split at indices [0, 2, 3], the returned partition specs will be: + * CoalescedPartitionSpec(0, 2), CoalescedPartitionSpec(2, 3) and + * CoalescedPartitionSpec(3, 5). */ private def coalescePartitions( start: Int, end: Int, mapOutputStatistics: Seq[MapOutputStatistics], targetSize: Long, - allowReturnEmpty: Boolean = false): Seq[Seq[CoalescedPartitionSpec]] = { + allowReturnEmpty: Boolean = false): Seq[CoalescedPartitionSpec] = {
[spark] branch master updated (5ad1261 -> a98c8ae)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5ad1261 [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6 add a98c8ae [SPARK-35944][PYTHON] Introduce Name and Label type aliases No new revisions were added by this update. Summary of changes: python/pyspark/pandas/_typing.py | 6 +- python/pyspark/pandas/accessors.py | 4 +- python/pyspark/pandas/base.py| 4 +- python/pyspark/pandas/frame.py | 218 ++- python/pyspark/pandas/generic.py | 34 +++-- python/pyspark/pandas/groupby.py | 38 +++--- python/pyspark/pandas/indexes/base.py| 52 python/pyspark/pandas/indexes/multi.py | 34 +++-- python/pyspark/pandas/indexes/numeric.py | 8 +- python/pyspark/pandas/indexing.py| 128 +- python/pyspark/pandas/internal.py| 61 - python/pyspark/pandas/ml.py | 3 +- python/pyspark/pandas/mlflow.py | 5 +- python/pyspark/pandas/namespace.py | 20 +-- python/pyspark/pandas/series.py | 52 python/pyspark/pandas/utils.py | 38 +++--- 16 files changed, 343 insertions(+), 362 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5ad1261 [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6 5ad1261 is described below commit 5ad12611ec23d02b5988b92561b70dbeacf6af78 Author: Xinrong Meng AuthorDate: Thu Jul 1 09:32:25 2021 +0900 [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6 ### What changes were proposed in this pull request? Add deprecation warning for Python 3.6. ### Why are the changes needed? According to https://endoflife.date/python, Python 3.6 will be EOL on 23 Dec, 2021. We should prepare for the deprecation of Python 3.6 support in Spark in advance. ### Does this PR introduce _any_ user-facing change? N/A. ### How was this patch tested? Manual tests. Closes #33139 from xinrong-databricks/deprecate3.6_warn. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon --- python/pyspark/context.py | 8 1 file changed, 8 insertions(+) diff --git a/python/pyspark/context.py b/python/pyspark/context.py index 6c6a538..6c94106 100644 --- a/python/pyspark/context.py +++ b/python/pyspark/context.py @@ -230,6 +230,14 @@ class SparkContext(object): self.pythonExec = os.environ.get("PYSPARK_PYTHON", 'python3') self.pythonVer = "%d.%d" % sys.version_info[:2] +if sys.version_info[:2] < (3, 7): +with warnings.catch_warnings(): +warnings.simplefilter("once") +warnings.warn( +"Python 3.6 support is deprecated in Spark 3.2.", +FutureWarning +) + # Broadcast's __reduce__ method stores Broadcast instances here. # This allows other code to determine which Broadcast instances have # been pickled, so it can determine which Java broadcast objects to - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a5c8866 -> 9e39415)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a5c8866 [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader add 9e39415 [SPARK-35939][DOCS][PYTHON] Deprecate Python 3.6 in Spark documentation No new revisions were added by this update. Summary of changes: docs/index.md | 1 + docs/rdd-programming-guide.md | 1 + 2 files changed, 2 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #349: Remove Apache Spark 2.4 from download page
dongjoon-hyun commented on pull request #349: URL: https://github.com/apache/spark-website/pull/349#issuecomment-871801270 Thank you, @maropu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] maropu commented on pull request #349: Remove Apache Spark 2.4 from download page
maropu commented on pull request #349: URL: https://github.com/apache/spark-website/pull/349#issuecomment-871800188 late lgtm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a5c8866 [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader a5c8866 is described below commit a5c886619dd1573e96bbba058db099b47f0c147c Author: Chao Sun AuthorDate: Wed Jun 30 14:21:18 2021 -0700 [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader ### What changes were proposed in this pull request? Make the current vectorized Parquet reader to work with column index introduced in Parquet 1.11. In particular, this PR makes the following changes: 1. in `ParquetReadState`, track row ranges returned via `PageReadStore.getRowIndexes` as well as the first row index for each page via `DataPage.getFirstRowIndex`. 1. introduced a new API `ParquetVectorUpdater.skipValues` which skips a batch of values from a Parquet value reader. As part of the process also renamed existing `updateBatch` to `readValues`, and `update` to `readValue` to keep the method names consistent. 1. in correspondence as above, also introduced new API `VectorizedValuesReader.skipXXX` for different data types, as well as the implementations. These are useful when the reader knows that the given batch of values can be skipped, for instance, due to the batch is not covered in the row ranges generated by column index filtering. 2. changed `VectorizedRleValuesReader` to handle column index filtering. This is done by comparing the range that is going to be read next within the current RLE/PACKED block (let's call this block range), against the current row range. There are three cases: * if the block range is before the current row range, skip all the values in the block range * if the block range is after the current row range, advance the row range and repeat the steps * if the block range overlaps with the current row range, only read the values within the overlapping area and skip the rest. ### Why are the changes needed? [Parquet Column Index](https://github.com/apache/parquet-format/blob/master/PageIndex.md) is a new feature in Parquet 1.11 which allows very efficient filtering on page level (some benchmark numbers can be found [here](https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/)), especially when data is sorted. The feature is largely implemented in parquet-mr (via classes such as `ColumnIndex` and `ColumnIndexFilter`). In Spark, the non-vectorized Parquet reader c [...] Previously, [SPARK-26345](https://issues.apache.org/jira/browse/SPARK-26345) / (#31393) updated Spark to only scan pages filtered by column index from parquet-mr side. This is done by calling `ParquetFileReader.readNextFilteredRowGroup` and `ParquetFileReader.getFilteredRecordCount` API. The implementation, however, only work for a few limited cases: in the scenario where there are multiple columns and their type width are different (e.g., `int` and `bigint`), it could return incorrec [...] In order to fix the above, Spark needs to leverage the API `PageReadStore.getRowIndexes` and `DataPage.getFirstRowIndex`. The former returns the indexes of all rows (note the difference between rows and values: for flat schema there is no difference between the two, but for nested schema they're different) after filtering within a Parquet row group. The latter returns the first row index within a single data page. With the combination of the two, one is able to know which rows/values [...] ### Does this PR introduce _any_ user-facing change? Yes. Now the vectorized Parquet reader should work correctly with column index. ### How was this patch tested? Borrowed tests from #31998 and added a few more tests. Closes #32753 from sunchao/SPARK-34859. Lead-authored-by: Chao Sun Co-authored-by: Li Xian Signed-off-by: Dongjoon Hyun --- .../datasources/parquet/ParquetReadState.java | 120 ++- .../datasources/parquet/ParquetVectorUpdater.java | 12 +- .../parquet/ParquetVectorUpdaterFactory.java | 216 ++- .../parquet/VectorizedColumnReader.java| 28 ++- .../parquet/VectorizedParquetRecordReader.java | 1 + .../parquet/VectorizedPlainValuesReader.java | 51 + .../parquet/VectorizedRleValuesReader.java | 239 - .../parquet/VectorizedValuesReader.java| 13 ++ .../parquet/ParquetColumnIndexSuite.scala | 126 +++ .../datasources/parquet/ParquetIOSuite.scala | 72 ++- 10 files changed, 746 insertions(+), 132 deletions(-) diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
[spark] branch master updated (2c94fbc -> d46c1e3)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2c94fbc initial commit for skeleton ansible for jenkins worker config add d46c1e3 [SPARK-35725][SQL] Support optimize skewed partitions in RebalancePartitions No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 10 ++ .../execution/adaptive/AdaptiveSparkPlanExec.scala | 1 + .../OptimizeSkewInRebalancePartitions.scala| 101 + .../execution/adaptive/OptimizeSkewedJoin.scala| 40 +--- .../execution/adaptive/ShufflePartitionsUtil.scala | 39 +++- .../adaptive/AdaptiveQueryExecSuite.scala | 43 - 6 files changed, 193 insertions(+), 41 deletions(-) create mode 100644 sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewInRebalancePartitions.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (733e85f1 -> 2c94fbc)
This is an automated email from the ASF dual-hosted git repository. shaneknapp pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 733e85f1 [SPARK-35953][SQL] Support extracting date fields from timestamp without time zone add 2c94fbc initial commit for skeleton ansible for jenkins worker config No new revisions were added by this update. Summary of changes: dev/.rat-excludes | 1 + dev/ansible-for-test-node/README.md| 25 +++ .../deploy-jenkins-worker.yml | 8 + dev/ansible-for-test-node/roles/common/README.md | 4 + .../roles/common/tasks/main.yml| 4 + .../roles/common/tasks/setup_local_userspace.yml | 8 + .../roles/common/tasks/system_packages.yml | 73 .../roles/jenkins-worker/README.md | 15 ++ .../roles/jenkins-worker/defaults/main.yml | 30 .../files/python_environments/base-py3-pip.txt | 3 + .../files/python_environments/base-py3-spec.txt| 21 +++ .../files/python_environments/py36.txt | 49 ++ .../files/python_environments/spark-py2-pip.txt| 8 + .../files/python_environments/spark-py36-spec.txt | 61 +++ .../files/python_environments/spark-py3k-spec.txt | 42 + .../files/scripts/jenkins-gitcache-cron| 7 + .../files/util_scripts/kill_zinc_nailgun.py| 60 +++ .../files/util_scripts/post_github_pr_comment.py | 81 + .../files/util_scripts/session_lock_resource.py| 152 + .../roles/jenkins-worker/files/worker-limits.conf | 5 + .../roles/jenkins-worker/tasks/cleanup.yml | 12 ++ .../jenkins-worker/tasks/install_anaconda.yml | 79 + .../tasks/install_build_packages.yml | 21 +++ .../roles/jenkins-worker/tasks/install_docker.yml | 33 .../jenkins-worker/tasks/install_minikube.yml | 16 ++ .../tasks/install_spark_build_packages.yml | 183 + .../jenkins-worker/tasks/jenkins_userspace.yml | 119 ++ .../roles/jenkins-worker/tasks/main.yml| 22 +++ .../roles/jenkins-worker/vars/main.yml | 9 + dev/tox.ini| 4 +- 30 files changed, 1153 insertions(+), 2 deletions(-) create mode 100644 dev/ansible-for-test-node/README.md create mode 100644 dev/ansible-for-test-node/deploy-jenkins-worker.yml create mode 100644 dev/ansible-for-test-node/roles/common/README.md create mode 100644 dev/ansible-for-test-node/roles/common/tasks/main.yml create mode 100644 dev/ansible-for-test-node/roles/common/tasks/setup_local_userspace.yml create mode 100644 dev/ansible-for-test-node/roles/common/tasks/system_packages.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/README.md create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/defaults/main.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/base-py3-pip.txt create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/base-py3-spec.txt create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/py36.txt create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/spark-py2-pip.txt create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/spark-py36-spec.txt create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/spark-py3k-spec.txt create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/scripts/jenkins-gitcache-cron create mode 100755 dev/ansible-for-test-node/roles/jenkins-worker/files/util_scripts/kill_zinc_nailgun.py create mode 100755 dev/ansible-for-test-node/roles/jenkins-worker/files/util_scripts/post_github_pr_comment.py create mode 100755 dev/ansible-for-test-node/roles/jenkins-worker/files/util_scripts/session_lock_resource.py create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/files/worker-limits.conf create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/cleanup.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_anaconda.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_build_packages.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_docker.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_minikube.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_spark_build_packages.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/jenkins_userspace.yml create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/tasks/main.yml create mode 100644
[spark] branch master updated (2febd5c -> 733e85f1)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2febd5c [SPARK-35735][SQL] Take into account day-time interval fields in cast add 733e85f1 [SPARK-35953][SQL] Support extracting date fields from timestamp without time zone No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 2 +- .../spark/sql/catalyst/analysis/TypeCoercion.scala | 4 +- .../test/resources/sql-tests/inputs/extract.sql| 92 +++ .../resources/sql-tests/results/extract.sql.out| 276 ++--- .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 8 +- 5 files changed, 192 insertions(+), 190 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35735][SQL] Take into account day-time interval fields in cast
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2febd5c [SPARK-35735][SQL] Take into account day-time interval fields in cast 2febd5c is described below commit 2febd5c3f0c3a0c6660cfb340eb65316a1ca4acd Author: Angerszh AuthorDate: Wed Jun 30 16:05:04 2021 +0300 [SPARK-35735][SQL] Take into account day-time interval fields in cast ### What changes were proposed in this pull request? Support take into account day-time interval field in cast. ### Why are the changes needed? To conform to the SQL standard. ### Does this PR introduce _any_ user-facing change? An user can use `cast(str, DayTimeInterval(DAY, HOUR))`, for instance. ### How was this patch tested? Added UT. Closes #32943 from AngersZh/SPARK-35735. Authored-by: Angerszh Signed-off-by: Max Gekk --- .../spark/sql/catalyst/util/IntervalUtils.scala| 203 +++-- .../sql/catalyst/expressions/CastSuiteBase.scala | 145 ++- 2 files changed, 324 insertions(+), 24 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala index 7a6de7f..30a2fa5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala @@ -119,18 +119,27 @@ object IntervalUtils { } } + val supportedFormat = Map( +(YM.YEAR, YM.MONTH) -> Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"), +(YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"), +(YM.MONTH, YM.MONTH) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MONTH"), +(DT.DAY, DT.DAY) -> Seq("[+|-]d", "INTERVAL [+|-]'[+|-]d' DAY"), +(DT.DAY, DT.HOUR) -> Seq("[+|-]d h", "INTERVAL [+|-]'[+|-]d h' DAY TO HOUR"), +(DT.DAY, DT.MINUTE) -> Seq("[+|-]d h:m", "INTERVAL [+|-]'[+|-]d h:m' DAY TO MINUTE"), +(DT.DAY, DT.SECOND) -> Seq("[+|-]d h:m:s.n", "INTERVAL [+|-]'[+|-]d h:m:s.n' DAY TO SECOND"), +(DT.HOUR, DT.HOUR) -> Seq("[+|-]h", "INTERVAL [+|-]'[+|-]h' HOUR"), +(DT.HOUR, DT.MINUTE) -> Seq("[+|-]h:m", "INTERVAL [+|-]'[+|-]h:m' HOUR TO MINUTE"), +(DT.HOUR, DT.SECOND) -> Seq("[+|-]h:m:s.n", "INTERVAL [+|-]'[+|-]h:m:s.n' HOUR TO SECOND"), +(DT.MINUTE, DT.MINUTE) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MINUTE"), +(DT.MINUTE, DT.SECOND) -> Seq("[+|-]m:s.n", "INTERVAL [+|-]'[+|-]m:s.n' MINUTE TO SECOND"), +(DT.SECOND, DT.SECOND) -> Seq("[+|-]s.n", "INTERVAL [+|-]'[+|-]s.n' SECOND") + ) + def castStringToYMInterval( input: UTF8String, startField: Byte, endField: Byte): Int = { -val supportedFormat = Map( - (YM.YEAR, YM.MONTH) -> -Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"), - (YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"), - (YM.MONTH, YM.MONTH) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MONTH") -) - def checkStringIntervalType(targetStartField: Byte, targetEndField: Byte): Unit = { if (startField != targetStartField || endField != targetEndField) { throw new IllegalArgumentException(s"Interval string does not match year-month format of " + @@ -151,7 +160,7 @@ object IntervalUtils { checkStringIntervalType(YM.YEAR, YM.MONTH) toYMInterval(year, month, getSign(firstSign, secondSign)) case yearMonthIndividualRegex(secondSign, value) => -safeToYMInterval { +safeToInterval { val sign = getSign("+", secondSign) if (endField == YM.YEAR) { sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR) @@ -166,7 +175,7 @@ object IntervalUtils { } } case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, suffix) => -safeToYMInterval { +safeToInterval { val sign = getSign(firstSign, secondSign) if ("YEAR".equalsIgnoreCase(suffix)) { checkStringIntervalType(YM.YEAR, YM.YEAR) @@ -202,7 +211,7 @@ object IntervalUtils { } } - private def safeToYMInterval(f: => Int): Int = { + private def safeToInterval[T](f: => T): T = { try { f } catch { @@ -213,24 +222,72 @@ object IntervalUtils { } private def toYMInterval(yearStr: String, monthStr: String, sign: Int): Int = { -safeToYMInterval { +safeToInterval { val years = toLongWithRange(YEAR, yearStr, 0, Integer.MAX_VALUE / MONTHS_PER_YEAR) val totalMonths = sign * (years * MONTHS_PER_YEAR + toLongWithRange(MONTH, monthStr, 0, 11)) Math.toIntExact(totalMonths) } } + private val normalPattern = "(\\d{1,2})" +
[spark] branch master updated (c6afd6e -> e88aa49)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c6afd6e [SPARK-35951][DOCS] Add since versions for Avro options in Documentation add e88aa49 [SPARK-35932][SQL] Support extracting hour/minute/second from timestamp without time zone No new revisions were added by this update. Summary of changes: .../sql/catalyst/analysis/AnsiTypeCoercion.scala | 1 + .../spark/sql/catalyst/analysis/TypeCoercion.scala | 6 +- .../catalyst/expressions/datetimeExpressions.scala | 8 +- .../apache/spark/sql/types/AbstractDataType.scala | 2 +- .../expressions/DateExpressionsSuite.scala | 65 +--- .../test/resources/sql-tests/inputs/extract.sql| 66 .../resources/sql-tests/results/extract.sql.out| 182 ++--- 7 files changed, 181 insertions(+), 149 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35951][DOCS] Add since versions for Avro options in Documentation
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c6afd6e [SPARK-35951][DOCS] Add since versions for Avro options in Documentation c6afd6e is described below commit c6afd6ed5296980e81160e441a4e9bea98c74196 Author: Gengliang Wang AuthorDate: Wed Jun 30 17:24:48 2021 +0800 [SPARK-35951][DOCS] Add since versions for Avro options in Documentation ### What changes were proposed in this pull request? There are two new Avro options `datetimeRebaseMode` and `positionalFieldMatching` after Spark 3.2. We should document the since version so that users can know whether the option works in their Spark version. ### Why are the changes needed? Better documentation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual preview on local setup. https://user-images.githubusercontent.com/1097932/123934000-ba833b00-d947-11eb-9ca5-ce8ff8add74b.png;> https://user-images.githubusercontent.com/1097932/123934126-d4bd1900-d947-11eb-8d80-69df8f3d9900.png;> Closes #33153 from gengliangwang/version. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang --- docs/sql-data-sources-avro.md | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index 7fb0ef5..94dd7e1 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -224,7 +224,7 @@ Data source options of Avro can be set via: * the `options` parameter in function `from_avro`. - Property NameDefaultMeaningScope + Property NameDefaultMeaningScopeSince Version avroSchema None @@ -244,24 +244,28 @@ Data source options of Avro can be set via: read, write and function from_avro +2.4.0 recordName topLevelRecord Top level record name in write result, which is required in Avro spec. write +2.4.0 recordNamespace "" Record namespace in write result. write +2.4.0 ignoreExtension true The option controls ignoring of files without .avro extensions in read. If the option is enabled, all files (with and without .avro extension) are loaded. The option has been deprecated, and it will be removed in the future releases. Please use the general data source option pathGlobFilter for filtering file names. read +2.4.0 compression @@ -269,6 +273,7 @@ Data source options of Avro can be set via: The compression option allows to specify a compression codec used in write. Currently supported codecs are uncompressed, snappy, deflate, bzip2 and xz. If the option is not set, the configuration spark.sql.avro.compression.codec config is taken into account. write +2.4.0 mode @@ -282,6 +287,7 @@ Data source options of Avro can be set via: function from_avro +2.4.0 datetimeRebaseMode @@ -295,12 +301,14 @@ Data source options of Avro can be set via: read and function from_avro +3.2.0 positionalFieldMatching false This can be used in tandem with the `avroSchema` option to adjust the behavior for matching the fields in the provided Avro schema with those in the SQL schema. By default, the matching will be performed using field names, ignoring their positions. If this option is set to "true", the matching will be based on the position of the fields. read and write +3.2.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4dd41b9 -> e3bd817)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4dd41b9 [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching add e3bd817 [SPARK-34920][CORE][SQL] Add error classes with SQLSTATE No new revisions were added by this update. Summary of changes: core/src/main/resources/error/README.md| 89 + core/src/main/resources/error/error-classes.json | 22 .../main/scala/org/apache/spark/SparkError.scala | 83 + .../scala/org/apache/spark/SparkException.scala| 34 - .../scala/org/apache/spark/SparkErrorSuite.scala | 137 + .../org/apache/spark/sql/AnalysisException.scala | 38 +- .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../sql/catalyst/analysis/CheckAnalysis.scala | 3 +- .../spark/sql/catalyst/analysis/package.scala | 8 ++ .../spark/sql/catalyst/parser/ParseDriver.scala| 26 +++- .../spark/sql/errors/QueryCompilationErrors.scala | 5 +- .../spark/sql/errors/QueryExecutionErrors.scala| 9 +- .../spark/sql/errors/QueryParsingErrors.scala | 3 +- .../sql/catalyst/analysis/AnalysisErrorSuite.scala | 5 + .../expressions/ArithmeticExpressionSuite.scala| 22 +++- .../sql-tests/results/ansi/interval.sql.out| 2 +- .../sql-tests/results/postgreSQL/case.sql.out | 6 +- .../sql-tests/results/postgreSQL/int8.sql.out | 6 +- .../results/postgreSQL/select_having.sql.out | 2 +- .../results/udf/postgreSQL/udf-case.sql.out| 6 +- .../udf/postgreSQL/udf-select_having.sql.out | 2 +- .../spark/sql/connector/DataSourceV2Suite.scala| 2 + .../hive/thriftserver/HiveThriftServerErrors.scala | 10 +- .../sql/hive/thriftserver/SparkSQLDriver.scala | 10 +- .../ThriftServerWithSparkContextSuite.scala| 3 + 25 files changed, 499 insertions(+), 36 deletions(-) create mode 100644 core/src/main/resources/error/README.md create mode 100644 core/src/main/resources/error/error-classes.json create mode 100644 core/src/main/scala/org/apache/spark/SparkError.scala create mode 100644 core/src/test/scala/org/apache/spark/SparkErrorSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (ab46045 -> 6a1361c)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from ab46045 [SPARK-35886][SQL][3.0] PromotePrecision should not overwrite genCodePromotePrecision should not overwrite genCode add 6a1361c [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR TABLE` on table refreshing No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.1 updated (fe412b6 -> b6e8fab)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git. from fe412b6 [SPARK-35898][SQL] Fix arrays and maps in RowToColumnConverter add b6e8fab [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR TABLE` on table refreshing No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (6bbfb45 -> 4dd41b9)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6bbfb45 [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to `FileCommitProtocol` add 4dd41b9 [SPARK-34365][AVRO] Add support for positional Catalyst-to-Avro schema matching No new revisions were added by this update. Summary of changes: docs/sql-data-sources-avro.md | 6 + .../apache/spark/sql/avro/AvroDeserializer.scala | 15 +- .../org/apache/spark/sql/avro/AvroFileFormat.scala | 1 + .../org/apache/spark/sql/avro/AvroOptions.scala| 8 + .../apache/spark/sql/avro/AvroOutputWriter.scala | 5 +- .../spark/sql/avro/AvroOutputWriterFactory.scala | 8 +- .../org/apache/spark/sql/avro/AvroSerializer.scala | 22 +-- .../org/apache/spark/sql/avro/AvroUtils.scala | 42 +- .../sql/v2/avro/AvroPartitionReaderFactory.scala | 1 + .../sql/avro/AvroCatalystDataConversionSuite.scala | 1 + .../apache/spark/sql/avro/AvroRowReaderSuite.scala | 1 + .../spark/sql/avro/AvroSchemaHelperSuite.scala | 24 ++- .../org/apache/spark/sql/avro/AvroSerdeSuite.scala | 164 ++--- .../org/apache/spark/sql/avro/AvroSuite.scala | 41 +- 14 files changed, 258 insertions(+), 81 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to `FileCommitProtocol`
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 6bbfb45 [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to `FileCommitProtocol` 6bbfb45 is described below commit 6bbfb45ffe75aa6c27a7bf3c3385a596637d1822 Author: Cheng Su AuthorDate: Wed Jun 30 16:25:20 2021 +0900 [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to `FileCommitProtocol` ### What changes were proposed in this pull request? This is the followup from https://github.com/apache/spark/pull/33012#discussion_r659440833, where we want to add `Unstable` to `FileCommitProtocol`, to give people a better idea of API. ### Why are the changes needed? Make it easier for people to follow and understand code. Clean up code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests, as no real logic change. Closes #33148 from c21/bucket-followup. Authored-by: Cheng Su Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/internal/io/FileCommitProtocol.scala | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala index 6465cc7..5cd7397 100644 --- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala +++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala @@ -20,6 +20,7 @@ package org.apache.spark.internal.io import org.apache.hadoop.fs._ import org.apache.hadoop.mapreduce._ +import org.apache.spark.annotation.Unstable import org.apache.spark.internal.Logging import org.apache.spark.util.Utils @@ -41,7 +42,11 @@ import org.apache.spark.util.Utils *(or abortTask if task failed). * 3. When all necessary tasks completed successfully, the driver calls commitJob. If the job *failed to execute (e.g. too many failed tasks), the job should call abortJob. + * + * @note This class is exposed as an API considering the usage of many downstream custom + * implementations, but will be subject to be changed and/or moved. */ +@Unstable abstract class FileCommitProtocol extends Logging { import FileCommitProtocol._ @@ -107,9 +112,7 @@ abstract class FileCommitProtocol extends Logging { * if a task is going to write out multiple files to the same dir. The file commit protocol only * guarantees that files written by different tasks will not conflict. * - * This API should be implemented and called, instead of - * [[newTaskTempFile(taskContest, dir, ext)]]. Provide a default implementation here to be - * backward compatible with custom [[FileCommitProtocol]] implementations before Spark 3.2.0. + * @since 3.2.0 */ def newTaskTempFile( taskContext: TaskAttemptContext, dir: Option[String], spec: FileNameSpec): String = { @@ -144,10 +147,7 @@ abstract class FileCommitProtocol extends Logging { * if a task is going to write out multiple files to the same dir. The file commit protocol only * guarantees that files written by different tasks will not conflict. * - * This API should be implemented and called, instead of - * [[newTaskTempFileAbsPath(taskContest, absoluteDir, ext)]]. Provide a default implementation - * here to be backward compatible with custom [[FileCommitProtocol]] implementations before - * Spark 3.2.0. + * @since 3.2.0 */ def newTaskTempFileAbsPath( taskContext: TaskAttemptContext, absoluteDir: String, spec: FileNameSpec): String = { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35948][INFRA] Simplify release scripts by removing Spark 2.4/Java7 parts
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b218cc9 [SPARK-35948][INFRA] Simplify release scripts by removing Spark 2.4/Java7 parts b218cc9 is described below commit b218cc90cfa957bdbf443ed3dbdfdf660bc1312d Author: Dongjoon Hyun AuthorDate: Wed Jun 30 16:24:03 2021 +0900 [SPARK-35948][INFRA] Simplify release scripts by removing Spark 2.4/Java7 parts ### What changes were proposed in this pull request? This PR aims to clean up Spark 2.4 and Java7 code path from the release scripts. ### Why are the changes needed? To simplify the logic. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33150 from dongjoon-hyun/SPARK-35948. Authored-by: Dongjoon Hyun Signed-off-by: Hyukjin Kwon --- dev/create-release/release-build.sh | 28 ++-- dev/create-release/release-util.sh | 6 -- 2 files changed, 6 insertions(+), 28 deletions(-) diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index 28d4853..96b8d4e 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -177,10 +177,7 @@ fi # Depending on the version being built, certain extra profiles need to be activated, and # different versions of Scala are supported. -BASE_PROFILES="-Pmesos -Pyarn" -if [[ $SPARK_VERSION > "2.3" ]]; then - BASE_PROFILES="$BASE_PROFILES -Pkubernetes" -fi +BASE_PROFILES="-Pmesos -Pyarn -Pkubernetes" PUBLISH_SCALA_2_13=1 SCALA_2_13_PROFILES="-Pscala-2.13" @@ -188,14 +185,8 @@ if [[ $SPARK_VERSION < "3.2" ]]; then PUBLISH_SCALA_2_13=0 fi -PUBLISH_SCALA_2_12=0 +PUBLISH_SCALA_2_12=1 SCALA_2_12_PROFILES="-Pscala-2.12" -if [[ $SPARK_VERSION < "3.0." ]]; then - SCALA_2_12_PROFILES="-Pscala-2.12 -Pflume" -fi -if [[ $SPARK_VERSION > "2.4" ]]; then - PUBLISH_SCALA_2_12=1 -fi # Hive-specific profiles for some builds HIVE_PROFILES="-Phive -Phive-thriftserver" @@ -236,12 +227,9 @@ if [[ "$1" == "package" ]]; then echo "Packaging release source tarballs" cp -r spark spark-$SPARK_VERSION - # For source release in v2.4+, exclude copy of binary license/notice - if [[ $SPARK_VERSION > "2.4" ]]; then -rm -f spark-$SPARK_VERSION/LICENSE-binary -rm -f spark-$SPARK_VERSION/NOTICE-binary -rm -rf spark-$SPARK_VERSION/licenses-binary - fi + rm -f spark-$SPARK_VERSION/LICENSE-binary + rm -f spark-$SPARK_VERSION/NOTICE-binary + rm -rf spark-$SPARK_VERSION/licenses-binary tar cvzf spark-$SPARK_VERSION.tgz --exclude spark-$SPARK_VERSION/.git spark-$SPARK_VERSION echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour --output spark-$SPARK_VERSION.tgz.asc \ @@ -337,11 +325,7 @@ if [[ "$1" == "package" ]]; then BINARY_PKGS_ARGS["hadoop3.2"]="-Phadoop-3.2 $HIVE_PROFILES" if ! is_dry_run; then BINARY_PKGS_ARGS["without-hadoop"]="-Phadoop-provided" -if [[ $SPARK_VERSION < "3.0." ]]; then - BINARY_PKGS_ARGS["hadoop2.6"]="-Phadoop-2.6 $HIVE_PROFILES" -else - BINARY_PKGS_ARGS["hadoop2.7"]="-Phadoop-2.7 $HIVE_PROFILES" -fi +BINARY_PKGS_ARGS["hadoop2.7"]="-Phadoop-2.7 $HIVE_PROFILES" fi declare -A BINARY_PKGS_EXTRA diff --git a/dev/create-release/release-util.sh b/dev/create-release/release-util.sh index eb1fd38..0394fb4 100755 --- a/dev/create-release/release-util.sh +++ b/dev/create-release/release-util.sh @@ -226,11 +226,5 @@ function init_maven_sbt { MVN="build/mvn -B" MVN_EXTRA_OPTS= SBT_OPTS= - if [[ $JAVA_VERSION < "1.8." ]]; then -# Needed for maven central when using Java 7. -SBT_OPTS="-Dhttps.protocols=TLSv1.1,TLSv1.2" -MVN_EXTRA_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g -Dhttps.protocols=TLSv1.1,TLSv1.2" -MVN="$MVN $MVN_EXTRA_OPTS" - fi export MVN MVN_EXTRA_OPTS SBT_OPTS } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35947][INFRA] Increase JVM stack size in release-build.sh
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 5312008 [SPARK-35947][INFRA] Increase JVM stack size in release-build.sh 5312008 is described below commit 5312008cca074748cac78a57359c9ad9a729df92 Author: Dongjoon Hyun AuthorDate: Wed Jun 30 16:23:13 2021 +0900 [SPARK-35947][INFRA] Increase JVM stack size in release-build.sh ### What changes were proposed in this pull request? Like SPARK-35825, this PR aims to increase JVM stack size via `MAVEN_OPTS` in release-build.sh. ### Why are the changes needed? This will mitigate the failure in publishing snapshot GitHub Action job and during the release. - https://github.com/apache/spark/actions/workflows/publish_snapshot.yml (3-day consecutive failures) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #33149 from dongjoon-hyun/SPARK-35947. Authored-by: Dongjoon Hyun Signed-off-by: Hyukjin Kwon --- dev/create-release/release-build.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dev/create-release/release-build.sh b/dev/create-release/release-build.sh index 57ce7b3..28d4853 100755 --- a/dev/create-release/release-build.sh +++ b/dev/create-release/release-build.sh @@ -229,7 +229,7 @@ git clean -d -f -x rm -f .gitignore cd .. -export MAVEN_OPTS="-Xmx12g" +export MAVEN_OPTS="-Xss128m -Xmx12g" if [[ "$1" == "package" ]]; then # Source and binary tarballs - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d28ca9c [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing d28ca9c is described below commit d28ca9cc9808828118be64a545c3478160fdc170 Author: Max Gekk AuthorDate: Wed Jun 30 09:44:52 2021 +0300 [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing ### What changes were proposed in this pull request? In the PR, I propose to catch all non-fatal exceptions coming `refreshTable()` at the final stage of table repairing, and output an error message instead of failing with an exception. ### Why are the changes needed? 1. The uncaught exceptions from table refreshing might be considered as regression comparing to previous Spark versions. Table refreshing was introduced by https://github.com/apache/spark/pull/31066. 2. This should improve user experience with Spark SQL. For instance, when the `MSCK REPAIR TABLE` is performed in a chain of command in SQL where catching exception is difficult or even impossible. ### Does this PR introduce _any_ user-facing change? Yes. Before the changes the `MSCK REPAIR TABLE` command can fail with the exception portrayed in SPARK-35935. After the changes, the same command outputs error message, and completes successfully. ### How was this patch tested? By existing test suites. Closes #33137 from MaxGekk/msck-repair-catch-except. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala index 0876b5f..06c6847 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala @@ -675,7 +675,15 @@ case class RepairTableCommand( // This is always the case for Hive format tables, but is not true for Datasource tables created // before Spark 2.1 unless they are converted via `msck repair table`. spark.sessionState.catalog.alterTable(table.copy(tracksPartitionsInCatalog = true)) -spark.catalog.refreshTable(tableIdentWithDB) +try { + spark.catalog.refreshTable(tableIdentWithDB) +} catch { + case NonFatal(e) => +logError(s"Cannot refresh the table '$tableIdentWithDB'. A query of the table " + + "might return wrong result if the table was cached. To avoid such issue, you should " + + "uncache the table manually via the UNCACHE TABLE command after table recovering will " + + "complete fully.", e) +} logInfo(s"Recovered all partitions: added ($addedAmount), dropped ($droppedAmount).") Seq.empty[Row] } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun edited a comment on pull request #349: Remove Apache Spark 2.4 from download page
dongjoon-hyun edited a comment on pull request #349: URL: https://github.com/apache/spark-website/pull/349#issuecomment-871132652 Thank you, @HyukjinKwon and @Ngone51 . Merged to asf-site. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Remove Apache Spark 2.4 from download page (#349)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new ed7c89c Remove Apache Spark 2.4 from download page (#349) ed7c89c is described below commit ed7c89c55d70d2df7dfdab11ba2ca9937dd647c1 Author: Dongjoon Hyun AuthorDate: Tue Jun 29 23:28:58 2021 -0700 Remove Apache Spark 2.4 from download page (#349) This PR aims to remove Apache Spark 2.4.8 download link from the download page because it's already EOL and we had better recommend the latest versions: 3.1.2, 3.0.3. We should not change the 2.4.8 binary distribution because they are still used in the test suite. --- js/downloads.js | 4 site/js/downloads.js | 4 2 files changed, 8 deletions(-) diff --git a/js/downloads.js b/js/downloads.js index 7f82fd3..fdac347 100644 --- a/js/downloads.js +++ b/js/downloads.js @@ -13,13 +13,10 @@ function addRelease(version, releaseDate, packages, mirrored) { var sources = {pretty: "Source Code", tag: "sources"}; var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; -var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"}; var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; -// 2.4.0+ -var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p12_hadoopFree, sources]; // 3.0.0+ var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources]; // 3.1.0+ @@ -28,7 +25,6 @@ var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources]; addRelease("3.1.2", new Date("06/01/2021"), packagesV11, true); addRelease("3.0.3", new Date("06/23/2021"), packagesV10, true); -addRelease("2.4.8", new Date("05/17/2021"), packagesV9, true); function append(el, contents) { el.innerHTML += contents; diff --git a/site/js/downloads.js b/site/js/downloads.js index 7f82fd3..fdac347 100644 --- a/site/js/downloads.js +++ b/site/js/downloads.js @@ -13,13 +13,10 @@ function addRelease(version, releaseDate, packages, mirrored) { var sources = {pretty: "Source Code", tag: "sources"}; var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; -var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"}; var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; -// 2.4.0+ -var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p12_hadoopFree, sources]; // 3.0.0+ var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources]; // 3.1.0+ @@ -28,7 +25,6 @@ var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources]; addRelease("3.1.2", new Date("06/01/2021"), packagesV11, true); addRelease("3.0.3", new Date("06/23/2021"), packagesV10, true); -addRelease("2.4.8", new Date("05/17/2021"), packagesV9, true); function append(el, contents) { el.innerHTML += contents; - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on pull request #349: Remove Apache Spark 2.4 from download page
dongjoon-hyun commented on pull request #349: URL: https://github.com/apache/spark-website/pull/349#issuecomment-871132652 Thank you, @HyukjinKwon and @Ngone51 . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun merged pull request #349: Remove Apache Spark 2.4 from download page
dongjoon-hyun merged pull request #349: URL: https://github.com/apache/spark-website/pull/349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function"
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7668226 Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function" 7668226 is described below commit 76682268d746e72f0e8aa4cc64860e0bfd90f1ed Author: Max Gekk AuthorDate: Wed Jun 30 09:26:35 2021 +0300 Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function" ### What changes were proposed in this pull request? This reverts commit e6753c9402b5c40d9e2af662f28bd4f07a0bae17. ### Why are the changes needed? The `make_interval` function aims to construct values of the legacy interval type `CalendarIntervalType` which will be substituted by ANSI interval types (see SPARK-27790). Since the function has not been released yet, it would be better to don't expose it via public API at all. ### Does this PR introduce _any_ user-facing change? Should not since the `make_interval` function has not been released yet. ### How was this patch tested? By existing test suites, and GA/jenkins builds. Closes #33143 from MaxGekk/revert-make_interval. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/sql/functions.scala | 25 .../apache/spark/sql/JavaDateFunctionsSuite.java | 68 -- .../org/apache/spark/sql/DateFunctionsSuite.scala | 40 - 3 files changed, 133 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index c446d6b..ecd60ff 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -2929,31 +2929,6 @@ object functions { // /** - * (Scala-specific) Creates a datetime interval - * - * @param years Number of years - * @param months Number of months - * @param weeks Number of weeks - * @param days Number of days - * @param hours Number of hours - * @param mins Number of mins - * @param secs Number of secs - * @return A datetime interval - * @group datetime_funcs - * @since 3.2.0 - */ - def make_interval( - years: Column = lit(0), - months: Column = lit(0), - weeks: Column = lit(0), - days: Column = lit(0), - hours: Column = lit(0), - mins: Column = lit(0), - secs: Column = lit(0)): Column = withExpr { -MakeInterval(years.expr, months.expr, weeks.expr, days.expr, hours.expr, mins.expr, secs.expr) - } - - /** * Returns the date that is `numMonths` after `startDate`. * * @param startDate A date, timestamp or string. If a string, the data must be in a format that diff --git a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java deleted file mode 100644 index 2d1de77..000 --- a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java +++ /dev/null @@ -1,68 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package test.org.apache.spark.sql; - -import org.apache.spark.sql.Column; -import org.apache.spark.sql.Dataset; -import org.apache.spark.sql.Row; -import org.apache.spark.sql.RowFactory; -import org.apache.spark.sql.test.TestSparkSession; -import org.apache.spark.sql.types.StructType; -import org.junit.After; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - -import java.sql.Date; -import java.util.*; - -import static org.apache.spark.sql.types.DataTypes.*; -import static org.apache.spark.sql.functions.*; - -public class JavaDateFunctionsSuite { - private transient TestSparkSession spark; - - @Before - public void setUp() { -spark = new TestSparkSession(); -} - - @After - public void tearDown() { -spark.stop(); -spark = null; - } - - @Test - public void makeIntervalWorksWithJava() { -Column