[spark] branch master updated (34286ae -> ba0a479)

2021-06-30 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 34286ae  [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9
 add ba0a479  [SPARK-35961][SQL] Only use local shuffle reader when 
REBALANCE_PARTITIONS_BY_NONE without CustomShuffleReaderExec

No new revisions were added by this update.

Summary of changes:
 .../sql/execution/adaptive/OptimizeLocalShuffleReader.scala   | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9

2021-06-30 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 34286ae  [SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9
34286ae is described below

commit 34286ae5bf120544524a6cfe934755e20eb9f835
Author: Holden Karau 
AuthorDate: Wed Jun 30 21:39:12 2021 -0700

[SPARK-35960][BUILD][TEST] Bump the scalatest version to 3.2.9

### What changes were proposed in this pull request?

Bump the scalatest version to 3.2.9

### Why are the changes needed?

With the scalatestplus change to 3.2.9.0, recent sbt fails to handle the 
mismatch between scalatest and scalatestplus and resolve resulting in 
test:compile errors of not being able to find the org.scalatest package.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

sbt tags/test:compile failed before and passes with this change.

Closes #33163 from holdenk/SPARK-35960-test-compile-sbt-issue.

Authored-by: Holden Karau 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pom.xml b/pom.xml
index 347c777..48307fd 100644
--- a/pom.xml
+++ b/pom.xml
@@ -937,7 +937,7 @@
   
 org.scalatest
 scalatest_${scala.binary.version}
-3.2.3
+3.2.9
 test
   
   

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page

2021-06-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dc85b0b  [SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the 
executors page
dc85b0b is described below

commit dc85b0b51a02b9d6c52ffb1600f26ccdd7d7829a
Author: Kevin Su 
AuthorDate: Thu Jul 1 12:32:54 2021 +0800

[SPARK-35950][WEBUI] Failed to toggle Exec Loss Reason in the executors page

### What changes were proposed in this pull request?

Update the executor's page, so it can successfully hide the "Exec Loss 
Reason" column.

### Why are the changes needed?

When unselected the checkbox "Exec Loss Reason" on the executor page,
the "Active tasks" column disappears instead of the "Exec Loss Reason" 
column.

Before:
![Screenshot from 2021-06-30 
15-55-05](https://user-images.githubusercontent.com/37936015/123930908-bd6f4180-d9c2-11eb-9aba-bbfe0a237776.png)
After:
![Screenshot from 2021-06-30 
22-21-38](https://user-images.githubusercontent.com/37936015/123977632-bf042e00-d9f1-11eb-910e-93d615d2db47.png)

### Does this PR introduce _any_ user-facing change?

Yes, The Web UI is updated.

### How was this patch tested?

Pass the CIs.

Closes #33155 from pingsutw/SPARK-35950.

Lead-authored-by: Kevin Su 
Co-authored-by: Kevin Su 
Signed-off-by: Gengliang Wang 
---
 .../src/main/resources/org/apache/spark/ui/static/executorspage.js | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js 
b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
index ab412a8..b7fbe04 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/executorspage.js
@@ -140,7 +140,7 @@ function totalDurationColor(totalGCTime, totalDuration) {
 }
 
 var sumOptionalColumns = [3, 4];
-var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 15];
+var execOptionalColumns = [5, 6, 7, 8, 9, 10, 13, 14, 25];
 var execDataTable;
 var sumDataTable;
 
@@ -566,7 +566,8 @@ $(document).ready(function () {
 {"visible": false, "targets": 9},
 {"visible": false, "targets": 10},
 {"visible": false, "targets": 13},
-{"visible": false, "targets": 14}
+{"visible": false, "targets": 14},
+{"visible": false, "targets": 25}
   ],
   "deferRender": true
 };
@@ -721,7 +722,7 @@ $(document).ready(function () {
   " Peak Pool Memory 
Direct / Mapped" +
   " 
Resources" +
   " Resource Profile Id" +
-  " Exec Loss Reason" +
+  " Exec Loss Reason" +
   "");
 
 reselectCheckboxesBasedOnTaskTableState();

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (6a1361c -> 845e750)

2021-06-30 Thread wuyi
This is an automated email from the ASF dual-hosted git repository.

wuyi pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6a1361c  [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR 
TABLE` on table refreshing
 add 845e750  [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for 
WorkerWatcher to avoid the duplicate System.exit

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/deploy/worker/WorkerWatcher.scala | 17 ++-
 .../executor/CoarseGrainedExecutorBackend.scala| 33 +-
 2 files changed, 27 insertions(+), 23 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated: [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher to avoid the duplicate System.exit

2021-06-30 Thread wuyi
This is an automated email from the ASF dual-hosted git repository.

wuyi pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 9229eb9  [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for 
WorkerWatcher to avoid the duplicate System.exit
9229eb9 is described below

commit 9229eb9a94cfcfd8c030b7dff4ab81227fffc742
Author: yi.wu 
AuthorDate: Thu Jul 1 11:40:00 2021 +0800

[SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for WorkerWatcher 
to avoid the duplicate System.exit

### What changes were proposed in this pull request?

This PR proposes to let `WorkerWatcher` reuse the `stopping` flag in 
`CoarseGrainedExecutorBackend` to avoid the duplicate call of `System.exit`.

### Why are the changes needed?

As a followup of https://github.com/apache/spark/pull/32868, this PR tries 
to give a more robust fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass existing tests.

Closes #33028 from Ngone51/spark-35714-followup.

Lead-authored-by: yi.wu 
Co-authored-by: wuyi 
Signed-off-by: yi.wu 
(cherry picked from commit 868a59470650cc12272de0d0b04c6d98b1fe076d)
Signed-off-by: yi.wu 
---
 .../apache/spark/deploy/worker/WorkerWatcher.scala | 17 ++-
 .../executor/CoarseGrainedExecutorBackend.scala| 33 +-
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala
index 43ec492..efffc9f 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.deploy.worker
 
-import scala.concurrent.ExecutionContext.Implicits.global
-import scala.concurrent.Future
-import scala.concurrent.duration._
+import java.util.concurrent.atomic.AtomicBoolean
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc._
-import org.apache.spark.util.ThreadUtils
 
 /**
  * Endpoint which connects to a worker process and terminates the JVM if the
@@ -31,7 +28,10 @@ import org.apache.spark.util.ThreadUtils
  * Provides fate sharing between a worker and its associated child processes.
  */
 private[spark] class WorkerWatcher(
-override val rpcEnv: RpcEnv, workerUrl: String, isTesting: Boolean = false)
+override val rpcEnv: RpcEnv,
+workerUrl: String,
+isTesting: Boolean = false,
+isChildProcessStopping: AtomicBoolean = new AtomicBoolean(false))
   extends RpcEndpoint with Logging {
 
   logInfo(s"Connecting to worker $workerUrl")
@@ -53,10 +53,9 @@ private[spark] class WorkerWatcher(
   private def exitNonZero() =
 if (isTesting) {
   isShutDown = true
-} else {
-  ThreadUtils.awaitResult(Future {
-System.exit(-1)
-  }, 5.seconds)
+} else if (isChildProcessStopping.compareAndSet(false, true)) {
+  // SPARK-35714: avoid the duplicate call of `System.exit` to avoid the 
dead lock
+  System.exit(-1)
 }
 
   override def receive: PartialFunction[Any, Unit] = {
diff --git 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
index d607ee8..95237c9 100644
--- 
a/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
@@ -60,7 +60,7 @@ private[spark] class CoarseGrainedExecutorBackend(
 
   private implicit val formats = DefaultFormats
 
-  private[this] val stopping = new AtomicBoolean(false)
+  private[executor] val stopping = new AtomicBoolean(false)
   var executor: Executor = null
   @volatile var driver: Option[RpcEndpointRef] = None
 
@@ -261,18 +261,22 @@ private[spark] class CoarseGrainedExecutorBackend(
  reason: String,
  throwable: Throwable = null,
  notifyDriver: Boolean = true) = {
-val message = "Executor self-exiting due to : " + reason
-if (throwable != null) {
-  logError(message, throwable)
-} else {
-  logError(message)
-}
+if (stopping.compareAndSet(false, true)) {
+  val message = "Executor self-exiting due to : " + reason
+  if (throwable != null) {
+logError(message, throwable)
+  } else {
+logError(message)
+  }
 
-if (notifyDriver && driver.nonEmpty) {
-  driver.get.send(RemoveExecutor(executorId, new 
ExecutorLossReason(reason)))
-}
+  if (notifyDriver && driver.nonEmpty) {
+driver.get.send(RemoveExecutor(executorId, new 

[spark] branch master updated (5d74ace -> 868a594)

2021-06-30 Thread wuyi
This is an automated email from the ASF dual-hosted git repository.

wuyi pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5d74ace  [SPARK-35065][SQL] Group exception messages in spark/sql 
(core)
 add 868a594  [SPARK-35714][FOLLOW-UP][CORE] Use a shared stopping flag for 
WorkerWatcher to avoid the duplicate System.exit

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/deploy/worker/WorkerWatcher.scala | 17 ++-
 .../executor/CoarseGrainedExecutorBackend.scala| 33 +-
 2 files changed, 27 insertions(+), 23 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (cd6a463 -> 5d74ace)

2021-06-30 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd6a463  [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all 
the shuffles
 add 5d74ace  [SPARK-35065][SQL] Group exception messages in spark/sql 
(core)

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/ResolveUnion.scala |   2 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 300 -
 .../spark/sql/errors/QueryExecutionErrors.scala| 254 -
 .../apache/spark/sql/DataFrameNaFunctions.scala|   7 +-
 .../org/apache/spark/sql/DataFrameReader.scala |  12 +-
 .../org/apache/spark/sql/DataFrameWriter.scala |  42 ++-
 .../org/apache/spark/sql/DataFrameWriterV2.scala   |   3 +-
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  28 +-
 .../spark/sql/RelationalGroupedDataset.scala   |  17 +-
 .../scala/org/apache/spark/sql/RuntimeConfig.scala |   5 +-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   3 +-
 .../org/apache/spark/sql/UDFRegistration.scala | 155 ---
 .../sql/execution/AggregatingAccumulator.scala |   5 +-
 .../execution/BaseScriptTransformationExec.scala   |   9 +-
 .../org/apache/spark/sql/execution/Columnar.scala  |   4 +-
 .../ExternalAppendOnlyUnsafeRowArray.scala |  12 +-
 .../sql/execution/OptimizeMetadataOnlyQuery.scala  |   5 +-
 .../org/apache/spark/sql/execution/SparkPlan.scala |   3 +-
 .../execution/SubqueryAdaptiveBroadcastExec.scala  |   4 +-
 .../sql/execution/SubqueryBroadcastExec.scala  |   4 +-
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |   5 +-
 .../sql/execution/adaptive/QueryStageExec.scala|   2 +-
 .../columnar/compression/CompressionScheme.scala   |   5 +-
 .../sql/execution/datasources/FileScanRDD.scala|  12 +-
 .../execution/datasources/SchemaMergeUtils.scala   |   7 +-
 .../execution/datasources/csv/CSVFileFormat.scala  |  14 +-
 .../spark/sql/execution/datasources/ddl.scala  |   7 +-
 .../execution/datasources/jdbc/DriverWrapper.scala |   7 +-
 .../execution/datasources/jdbc/JDBCRelation.scala  |  12 +-
 .../datasources/jdbc/JdbcRelationProvider.scala|   7 +-
 .../datasources/json/JsonFileFormat.scala  |  14 +-
 .../execution/datasources/orc/OrcFileFormat.scala  |   7 +-
 .../datasources/parquet/ParquetRowConverter.scala  |  17 +-
 .../sql/execution/datasources/pathFilters.scala|   6 +-
 .../datasources/text/TextFileFormat.scala  |   6 +-
 .../datasources/v2/AddPartitionExec.scala  |   5 +-
 .../datasources/v2/DataSourceV2Utils.scala |   4 +-
 .../datasources/v2/DropPartitionExec.scala |   5 +-
 .../sql/execution/datasources/v2/FileScan.scala|   7 +-
 .../sql/execution/datasources/v2/FileTable.scala   |   9 +-
 .../sql/execution/datasources/v2/FileWrite.scala   |   6 +-
 .../datasources/v2/TruncatePartitionExec.scala |   4 +-
 .../sql/execution/datasources/v2/V2Writes.scala|   9 +-
 .../sql/execution/datasources/v2/csv/CSVScan.scala |  14 +-
 .../execution/datasources/v2/json/JsonScan.scala   |  14 +-
 .../execution/datasources/v2/text/TextWrite.scala  |   5 +-
 .../execution/exchange/BroadcastExchangeExec.scala |  25 +-
 .../spark/sql/execution/metric/SQLMetrics.scala|   5 +-
 .../streaming/CompactibleFileStreamLog.scala   |   5 +-
 .../sql/execution/streaming/FileStreamSource.scala |   7 +-
 .../execution/streaming/ResolveWriteToStream.scala |  31 +--
 .../sql/execution/streaming/StreamMetadata.scala   |   6 +-
 .../streaming/continuous/ContinuousExecution.scala |   4 +-
 .../continuous/ContinuousQueuedDataReader.scala|  10 +-
 .../WriteToContinuousDataSourceExec.scala  |   4 +-
 .../streaming/sources/ForeachWriterTable.scala |   4 +-
 .../sources/RateStreamMicroBatchStream.scala   |   7 +-
 .../sources/TextSocketSourceProvider.scala |   7 +-
 .../state/HDFSBackedStateStoreProvider.scala   |  12 +-
 .../apache/spark/sql/expressions/WindowSpec.scala  |   7 +-
 .../scala/org/apache/spark/sql/functions.scala |  16 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|  10 +-
 .../apache/spark/sql/internal/SharedState.scala|   8 +-
 .../org/apache/spark/sql/jdbc/DerbyDialect.scala   |   5 +-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |   3 +-
 .../apache/spark/sql/jdbc/MsSqlServerDialect.scala |   6 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |   8 +-
 .../spark/sql/streaming/DataStreamReader.scala |   2 +-
 .../spark/sql/util/QueryExecutionListener.scala|   7 +-
 69 files changed, 826 insertions(+), 457 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all the shuffles

2021-06-30 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cd6a463  [SPARK-35888][SQL][FOLLOWUP] Return partition specs for all 
the shuffles
cd6a463 is described below

commit cd6a4638110ef3f0db8b6366be680870dfb0bcad
Author: Wenchen Fan 
AuthorDate: Thu Jul 1 01:43:11 2021 +

[SPARK-35888][SQL][FOLLOWUP] Return partition specs for all the shuffles

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/33079, to fix a 
bug in corner cases: `ShufflePartitionsUtil.coalescePartitions` should either 
return the shuffle spec for all the shuffles, or none.

If the input RDD has no partition, the `mapOutputStatistics` is None, and 
we should still return shuffle specs with size 0.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

a new test

Closes #33158 from cloud-fan/bug.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../execution/adaptive/ShufflePartitionsUtil.scala | 43 --
 .../adaptive/AdaptiveQueryExecSuite.scala  | 24 
 2 files changed, 48 insertions(+), 19 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
index a1f2d91..1353dc9 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala
@@ -96,9 +96,8 @@ object ShufflePartitionsUtil extends Logging {
 
 val numPartitions = validMetrics.head.bytesByPartitionId.length
 val newPartitionSpecs = coalescePartitions(0, numPartitions, validMetrics, 
targetSize)
-assert(newPartitionSpecs.length == validMetrics.length)
-if (newPartitionSpecs.head.length < numPartitions) {
-  newPartitionSpecs
+if (newPartitionSpecs.length < numPartitions) {
+  attachDataSize(mapOutputStatistics, newPartitionSpecs)
 } else {
   Seq.empty
 }
@@ -148,7 +147,8 @@ object ShufflePartitionsUtil extends Logging {
 if (i - 1 > start) {
   val partitionSpecs = coalescePartitions(
 partitionIndices(start), repeatValue, validMetrics, targetSize, 
true)
-  newPartitionSpecsSeq.zip(partitionSpecs).foreach(spec => spec._1 ++= 
spec._2)
+  newPartitionSpecsSeq.zip(attachDataSize(mapOutputStatistics, 
partitionSpecs))
+.foreach(spec => spec._1 ++= spec._2)
 }
 // find the end of this skew section, skipping partition(i - 1) and 
partition(i).
 var repeatIndex = i + 1
@@ -173,7 +173,8 @@ object ShufflePartitionsUtil extends Logging {
 if (numPartitions > start) {
   val partitionSpecs = coalescePartitions(
 partitionIndices(start), partitionIndices.last + 1, validMetrics, 
targetSize, true)
-  newPartitionSpecsSeq.zip(partitionSpecs).foreach(spec => spec._1 ++= 
spec._2)
+  newPartitionSpecsSeq.zip(attachDataSize(mapOutputStatistics, 
partitionSpecs))
+.foreach(spec => spec._1 ++= spec._2)
 }
 // only return coalesced result if any coalescing has happened.
 if (newPartitionSpecsSeq.head.length < numPartitions) {
@@ -204,19 +205,17 @@ object ShufflePartitionsUtil extends Logging {
*  - coalesced partition 2: shuffle partition 2 (size 170 MiB)
*  - coalesced partition 3: shuffle partition 3 and 4 (size 50 MiB)
*
-   *  @return A sequence of sequence of [[CoalescedPartitionSpec]]s. which 
each inner sequence as
-   *  the new partition specs for its corresponding shuffle after 
coalescing. For example,
-   *  if partitions [0, 1, 2, 3, 4] and partition bytes [10, 10, 100, 
10, 20] with
-   *  targetSize 100, split at indices [0, 2, 3], the returned 
partition specs will be:
-   *  CoalescedPartitionSpec(0, 2, 20), CoalescedPartitionSpec(2, 3, 
100) and
-   *  CoalescedPartitionSpec(3, 5, 30).
+   *  @return A sequence of [[CoalescedPartitionSpec]]s. For example, if 
partitions [0, 1, 2, 3, 4]
+   *  split at indices [0, 2, 3], the returned partition specs will be:
+   *  CoalescedPartitionSpec(0, 2), CoalescedPartitionSpec(2, 3) and
+   *  CoalescedPartitionSpec(3, 5).
*/
   private def coalescePartitions(
   start: Int,
   end: Int,
   mapOutputStatistics: Seq[MapOutputStatistics],
   targetSize: Long,
-  allowReturnEmpty: Boolean = false): Seq[Seq[CoalescedPartitionSpec]] = {
+  allowReturnEmpty: Boolean = false): Seq[CoalescedPartitionSpec] = {
 

[spark] branch master updated (5ad1261 -> a98c8ae)

2021-06-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5ad1261  [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6
 add a98c8ae  [SPARK-35944][PYTHON] Introduce Name and Label type aliases

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/_typing.py |   6 +-
 python/pyspark/pandas/accessors.py   |   4 +-
 python/pyspark/pandas/base.py|   4 +-
 python/pyspark/pandas/frame.py   | 218 ++-
 python/pyspark/pandas/generic.py |  34 +++--
 python/pyspark/pandas/groupby.py |  38 +++---
 python/pyspark/pandas/indexes/base.py|  52 
 python/pyspark/pandas/indexes/multi.py   |  34 +++--
 python/pyspark/pandas/indexes/numeric.py |   8 +-
 python/pyspark/pandas/indexing.py| 128 +-
 python/pyspark/pandas/internal.py|  61 -
 python/pyspark/pandas/ml.py  |   3 +-
 python/pyspark/pandas/mlflow.py  |   5 +-
 python/pyspark/pandas/namespace.py   |  20 +--
 python/pyspark/pandas/series.py  |  52 
 python/pyspark/pandas/utils.py   |  38 +++---
 16 files changed, 343 insertions(+), 362 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6

2021-06-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5ad1261  [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6
5ad1261 is described below

commit 5ad12611ec23d02b5988b92561b70dbeacf6af78
Author: Xinrong Meng 
AuthorDate: Thu Jul 1 09:32:25 2021 +0900

[SPARK-35938][PYTHON] Add deprecation warning for Python 3.6

### What changes were proposed in this pull request?

Add deprecation warning for Python 3.6.

### Why are the changes needed?

According to https://endoflife.date/python, Python 3.6 will be EOL on 23 
Dec, 2021.
We should prepare for the deprecation of Python 3.6 support in Spark in 
advance.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Manual tests.

Closes #33139 from xinrong-databricks/deprecate3.6_warn.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/context.py | 8 
 1 file changed, 8 insertions(+)

diff --git a/python/pyspark/context.py b/python/pyspark/context.py
index 6c6a538..6c94106 100644
--- a/python/pyspark/context.py
+++ b/python/pyspark/context.py
@@ -230,6 +230,14 @@ class SparkContext(object):
 self.pythonExec = os.environ.get("PYSPARK_PYTHON", 'python3')
 self.pythonVer = "%d.%d" % sys.version_info[:2]
 
+if sys.version_info[:2] < (3, 7):
+with warnings.catch_warnings():
+warnings.simplefilter("once")
+warnings.warn(
+"Python 3.6 support is deprecated in Spark 3.2.",
+FutureWarning
+)
+
 # Broadcast's __reduce__ method stores Broadcast instances here.
 # This allows other code to determine which Broadcast instances have
 # been pickled, so it can determine which Java broadcast objects to

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a5c8866 -> 9e39415)

2021-06-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a5c8866  [SPARK-34859][SQL] Handle column index when using vectorized 
Parquet reader
 add 9e39415  [SPARK-35939][DOCS][PYTHON] Deprecate Python 3.6 in Spark 
documentation

No new revisions were added by this update.

Summary of changes:
 docs/index.md | 1 +
 docs/rdd-programming-guide.md | 1 +
 2 files changed, 2 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on pull request #349: Remove Apache Spark 2.4 from download page

2021-06-30 Thread GitBox


dongjoon-hyun commented on pull request #349:
URL: https://github.com/apache/spark-website/pull/349#issuecomment-871801270


   Thank you, @maropu !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] maropu commented on pull request #349: Remove Apache Spark 2.4 from download page

2021-06-30 Thread GitBox


maropu commented on pull request #349:
URL: https://github.com/apache/spark-website/pull/349#issuecomment-871800188


   late lgtm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

2021-06-30 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a5c8866  [SPARK-34859][SQL] Handle column index when using vectorized 
Parquet reader
a5c8866 is described below

commit a5c886619dd1573e96bbba058db099b47f0c147c
Author: Chao Sun 
AuthorDate: Wed Jun 30 14:21:18 2021 -0700

[SPARK-34859][SQL] Handle column index when using vectorized Parquet reader

### What changes were proposed in this pull request?

Make the current vectorized Parquet reader to work with column index 
introduced in Parquet 1.11. In particular, this PR makes the following changes:
1. in `ParquetReadState`, track row ranges returned via 
`PageReadStore.getRowIndexes` as well as the first row index for each page via 
`DataPage.getFirstRowIndex`.
1. introduced a new API `ParquetVectorUpdater.skipValues` which skips a 
batch of values from a Parquet value reader. As part of the process also 
renamed existing `updateBatch` to `readValues`, and `update` to `readValue` to 
keep the method names consistent.
1. in correspondence as above, also introduced new API 
`VectorizedValuesReader.skipXXX` for different data types, as well as the 
implementations. These are useful when the reader knows that the given batch of 
values can be skipped, for instance, due to the batch is not covered in the row 
ranges generated by column index filtering.
2. changed `VectorizedRleValuesReader` to handle column index filtering. 
This is done by comparing the range that is going to be read next within the 
current RLE/PACKED block (let's call this block range), against the current row 
range. There are three cases:
* if the block range is before the current row range, skip all the 
values in the block range
* if the block range is after the current row range, advance the row 
range and repeat the steps
* if the block range overlaps with the current row range, only read the 
values within the overlapping area and skip the rest.

### Why are the changes needed?

[Parquet Column 
Index](https://github.com/apache/parquet-format/blob/master/PageIndex.md) is a 
new feature in Parquet 1.11 which allows very efficient filtering on page level 
(some benchmark numbers can be found 
[here](https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/)),
 especially when data is sorted. The feature is largely implemented in 
parquet-mr (via classes such as `ColumnIndex` and `ColumnIndexFilter`). In 
Spark, the non-vectorized Parquet reader c [...]

Previously, 
[SPARK-26345](https://issues.apache.org/jira/browse/SPARK-26345) / (#31393) 
updated Spark to only scan pages filtered by column index from parquet-mr side. 
This is done by calling `ParquetFileReader.readNextFilteredRowGroup` and 
`ParquetFileReader.getFilteredRecordCount` API. The implementation, however, 
only work for a few limited cases: in the scenario where there are multiple 
columns and their type width are different (e.g., `int` and `bigint`), it could 
return incorrec [...]

In order to fix the above, Spark needs to leverage the API 
`PageReadStore.getRowIndexes` and `DataPage.getFirstRowIndex`. The former 
returns the indexes of all rows (note the difference between rows and values: 
for flat schema there is no difference between the two, but for nested schema 
they're different) after filtering within a Parquet row group. The latter 
returns the first row index within a single data page. With the combination of 
the two, one is able to know which rows/values  [...]

### Does this PR introduce _any_ user-facing change?

Yes. Now the vectorized Parquet reader should work correctly with column 
index.

### How was this patch tested?

Borrowed tests from #31998 and added a few more tests.

Closes #32753 from sunchao/SPARK-34859.

Lead-authored-by: Chao Sun 
Co-authored-by: Li Xian 
Signed-off-by: Dongjoon Hyun 
---
 .../datasources/parquet/ParquetReadState.java  | 120 ++-
 .../datasources/parquet/ParquetVectorUpdater.java  |  12 +-
 .../parquet/ParquetVectorUpdaterFactory.java   | 216 ++-
 .../parquet/VectorizedColumnReader.java|  28 ++-
 .../parquet/VectorizedParquetRecordReader.java |   1 +
 .../parquet/VectorizedPlainValuesReader.java   |  51 +
 .../parquet/VectorizedRleValuesReader.java | 239 -
 .../parquet/VectorizedValuesReader.java|  13 ++
 .../parquet/ParquetColumnIndexSuite.scala  | 126 +++
 .../datasources/parquet/ParquetIOSuite.scala   |  72 ++-
 10 files changed, 746 insertions(+), 132 deletions(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetReadState.java
 

[spark] branch master updated (2c94fbc -> d46c1e3)

2021-06-30 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2c94fbc  initial commit for skeleton ansible for jenkins worker config
 add d46c1e3  [SPARK-35725][SQL] Support optimize skewed partitions in 
RebalancePartitions

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  10 ++
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |   1 +
 .../OptimizeSkewInRebalancePartitions.scala| 101 +
 .../execution/adaptive/OptimizeSkewedJoin.scala|  40 +---
 .../execution/adaptive/ShufflePartitionsUtil.scala |  39 +++-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  43 -
 6 files changed, 193 insertions(+), 41 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewInRebalancePartitions.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (733e85f1 -> 2c94fbc)

2021-06-30 Thread shaneknapp
This is an automated email from the ASF dual-hosted git repository.

shaneknapp pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 733e85f1 [SPARK-35953][SQL] Support extracting date fields from 
timestamp without time zone
 add 2c94fbc  initial commit for skeleton ansible for jenkins worker config

No new revisions were added by this update.

Summary of changes:
 dev/.rat-excludes  |   1 +
 dev/ansible-for-test-node/README.md|  25 +++
 .../deploy-jenkins-worker.yml  |   8 +
 dev/ansible-for-test-node/roles/common/README.md   |   4 +
 .../roles/common/tasks/main.yml|   4 +
 .../roles/common/tasks/setup_local_userspace.yml   |   8 +
 .../roles/common/tasks/system_packages.yml |  73 
 .../roles/jenkins-worker/README.md |  15 ++
 .../roles/jenkins-worker/defaults/main.yml |  30 
 .../files/python_environments/base-py3-pip.txt |   3 +
 .../files/python_environments/base-py3-spec.txt|  21 +++
 .../files/python_environments/py36.txt |  49 ++
 .../files/python_environments/spark-py2-pip.txt|   8 +
 .../files/python_environments/spark-py36-spec.txt  |  61 +++
 .../files/python_environments/spark-py3k-spec.txt  |  42 +
 .../files/scripts/jenkins-gitcache-cron|   7 +
 .../files/util_scripts/kill_zinc_nailgun.py|  60 +++
 .../files/util_scripts/post_github_pr_comment.py   |  81 +
 .../files/util_scripts/session_lock_resource.py| 152 +
 .../roles/jenkins-worker/files/worker-limits.conf  |   5 +
 .../roles/jenkins-worker/tasks/cleanup.yml |  12 ++
 .../jenkins-worker/tasks/install_anaconda.yml  |  79 +
 .../tasks/install_build_packages.yml   |  21 +++
 .../roles/jenkins-worker/tasks/install_docker.yml  |  33 
 .../jenkins-worker/tasks/install_minikube.yml  |  16 ++
 .../tasks/install_spark_build_packages.yml | 183 +
 .../jenkins-worker/tasks/jenkins_userspace.yml | 119 ++
 .../roles/jenkins-worker/tasks/main.yml|  22 +++
 .../roles/jenkins-worker/vars/main.yml |   9 +
 dev/tox.ini|   4 +-
 30 files changed, 1153 insertions(+), 2 deletions(-)
 create mode 100644 dev/ansible-for-test-node/README.md
 create mode 100644 dev/ansible-for-test-node/deploy-jenkins-worker.yml
 create mode 100644 dev/ansible-for-test-node/roles/common/README.md
 create mode 100644 dev/ansible-for-test-node/roles/common/tasks/main.yml
 create mode 100644 
dev/ansible-for-test-node/roles/common/tasks/setup_local_userspace.yml
 create mode 100644 
dev/ansible-for-test-node/roles/common/tasks/system_packages.yml
 create mode 100644 dev/ansible-for-test-node/roles/jenkins-worker/README.md
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/defaults/main.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/base-py3-pip.txt
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/base-py3-spec.txt
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/py36.txt
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/spark-py2-pip.txt
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/spark-py36-spec.txt
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/python_environments/spark-py3k-spec.txt
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/scripts/jenkins-gitcache-cron
 create mode 100755 
dev/ansible-for-test-node/roles/jenkins-worker/files/util_scripts/kill_zinc_nailgun.py
 create mode 100755 
dev/ansible-for-test-node/roles/jenkins-worker/files/util_scripts/post_github_pr_comment.py
 create mode 100755 
dev/ansible-for-test-node/roles/jenkins-worker/files/util_scripts/session_lock_resource.py
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/files/worker-limits.conf
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/cleanup.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_anaconda.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_build_packages.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_docker.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_minikube.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/install_spark_build_packages.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/jenkins_userspace.yml
 create mode 100644 
dev/ansible-for-test-node/roles/jenkins-worker/tasks/main.yml
 create mode 100644 

[spark] branch master updated (2febd5c -> 733e85f1)

2021-06-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2febd5c  [SPARK-35735][SQL] Take into account day-time interval fields 
in cast
 add 733e85f1 [SPARK-35953][SQL] Support extracting date fields from 
timestamp without time zone

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   |   2 +-
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |   4 +-
 .../test/resources/sql-tests/inputs/extract.sql|  92 +++
 .../resources/sql-tests/results/extract.sql.out| 276 ++---
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |   8 +-
 5 files changed, 192 insertions(+), 190 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35735][SQL] Take into account day-time interval fields in cast

2021-06-30 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2febd5c  [SPARK-35735][SQL] Take into account day-time interval fields 
in cast
2febd5c is described below

commit 2febd5c3f0c3a0c6660cfb340eb65316a1ca4acd
Author: Angerszh 
AuthorDate: Wed Jun 30 16:05:04 2021 +0300

[SPARK-35735][SQL] Take into account day-time interval fields in cast

### What changes were proposed in this pull request?
Support take into account day-time interval field in cast.

### Why are the changes needed?
To conform to the SQL standard.

### Does this PR introduce _any_ user-facing change?
An user can use `cast(str, DayTimeInterval(DAY, HOUR))`, for instance.

### How was this patch tested?
Added UT.

Closes #32943 from AngersZh/SPARK-35735.

Authored-by: Angerszh 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/util/IntervalUtils.scala| 203 +++--
 .../sql/catalyst/expressions/CastSuiteBase.scala   | 145 ++-
 2 files changed, 324 insertions(+), 24 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
index 7a6de7f..30a2fa5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala
@@ -119,18 +119,27 @@ object IntervalUtils {
 }
   }
 
+  val supportedFormat = Map(
+(YM.YEAR, YM.MONTH) -> Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO 
MONTH"),
+(YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"),
+(YM.MONTH, YM.MONTH) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MONTH"),
+(DT.DAY, DT.DAY) -> Seq("[+|-]d", "INTERVAL [+|-]'[+|-]d' DAY"),
+(DT.DAY, DT.HOUR) -> Seq("[+|-]d h", "INTERVAL [+|-]'[+|-]d h' DAY TO 
HOUR"),
+(DT.DAY, DT.MINUTE) -> Seq("[+|-]d h:m", "INTERVAL [+|-]'[+|-]d h:m' DAY 
TO MINUTE"),
+(DT.DAY, DT.SECOND) -> Seq("[+|-]d h:m:s.n", "INTERVAL [+|-]'[+|-]d 
h:m:s.n' DAY TO SECOND"),
+(DT.HOUR, DT.HOUR) -> Seq("[+|-]h", "INTERVAL [+|-]'[+|-]h' HOUR"),
+(DT.HOUR, DT.MINUTE) -> Seq("[+|-]h:m", "INTERVAL [+|-]'[+|-]h:m' HOUR TO 
MINUTE"),
+(DT.HOUR, DT.SECOND) -> Seq("[+|-]h:m:s.n", "INTERVAL [+|-]'[+|-]h:m:s.n' 
HOUR TO SECOND"),
+(DT.MINUTE, DT.MINUTE) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MINUTE"),
+(DT.MINUTE, DT.SECOND) -> Seq("[+|-]m:s.n", "INTERVAL [+|-]'[+|-]m:s.n' 
MINUTE TO SECOND"),
+(DT.SECOND, DT.SECOND) -> Seq("[+|-]s.n", "INTERVAL [+|-]'[+|-]s.n' 
SECOND")
+  )
+
   def castStringToYMInterval(
   input: UTF8String,
   startField: Byte,
   endField: Byte): Int = {
 
-val supportedFormat = Map(
-  (YM.YEAR, YM.MONTH) ->
-Seq("[+|-]y-m", "INTERVAL [+|-]'[+|-]y-m' YEAR TO MONTH"),
-  (YM.YEAR, YM.YEAR) -> Seq("[+|-]y", "INTERVAL [+|-]'[+|-]y' YEAR"),
-  (YM.MONTH, YM.MONTH) -> Seq("[+|-]m", "INTERVAL [+|-]'[+|-]m' MONTH")
-)
-
 def checkStringIntervalType(targetStartField: Byte, targetEndField: Byte): 
Unit = {
   if (startField != targetStartField || endField != targetEndField) {
 throw new IllegalArgumentException(s"Interval string does not match 
year-month format of " +
@@ -151,7 +160,7 @@ object IntervalUtils {
 checkStringIntervalType(YM.YEAR, YM.MONTH)
 toYMInterval(year, month, getSign(firstSign, secondSign))
   case yearMonthIndividualRegex(secondSign, value) =>
-safeToYMInterval {
+safeToInterval {
   val sign = getSign("+", secondSign)
   if (endField == YM.YEAR) {
 sign * Math.toIntExact(value.toLong * MONTHS_PER_YEAR)
@@ -166,7 +175,7 @@ object IntervalUtils {
   }
 }
   case yearMonthIndividualLiteralRegex(firstSign, secondSign, value, 
suffix) =>
-safeToYMInterval {
+safeToInterval {
   val sign = getSign(firstSign, secondSign)
   if ("YEAR".equalsIgnoreCase(suffix)) {
 checkStringIntervalType(YM.YEAR, YM.YEAR)
@@ -202,7 +211,7 @@ object IntervalUtils {
 }
   }
 
-  private def safeToYMInterval(f: => Int): Int = {
+  private def safeToInterval[T](f: => T): T = {
 try {
   f
 } catch {
@@ -213,24 +222,72 @@ object IntervalUtils {
   }
 
   private def toYMInterval(yearStr: String, monthStr: String, sign: Int): Int 
= {
-safeToYMInterval {
+safeToInterval {
   val years = toLongWithRange(YEAR, yearStr, 0, Integer.MAX_VALUE / 
MONTHS_PER_YEAR)
   val totalMonths = sign * (years * MONTHS_PER_YEAR + 
toLongWithRange(MONTH, monthStr, 0, 11))
   Math.toIntExact(totalMonths)
 }
   }
 
+  private val normalPattern = "(\\d{1,2})"
+  

[spark] branch master updated (c6afd6e -> e88aa49)

2021-06-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c6afd6e  [SPARK-35951][DOCS] Add since versions for Avro options in 
Documentation
 add e88aa49  [SPARK-35932][SQL] Support extracting hour/minute/second from 
timestamp without time zone

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/AnsiTypeCoercion.scala   |   1 +
 .../spark/sql/catalyst/analysis/TypeCoercion.scala |   6 +-
 .../catalyst/expressions/datetimeExpressions.scala |   8 +-
 .../apache/spark/sql/types/AbstractDataType.scala  |   2 +-
 .../expressions/DateExpressionsSuite.scala |  65 +---
 .../test/resources/sql-tests/inputs/extract.sql|  66 
 .../resources/sql-tests/results/extract.sql.out| 182 ++---
 7 files changed, 181 insertions(+), 149 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35951][DOCS] Add since versions for Avro options in Documentation

2021-06-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c6afd6e  [SPARK-35951][DOCS] Add since versions for Avro options in 
Documentation
c6afd6e is described below

commit c6afd6ed5296980e81160e441a4e9bea98c74196
Author: Gengliang Wang 
AuthorDate: Wed Jun 30 17:24:48 2021 +0800

[SPARK-35951][DOCS] Add since versions for Avro options in Documentation

### What changes were proposed in this pull request?

There are two new Avro options `datetimeRebaseMode` and 
`positionalFieldMatching` after Spark 3.2.
We should document the since version so that users can know whether the 
option works in their Spark version.

### Why are the changes needed?

Better documentation.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Manual preview on local setup.
https://user-images.githubusercontent.com/1097932/123934000-ba833b00-d947-11eb-9ca5-ce8ff8add74b.png;>

https://user-images.githubusercontent.com/1097932/123934126-d4bd1900-d947-11eb-8d80-69df8f3d9900.png;>

Closes #33153 from gengliangwang/version.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 docs/sql-data-sources-avro.md | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index 7fb0ef5..94dd7e1 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -224,7 +224,7 @@ Data source options of Avro can be set via:
  * the `options` parameter in function `from_avro`.
 
 
-  Property 
NameDefaultMeaningScope
+  Property 
NameDefaultMeaningScopeSince
 Version
   
 avroSchema
 None
@@ -244,24 +244,28 @@ Data source options of Avro can be set via:
   
 
  read, write and function from_avro
+2.4.0
   
   
 recordName
 topLevelRecord
 Top level record name in write result, which is required in Avro 
spec.
 write
+2.4.0
   
   
 recordNamespace
 ""
 Record namespace in write result.
 write
+2.4.0
   
   
 ignoreExtension
 true
 The option controls ignoring of files without .avro 
extensions in read. If the option is enabled, all files (with and without 
.avro extension) are loaded. The option has been deprecated, 
and it will be removed in the future releases. Please use the general data 
source option pathGlobFilter
 for filtering file names.
 read
+2.4.0
   
   
 compression
@@ -269,6 +273,7 @@ Data source options of Avro can be set via:
 The compression option allows to specify a compression 
codec used in write.
   Currently supported codecs are uncompressed, 
snappy, deflate, bzip2 and 
xz. If the option is not set, the configuration 
spark.sql.avro.compression.codec config is taken into account.
 write
+2.4.0
   
   
 mode
@@ -282,6 +287,7 @@ Data source options of Avro can be set via:
   
 
 function from_avro
+2.4.0
   
   
 datetimeRebaseMode
@@ -295,12 +301,14 @@ Data source options of Avro can be set via:
   
 
 read and function from_avro
+3.2.0
   
   
 positionalFieldMatching
 false
 This can be used in tandem with the `avroSchema` option to adjust the 
behavior for matching the fields in the provided Avro schema with those in the 
SQL schema. By default, the matching will be performed using field names, 
ignoring their positions. If this option is set to "true", the matching will be 
based on the position of the fields.
 read and write
+3.2.0
   
 
 

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (4dd41b9 -> e3bd817)

2021-06-30 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4dd41b9  [SPARK-34365][AVRO] Add support for positional 
Catalyst-to-Avro schema matching
 add e3bd817  [SPARK-34920][CORE][SQL] Add error classes with SQLSTATE

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/README.md|  89 +
 core/src/main/resources/error/error-classes.json   |  22 
 .../main/scala/org/apache/spark/SparkError.scala   |  83 +
 .../scala/org/apache/spark/SparkException.scala|  34 -
 .../scala/org/apache/spark/SparkErrorSuite.scala   | 137 +
 .../org/apache/spark/sql/AnalysisException.scala   |  38 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |   2 +-
 .../sql/catalyst/analysis/CheckAnalysis.scala  |   3 +-
 .../spark/sql/catalyst/analysis/package.scala  |   8 ++
 .../spark/sql/catalyst/parser/ParseDriver.scala|  26 +++-
 .../spark/sql/errors/QueryCompilationErrors.scala  |   5 +-
 .../spark/sql/errors/QueryExecutionErrors.scala|   9 +-
 .../spark/sql/errors/QueryParsingErrors.scala  |   3 +-
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala |   5 +
 .../expressions/ArithmeticExpressionSuite.scala|  22 +++-
 .../sql-tests/results/ansi/interval.sql.out|   2 +-
 .../sql-tests/results/postgreSQL/case.sql.out  |   6 +-
 .../sql-tests/results/postgreSQL/int8.sql.out  |   6 +-
 .../results/postgreSQL/select_having.sql.out   |   2 +-
 .../results/udf/postgreSQL/udf-case.sql.out|   6 +-
 .../udf/postgreSQL/udf-select_having.sql.out   |   2 +-
 .../spark/sql/connector/DataSourceV2Suite.scala|   2 +
 .../hive/thriftserver/HiveThriftServerErrors.scala |  10 +-
 .../sql/hive/thriftserver/SparkSQLDriver.scala |  10 +-
 .../ThriftServerWithSparkContextSuite.scala|   3 +
 25 files changed, 499 insertions(+), 36 deletions(-)
 create mode 100644 core/src/main/resources/error/README.md
 create mode 100644 core/src/main/resources/error/error-classes.json
 create mode 100644 core/src/main/scala/org/apache/spark/SparkError.scala
 create mode 100644 core/src/test/scala/org/apache/spark/SparkErrorSuite.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated (ab46045 -> 6a1361c)

2021-06-30 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ab46045  [SPARK-35886][SQL][3.0] PromotePrecision should not overwrite 
genCodePromotePrecision should not overwrite genCode
 add 6a1361c  [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR 
TABLE` on table refreshing

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.1 updated (fe412b6 -> b6e8fab)

2021-06-30 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fe412b6  [SPARK-35898][SQL] Fix arrays and maps in RowToColumnConverter
 add b6e8fab  [SPARK-35935][SQL][3.1][3.0] Prevent failure of `MSCK REPAIR 
TABLE` on table refreshing

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (6bbfb45 -> 4dd41b9)

2021-06-30 Thread gengliang
This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6bbfb45  [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to 
`FileCommitProtocol`
 add 4dd41b9  [SPARK-34365][AVRO] Add support for positional 
Catalyst-to-Avro schema matching

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-avro.md  |   6 +
 .../apache/spark/sql/avro/AvroDeserializer.scala   |  15 +-
 .../org/apache/spark/sql/avro/AvroFileFormat.scala |   1 +
 .../org/apache/spark/sql/avro/AvroOptions.scala|   8 +
 .../apache/spark/sql/avro/AvroOutputWriter.scala   |   5 +-
 .../spark/sql/avro/AvroOutputWriterFactory.scala   |   8 +-
 .../org/apache/spark/sql/avro/AvroSerializer.scala |  22 +--
 .../org/apache/spark/sql/avro/AvroUtils.scala  |  42 +-
 .../sql/v2/avro/AvroPartitionReaderFactory.scala   |   1 +
 .../sql/avro/AvroCatalystDataConversionSuite.scala |   1 +
 .../apache/spark/sql/avro/AvroRowReaderSuite.scala |   1 +
 .../spark/sql/avro/AvroSchemaHelperSuite.scala |  24 ++-
 .../org/apache/spark/sql/avro/AvroSerdeSuite.scala | 164 ++---
 .../org/apache/spark/sql/avro/AvroSuite.scala  |  41 +-
 14 files changed, 258 insertions(+), 81 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to `FileCommitProtocol`

2021-06-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6bbfb45  [SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to 
`FileCommitProtocol`
6bbfb45 is described below

commit 6bbfb45ffe75aa6c27a7bf3c3385a596637d1822
Author: Cheng Su 
AuthorDate: Wed Jun 30 16:25:20 2021 +0900

[SPARK-33298][CORE][FOLLOWUP] Add Unstable annotation to 
`FileCommitProtocol`

### What changes were proposed in this pull request?

This is the followup from 
https://github.com/apache/spark/pull/33012#discussion_r659440833, where we want 
to add `Unstable` to `FileCommitProtocol`, to give people a better idea of API.

### Why are the changes needed?

Make it easier for people to follow and understand code. Clean up code.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests, as no real logic change.

Closes #33148 from c21/bucket-followup.

Authored-by: Cheng Su 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/internal/io/FileCommitProtocol.scala  | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala 
b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
index 6465cc7..5cd7397 100644
--- a/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala
@@ -20,6 +20,7 @@ package org.apache.spark.internal.io
 import org.apache.hadoop.fs._
 import org.apache.hadoop.mapreduce._
 
+import org.apache.spark.annotation.Unstable
 import org.apache.spark.internal.Logging
 import org.apache.spark.util.Utils
 
@@ -41,7 +42,11 @@ import org.apache.spark.util.Utils
  *(or abortTask if task failed).
  * 3. When all necessary tasks completed successfully, the driver calls 
commitJob. If the job
  *failed to execute (e.g. too many failed tasks), the job should call 
abortJob.
+ *
+ * @note This class is exposed as an API considering the usage of many 
downstream custom
+ * implementations, but will be subject to be changed and/or moved.
  */
+@Unstable
 abstract class FileCommitProtocol extends Logging {
   import FileCommitProtocol._
 
@@ -107,9 +112,7 @@ abstract class FileCommitProtocol extends Logging {
* if a task is going to write out multiple files to the same dir. The file 
commit protocol only
* guarantees that files written by different tasks will not conflict.
*
-   * This API should be implemented and called, instead of
-   * [[newTaskTempFile(taskContest, dir, ext)]]. Provide a default 
implementation here to be
-   * backward compatible with custom [[FileCommitProtocol]] implementations 
before Spark 3.2.0.
+   * @since 3.2.0
*/
   def newTaskTempFile(
   taskContext: TaskAttemptContext, dir: Option[String], spec: 
FileNameSpec): String = {
@@ -144,10 +147,7 @@ abstract class FileCommitProtocol extends Logging {
* if a task is going to write out multiple files to the same dir. The file 
commit protocol only
* guarantees that files written by different tasks will not conflict.
*
-   * This API should be implemented and called, instead of
-   * [[newTaskTempFileAbsPath(taskContest, absoluteDir, ext)]]. Provide a 
default implementation
-   * here to be backward compatible with custom [[FileCommitProtocol]] 
implementations before
-   * Spark 3.2.0.
+   * @since 3.2.0
*/
   def newTaskTempFileAbsPath(
   taskContext: TaskAttemptContext, absoluteDir: String, spec: 
FileNameSpec): String = {

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35948][INFRA] Simplify release scripts by removing Spark 2.4/Java7 parts

2021-06-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b218cc9  [SPARK-35948][INFRA] Simplify release scripts by removing 
Spark 2.4/Java7 parts
b218cc9 is described below

commit b218cc90cfa957bdbf443ed3dbdfdf660bc1312d
Author: Dongjoon Hyun 
AuthorDate: Wed Jun 30 16:24:03 2021 +0900

[SPARK-35948][INFRA] Simplify release scripts by removing Spark 2.4/Java7 
parts

### What changes were proposed in this pull request?

This PR aims to clean up Spark 2.4 and Java7 code path from the release 
scripts.

### Why are the changes needed?

To simplify the logic.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #33150 from dongjoon-hyun/SPARK-35948.

Authored-by: Dongjoon Hyun 
Signed-off-by: Hyukjin Kwon 
---
 dev/create-release/release-build.sh | 28 ++--
 dev/create-release/release-util.sh  |  6 --
 2 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 28d4853..96b8d4e 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -177,10 +177,7 @@ fi
 
 # Depending on the version being built, certain extra profiles need to be 
activated, and
 # different versions of Scala are supported.
-BASE_PROFILES="-Pmesos -Pyarn"
-if [[ $SPARK_VERSION > "2.3" ]]; then
-  BASE_PROFILES="$BASE_PROFILES -Pkubernetes"
-fi
+BASE_PROFILES="-Pmesos -Pyarn -Pkubernetes"
 
 PUBLISH_SCALA_2_13=1
 SCALA_2_13_PROFILES="-Pscala-2.13"
@@ -188,14 +185,8 @@ if [[ $SPARK_VERSION < "3.2" ]]; then
   PUBLISH_SCALA_2_13=0
 fi
 
-PUBLISH_SCALA_2_12=0
+PUBLISH_SCALA_2_12=1
 SCALA_2_12_PROFILES="-Pscala-2.12"
-if [[ $SPARK_VERSION < "3.0." ]]; then
-  SCALA_2_12_PROFILES="-Pscala-2.12 -Pflume"
-fi
-if [[ $SPARK_VERSION > "2.4" ]]; then
-  PUBLISH_SCALA_2_12=1
-fi
 
 # Hive-specific profiles for some builds
 HIVE_PROFILES="-Phive -Phive-thriftserver"
@@ -236,12 +227,9 @@ if [[ "$1" == "package" ]]; then
   echo "Packaging release source tarballs"
   cp -r spark spark-$SPARK_VERSION
 
-  # For source release in v2.4+, exclude copy of binary license/notice
-  if [[ $SPARK_VERSION > "2.4" ]]; then
-rm -f spark-$SPARK_VERSION/LICENSE-binary
-rm -f spark-$SPARK_VERSION/NOTICE-binary
-rm -rf spark-$SPARK_VERSION/licenses-binary
-  fi
+  rm -f spark-$SPARK_VERSION/LICENSE-binary
+  rm -f spark-$SPARK_VERSION/NOTICE-binary
+  rm -rf spark-$SPARK_VERSION/licenses-binary
 
   tar cvzf spark-$SPARK_VERSION.tgz --exclude spark-$SPARK_VERSION/.git 
spark-$SPARK_VERSION
   echo $GPG_PASSPHRASE | $GPG --passphrase-fd 0 --armour --output 
spark-$SPARK_VERSION.tgz.asc \
@@ -337,11 +325,7 @@ if [[ "$1" == "package" ]]; then
   BINARY_PKGS_ARGS["hadoop3.2"]="-Phadoop-3.2 $HIVE_PROFILES"
   if ! is_dry_run; then
 BINARY_PKGS_ARGS["without-hadoop"]="-Phadoop-provided"
-if [[ $SPARK_VERSION < "3.0." ]]; then
-  BINARY_PKGS_ARGS["hadoop2.6"]="-Phadoop-2.6 $HIVE_PROFILES"
-else
-  BINARY_PKGS_ARGS["hadoop2.7"]="-Phadoop-2.7 $HIVE_PROFILES"
-fi
+BINARY_PKGS_ARGS["hadoop2.7"]="-Phadoop-2.7 $HIVE_PROFILES"
   fi
 
   declare -A BINARY_PKGS_EXTRA
diff --git a/dev/create-release/release-util.sh 
b/dev/create-release/release-util.sh
index eb1fd38..0394fb4 100755
--- a/dev/create-release/release-util.sh
+++ b/dev/create-release/release-util.sh
@@ -226,11 +226,5 @@ function init_maven_sbt {
   MVN="build/mvn -B"
   MVN_EXTRA_OPTS=
   SBT_OPTS=
-  if [[ $JAVA_VERSION < "1.8." ]]; then
-# Needed for maven central when using Java 7.
-SBT_OPTS="-Dhttps.protocols=TLSv1.1,TLSv1.2"
-MVN_EXTRA_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g 
-Dhttps.protocols=TLSv1.1,TLSv1.2"
-MVN="$MVN $MVN_EXTRA_OPTS"
-  fi
   export MVN MVN_EXTRA_OPTS SBT_OPTS
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35947][INFRA] Increase JVM stack size in release-build.sh

2021-06-30 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5312008  [SPARK-35947][INFRA] Increase JVM stack size in 
release-build.sh
5312008 is described below

commit 5312008cca074748cac78a57359c9ad9a729df92
Author: Dongjoon Hyun 
AuthorDate: Wed Jun 30 16:23:13 2021 +0900

[SPARK-35947][INFRA] Increase JVM stack size in release-build.sh

### What changes were proposed in this pull request?

Like SPARK-35825, this PR aims to increase JVM stack size via `MAVEN_OPTS` 
in release-build.sh.

### Why are the changes needed?

This will mitigate the failure in publishing snapshot GitHub Action job and 
during the release.

- https://github.com/apache/spark/actions/workflows/publish_snapshot.yml 
(3-day consecutive failures)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #33149 from dongjoon-hyun/SPARK-35947.

Authored-by: Dongjoon Hyun 
Signed-off-by: Hyukjin Kwon 
---
 dev/create-release/release-build.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 57ce7b3..28d4853 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -229,7 +229,7 @@ git clean -d -f -x
 rm -f .gitignore
 cd ..
 
-export MAVEN_OPTS="-Xmx12g"
+export MAVEN_OPTS="-Xss128m -Xmx12g"
 
 if [[ "$1" == "package" ]]; then
   # Source and binary tarballs

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table refreshing

2021-06-30 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d28ca9c  [SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on 
table refreshing
d28ca9c is described below

commit d28ca9cc9808828118be64a545c3478160fdc170
Author: Max Gekk 
AuthorDate: Wed Jun 30 09:44:52 2021 +0300

[SPARK-35935][SQL] Prevent failure of `MSCK REPAIR TABLE` on table 
refreshing

### What changes were proposed in this pull request?
In the PR, I propose to catch all non-fatal exceptions coming 
`refreshTable()` at the final stage of table repairing, and output an error 
message instead of failing with an exception.

### Why are the changes needed?
1. The uncaught exceptions from table refreshing might be considered as 
regression comparing to previous Spark versions. Table refreshing was 
introduced by https://github.com/apache/spark/pull/31066.
2. This should improve user experience with Spark SQL. For instance, when 
the `MSCK REPAIR TABLE` is performed in a chain of command in SQL where 
catching exception is difficult or even impossible.

### Does this PR introduce _any_ user-facing change?
Yes. Before the changes the `MSCK REPAIR TABLE` command can fail with the 
exception portrayed in SPARK-35935. After the changes, the same command outputs 
error message, and completes successfully.

### How was this patch tested?
By existing test suites.

Closes #33137 from MaxGekk/msck-repair-catch-except.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/execution/command/ddl.scala | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index 0876b5f..06c6847 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -675,7 +675,15 @@ case class RepairTableCommand(
 // This is always the case for Hive format tables, but is not true for 
Datasource tables created
 // before Spark 2.1 unless they are converted via `msck repair table`.
 spark.sessionState.catalog.alterTable(table.copy(tracksPartitionsInCatalog 
= true))
-spark.catalog.refreshTable(tableIdentWithDB)
+try {
+  spark.catalog.refreshTable(tableIdentWithDB)
+} catch {
+  case NonFatal(e) =>
+logError(s"Cannot refresh the table '$tableIdentWithDB'. A query of 
the table " +
+  "might return wrong result if the table was cached. To avoid such 
issue, you should " +
+  "uncache the table manually via the UNCACHE TABLE command after 
table recovering will " +
+  "complete fully.", e)
+}
 logInfo(s"Recovered all partitions: added ($addedAmount), dropped 
($droppedAmount).")
 Seq.empty[Row]
   }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun edited a comment on pull request #349: Remove Apache Spark 2.4 from download page

2021-06-30 Thread GitBox


dongjoon-hyun edited a comment on pull request #349:
URL: https://github.com/apache/spark-website/pull/349#issuecomment-871132652


   Thank you, @HyukjinKwon and @Ngone51 .
   Merged to asf-site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Remove Apache Spark 2.4 from download page (#349)

2021-06-30 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ed7c89c  Remove Apache Spark 2.4 from download page (#349)
ed7c89c is described below

commit ed7c89c55d70d2df7dfdab11ba2ca9937dd647c1
Author: Dongjoon Hyun 
AuthorDate: Tue Jun 29 23:28:58 2021 -0700

Remove Apache Spark 2.4 from download page (#349)

This PR aims to remove Apache Spark 2.4.8 download link from the download 
page because it's already EOL and we had better recommend the latest versions: 
3.1.2, 3.0.3.

We should not change the 2.4.8 binary distribution because they are still 
used in the test suite.
---
 js/downloads.js  | 4 
 site/js/downloads.js | 4 
 2 files changed, 8 deletions(-)

diff --git a/js/downloads.js b/js/downloads.js
index 7f82fd3..fdac347 100644
--- a/js/downloads.js
+++ b/js/downloads.js
@@ -13,13 +13,10 @@ function addRelease(version, releaseDate, packages, 
mirrored) {
 
 var sources = {pretty: "Source Code", tag: "sources"};
 var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
-var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
 var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
 var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
-// 2.4.0+
-var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p12_hadoopFree, 
sources];
 // 3.0.0+
 var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources];
 // 3.1.0+
@@ -28,7 +25,6 @@ var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources];
 
 addRelease("3.1.2", new Date("06/01/2021"), packagesV11, true);
 addRelease("3.0.3", new Date("06/23/2021"), packagesV10, true);
-addRelease("2.4.8", new Date("05/17/2021"), packagesV9, true);
 
 function append(el, contents) {
   el.innerHTML += contents;
diff --git a/site/js/downloads.js b/site/js/downloads.js
index 7f82fd3..fdac347 100644
--- a/site/js/downloads.js
+++ b/site/js/downloads.js
@@ -13,13 +13,10 @@ function addRelease(version, releaseDate, packages, 
mirrored) {
 
 var sources = {pretty: "Source Code", tag: "sources"};
 var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
-var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
 var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
 var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
-// 2.4.0+
-var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p12_hadoopFree, 
sources];
 // 3.0.0+
 var packagesV10 = [hadoop2p7, hadoop3p2, hadoopFree, sources];
 // 3.1.0+
@@ -28,7 +25,6 @@ var packagesV11 = [hadoop3p2, hadoop2p7, hadoopFree, sources];
 
 addRelease("3.1.2", new Date("06/01/2021"), packagesV11, true);
 addRelease("3.0.3", new Date("06/23/2021"), packagesV10, true);
-addRelease("2.4.8", new Date("05/17/2021"), packagesV9, true);
 
 function append(el, contents) {
   el.innerHTML += contents;

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on pull request #349: Remove Apache Spark 2.4 from download page

2021-06-30 Thread GitBox


dongjoon-hyun commented on pull request #349:
URL: https://github.com/apache/spark-website/pull/349#issuecomment-871132652


   Thank you, @HyukjinKwon and @Ngone51 .
   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun merged pull request #349: Remove Apache Spark 2.4 from download page

2021-06-30 Thread GitBox


dongjoon-hyun merged pull request #349:
URL: https://github.com/apache/spark-website/pull/349


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function"

2021-06-30 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7668226  Revert "[SPARK-33995][SQL] Expose make_interval as a Scala 
function"
7668226 is described below

commit 76682268d746e72f0e8aa4cc64860e0bfd90f1ed
Author: Max Gekk 
AuthorDate: Wed Jun 30 09:26:35 2021 +0300

Revert "[SPARK-33995][SQL] Expose make_interval as a Scala function"

### What changes were proposed in this pull request?
This reverts commit e6753c9402b5c40d9e2af662f28bd4f07a0bae17.

### Why are the changes needed?
The `make_interval` function aims to construct values of the legacy 
interval type `CalendarIntervalType` which will be substituted by ANSI interval 
types (see SPARK-27790). Since the function has not been released yet, it would 
be better to don't expose it via public API at all.

### Does this PR introduce _any_ user-facing change?
Should not since the `make_interval` function has not been released yet.

### How was this patch tested?
By existing test suites, and GA/jenkins builds.

Closes #33143 from MaxGekk/revert-make_interval.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/sql/functions.scala | 25 
 .../apache/spark/sql/JavaDateFunctionsSuite.java   | 68 --
 .../org/apache/spark/sql/DateFunctionsSuite.scala  | 40 -
 3 files changed, 133 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index c446d6b..ecd60ff 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -2929,31 +2929,6 @@ object functions {
   
//
 
   /**
-   * (Scala-specific) Creates a datetime interval
-   *
-   * @param years Number of years
-   * @param months Number of months
-   * @param weeks Number of weeks
-   * @param days Number of days
-   * @param hours Number of hours
-   * @param mins Number of mins
-   * @param secs Number of secs
-   * @return A datetime interval
-   * @group datetime_funcs
-   * @since 3.2.0
-   */
-  def make_interval(
-  years: Column = lit(0),
-  months: Column = lit(0),
-  weeks: Column = lit(0),
-  days: Column = lit(0),
-  hours: Column = lit(0),
-  mins: Column = lit(0),
-  secs: Column = lit(0)): Column = withExpr {
-MakeInterval(years.expr, months.expr, weeks.expr, days.expr, hours.expr, 
mins.expr, secs.expr)
-  }
-
-  /**
* Returns the date that is `numMonths` after `startDate`.
*
* @param startDate A date, timestamp or string. If a string, the data must 
be in a format that
diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java 
b/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java
deleted file mode 100644
index 2d1de77..000
--- 
a/sql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java
+++ /dev/null
@@ -1,68 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package test.org.apache.spark.sql;
-
-import org.apache.spark.sql.Column;
-import org.apache.spark.sql.Dataset;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.RowFactory;
-import org.apache.spark.sql.test.TestSparkSession;
-import org.apache.spark.sql.types.StructType;
-import org.junit.After;
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.Test;
-
-import java.sql.Date;
-import java.util.*;
-
-import static org.apache.spark.sql.types.DataTypes.*;
-import static org.apache.spark.sql.functions.*;
-
-public class JavaDateFunctionsSuite {
-  private transient TestSparkSession spark;
-
-  @Before
-  public void setUp() {
-spark = new TestSparkSession();
-}
-
-  @After
-  public void tearDown() {
-spark.stop();
-spark = null;
-  }
-
-  @Test
-  public void makeIntervalWorksWithJava() {
-Column