date:20201123

[spark] branch master updated (84e7036 -> c891e02)

2020-11-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 84e7036  [SPARK-33510][BUILD] Update SBT to 1.4.4
 add c891e02  Revert "[SPARK-32481][CORE][SQL] Support truncate table to 
move data to trash"

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/util/Utils.scala   | 25 +--
 .../org/apache/spark/sql/internal/SQLConf.scala| 14 
 .../spark/sql/execution/command/tables.scala   |  4 +-
 .../spark/sql/execution/command/DDLSuite.scala | 78 --
 4 files changed, 2 insertions(+), 119 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c891e02 -> 60f3a730)

2020-11-23 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c891e02  Revert "[SPARK-32481][CORE][SQL] Support truncate table to 
move data to trash"
 add 60f3a730 [SPARK-33515][SQL] Improve exception messages while handling 
UnresolvedTable

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala  | 10 +-
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala|  2 +-
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  |  4 +++-
 .../apache/spark/sql/catalyst/parser/AstBuilder.scala  | 12 
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 18 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  3 ++-
 .../org/apache/spark/sql/execution/SQLViewSuite.scala  |  8 
 .../apache/spark/sql/hive/execution/HiveDDLSuite.scala |  4 ++--
 8 files changed, 34 insertions(+), 27 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (60f3a730 -> 23e9920)

2020-11-23 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 60f3a730 [SPARK-33515][SQL] Improve exception messages while handling 
UnresolvedTable
 add 23e9920  [SPARK-33511][SQL] Respect case sensitivity while resolving 
V2 partition specs

No new revisions were added by this update.

Summary of changes:
 .../catalyst/analysis/ResolvePartitionSpec.scala   | 27 -
 .../apache/spark/sql/util/PartitioningUtils.scala  | 47 ++
 .../command/AnalyzePartitionCommand.scala  |  2 +-
 .../apache/spark/sql/execution/command/ddl.scala   |  3 +-
 .../spark/sql/execution/command/tables.scala   |  3 +-
 .../execution/datasources/PartitioningUtils.scala  | 26 +---
 .../spark/sql/execution/datasources/rules.scala|  3 +-
 .../connector/AlterTablePartitionV2SQLSuite.scala  | 26 
 8 files changed, 98 insertions(+), 39 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/util/PartitioningUtils.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (23e9920 -> f83fcb1)

2020-11-23 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 23e9920  [SPARK-33511][SQL] Respect case sensitivity while resolving 
V2 partition specs
 add f83fcb1  [SPARK-33278][SQL][FOLLOWUP] Improve OptimizeWindowFunctions 
to avoid transfer first to nth_value

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  9 --
 .../optimizer/OptimizeWindowFunctionsSuite.scala   | 33 --
 2 files changed, 36 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-11-23 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1bd897c  [SPARK-32918][SHUFFLE] RPC implementation to support control 
plane coordination for push-based shuffle
1bd897c is described below

commit 1bd897cbc4fe30eb8b7740c7232aae87081e8e33
Author: Ye Zhou 
AuthorDate: Mon Nov 23 15:16:20 2020 -0600

[SPARK-32918][SHUFFLE] RPC implementation to support control plane 
coordination for push-based shuffle

### What changes were proposed in this pull request?
This is one of the patches for SPIP SPARK-30602 which is needed for 
push-based shuffle.
Summary of changes:
This PR introduces a new RPC to be called within Driver. When the expected 
shuffle push wait time reaches, Driver will call this RPC to facilitate 
coordination of shuffle map/reduce stages and notify external shuffle services 
to finalize shuffle block merge for a given shuffle. Shuffle services also 
respond back the metadata about a merged shuffle partition back to the caller.

### Why are the changes needed?
Refer to the SPIP in SPARK-30602.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This code snippets won't be called by any existing code and will be tested 
after the coordinated driver changes gets merged in SPARK-32920.

Lead-authored-by: Min Shen mshenlinkedin.com

Closes #30163 from zhouyejoe/SPARK-32918.

Lead-authored-by: Ye Zhou 
Co-authored-by: Min Shen 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../spark/network/shuffle/BlockStoreClient.java| 22 +++
 .../network/shuffle/ExternalBlockStoreClient.java  | 29 +++
 .../network/shuffle/MergeFinalizerListener.java| 43 ++
 3 files changed, 94 insertions(+)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
index 37befcd..a6bdc13 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java
@@ -147,6 +147,8 @@ public abstract class BlockStoreClient implements Closeable 
{
* @param blockIds block ids to be pushed
* @param buffers buffers to be pushed
* @param listener the listener to receive block push status.
+   *
+   * @since 3.1.0
*/
   public void pushBlocks(
   String host,
@@ -156,4 +158,24 @@ public abstract class BlockStoreClient implements 
Closeable {
   BlockFetchingListener listener) {
 throw new UnsupportedOperationException();
   }
+
+  /**
+   * Invoked by Spark driver to notify external shuffle services to finalize 
the shuffle merge
+   * for a given shuffle. This allows the driver to start the shuffle reducer 
stage after properly
+   * finishing the shuffle merge process associated with the shuffle mapper 
stage.
+   *
+   * @param host host of shuffle server
+   * @param port port of shuffle server.
+   * @param shuffleId shuffle ID of the shuffle to be finalized
+   * @param listener the listener to receive MergeStatuses
+   *
+   * @since 3.1.0
+   */
+  public void finalizeShuffleMerge(
+  String host,
+  int port,
+  int shuffleId,
+  MergeFinalizerListener listener) {
+throw new UnsupportedOperationException();
+  }
 }
diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
index eca35ed..56c06e6 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
@@ -159,6 +159,35 @@ public class ExternalBlockStoreClient extends 
BlockStoreClient {
   }
 
   @Override
+  public void finalizeShuffleMerge(
+  String host,
+  int port,
+  int shuffleId,
+  MergeFinalizerListener listener) {
+checkInit();
+try {
+  TransportClient client = clientFactory.createClient(host, port);
+  ByteBuffer finalizeShuffleMerge = new FinalizeShuffleMerge(appId, 
shuffleId).toByteBuffer();
+  client.sendRpc(finalizeShuffleMerge, new RpcResponseCallback() {
+@Override
+public void onSuccess(ByteBuffer response) {
+  listener.onShuffleMergeSuccess(
+(MergeStatuses) 
BlockTransferMessage.Decoder.fromByteBuffer(response));
+}
+
+@Override
+public void onFailure(Throwable e) {
+  listener.onShuffleMergeFailure(e);
+}
+  });
+} catch (Exceptio

[spark] branch master updated (1bd897c -> 0592181)

2020-11-23 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1bd897c  [SPARK-32918][SHUFFLE] RPC implementation to support control 
plane coordination for push-based shuffle
 add 0592181  [SPARK-33479][DOC][FOLLOWUP] DocSearch: Support filtering 
search results by version

No new revisions were added by this update.

Summary of changes:
 docs/_config.yml | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0592181 -> 3ce4ab5)

2020-11-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0592181  [SPARK-33479][DOC][FOLLOWUP] DocSearch: Support filtering 
search results by version
 add 3ce4ab5  [SPARK-33513][BUILD] Upgrade to Scala 2.13.4 to improve 
exhaustivity

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/storage/StorageUtils.scala   | 2 +-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala  | 8 
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala  | 2 ++
 .../main/scala/org/apache/spark/ml/feature/RFormulaParser.scala   | 6 +-
 .../main/scala/org/apache/spark/ml/feature/StandardScaler.scala   | 2 ++
 .../scala/org/apache/spark/ml/linalg/JsonMatrixConverter.scala| 2 ++
 .../scala/org/apache/spark/ml/linalg/JsonVectorConverter.scala| 2 ++
 mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala   | 2 ++
 .../org/apache/spark/ml/optim/aggregator/HingeAggregator.scala| 3 +++
 .../org/apache/spark/ml/optim/aggregator/LogisticAggregator.scala | 3 +++
 .../src/main/scala/org/apache/spark/ml/util/Instrumentation.scala | 2 ++
 .../scala/org/apache/spark/mllib/feature/StandardScaler.scala | 2 ++
 mllib/src/main/scala/org/apache/spark/mllib/linalg/BLAS.scala | 2 ++
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala  | 2 ++
 .../apache/spark/mllib/linalg/distributed/IndexedRowMatrix.scala  | 4 
 .../org/apache/spark/mllib/linalg/distributed/RowMatrix.scala | 2 ++
 pom.xml   | 2 +-
 .../spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala   | 2 +-
 .../cluster/mesos/MesosFineGrainedSchedulerBackendSuite.scala | 2 +-
 .../apache/spark/sql/catalyst/expressions/jsonExpressions.scala   | 2 +-
 .../org/apache/spark/sql/catalyst/expressions/literals.scala  | 4 +++-
 .../apache/spark/sql/catalyst/expressions/objects/objects.scala   | 2 +-
 .../org/apache/spark/sql/catalyst/json/JsonInferSchema.scala  | 3 +++
 .../apache/spark/sql/catalyst/optimizer/StarSchemaDetection.scala | 6 +++---
 .../org/apache/spark/sql/catalyst/optimizer/expressions.scala | 1 +
 .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala   | 2 ++
 .../spark/sql/catalyst/plans/logical/basicLogicalOperators.scala  | 2 +-
 .../org/apache/spark/sql/catalyst/util/GenericArrayData.scala | 2 +-
 .../apache/spark/sql/catalyst/planning/ScanOperationSuite.scala   | 5 +
 .../apache/spark/sql/catalyst/util/ArrayDataIndexedSeqSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/SparkSqlParser.scala | 6 +++---
 .../apache/spark/sql/execution/aggregate/BaseAggregateExec.scala  | 2 +-
 .../org/apache/spark/sql/execution/window/WindowExecBase.scala| 6 ++
 .../src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala | 1 +
 .../org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala  | 2 +-
 35 files changed, 77 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3ce4ab5 -> 8380e00)

2020-11-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3ce4ab5  [SPARK-33513][BUILD] Upgrade to Scala 2.13.4 to improve 
exhaustivity
 add 8380e00  [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use 
Tuple.hashCode for `BucketTransform`

No new revisions were added by this update.

Summary of changes:
 .../src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala | 4 +++-
 .../scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala   | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`

2020-11-23 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 200417e  [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use 
Tuple.hashCode for `BucketTransform`
200417e is described below

commit 200417e47ac400a48af61a2ce119da0041b93712
Author: Dongjoon Hyun 
AuthorDate: Mon Nov 23 19:35:58 2020 -0800

[SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode 
for `BucketTransform`

This PR aims to change `InMemoryTable` not to use `Tuple.hashCode` for 
`BucketTransform`.

SPARK-32168 made `InMemoryTable` to handle `BucketTransform` as a hash of 
`Tuple` which is dependents on Scala versions.
- 
https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala#L159

**Scala 2.12.10**
```scala
$ bin/scala
Welcome to Scala 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
Type in expressions for evaluation. Or try :help.

scala> (1, 1).hashCode
res0: Int = -2074071657
```

**Scala 2.13.3**
```scala
Welcome to Scala 2.13.3 (OpenJDK 64-Bit Server VM, Java 1.8.0_272).
Type in expressions for evaluation. Or try :help.

scala> (1, 1).hashCode
val res0: Int = -1669302457
```

Yes. This is a correctness issue.

Pass the UT with both Scala 2.12/2.13.

Closes #30477 from dongjoon-hyun/SPARK-33524.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 8380e00419281cd1b1fc5706d23d5231356a3379)
Signed-off-by: Dongjoon Hyun 
---
 .../src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
index 616fc72..98b6a3b 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
@@ -128,7 +128,9 @@ class InMemoryTable(
 ChronoUnit.HOURS.between(Instant.EPOCH, 
DateTimeUtils.microsToInstant(micros))
 }
   case BucketTransform(numBuckets, ref) =>
-(extractor(ref.fieldNames, schema, row).hashCode() & 
Integer.MAX_VALUE) % numBuckets
+val (value, dataType) = extractor(ref.fieldNames, schema, row)
+val valueHashCode = if (value == null) 0 else value.hashCode
+((valueHashCode + 31 * dataType.hashCode()) & Integer.MAX_VALUE) % 
numBuckets
 }
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8380e00 -> f35e28f)

2020-11-23 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8380e00  [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use 
Tuple.hashCode for `BucketTransform`
 add f35e28f  [SPARK-33523][SQL][TEST] Add predicate related benchmark to 
SubExprEliminationBenchmark

No new revisions were added by this update.

Summary of changes:
 .../SubExprEliminationBenchmark-jdk11-results.txt  |  22 +++--
 .../SubExprEliminationBenchmark-results.txt|  22 +++--
 .../execution/SubExprEliminationBenchmark.scala| 106 +++--
 3 files changed, 90 insertions(+), 60 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (84e7036 -> c891e02)

[spark] branch master updated (c891e02 -> 60f3a730)

[spark] branch master updated (60f3a730 -> 23e9920)

[spark] branch master updated (23e9920 -> f83fcb1)

[spark] branch master updated: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

[spark] branch master updated (1bd897c -> 0592181)

[spark] branch master updated (0592181 -> 3ce4ab5)

[spark] branch master updated (3ce4ab5 -> 8380e00)

[spark] branch branch-3.0 updated: [SPARK-33524][SQL][TESTS] Change `InMemoryTable` not to use Tuple.hashCode for `BucketTransform`

[spark] branch master updated (8380e00 -> f35e28f)

10 matches

Site Navigation

Mail list logo

Footer information