date:20200302

[spark] branch branch-3.0 updated: [SPARK-30964][CORE][WEBUI] Accelerate InMemoryStore with a new index

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3e26876  [SPARK-30964][CORE][WEBUI] Accelerate InMemoryStore with a 
new index
3e26876 is described below

commit 3e268763c345076801f1ff6f75bab67ecab87e8f
Author: Gengliang Wang 
AuthorDate: Mon Mar 2 15:48:48 2020 +0800

[SPARK-30964][CORE][WEBUI] Accelerate InMemoryStore with a new index

### What changes were proposed in this pull request?

Spark uses the class `InMemoryStore` as the KV storage for live UI and 
history server(by default if no LevelDB file path is provided).
In `InMemoryStore`, all the task data in one application is stored in a 
hashmap, which key is the task ID and the value is the task data. This fine for 
getting or deleting with a provided task ID.
However, Spark stage UI always shows all the task data in one stage and the 
current implementation is to look up all the values in the hashmap. The time 
complexity is O(numOfTasks).
Also, when there are too many stages (>spark.ui.retainedStages), Spark will 
linearly try to look up all the task data of the stages to be deleted as well.

This can be very bad for a large application with many stages and tasks. We 
can improve it by allowing the natural key of an entity to have a real parent 
index. So that on each lookup with parent node provided, Spark can look up all 
the natural keys(in our case, the task IDs) first, and then find the data with 
the natural keys in the hashmap.

### Why are the changes needed?

The in-memory KV store becomes really slow for large applications. We can 
improve it with a new index. The performance can be 10 times, 100 times, even 
1000 times faster.
This is also possible to make the Spark driver more stable for large 
applications.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Existing unit tests.
Also, I run a benchmark with the following code
```
  val store = new InMemoryStore()
  val numberOfTasksPerStage = 1
   (0 until 1000).map { sId =>
 (0 until numberOfTasksPerStage).map { taskId =>
   val task = newTaskData(sId * numberOfTasksPerStage + taskId, 
"SUCCESS", sId)
   store.write(task)
 }
   }
  val appStatusStore = new AppStatusStore(store)
  var start = System.nanoTime()
  appStatusStore.taskSummary(2, attemptId, Array(0, 0.25, 0.5, 0.75, 1))
  println("task summary run time: " + ((System.nanoTime() - start) / 
100))
  val stageIds = Seq(1, 11, 66, 88)
  val stageKeys = stageIds.map(Array(_, attemptId))
  start = System.nanoTime()
  store.removeAllByIndexValues(classOf[TaskDataWrapper], 
TaskIndexNames.STAGE,
stageKeys.asJavaCollection)
   println("clean up tasks run time: " + ((System.nanoTime() - start) / 
100))
```

Task summary before the changes: 98642ms
Task summary after the changes: 120ms

Task clean up before the changes:  4900ms
Task clean up before the changes: 4ms

It's 800x faster after the changes in the micro-benchmark.

Closes #27716 from gengliangwang/liveUIStore.

Authored-by: Gengliang Wang 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 6b641430c37e0115bee781fed7360187b988313d)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/util/kvstore/InMemoryStore.java   | 116 +
 .../org/apache/spark/util/kvstore/KVTypeInfo.java  |   7 +-
 .../apache/spark/util/kvstore/LevelDBTypeInfo.java |   5 +-
 .../scala/org/apache/spark/status/storeTypes.scala |   2 +-
 4 files changed, 105 insertions(+), 25 deletions(-)

diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
index b33c538..db08740 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
@@ -163,6 +163,12 @@ public class InMemoryStore implements KVStore {
 }
   }
 
+  /**
+   * An alias class for the type "ConcurrentHashMap, 
Boolean>", which is used
+   * as a concurrent hashset for storing natural keys and the boolean value 
doesn't matter.
+   */
+  private static class NaturalKeys extends 
ConcurrentHashMap, Boolean> {}
+
   private static class InstanceList {
 
 /**
@@ -205,11 +211,19 @@ public class InMemoryStore implements KVStore {
 private final KVTypeInfo ti;
 private final KVTypeInfo.Accessor naturalKey;
 private final ConcurrentMap, T> data;
+private final String naturalParentIndexName;
+private final Boolean hasNaturalParentIndex;
+// A mapping from parent to the natu

[spark] branch master updated (6b64143 -> f24a460)

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6b64143  [SPARK-30964][CORE][WEBUI] Accelerate InMemoryStore with a 
new index
 add f24a460  [SPARK-30993][SQL] Use its sql type for UDT when checking the 
type of length (fixed/var) or mutable

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/UnsafeRow.java  |  8 +
 .../codegen/GenerateUnsafeRowJoinerSuite.scala | 41 +-
 .../apache/spark/sql/UserDefinedTypeSuite.scala| 37 +++
 3 files changed, 85 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6b64143 -> f24a460)

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6b64143  [SPARK-30964][CORE][WEBUI] Accelerate InMemoryStore with a 
new index
 add f24a460  [SPARK-30993][SQL] Use its sql type for UDT when checking the 
type of length (fixed/var) or mutable

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/UnsafeRow.java  |  8 +
 .../codegen/GenerateUnsafeRowJoinerSuite.scala | 41 +-
 .../apache/spark/sql/UserDefinedTypeSuite.scala| 37 +++
 3 files changed, 85 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 6c4977d  [SPARK-30993][SQL] Use its sql type for UDT when checking the 
type of length (fixed/var) or mutable
6c4977d is described below

commit 6c4977d38f13628abfa24129ae6844146672d96d
Author: Jungtaek Lim (HeartSaVioR) 
AuthorDate: Mon Mar 2 22:33:11 2020 +0800

[SPARK-30993][SQL] Use its sql type for UDT when checking the type of 
length (fixed/var) or mutable

### What changes were proposed in this pull request?

This patch fixes the bug of UnsafeRow which misses to handle the UDT 
specifically, in `isFixedLength` and `isMutable`. These methods don't check its 
SQL type for UDT, always treating UDT as variable-length, and non-mutable.

It doesn't bring any issue if UDT is used to represent complicated type, 
but when UDT is used to represent some type which is matched with fixed length 
of SQL type, it exposes the chance of correctness issues, as these informations 
sometimes decide how the value should be handled.

We got report from user mailing list which suspected as mapGroupsWithState 
looks like handling UDT incorrectly, but after some investigation it was from 
GenerateUnsafeRowJoiner in shuffle phase.


https://github.com/apache/spark/blob/0e2ca11d80c3921387d7b077cb64c3a0c06b08d7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala#L32-L43

Here updating position should not happen on fixed-length column, but due to 
this bug, the value of UDT having fixed-length as sql type would be modified, 
which actually corrupts the value.

### Why are the changes needed?

Misclassifying of the type of length for UDT can corrupt the value when the 
row is presented to the input of GenerateUnsafeRowJoiner, which brings 
correctness issue.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

New UT added.

Closes #27747 from HeartSaVioR/SPARK-30993.

Authored-by: Jungtaek Lim (HeartSaVioR) 
Signed-off-by: Wenchen Fan 
(cherry picked from commit f24a46011c8cba086193f697d653b6eccd029e8f)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/expressions/UnsafeRow.java  |  8 +
 .../codegen/GenerateUnsafeRowJoinerSuite.scala | 41 +-
 .../apache/spark/sql/UserDefinedTypeSuite.scala| 37 +++
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
index 23e7d1f..034894b 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
@@ -95,6 +95,10 @@ public final class UnsafeRow extends InternalRow implements 
Externalizable, Kryo
   }
 
   public static boolean isFixedLength(DataType dt) {
+if (dt instanceof UserDefinedType) {
+  return isFixedLength(((UserDefinedType) dt).sqlType());
+}
+
 if (dt instanceof DecimalType) {
   return ((DecimalType) dt).precision() <= Decimal.MAX_LONG_DIGITS();
 } else {
@@ -103,6 +107,10 @@ public final class UnsafeRow extends InternalRow 
implements Externalizable, Kryo
   }
 
   public static boolean isMutable(DataType dt) {
+if (dt instanceof UserDefinedType) {
+  return isMutable(((UserDefinedType) dt).sqlType());
+}
+
 return mutableFieldTypes.contains(dt) || dt instanceof DecimalType ||
   dt instanceof CalendarIntervalType;
   }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
index 81e2993..fb1ea7b 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoinerSuite.scala
@@ -17,13 +17,15 @@
 
 package org.apache.spark.sql.catalyst.expressions.codegen
 
+import java.time.{LocalDateTime, ZoneOffset}
+
 import scala.util.Random
 
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.RandomDataGenerator
 import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
-import org.apache.spark.sql.catalyst.expressions.{JoinedRow, UnsafeProjection, 
UnsafeRow}
+import org.apache.spark.sql.catalyst.expressions.{GenericInternalRow, 
JoinedRow, UnsafeProjection, UnsafeRow}

[spark] branch master updated (f24a460 -> ac12276)

2020-03-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f24a460  [SPARK-30993][SQL] Use its sql type for UDT when checking the 
type of length (fixed/var) or mutable
 add ac12276  [SPARK-30813][ML] Fix Matrices.sprand comments

No new revisions were added by this update.

Summary of changes:
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala | 2 +-
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f24a460 -> ac12276)

2020-03-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f24a460  [SPARK-30993][SQL] Use its sql type for UDT when checking the 
type of length (fixed/var) or mutable
 add ac12276  [SPARK-30813][ML] Fix Matrices.sprand comments

No new revisions were added by this update.

Summary of changes:
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala | 2 +-
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

2020-03-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d64d6ee  [SPARK-30813][ML] Fix Matrices.sprand comments
d64d6ee is described below

commit d64d6ee8ac2e9a105da63cd5e310f486de42b76c
Author: Wu, Xiaochang 
AuthorDate: Mon Mar 2 08:56:17 2020 -0600

[SPARK-30813][ML] Fix Matrices.sprand comments

### What changes were proposed in this pull request?
Fix mistakes in comments

### Why are the changes needed?
There are mistakes in comments

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27564 from xwu99/fix-mllib-sprand-comment.

Authored-by: Wu, Xiaochang 
Signed-off-by: Sean Owen 
(cherry picked from commit ac122762f5091a7abf62d17595e0f5a99374ac5c)
Signed-off-by: Sean Owen 
---
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala | 2 +-
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
index 61d35c8..34e4366 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
@@ -1103,7 +1103,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index 83187d6..57edc96 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -1055,7 +1055,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

2020-03-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new cd8f86a  [SPARK-30813][ML] Fix Matrices.sprand comments
cd8f86a is described below

commit cd8f86aafeb2ee5796476bd2cc14e71851611062
Author: Wu, Xiaochang 
AuthorDate: Mon Mar 2 08:56:17 2020 -0600

[SPARK-30813][ML] Fix Matrices.sprand comments

### What changes were proposed in this pull request?
Fix mistakes in comments

### Why are the changes needed?
There are mistakes in comments

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27564 from xwu99/fix-mllib-sprand-comment.

Authored-by: Wu, Xiaochang 
Signed-off-by: Sean Owen 
(cherry picked from commit ac122762f5091a7abf62d17595e0f5a99374ac5c)
Signed-off-by: Sean Owen 
---
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala | 2 +-
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
index 14428c6..5e3c37a 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
@@ -1103,7 +1103,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index e474cfa..0c29198 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -1055,7 +1055,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

2020-03-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d64d6ee  [SPARK-30813][ML] Fix Matrices.sprand comments
d64d6ee is described below

commit d64d6ee8ac2e9a105da63cd5e310f486de42b76c
Author: Wu, Xiaochang 
AuthorDate: Mon Mar 2 08:56:17 2020 -0600

[SPARK-30813][ML] Fix Matrices.sprand comments

### What changes were proposed in this pull request?
Fix mistakes in comments

### Why are the changes needed?
There are mistakes in comments

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27564 from xwu99/fix-mllib-sprand-comment.

Authored-by: Wu, Xiaochang 
Signed-off-by: Sean Owen 
(cherry picked from commit ac122762f5091a7abf62d17595e0f5a99374ac5c)
Signed-off-by: Sean Owen 
---
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala | 2 +-
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
index 61d35c8..34e4366 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
@@ -1103,7 +1103,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index 83187d6..57edc96 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -1055,7 +1055,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

2020-03-02 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new cd8f86a  [SPARK-30813][ML] Fix Matrices.sprand comments
cd8f86a is described below

commit cd8f86aafeb2ee5796476bd2cc14e71851611062
Author: Wu, Xiaochang 
AuthorDate: Mon Mar 2 08:56:17 2020 -0600

[SPARK-30813][ML] Fix Matrices.sprand comments

### What changes were proposed in this pull request?
Fix mistakes in comments

### Why are the changes needed?
There are mistakes in comments

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #27564 from xwu99/fix-mllib-sprand-comment.

Authored-by: Wu, Xiaochang 
Signed-off-by: Sean Owen 
(cherry picked from commit ac122762f5091a7abf62d17595e0f5a99374ac5c)
Signed-off-by: Sean Owen 
---
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala | 2 +-
 mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
index 14428c6..5e3c37a 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala
@@ -1103,7 +1103,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index e474cfa..0c29198 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -1055,7 +1055,7 @@ object Matrices {
 DenseMatrix.rand(numRows, numCols, rng)
 
   /**
-   * Generate a `SparseMatrix` consisting of `i.i.d.` gaussian random numbers.
+   * Generate a `SparseMatrix` consisting of `i.i.d.` uniform random numbers.
* @param numRows number of rows of the matrix
* @param numCols number of columns of the matrix
* @param density the desired density for the matrix


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac12276 -> b517f99)

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac12276  [SPARK-30813][ML] Fix Matrices.sprand comments
 add b517f99  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone

No new revisions were added by this update.

Summary of changes:
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 python/pyspark/tests/test_context.py   |   2 -
 python/pyspark/tests/test_taskcontext.py   |   2 -
 13 files changed, 17 insertions(+), 463 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac12276 -> b517f99)

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac12276  [SPARK-30813][ML] Fix Matrices.sprand comments
 add b517f99  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone

No new revisions were added by this update.

Summary of changes:
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 python/pyspark/tests/test_context.py   |   2 -
 python/pyspark/tests/test_taskcontext.py   |   2 -
 13 files changed, 17 insertions(+), 463 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 148262f  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone
148262f is described below

commit 148262f3fbdf7c5da7cd147cf43bf5ebab5f5244
Author: yi.wu 
AuthorDate: Mon Mar 2 11:23:07 2020 -0800

[SPARK-30969][CORE] Remove resource coordination support from Standalone

### What changes were proposed in this pull request?

Remove automatically resource coordination support from Standalone.

### Why are the changes needed?

Resource coordination is mainly designed for the scenario where multiple 
workers launched on the same host. However, it's, actually, a non-existed  
scenario for today's Spark. Because, Spark now can start multiple executors in 
a single Worker, while it only allow one executor per Worker at very beginning. 
So, now, it really help nothing for user to launch multiple workers on the same 
host. Thus, it's not worth for us to bring over complicated implementation and 
potential high maintain [...]

### Does this PR introduce any user-facing change?

No, it's Spark 3.0 feature.

### How was this patch tested?

Pass Jenkins.

Closes #27722 from Ngone51/abandon_coordination.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit b517f991fe0c95a186872d38be6a2091d9326195)
Signed-off-by: Xingbo Jiang 
---
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 11 files changed, 17 insertions(+), 459 deletions(-)

diff --git a/.gitignore b/.gitignore
index 798e8ac..198fdee 100644
--- a/.gitignore
+++ b/.gitignore
@@ -72,7 +72,6 @@ scalastyle-on-compile.generated.xml
 scalastyle-output.xml
 scalastyle.txt
 spark-*-bin-*.tgz
-spark-resources/
 spark-tests.log
 src_managed/
 streaming-tests.log
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 91188d5..bcbb7e4 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -41,7 +41,6 @@ import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat 
=> NewFileInputFor
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil}
-import org.apache.spark.deploy.StandaloneResourceUtils._
 import org.apache.spark.executor.{ExecutorMetrics, ExecutorMetricsSource}
 import org.apache.spark.input.{FixedLengthBinaryInputFormat, 
PortableDataStream, StreamInputFormat, WholeTextFileInputFormat}
 import org.apache.spark.internal.Logging
@@ -250,15 +249,6 @@ class SparkContext(config: SparkConf) extends Logging {
 
   def isLocal: Boolean = Utils.isLocalMaster(_conf)
 
-  private def isClientStandalone: Boolean = {
-val isSparkCluster = master match {
-  case SparkMasterRegex.SPARK_REGEX(_) => true
-  case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, _) => true
-  case _ => false
-}
-deployMode == "client" && isSparkCluster
-  }
-
   /**
* @return true if context is stopped or in the midst of stopping.
*/
@@ -396,17 +386,7 @@ class SparkContext(config: SparkConf) extends Logging {
 _driverLogger = DriverLogger(_conf)
 
 val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE)
-val allResources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
-_resources = {
-  // driver submitted in client mode under Standalone may have conflicting 
resources with
-  // other drivers/workers on this host. We should sync driver's resources 
info into
-  // SPARK_RESOURCES/SPARK_RESOURCES_COORDINATE_DIR/ to avoid collision.
-  if (isClientStandalone) {
-acquireResources(_conf, SPARK_DRIVER_PREFIX, allResources, 
Utils.getProcessId)
-  } else {
-allResources
-  }
-}
+_resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
 logResourceInfo(SPARK_DRIVER_PREFIX, _resources)
 
 // log out spark.app.name in the Spark driver logs
@@ -2019,9 +1999,6 @@ class Spark

[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 148262f  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone
148262f is described below

commit 148262f3fbdf7c5da7cd147cf43bf5ebab5f5244
Author: yi.wu 
AuthorDate: Mon Mar 2 11:23:07 2020 -0800

[SPARK-30969][CORE] Remove resource coordination support from Standalone

### What changes were proposed in this pull request?

Remove automatically resource coordination support from Standalone.

### Why are the changes needed?

Resource coordination is mainly designed for the scenario where multiple 
workers launched on the same host. However, it's, actually, a non-existed  
scenario for today's Spark. Because, Spark now can start multiple executors in 
a single Worker, while it only allow one executor per Worker at very beginning. 
So, now, it really help nothing for user to launch multiple workers on the same 
host. Thus, it's not worth for us to bring over complicated implementation and 
potential high maintain [...]

### Does this PR introduce any user-facing change?

No, it's Spark 3.0 feature.

### How was this patch tested?

Pass Jenkins.

Closes #27722 from Ngone51/abandon_coordination.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit b517f991fe0c95a186872d38be6a2091d9326195)
Signed-off-by: Xingbo Jiang 
---
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 11 files changed, 17 insertions(+), 459 deletions(-)

diff --git a/.gitignore b/.gitignore
index 798e8ac..198fdee 100644
--- a/.gitignore
+++ b/.gitignore
@@ -72,7 +72,6 @@ scalastyle-on-compile.generated.xml
 scalastyle-output.xml
 scalastyle.txt
 spark-*-bin-*.tgz
-spark-resources/
 spark-tests.log
 src_managed/
 streaming-tests.log
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 91188d5..bcbb7e4 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -41,7 +41,6 @@ import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat 
=> NewFileInputFor
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil}
-import org.apache.spark.deploy.StandaloneResourceUtils._
 import org.apache.spark.executor.{ExecutorMetrics, ExecutorMetricsSource}
 import org.apache.spark.input.{FixedLengthBinaryInputFormat, 
PortableDataStream, StreamInputFormat, WholeTextFileInputFormat}
 import org.apache.spark.internal.Logging
@@ -250,15 +249,6 @@ class SparkContext(config: SparkConf) extends Logging {
 
   def isLocal: Boolean = Utils.isLocalMaster(_conf)
 
-  private def isClientStandalone: Boolean = {
-val isSparkCluster = master match {
-  case SparkMasterRegex.SPARK_REGEX(_) => true
-  case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, _) => true
-  case _ => false
-}
-deployMode == "client" && isSparkCluster
-  }
-
   /**
* @return true if context is stopped or in the midst of stopping.
*/
@@ -396,17 +386,7 @@ class SparkContext(config: SparkConf) extends Logging {
 _driverLogger = DriverLogger(_conf)
 
 val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE)
-val allResources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
-_resources = {
-  // driver submitted in client mode under Standalone may have conflicting 
resources with
-  // other drivers/workers on this host. We should sync driver's resources 
info into
-  // SPARK_RESOURCES/SPARK_RESOURCES_COORDINATE_DIR/ to avoid collision.
-  if (isClientStandalone) {
-acquireResources(_conf, SPARK_DRIVER_PREFIX, allResources, 
Utils.getProcessId)
-  } else {
-allResources
-  }
-}
+_resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
 logResourceInfo(SPARK_DRIVER_PREFIX, _resources)
 
 // log out spark.app.name in the Spark driver logs
@@ -2019,9 +1999,6 @@ class Spark

[spark] branch master updated (b517f99 -> f0010c8)

2020-03-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b517f99  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone
 add f0010c8  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala   | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++---
 .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +-
 .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +-
 .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala   | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b517f99 -> f0010c8)

2020-03-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b517f99  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone
 add f0010c8  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala   | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++---
 .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +-
 .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +-
 .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala   | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

2020-03-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cb23f0  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests
8cb23f0 is described below

commit 8cb23f0cb5b20b7e49fdd16c52d6451e901d9a7a
Author: Josh Rosen 
AuthorDate: Mon Mar 2 15:20:45 2020 -0800

[SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

### What changes were proposed in this pull request?

This patch fixes several incorrect uses of `assume()` in our tests.

If a call to `assume(condition)` fails then it will cause the test to be 
marked as skipped instead of failed: this feature allows test cases to be 
skipped if certain prerequisites are missing. For example, we use this to skip 
certain tests when running on Windows (or when Python dependencies are 
unavailable).

In contrast, `assert(condition)` will fail the test if the condition 
doesn't hold.

If `assume()` is accidentally substituted for `assert()`then the resulting 
test will be marked as skipped in cases where it should have failed, 
undermining the purpose of the test.

This patch fixes several such cases, replacing certain `assume()` calls 
with `assert()`.

Credit to ahirreddy for spotting this problem.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #27754 from JoshRosen/fix-assume-vs-assert.

Lead-authored-by: Josh Rosen 
Co-authored-by: Josh Rosen 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit f0010c81e2ef9b8859b39917bb62b48d739a4a22)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala   | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++---
 .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +-
 .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +-
 .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala   | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
index 94e251d..4488902 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
@@ -106,7 +106,7 @@ class OrderingSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   StructField("a", dataType, nullable = true) ::
 StructField("b", dataType, nullable = true) :: Nil)
 val maybeDataGenerator = RandomDataGenerator.forType(rowType, nullable 
= false)
-assume(maybeDataGenerator.isDefined)
+assert(maybeDataGenerator.isDefined)
 val randGenerator = maybeDataGenerator.get
 val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
 for (_ <- 1 to 50) {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
index cd2c681..8189353 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
@@ -195,7 +195,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils
   }
 
   test("SPARK-1669: cacheTable should be idempotent") {
-assume(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation])
+assert(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation])
 
 spark.catalog.cacheTable("testData")
 assertCached(spark.table("testData"))
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
index 6c824c2..5a67dce 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
@@ -1033,7 +1033,7 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 df.write.insertInto("students")
 spark.catalog.cacheTable("students")
 checkAnswer(spark.table("students"), df)
-assume(spark.catalog.isCache

[spark] branch branch-3.0 updated: [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

2020-03-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 8cb23f0  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests
8cb23f0 is described below

commit 8cb23f0cb5b20b7e49fdd16c52d6451e901d9a7a
Author: Josh Rosen 
AuthorDate: Mon Mar 2 15:20:45 2020 -0800

[SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

### What changes were proposed in this pull request?

This patch fixes several incorrect uses of `assume()` in our tests.

If a call to `assume(condition)` fails then it will cause the test to be 
marked as skipped instead of failed: this feature allows test cases to be 
skipped if certain prerequisites are missing. For example, we use this to skip 
certain tests when running on Windows (or when Python dependencies are 
unavailable).

In contrast, `assert(condition)` will fail the test if the condition 
doesn't hold.

If `assume()` is accidentally substituted for `assert()`then the resulting 
test will be marked as skipped in cases where it should have failed, 
undermining the purpose of the test.

This patch fixes several such cases, replacing certain `assume()` calls 
with `assert()`.

Credit to ahirreddy for spotting this problem.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #27754 from JoshRosen/fix-assume-vs-assert.

Lead-authored-by: Josh Rosen 
Co-authored-by: Josh Rosen 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit f0010c81e2ef9b8859b39917bb62b48d739a4a22)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala   | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++---
 .../src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +-
 .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +-
 .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala   | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
index 94e251d..4488902 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala
@@ -106,7 +106,7 @@ class OrderingSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   StructField("a", dataType, nullable = true) ::
 StructField("b", dataType, nullable = true) :: Nil)
 val maybeDataGenerator = RandomDataGenerator.forType(rowType, nullable 
= false)
-assume(maybeDataGenerator.isDefined)
+assert(maybeDataGenerator.isDefined)
 val randGenerator = maybeDataGenerator.get
 val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
 for (_ <- 1 to 50) {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
index cd2c681..8189353 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
@@ -195,7 +195,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils
   }
 
   test("SPARK-1669: cacheTable should be idempotent") {
-assume(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation])
+assert(!spark.table("testData").logicalPlan.isInstanceOf[InMemoryRelation])
 
 spark.catalog.cacheTable("testData")
 assertCached(spark.table("testData"))
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
index 6c824c2..5a67dce 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
@@ -1033,7 +1033,7 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 df.write.insertInto("students")
 spark.catalog.cacheTable("students")
 checkAnswer(spark.table("students"), df)
-assume(spark.catalog.isCache

[spark] branch branch-2.4 updated (cd8f86a -> 0b71b4d)

2020-03-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd8f86a  [SPARK-30813][ML] Fix Matrices.sprand comments
 add 0b71b4d  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala   | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +-
 .../src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++---
 .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +-
 .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala   | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (cd8f86a -> 0b71b4d)

2020-03-02 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd8f86a  [SPARK-30813][ML] Fix Matrices.sprand comments
 add 0b71b4d  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/OrderingSuite.scala   | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 4 ++--
 .../src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala | 6 +++---
 .../test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala | 2 +-
 .../src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala| 2 +-
 .../scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala| 6 +++---
 .../apache/spark/sql/sources/BucketedReadWithHiveSupportSuite.scala | 2 +-
 .../spark/sql/sources/BucketedWriteWithHiveSupportSuite.scala   | 2 +-
 10 files changed, 15 insertions(+), 15 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f0010c8 -> 473a28c)

2020-03-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f0010c8  [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests
 add 473a28c  [SPARK-30991] Refactor AQE readers and RDDs

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ShuffledRowRDD.scala   | 142 -
 .../apache/spark/sql/execution/SparkPlanInfo.scala |   2 +-
 .../adaptive/CustomShuffleReaderExec.scala |  81 
 .../execution/adaptive/CustomShuffledRowRDD.scala  | 113 
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 112 
 .../adaptive/OptimizeLocalShuffleReader.scala  |  88 +++--
 .../execution/adaptive/OptimizeSkewedJoin.scala|  72 ++-
 .../adaptive/ReduceNumShufflePartitions.scala  |  49 ++-
 .../adaptive/ShufflePartitionsCoalescer.scala  |  23 ++--
 .../execution/exchange/ShuffleExchangeExec.scala   |  12 +-
 .../ReduceNumShufflePartitionsSuite.scala  |  28 ++--
 .../ShufflePartitionsCoalescerSuite.scala  | 101 ++-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  23 ++--
 13 files changed, 317 insertions(+), 529 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffledRowRDD.scala
 delete mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LocalShuffledRowRDD.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30991] Refactor AQE readers and RDDs

2020-03-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 597  [SPARK-30991] Refactor AQE readers and RDDs
597 is described below

commit 597b5507448980e4fadbad85ffb104808081
Author: maryannxue 
AuthorDate: Mon Mar 2 16:04:00 2020 -0800

[SPARK-30991] Refactor AQE readers and RDDs

### What changes were proposed in this pull request?
This PR combines `CustomShuffledRowRDD` and `LocalShuffledRowRDD` into 
`ShuffledRowRDD`, and creates `CustomShuffleReaderExec` to unify and replace 
all existing AQE readers: `CoalescedShuffleReaderExec`, 
`LocalShuffleReaderExec` and `SkewJoinShuffleReaderExec`.

### Why are the changes needed?
To reduce code redundancy.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Passed existing UTs.

Closes #27742 from maryannxue/aqe-readers.

Authored-by: maryannxue 
Signed-off-by: gatorsmile 
(cherry picked from commit 473a28c1d032993c7fa515b39f2cb1e3105d65d3)
Signed-off-by: gatorsmile 
---
 .../spark/sql/execution/ShuffledRowRDD.scala   | 142 -
 .../apache/spark/sql/execution/SparkPlanInfo.scala |   2 +-
 .../adaptive/CustomShuffleReaderExec.scala |  81 
 .../execution/adaptive/CustomShuffledRowRDD.scala  | 113 
 .../execution/adaptive/LocalShuffledRowRDD.scala   | 112 
 .../adaptive/OptimizeLocalShuffleReader.scala  |  88 +++--
 .../execution/adaptive/OptimizeSkewedJoin.scala|  72 ++-
 .../adaptive/ReduceNumShufflePartitions.scala  |  49 ++-
 .../adaptive/ShufflePartitionsCoalescer.scala  |  23 ++--
 .../execution/exchange/ShuffleExchangeExec.scala   |  12 +-
 .../ReduceNumShufflePartitionsSuite.scala  |  28 ++--
 .../ShufflePartitionsCoalescerSuite.scala  | 101 ++-
 .../adaptive/AdaptiveQueryExecSuite.scala  |  23 ++--
 13 files changed, 317 insertions(+), 529 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
index 4c19f95..eb02259 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala
@@ -26,17 +26,28 @@ import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.execution.metric.{SQLMetric, 
SQLShuffleReadMetricsReporter}
 import org.apache.spark.sql.internal.SQLConf
 
+sealed trait ShufflePartitionSpec
+
+// A partition that reads data of one or more reducers, from 
`startReducerIndex` (inclusive) to
+// `endReducerIndex` (exclusive).
+case class CoalescedPartitionSpec(
+  startReducerIndex: Int, endReducerIndex: Int) extends ShufflePartitionSpec
+
+// A partition that reads partial data of one reducer, from `startMapIndex` 
(inclusive) to
+// `endMapIndex` (exclusive).
+case class PartialReducerPartitionSpec(
+  reducerIndex: Int, startMapIndex: Int, endMapIndex: Int) extends 
ShufflePartitionSpec
+
+// A partition that reads partial data of one mapper, from `startReducerIndex` 
(inclusive) to
+// `endReducerIndex` (exclusive).
+case class PartialMapperPartitionSpec(
+  mapIndex: Int, startReducerIndex: Int, endReducerIndex: Int) extends 
ShufflePartitionSpec
+
 /**
- * The [[Partition]] used by [[ShuffledRowRDD]]. A post-shuffle partition
- * (identified by `postShufflePartitionIndex`) contains a range of pre-shuffle 
partitions
- * (`startPreShufflePartitionIndex` to `endPreShufflePartitionIndex - 1`, 
inclusive).
+ * The [[Partition]] used by [[ShuffledRowRDD]].
  */
-private final class ShuffledRowRDDPartition(
-val postShufflePartitionIndex: Int,
-val startPreShufflePartitionIndex: Int,
-val endPreShufflePartitionIndex: Int) extends Partition {
-  override val index: Int = postShufflePartitionIndex
-}
+private final case class ShuffledRowRDDPartition(
+  index: Int, spec: ShufflePartitionSpec) extends Partition
 
 /**
  * A dummy partitioner for use with records whose partition ids have been 
pre-computed (i.e. for
@@ -94,8 +105,7 @@ class CoalescedPartitioner(val parent: Partitioner, val 
partitionStartIndices: A
  * interfaces / internals.
  *
  * This RDD takes a [[ShuffleDependency]] (`dependency`),
- * and an optional array of partition start indices as input arguments
- * (`specifiedPartitionStartIndices`).
+ * and an array of [[ShufflePartitionSpec]] as input arguments.
  *
  * The `dependency` has the parent RDD of this RDD, which represents the 
dataset before shuffle
  * (i.e. map output). Elements of this RDD are (partitionId, Row) pairs.
@@ -103,79 +113,97 @@ class CoalescedPartitioner(val parent: Partitioner, val

[spark] branch master updated (473a28c -> 3956e95)

2020-03-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 473a28c  [SPARK-30991] Refactor AQE readers and RDDs
 add 3956e95  [SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 
'pattern' at split in Scala API

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/functions.scala  | 20 ++--
 .../scala/org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 'pattern' at split in Scala API

2020-03-02 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new fafa8d8  [SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 
'pattern' at split in Scala API
fafa8d8 is described below

commit fafa8d887ded4ca779bcd9f1cef88cc40aded0ae
Author: HyukjinKwon 
AuthorDate: Tue Mar 3 10:24:50 2020 +0900

[SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 'pattern' at 
split in Scala API

### What changes were proposed in this pull request?

To address the concern pointed out in 
https://github.com/apache/spark/pull/7. This will make `split` 
source-compatible by removing minimal cosmetic changes.

### Why are the changes needed?

For source compatibility.

### Does this PR introduce any user-facing change?

No (it will prevent potential user-facing change from the original PR)

### How was this patch tested?

Unittest was changed (in order for us to detect that source compatibility 
easily).

Closes #27756 from HyukjinKwon/SPARK-25202.

Authored-by: HyukjinKwon 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 3956e95f059ba9599c3cfde29225177d29b2494a)
Signed-off-by: HyukjinKwon 
---
 .../main/scala/org/apache/spark/sql/functions.scala  | 20 ++--
 .../scala/org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index c60df14..c6e8cf7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -2460,25 +2460,25 @@ object functions {
   def soundex(e: Column): Column = withExpr { SoundEx(e.expr) }
 
   /**
-   * Splits str around matches of the given regex.
+   * Splits str around matches of the given pattern.
*
* @param str a string expression to split
-   * @param regex a string representing a regular expression. The regex string 
should be
-   *  a Java regular expression.
+   * @param pattern a string representing a regular expression. The regex 
string should be
+   *a Java regular expression.
*
* @group string_funcs
* @since 1.5.0
*/
-  def split(str: Column, regex: String): Column = withExpr {
-StringSplit(str.expr, Literal(regex), Literal(-1))
+  def split(str: Column, pattern: String): Column = withExpr {
+StringSplit(str.expr, Literal(pattern), Literal(-1))
   }
 
   /**
-   * Splits str around matches of the given regex.
+   * Splits str around matches of the given pattern.
*
* @param str a string expression to split
-   * @param regex a string representing a regular expression. The regex string 
should be
-   *  a Java regular expression.
+   * @param pattern a string representing a regular expression. The regex 
string should be
+   *a Java regular expression.
* @param limit an integer expression which controls the number of times the 
regex is applied.
*
*  limit greater than 0: The resulting array's length will not 
be more than limit,
@@ -2491,8 +2491,8 @@ object functions {
* @group string_funcs
* @since 3.0.0
*/
-  def split(str: Column, regex: String, limit: Int): Column = withExpr {
-StringSplit(str.expr, Literal(regex), Literal(limit))
+  def split(str: Column, pattern: String, limit: Int): Column = withExpr {
+StringSplit(str.expr, Literal(pattern), Literal(limit))
   }
 
   /**
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
index 42a9073..e74d553 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@@ -196,7 +196,7 @@ class DataFrameSuite extends QueryTest
   test("explode on output of array-valued function") {
 val df = Seq(("1,2"), ("4"), ("7,8,9")).toDF("csv")
 checkAnswer(
-  df.select(explode(split($"csv", ","))),
+  df.select(explode(split($"csv", pattern = ","))),
   Row("1") :: Row("2") :: Row("4") :: Row("7") :: Row("8") :: Row("9") :: 
Nil)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3956e95 -> 1fac06c)

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3956e95  [SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 
'pattern' at split in Scala API
 add 1fac06c  Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift 
server"

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/HiveResult.scala| 61 --
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  3 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  2 +-
 .../spark/sql/execution/HiveResultSuite.scala  | 25 +
 .../SparkExecuteStatementOperation.scala   | 10 +---
 .../sql/hive/thriftserver/SparkSQLDriver.scala |  5 +-
 .../sql/hive/execution/HiveComparisonTest.scala|  4 +-
 7 files changed, 43 insertions(+), 67 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3956e95 -> 1fac06c)

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3956e95  [SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 
'pattern' at split in Scala API
 add 1fac06c  Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift 
server"

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/HiveResult.scala| 61 --
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  3 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  2 +-
 .../spark/sql/execution/HiveResultSuite.scala  | 25 +
 .../SparkExecuteStatementOperation.scala   | 10 +---
 .../sql/hive/thriftserver/SparkSQLDriver.scala |  5 +-
 .../sql/hive/execution/HiveComparisonTest.scala|  4 +-
 7 files changed, 43 insertions(+), 67 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"

2020-03-02 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 4fa447c  Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift 
server"
4fa447c is described below

commit 4fa447cabc1c09170edd9da0e586660a7ae0db74
Author: Kent Yao 
AuthorDate: Tue Mar 3 14:21:20 2020 +0800

Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"

This reverts commit afaeb29599593f021c9ea47e52f8c70013a4afef.

### What changes were proposed in this pull request?

Based on the result and comment from 
https://github.com/apache/spark/pull/27552#discussion_r385531744

In the hive module, server-side provides datetime values simply use 
`value.toSting`, and the client-side regenerates the results back in 
`HiveBaseResultSet` with `java.sql.Date(Timestamp).valueOf`.
there will be inconsistency between client and server if we use java8 APIs

### Why are the changes needed?

the change is still unclear enough

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

Nah

Closes #27733 from yaooqinn/SPARK-30808.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 1fac06c4307c8c7a5a48a50952d48ee5b9ebccb2)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/execution/HiveResult.scala| 61 --
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  3 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  2 +-
 .../spark/sql/execution/HiveResultSuite.scala  | 25 +
 .../SparkExecuteStatementOperation.scala   | 10 +---
 .../sql/hive/thriftserver/SparkSQLDriver.scala |  5 +-
 .../sql/hive/execution/HiveComparisonTest.scala|  4 +-
 7 files changed, 43 insertions(+), 67 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
index b191840..5a2f16d 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala
@@ -21,10 +21,9 @@ import java.nio.charset.StandardCharsets
 import java.sql.{Date, Timestamp}
 import java.time.{Instant, LocalDate}
 
-import org.apache.spark.sql.{Dataset, Row}
+import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, 
TimestampFormatter}
 import org.apache.spark.sql.execution.command.{DescribeCommandBase, 
ExecutedCommandExec, ShowTablesCommand}
-import org.apache.spark.sql.functions._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval
@@ -37,43 +36,27 @@ object HiveResult {
* Returns the result as a hive compatible sequence of strings. This is used 
in tests and
* `SparkSQLDriver` for CLI applications.
*/
-  def hiveResultString(ds: Dataset[_]): Seq[String] = {
-val executedPlan = ds.queryExecution.executedPlan
-executedPlan match {
-  case ExecutedCommandExec(_: DescribeCommandBase) =>
-// If it is a describe command for a Hive table, we want to have the 
output format
-// be similar with Hive.
-executedPlan.executeCollectPublic().map {
-  case Row(name: String, dataType: String, comment) =>
-Seq(name, dataType,
-  Option(comment.asInstanceOf[String]).getOrElse(""))
-  .map(s => String.format(s"%-20s", s))
-  .mkString("\t")
-}
-  // SHOW TABLES in Hive only output table names,
-  // while ours output database, table name, isTemp.
-  case command @ ExecutedCommandExec(s: ShowTablesCommand) if 
!s.isExtended =>
-command.executeCollect().map(_.getString(1))
-  case _ =>
-val sessionWithJava8DatetimeEnabled = {
-  val cloned = ds.sparkSession.cloneSession()
-  cloned.conf.set(SQLConf.DATETIME_JAVA8API_ENABLED.key, true)
-  cloned
-}
-sessionWithJava8DatetimeEnabled.withActive {
-  // We cannot collect the original dataset because its encoders could 
be created
-  // with disabled Java 8 date-time API.
-  val result: Seq[Seq[Any]] = Dataset.ofRows(ds.sparkSession, 
ds.logicalPlan)
-.queryExecution
-.executedPlan
-.executeCollectPublic().map(_.toSeq).toSeq
-  // We need the types so we can output struct field names
-  val types = executedPlan.output.map(_.dataType)
-  // Reformat to match hive tab delimited output.
-  result.map(_.zip(types).map(e => toHiveString(e)))
-.map(_.mkString("\t"))
-}
-}
+  def hiveResultString(executedPlan: SparkPlan): Seq[

[spark] branch branch-3.0 updated: [SPARK-30964][CORE][WEBUI] Accelerate InMemoryStore with a new index

[spark] branch master updated (6b64143 -> f24a460)

[spark] branch master updated (6b64143 -> f24a460)

[spark] branch branch-3.0 updated: [SPARK-30993][SQL] Use its sql type for UDT when checking the type of length (fixed/var) or mutable

[spark] branch master updated (f24a460 -> ac12276)

[spark] branch master updated (f24a460 -> ac12276)

[spark] branch branch-3.0 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

[spark] branch branch-2.4 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

[spark] branch branch-3.0 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

[spark] branch branch-2.4 updated: [SPARK-30813][ML] Fix Matrices.sprand comments

[spark] branch master updated (ac12276 -> b517f99)

[spark] branch master updated (ac12276 -> b517f99)

[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone

[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone

[spark] branch master updated (b517f99 -> f0010c8)

[spark] branch master updated (b517f99 -> f0010c8)

[spark] branch branch-3.0 updated: [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

[spark] branch branch-3.0 updated: [SPARK-31003][TESTS] Fix incorrect uses of assume() in tests

[spark] branch branch-2.4 updated (cd8f86a -> 0b71b4d)

[spark] branch branch-2.4 updated (cd8f86a -> 0b71b4d)

[spark] branch master updated (f0010c8 -> 473a28c)

[spark] branch branch-3.0 updated: [SPARK-30991] Refactor AQE readers and RDDs

[spark] branch master updated (473a28c -> 3956e95)

[spark] branch branch-3.0 updated: [SPARK-25202][SQL][FOLLOW-UP] Keep the old parameter name 'pattern' at split in Scala API

[spark] branch master updated (3956e95 -> 1fac06c)

[spark] branch master updated (3956e95 -> 1fac06c)

[spark] branch branch-3.0 updated: Revert "[SPARK-30808][SQL] Enable Java 8 time API in Thrift server"

27 matches

Site Navigation

Mail list logo

Footer information