[spark] branch branch-3.0 updated: [SPARK-31425][SQL][CORE] UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect UnsafeAlignedOffset

2020-04-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 33d25ba  [SPARK-31425][SQL][CORE] 
UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect 
UnsafeAlignedOffset
33d25ba is described below

commit 33d25ba30b7565d822d252093c5566f95148a483
Author: yi.wu 
AuthorDate: Fri Apr 17 04:48:27 2020 +

[SPARK-31425][SQL][CORE] 
UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect 
UnsafeAlignedOffset

### What changes were proposed in this pull request?

Make `UnsafeKVExternalSorter` / `VariableLengthRowBasedKeyValueBatch ` also 
respect `UnsafeAlignedOffset` when reading the record and update some out of 
date comemnts.

### Why are the changes needed?

Since `BytesToBytesMap` respects `UnsafeAlignedOffset` when writing the 
record, `UnsafeKVExternalSorter` should also respect `UnsafeAlignedOffset` when 
reading the record from `BytesToBytesMap` otherwise it will causes data 
correctness issue.

Unlike `UnsafeKVExternalSorter` may reading records from `BytesToBytesMap`, 
`VariableLengthRowBasedKeyValueBatch` writes and reads records by itself. Thus, 
similar to #22053 and 
[comment](https://github.com/apache/spark/pull/22053#issuecomment-411975239) 
there, fix for `VariableLengthRowBasedKeyValueBatch` more likely an improvement 
for the support of SPARC platform.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested `HashAggregationQueryWithControlledFallbackSuite` with 
`UAO_SIZE=8`  to simulate SPARC platform. And tests only pass with this fix.

Closes #28195 from Ngone51/fix_uao.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 40f9dbb6284ca6d5664ec0983faba723bc72d7f1)
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/unsafe/UnsafeAlignedOffset.java   | 14 --
 .../apache/spark/unsafe/map/BytesToBytesMap.java   | 14 +++---
 .../unsafe/sort/UnsafeInMemorySorter.java  |  2 +-
 .../VariableLengthRowBasedKeyValueBatch.java   | 34 ---
 .../sql/execution/UnsafeKVExternalSorter.java  | 20 +
 .../sql/hive/execution/AggregationQuerySuite.scala | 50 --
 6 files changed, 78 insertions(+), 56 deletions(-)

diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java
index 546e878..d399e66 100644
--- 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java
+++ 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java
@@ -28,12 +28,20 @@ public class UnsafeAlignedOffset {
 
   private static final int UAO_SIZE = Platform.unaligned() ? 4 : 8;
 
+  private static int TEST_UAO_SIZE = 0;
+
+  // used for test only
+  public static void setUaoSize(int size) {
+assert size == 0 || size == 4 || size == 8;
+TEST_UAO_SIZE = size;
+  }
+
   public static int getUaoSize() {
-return UAO_SIZE;
+return TEST_UAO_SIZE == 0 ? UAO_SIZE : TEST_UAO_SIZE;
   }
 
   public static int getSize(Object object, long offset) {
-switch (UAO_SIZE) {
+switch (getUaoSize()) {
   case 4:
 return Platform.getInt(object, offset);
   case 8:
@@ -46,7 +54,7 @@ public class UnsafeAlignedOffset {
   }
 
   public static void putSize(Object object, long offset, int value) {
-switch (UAO_SIZE) {
+switch (getUaoSize()) {
   case 4:
 Platform.putInt(object, offset, value);
 break;
diff --git 
a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java 
b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
index 7bdd894..b280f7c 100644
--- a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
+++ b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
@@ -54,13 +54,13 @@ import 
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter;
  * probably be using sorting instead of hashing for better cache locality.
  *
  * The key and values under the hood are stored together, in the following 
format:
- *   Bytes 0 to 4: len(k) (key length in bytes) + len(v) (value length in 
bytes) + 4
- *   Bytes 4 to 8: len(k)
- *   Bytes 8 to 8 + len(k): key data
- *   Bytes 8 + len(k) to 8 + len(k) + len(v): value data
- *   Bytes 8 + len(k) + len(v) to 8 + len(k) + len(v) + 8: pointer to next pair
+ *   First uaoSize bytes: len(k) (key length in bytes) + len(v) (value length 
in bytes) + uaoSize
+ *   Next uaoSize bytes: len(k)
+ *   Next len(k) bytes: key data
+ *   Next len(v) bytes: value data
+ *   Last 8 bytes: pointer to next pair
  *
- * This means that the first four bytes store the entire record 

[spark] branch master updated: [SPARK-31425][SQL][CORE] UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect UnsafeAlignedOffset

2020-04-16 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 40f9dbb  [SPARK-31425][SQL][CORE] 
UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect 
UnsafeAlignedOffset
40f9dbb is described below

commit 40f9dbb6284ca6d5664ec0983faba723bc72d7f1
Author: yi.wu 
AuthorDate: Fri Apr 17 04:48:27 2020 +

[SPARK-31425][SQL][CORE] 
UnsafeKVExternalSorter/VariableLengthRowBasedKeyValueBatch should also respect 
UnsafeAlignedOffset

### What changes were proposed in this pull request?

Make `UnsafeKVExternalSorter` / `VariableLengthRowBasedKeyValueBatch ` also 
respect `UnsafeAlignedOffset` when reading the record and update some out of 
date comemnts.

### Why are the changes needed?

Since `BytesToBytesMap` respects `UnsafeAlignedOffset` when writing the 
record, `UnsafeKVExternalSorter` should also respect `UnsafeAlignedOffset` when 
reading the record from `BytesToBytesMap` otherwise it will causes data 
correctness issue.

Unlike `UnsafeKVExternalSorter` may reading records from `BytesToBytesMap`, 
`VariableLengthRowBasedKeyValueBatch` writes and reads records by itself. Thus, 
similar to #22053 and 
[comment](https://github.com/apache/spark/pull/22053#issuecomment-411975239) 
there, fix for `VariableLengthRowBasedKeyValueBatch` more likely an improvement 
for the support of SPARC platform.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Manually tested `HashAggregationQueryWithControlledFallbackSuite` with 
`UAO_SIZE=8`  to simulate SPARC platform. And tests only pass with this fix.

Closes #28195 from Ngone51/fix_uao.

Authored-by: yi.wu 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/unsafe/UnsafeAlignedOffset.java   | 14 --
 .../apache/spark/unsafe/map/BytesToBytesMap.java   | 14 +++---
 .../unsafe/sort/UnsafeInMemorySorter.java  |  2 +-
 .../VariableLengthRowBasedKeyValueBatch.java   | 34 ---
 .../sql/execution/UnsafeKVExternalSorter.java  | 20 +
 .../sql/hive/execution/AggregationQuerySuite.scala | 50 --
 6 files changed, 78 insertions(+), 56 deletions(-)

diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java
index 546e878..d399e66 100644
--- 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java
+++ 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/UnsafeAlignedOffset.java
@@ -28,12 +28,20 @@ public class UnsafeAlignedOffset {
 
   private static final int UAO_SIZE = Platform.unaligned() ? 4 : 8;
 
+  private static int TEST_UAO_SIZE = 0;
+
+  // used for test only
+  public static void setUaoSize(int size) {
+assert size == 0 || size == 4 || size == 8;
+TEST_UAO_SIZE = size;
+  }
+
   public static int getUaoSize() {
-return UAO_SIZE;
+return TEST_UAO_SIZE == 0 ? UAO_SIZE : TEST_UAO_SIZE;
   }
 
   public static int getSize(Object object, long offset) {
-switch (UAO_SIZE) {
+switch (getUaoSize()) {
   case 4:
 return Platform.getInt(object, offset);
   case 8:
@@ -46,7 +54,7 @@ public class UnsafeAlignedOffset {
   }
 
   public static void putSize(Object object, long offset, int value) {
-switch (UAO_SIZE) {
+switch (getUaoSize()) {
   case 4:
 Platform.putInt(object, offset, value);
 break;
diff --git 
a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java 
b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
index a57cd3b..64c240c 100644
--- a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
+++ b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
@@ -54,13 +54,13 @@ import 
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter;
  * probably be using sorting instead of hashing for better cache locality.
  *
  * The key and values under the hood are stored together, in the following 
format:
- *   Bytes 0 to 4: len(k) (key length in bytes) + len(v) (value length in 
bytes) + 4
- *   Bytes 4 to 8: len(k)
- *   Bytes 8 to 8 + len(k): key data
- *   Bytes 8 + len(k) to 8 + len(k) + len(v): value data
- *   Bytes 8 + len(k) + len(v) to 8 + len(k) + len(v) + 8: pointer to next pair
+ *   First uaoSize bytes: len(k) (key length in bytes) + len(v) (value length 
in bytes) + uaoSize
+ *   Next uaoSize bytes: len(k)
+ *   Next len(k) bytes: key data
+ *   Next len(v) bytes: value data
+ *   Last 8 bytes: pointer to next pair
  *
- * This means that the first four bytes store the entire record (key + value) 
length. This format
+ * It means first uaoSize bytes store the entire record (key + value + 

[spark] branch master updated (fab4ca5 -> b2e9e17)

2020-04-16 Thread jiangxb1987
This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
 add b2e9e17  [SPARK-31344][CORE] Polish implementation of barrier() and 
allGather()

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 114 +
 .../org/apache/spark/BarrierTaskContext.scala  |  59 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  17 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |   1 -
 python/pyspark/taskcontext.py  |  15 ++-
 5 files changed, 46 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (fab4ca5 -> b2e9e17)

2020-04-16 Thread jiangxb1987
This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
 add b2e9e17  [SPARK-31344][CORE] Polish implementation of barrier() and 
allGather()

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 114 +
 .../org/apache/spark/BarrierTaskContext.scala  |  59 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  17 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |   1 -
 python/pyspark/taskcontext.py  |  15 ++-
 5 files changed, 46 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7fef70  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
e7fef70 is described below

commit e7fef70fbbea08a38316abdaa9445123bb8c39e2
Author: herman 
AuthorDate: Thu Apr 16 18:47:46 2020 -0700

[SPARK-31450][SQL] Make ExpressionEncoder thread-safe

### What changes were proposed in this pull request?
This PR moves the `ExpressionEncoder.toRow` and `ExpressionEncoder.fromRow` 
functions into their own function objects(`ExpressionEncoder.Serializer` & 
`ExpressionEncoder.Deserializer`). This effectively makes the 
`ExpressionEncoder` stateless, thread-safe and (more) reusable. The function 
objects are not thread safe, however they are documented as such and should be 
used in a more limited scope (making it easier to reason about thread safety).

### Why are the changes needed?
ExpressionEncoders are not thread-safe. We had various (nasty) bugs because 
of this.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #28223 from hvanhovell/SPARK-31450.

Authored-by: herman 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fab4ca5156d5e1cc0e976c7c27b28a12fa61eb6d)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
index c332144..4944e0c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
@@ -91,8 +91,8 @@ private[image] class ImageFileFormat extends FileFormat with 
DataSourceRegister
 if (requiredSchema.isEmpty) {
   filteredResult.map(_ => emptyUnsafeRow)
 } else {
-  val converter = RowEncoder(requiredSchema)
-  filteredResult.map(row => converter.toRow(row))
+  val toRow = RowEncoder(requiredSchema).createSerializer()
+  filteredResult.map(row => toRow(row))
 }
   }
 }
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 

[spark] branch master updated (8608189 -> fab4ca5)

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
 add fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7fef70  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
e7fef70 is described below

commit e7fef70fbbea08a38316abdaa9445123bb8c39e2
Author: herman 
AuthorDate: Thu Apr 16 18:47:46 2020 -0700

[SPARK-31450][SQL] Make ExpressionEncoder thread-safe

### What changes were proposed in this pull request?
This PR moves the `ExpressionEncoder.toRow` and `ExpressionEncoder.fromRow` 
functions into their own function objects(`ExpressionEncoder.Serializer` & 
`ExpressionEncoder.Deserializer`). This effectively makes the 
`ExpressionEncoder` stateless, thread-safe and (more) reusable. The function 
objects are not thread safe, however they are documented as such and should be 
used in a more limited scope (making it easier to reason about thread safety).

### Why are the changes needed?
ExpressionEncoders are not thread-safe. We had various (nasty) bugs because 
of this.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #28223 from hvanhovell/SPARK-31450.

Authored-by: herman 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fab4ca5156d5e1cc0e976c7c27b28a12fa61eb6d)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
index c332144..4944e0c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
@@ -91,8 +91,8 @@ private[image] class ImageFileFormat extends FileFormat with 
DataSourceRegister
 if (requiredSchema.isEmpty) {
   filteredResult.map(_ => emptyUnsafeRow)
 } else {
-  val converter = RowEncoder(requiredSchema)
-  filteredResult.map(row => converter.toRow(row))
+  val toRow = RowEncoder(requiredSchema).createSerializer()
+  filteredResult.map(row => toRow(row))
 }
   }
 }
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 

[spark] branch master updated (8608189 -> fab4ca5)

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
 add fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7fef70  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
e7fef70 is described below

commit e7fef70fbbea08a38316abdaa9445123bb8c39e2
Author: herman 
AuthorDate: Thu Apr 16 18:47:46 2020 -0700

[SPARK-31450][SQL] Make ExpressionEncoder thread-safe

### What changes were proposed in this pull request?
This PR moves the `ExpressionEncoder.toRow` and `ExpressionEncoder.fromRow` 
functions into their own function objects(`ExpressionEncoder.Serializer` & 
`ExpressionEncoder.Deserializer`). This effectively makes the 
`ExpressionEncoder` stateless, thread-safe and (more) reusable. The function 
objects are not thread safe, however they are documented as such and should be 
used in a more limited scope (making it easier to reason about thread safety).

### Why are the changes needed?
ExpressionEncoders are not thread-safe. We had various (nasty) bugs because 
of this.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #28223 from hvanhovell/SPARK-31450.

Authored-by: herman 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fab4ca5156d5e1cc0e976c7c27b28a12fa61eb6d)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
index c332144..4944e0c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
@@ -91,8 +91,8 @@ private[image] class ImageFileFormat extends FileFormat with 
DataSourceRegister
 if (requiredSchema.isEmpty) {
   filteredResult.map(_ => emptyUnsafeRow)
 } else {
-  val converter = RowEncoder(requiredSchema)
-  filteredResult.map(row => converter.toRow(row))
+  val toRow = RowEncoder(requiredSchema).createSerializer()
+  filteredResult.map(row => toRow(row))
 }
   }
 }
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 

[spark] branch master updated (8608189 -> fab4ca5)

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
 add fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (8608189 -> fab4ca5)

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
 add fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7fef70  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
e7fef70 is described below

commit e7fef70fbbea08a38316abdaa9445123bb8c39e2
Author: herman 
AuthorDate: Thu Apr 16 18:47:46 2020 -0700

[SPARK-31450][SQL] Make ExpressionEncoder thread-safe

### What changes were proposed in this pull request?
This PR moves the `ExpressionEncoder.toRow` and `ExpressionEncoder.fromRow` 
functions into their own function objects(`ExpressionEncoder.Serializer` & 
`ExpressionEncoder.Deserializer`). This effectively makes the 
`ExpressionEncoder` stateless, thread-safe and (more) reusable. The function 
objects are not thread safe, however they are documented as such and should be 
used in a more limited scope (making it easier to reason about thread safety).

### Why are the changes needed?
ExpressionEncoders are not thread-safe. We had various (nasty) bugs because 
of this.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #28223 from hvanhovell/SPARK-31450.

Authored-by: herman 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fab4ca5156d5e1cc0e976c7c27b28a12fa61eb6d)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
index c332144..4944e0c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
@@ -91,8 +91,8 @@ private[image] class ImageFileFormat extends FileFormat with 
DataSourceRegister
 if (requiredSchema.isEmpty) {
   filteredResult.map(_ => emptyUnsafeRow)
 } else {
-  val converter = RowEncoder(requiredSchema)
-  filteredResult.map(row => converter.toRow(row))
+  val toRow = RowEncoder(requiredSchema).createSerializer()
+  filteredResult.map(row => toRow(row))
 }
   }
 }
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 

[spark] branch master updated (8608189 -> fab4ca5)

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
 add fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

No new revisions were added by this update.

Summary of changes:
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31450][SQL] Make ExpressionEncoder thread-safe

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7fef70  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
e7fef70 is described below

commit e7fef70fbbea08a38316abdaa9445123bb8c39e2
Author: herman 
AuthorDate: Thu Apr 16 18:47:46 2020 -0700

[SPARK-31450][SQL] Make ExpressionEncoder thread-safe

### What changes were proposed in this pull request?
This PR moves the `ExpressionEncoder.toRow` and `ExpressionEncoder.fromRow` 
functions into their own function objects(`ExpressionEncoder.Serializer` & 
`ExpressionEncoder.Deserializer`). This effectively makes the 
`ExpressionEncoder` stateless, thread-safe and (more) reusable. The function 
objects are not thread safe, however they are documented as such and should be 
used in a more limited scope (making it easier to reason about thread safety).

### Why are the changes needed?
ExpressionEncoders are not thread-safe. We had various (nasty) bugs because 
of this.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #28223 from hvanhovell/SPARK-31450.

Authored-by: herman 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fab4ca5156d5e1cc0e976c7c27b28a12fa61eb6d)
Signed-off-by: Dongjoon Hyun 
---
 .../spark/ml/source/image/ImageFileFormat.scala|   4 +-
 .../spark/ml/source/libsvm/LibSVMRelation.scala|   4 +-
 .../mllib/linalg/UDTSerializationBenchmark.scala   |  10 +-
 .../main/scala/org/apache/spark/sql/Encoder.scala  |   9 +-
 .../sql/catalyst/encoders/ExpressionEncoder.scala  | 102 ++---
 .../spark/sql/catalyst/expressions/ScalaUDF.scala  |   4 +-
 .../scala/org/apache/spark/sql/HashBenchmark.scala |   4 +-
 .../spark/sql/UnsafeProjectionBenchmark.scala  |   4 +-
 .../catalyst/encoders/EncoderResolutionSuite.scala |  28 --
 .../catalyst/encoders/ExpressionEncoderSuite.scala |  22 ++---
 .../sql/catalyst/encoders/RowEncoderSuite.scala|  49 ++
 .../expressions/HashExpressionsSuite.scala |   6 +-
 .../expressions/ObjectExpressionsSuite.scala   |   4 +-
 .../codegen/GenerateUnsafeRowJoinerSuite.scala |   4 +-
 .../catalyst/util/ArrayDataIndexedSeqSuite.scala   |   4 +-
 .../spark/sql/catalyst/util/UnsafeArraySuite.scala |  74 +--
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  13 ++-
 .../scala/org/apache/spark/sql/SparkSession.scala  |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |   3 +-
 .../spark/sql/execution/aggregate/udaf.scala   |  15 +--
 .../execution/datasources/DataSourceStrategy.scala |   4 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |   4 +-
 .../datasources/v2/DescribeNamespaceExec.scala |   6 +-
 .../datasources/v2/DescribeTableExec.scala |   6 +-
 .../datasources/v2/ShowCurrentNamespaceExec.scala  |  11 ++-
 .../datasources/v2/ShowNamespacesExec.scala|   6 +-
 .../datasources/v2/ShowTablePropertiesExec.scala   |   6 +-
 .../execution/datasources/v2/ShowTablesExec.scala  |  12 +--
 .../continuous/ContinuousTextSocketSource.scala|   3 +-
 .../spark/sql/execution/streaming/memory.scala |   8 +-
 .../streaming/sources/ContinuousMemoryStream.scala |   2 +-
 .../streaming/sources/ForeachBatchSink.scala   |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   2 +-
 .../sql/execution/streaming/sources/memory.scala   |   4 +-
 .../apache/spark/sql/internal/CatalogImpl.scala|   3 +-
 .../spark/sql/execution/GroupedIteratorSuite.scala |  15 ++-
 .../benchmark/UnsafeArrayDataBenchmark.scala   |  29 +++---
 .../binaryfile/BinaryFileFormatSuite.scala |   2 +-
 .../apache/spark/sql/streaming/StreamTest.scala|  22 +++--
 39 files changed, 282 insertions(+), 238 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
index c332144..4944e0c 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala
@@ -91,8 +91,8 @@ private[image] class ImageFileFormat extends FileFormat with 
DataSourceRegister
 if (requiredSchema.isEmpty) {
   filteredResult.map(_ => emptyUnsafeRow)
 } else {
-  val converter = RowEncoder(requiredSchema)
-  filteredResult.map(row => converter.toRow(row))
+  val toRow = RowEncoder(requiredSchema).createSerializer()
+  filteredResult.map(row => toRow(row))
 }
   }
 }
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala 

[spark] branch master updated: [SPARK-31446][WEBUI] Make html elements for a paged table possible to have different id attribute

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
8608189 is described below

commit 86081893354a91ecf6226b82aaf5768d9e74c138
Author: Kousuke Saruta 
AuthorDate: Thu Apr 16 16:24:11 2020 -0700

[SPARK-31446][WEBUI] Make html elements for a paged table possible to have 
different id attribute

### What changes were proposed in this pull request?

This PR makes each id attribute for page navigations in a page unique.

`PagedTable#pageNavigation` returns HTML elements representing a page 
navigation for a paged table.
In the current implementation, the method generates an id and it's used for 
id attribute for a set of elements for the page navigation.
But some pages have two page navigations so there are two set of elements 
where corresponding elements have the same id.
For example, there are two `form-completedJob-table-page` id in JobsPage.
### Why are the changes needed?

Each id attribute should be unique in a page.
The following is a screenshot of warning messages shown with Chrome when I 
visit JobsPage (Firefox doesn't show in my environment).
https://user-images.githubusercontent.com/4736016/79261523-f3fa9280-7eca-11ea-861d-d54f04f1b0bc.png;>

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I added a test case for `pageNavigation` extended.
I also manually tested that there were no warning messages for the 
uniqueness in JobsPage and JobPage.

Closes #28217 from sarutak/unique-form-id.

Authored-by: Kousuke Saruta 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/ui/PagedTable.scala | 19 --
 .../org/apache/spark/ui/PagedTableSuite.scala  | 29 ++
 2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala 
b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
index 06b64c1..008dcc6 100644
--- a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
+++ b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
@@ -115,17 +115,18 @@ private[spark] trait PagedTable[T] {
 _dataSource.pageSize
   }
 
-  val pageNavi = pageNavigation(pageToShow, pageSize, totalPages)
+  val pageNaviTop = pageNavigation(pageToShow, pageSize, totalPages, 
tableId + "-top")
+  val pageNaviBottom = pageNavigation(pageToShow, pageSize, totalPages, 
tableId + "-bottom")
 
   
-{pageNavi}
+{pageNaviTop}
 
   {headers}
   
 {data.map(row)}
   
 
-{pageNavi}
+{pageNaviBottom}
   
 } catch {
   case e: IndexOutOfBoundsException =>
@@ -171,7 +172,11 @@ private[spark] trait PagedTable[T] {
* > means jumping to the next page.
* }}}
*/
-  private[ui] def pageNavigation(page: Int, pageSize: Int, totalPages: Int): 
Seq[Node] = {
+  private[ui] def pageNavigation(
+  page: Int,
+  pageSize: Int,
+  totalPages: Int,
+  navigationId: String = tableId): Seq[Node] = {
 // A group includes all page numbers will be shown in the page navigation.
 // The size of group is 10 means there are 10 page numbers will be shown.
 // The first group is 1 to 10, the second is 2 to 20, and so on
@@ -214,7 +219,7 @@ private[spark] trait PagedTable[T] {
 
 
   
-{totalPages} Pages. Jump to
   
 
   . Show 
   
diff --git a/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala 
b/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala
index d18f554..4125436 100644
--- a/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala
@@ -85,6 +85,35 @@ class PagedTableSuite extends SparkFunSuite {
 assert((pagedTable.pageNavigation(93, 10, 97).head \\ 
"li").map(_.text.trim) ===
   Seq("<<", "<") ++ (91 to 97).map(_.toString) ++ Seq(">"))
   }
+
+  test("pageNavigation with different id") {
+val pagedTable = new PagedTable[Int] {
+  override def tableId: String = "testTable"
+
+  override def tableCssClass: String = ""
+
+  override def dataSource: PagedDataSource[Int] = null
+
+  override def pageLink(page: Int): String = ""
+
+  override def headers: Seq[Node] = Nil
+
+  override def row(t: Int): Seq[Node] = Nil
+
+  override def pageSizeFormField: String = ""
+
+  override def pageNumberFormField: String = ""
+
+  override def goButtonFormPath: String = ""
+}
+
+val defaultIdNavigation = pagedTable.pageNavigation(1, 10, 2).head \\ 
"form"
+

[spark] branch master updated: [SPARK-31446][WEBUI] Make html elements for a paged table possible to have different id attribute

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8608189  [SPARK-31446][WEBUI] Make html elements for a paged table 
possible to have different id attribute
8608189 is described below

commit 86081893354a91ecf6226b82aaf5768d9e74c138
Author: Kousuke Saruta 
AuthorDate: Thu Apr 16 16:24:11 2020 -0700

[SPARK-31446][WEBUI] Make html elements for a paged table possible to have 
different id attribute

### What changes were proposed in this pull request?

This PR makes each id attribute for page navigations in a page unique.

`PagedTable#pageNavigation` returns HTML elements representing a page 
navigation for a paged table.
In the current implementation, the method generates an id and it's used for 
id attribute for a set of elements for the page navigation.
But some pages have two page navigations so there are two set of elements 
where corresponding elements have the same id.
For example, there are two `form-completedJob-table-page` id in JobsPage.
### Why are the changes needed?

Each id attribute should be unique in a page.
The following is a screenshot of warning messages shown with Chrome when I 
visit JobsPage (Firefox doesn't show in my environment).
https://user-images.githubusercontent.com/4736016/79261523-f3fa9280-7eca-11ea-861d-d54f04f1b0bc.png;>

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I added a test case for `pageNavigation` extended.
I also manually tested that there were no warning messages for the 
uniqueness in JobsPage and JobPage.

Closes #28217 from sarutak/unique-form-id.

Authored-by: Kousuke Saruta 
Signed-off-by: Dongjoon Hyun 
---
 .../scala/org/apache/spark/ui/PagedTable.scala | 19 --
 .../org/apache/spark/ui/PagedTableSuite.scala  | 29 ++
 2 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala 
b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
index 06b64c1..008dcc6 100644
--- a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
+++ b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
@@ -115,17 +115,18 @@ private[spark] trait PagedTable[T] {
 _dataSource.pageSize
   }
 
-  val pageNavi = pageNavigation(pageToShow, pageSize, totalPages)
+  val pageNaviTop = pageNavigation(pageToShow, pageSize, totalPages, 
tableId + "-top")
+  val pageNaviBottom = pageNavigation(pageToShow, pageSize, totalPages, 
tableId + "-bottom")
 
   
-{pageNavi}
+{pageNaviTop}
 
   {headers}
   
 {data.map(row)}
   
 
-{pageNavi}
+{pageNaviBottom}
   
 } catch {
   case e: IndexOutOfBoundsException =>
@@ -171,7 +172,11 @@ private[spark] trait PagedTable[T] {
* > means jumping to the next page.
* }}}
*/
-  private[ui] def pageNavigation(page: Int, pageSize: Int, totalPages: Int): 
Seq[Node] = {
+  private[ui] def pageNavigation(
+  page: Int,
+  pageSize: Int,
+  totalPages: Int,
+  navigationId: String = tableId): Seq[Node] = {
 // A group includes all page numbers will be shown in the page navigation.
 // The size of group is 10 means there are 10 page numbers will be shown.
 // The first group is 1 to 10, the second is 2 to 20, and so on
@@ -214,7 +219,7 @@ private[spark] trait PagedTable[T] {
 
 
   
-{totalPages} Pages. Jump to
   
 
   . Show 
   
diff --git a/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala 
b/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala
index d18f554..4125436 100644
--- a/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/ui/PagedTableSuite.scala
@@ -85,6 +85,35 @@ class PagedTableSuite extends SparkFunSuite {
 assert((pagedTable.pageNavigation(93, 10, 97).head \\ 
"li").map(_.text.trim) ===
   Seq("<<", "<") ++ (91 to 97).map(_.toString) ++ Seq(">"))
   }
+
+  test("pageNavigation with different id") {
+val pagedTable = new PagedTable[Int] {
+  override def tableId: String = "testTable"
+
+  override def tableCssClass: String = ""
+
+  override def dataSource: PagedDataSource[Int] = null
+
+  override def pageLink(page: Int): String = ""
+
+  override def headers: Seq[Node] = Nil
+
+  override def row(t: Int): Seq[Node] = Nil
+
+  override def pageSizeFormField: String = ""
+
+  override def pageNumberFormField: String = ""
+
+  override def goButtonFormPath: String = ""
+}
+
+val defaultIdNavigation = pagedTable.pageNavigation(1, 10, 2).head \\ 
"form"
+

[spark] branch master updated: [SPARK-29905][K8S] Improve pod lifecycle manager behavior with dynamic allocation

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b8ccd75  [SPARK-29905][K8S] Improve pod lifecycle manager behavior 
with dynamic allocation
b8ccd75 is described below

commit b8ccd755244d3cd8a81a9f4a1eafa2a4e48759d2
Author: Marcelo Vanzin 
AuthorDate: Thu Apr 16 14:15:10 2020 -0700

[SPARK-29905][K8S] Improve pod lifecycle manager behavior with dynamic 
allocation

This issue mainly shows up when you enable dynamic allocation:
because there are many executor state changes (because of executors
being requested and starting to run, and later stopped), the lifecycle
manager class could end up logging information about the same executor
multiple times, since the different events would cause the same
executor update to be present in multiple pod snapshots. On top of that,
it could end up making multiple redundant calls into the API server
for the same pod.

Another issue was when the config was set to not delete executor
pods; with dynamic allocation, that means pods keep accumulating
in the API server, and every time the full sync is done by the
polling source, all executors, even the finished ones that Spark
technically does not care about anymore, would be processed.

The change modifies the lifecycle monitor so that it:

- logs executor updates a single time, even if it shows up in
  multiple snapshots, by checking whether the state change
  happened before.
- marks finished-but-not-deleted-in-k8s executors with a label
  so that they can be easily filtered out.

This reduces the amount of logging done by the lifecycle manager,
which is a minor thing in general since the logs are at debug level.
But it also reduces the amount of data that needs to be fetched
from the API server under certain configurations, and overall
reduces interaction with the API server when dynamic allocation is on.

There's also a change in the snapshot store to ensure that the
same subscriber is not called concurrently. That is kind of a bug,
since it means subscribers could be processing snapshots out of order,
or even that they could block multiple threads (e.g. the allocator
callback was synchronized). I actually ran into the "concurrent calls"
situation in the lifecycle manager during testing, and while it did not
seem to cause problems, it did make for some head scratching while
looking at the logs. It seemed safer to fix that.

Unit tests were updated to check for the changes. Also tested in real
cluster with dynamic allocation on.

Closes #26535 from vanzin/SPARK-29905.

Lead-authored-by: Marcelo Vanzin 
Co-authored-by: Marcelo Vanzin 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/deploy/k8s/Constants.scala|   1 +
 .../cluster/k8s/ExecutorPodsAllocator.scala|   2 +-
 .../cluster/k8s/ExecutorPodsLifecycleManager.scala | 156 ++---
 .../k8s/ExecutorPodsPollingSnapshotSource.scala|   1 +
 .../k8s/ExecutorPodsSnapshotsStoreImpl.scala   |  87 +---
 .../k8s/ExecutorPodsLifecycleManagerSuite.scala|  12 +-
 .../ExecutorPodsPollingSnapshotSourceSuite.scala   |   8 +-
 7 files changed, 188 insertions(+), 79 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
index a3c74ff7..759c205 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala
@@ -24,6 +24,7 @@ private[spark] object Constants {
   val SPARK_ROLE_LABEL = "spark-role"
   val SPARK_POD_DRIVER_ROLE = "driver"
   val SPARK_POD_EXECUTOR_ROLE = "executor"
+  val SPARK_EXECUTOR_INACTIVE_LABEL = "spark-exec-inactive"
 
   // Credentials secrets
   val DRIVER_CREDENTIALS_SECRETS_BASE_DIR =
diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
index b394f35..b6ea1fa 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
@@ -94,7 +94,7 @@ private[spark] class ExecutorPodsAllocator(
 
   private def onNewSnapshots(
   applicationId: String,
-  snapshots: Seq[ExecutorPodsSnapshot]): Unit = synchronized {
+  snapshots: 

[spark] branch branch-3.0 updated: [SPARK-31462][INFRA] The usage of getopts and case statement is wrong in do-release.sh

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9690d9f  [SPARK-31462][INFRA] The usage of getopts and case statement 
is wrong in do-release.sh
9690d9f is described below

commit 9690d9feb090f6b72fc36d120f1baf3dc3de367e
Author: Kousuke Saruta 
AuthorDate: Thu Apr 16 12:54:10 2020 -0700

[SPARK-31462][INFRA] The usage of getopts and case statement is wrong in 
do-release.sh

### What changes were proposed in this pull request?

This PR (SPARK-31462) fixes the usage of getopts and case statement in 
`do-release.sh` and `do-release-docker.sh`.

### Why are the changes needed?

In the current master, do-release.sh contains the following code.
```
while getopts "bn" opt; do
  case $opt in
b) GIT_BRANCH=$OPTARG ;;
n) DRY_RUN=1 ;;
?) error "Invalid option: $OPTARG" ;;
  esac
done
```
There are 3 wrong usage in getopts and case statement.
1. To set  $OPTARG to an argument passed for the option "b", the parameter 
for getopts should be "b:".
2. To set $OPTARG to the invalid option name passed, the parameter for 
getopts starts with ":".
3. It's minor but to match the character "?", it's better to escape like 
"\\?".

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I checked that $GIT_BRANCH is set when do-release.sh is launched with -b 
option.
I also checked that the error message contains invalid option name when 
do-release.sh is launched with an invalid option.

Closes #28234 from sarutak/fix-do-release.

Authored-by: Kousuke Saruta 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 2921be7a4e17b45591c25873a55fb286599b0fd7)
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/do-release-docker.sh | 4 ++--
 dev/create-release/do-release.sh| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/create-release/do-release-docker.sh 
b/dev/create-release/do-release-docker.sh
index 4a003d7..2f794c0 100755
--- a/dev/create-release/do-release-docker.sh
+++ b/dev/create-release/do-release-docker.sh
@@ -54,7 +54,7 @@ WORKDIR=
 IMGTAG=latest
 JAVA=
 RELEASE_STEP=
-while getopts "d:hj:ns:t:" opt; do
+while getopts ":d:hj:ns:t:" opt; do
   case $opt in
 d) WORKDIR="$OPTARG" ;;
 n) DRY_RUN=1 ;;
@@ -62,7 +62,7 @@ while getopts "d:hj:ns:t:" opt; do
 j) JAVA="$OPTARG" ;;
 s) RELEASE_STEP="$OPTARG" ;;
 h) usage ;;
-?) error "Invalid option. Run with -h for help." ;;
+\?) error "Invalid option. Run with -h for help." ;;
   esac
 done
 
diff --git a/dev/create-release/do-release.sh b/dev/create-release/do-release.sh
index f1d4f3a..4f18a55 100755
--- a/dev/create-release/do-release.sh
+++ b/dev/create-release/do-release.sh
@@ -20,11 +20,11 @@
 SELF=$(cd $(dirname $0) && pwd)
 . "$SELF/release-util.sh"
 
-while getopts "bn" opt; do
+while getopts ":b:n" opt; do
   case $opt in
 b) GIT_BRANCH=$OPTARG ;;
 n) DRY_RUN=1 ;;
-?) error "Invalid option: $OPTARG" ;;
+\?) error "Invalid option: $OPTARG" ;;
   esac
 done
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (df27350 -> 2921be7)

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from df27350  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'
 add 2921be7  [SPARK-31462][INFRA] The usage of getopts and case statement 
is wrong in do-release.sh

No new revisions were added by this update.

Summary of changes:
 dev/create-release/do-release-docker.sh | 4 ++--
 dev/create-release/do-release.sh| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-31462][INFRA] The usage of getopts and case statement is wrong in do-release.sh

2020-04-16 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2921be7  [SPARK-31462][INFRA] The usage of getopts and case statement 
is wrong in do-release.sh
2921be7 is described below

commit 2921be7a4e17b45591c25873a55fb286599b0fd7
Author: Kousuke Saruta 
AuthorDate: Thu Apr 16 12:54:10 2020 -0700

[SPARK-31462][INFRA] The usage of getopts and case statement is wrong in 
do-release.sh

### What changes were proposed in this pull request?

This PR (SPARK-31462) fixes the usage of getopts and case statement in 
`do-release.sh` and `do-release-docker.sh`.

### Why are the changes needed?

In the current master, do-release.sh contains the following code.
```
while getopts "bn" opt; do
  case $opt in
b) GIT_BRANCH=$OPTARG ;;
n) DRY_RUN=1 ;;
?) error "Invalid option: $OPTARG" ;;
  esac
done
```
There are 3 wrong usage in getopts and case statement.
1. To set  $OPTARG to an argument passed for the option "b", the parameter 
for getopts should be "b:".
2. To set $OPTARG to the invalid option name passed, the parameter for 
getopts starts with ":".
3. It's minor but to match the character "?", it's better to escape like 
"\\?".

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I checked that $GIT_BRANCH is set when do-release.sh is launched with -b 
option.
I also checked that the error message contains invalid option name when 
do-release.sh is launched with an invalid option.

Closes #28234 from sarutak/fix-do-release.

Authored-by: Kousuke Saruta 
Signed-off-by: Dongjoon Hyun 
---
 dev/create-release/do-release-docker.sh | 4 ++--
 dev/create-release/do-release.sh| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/create-release/do-release-docker.sh 
b/dev/create-release/do-release-docker.sh
index 4a003d7..2f794c0 100755
--- a/dev/create-release/do-release-docker.sh
+++ b/dev/create-release/do-release-docker.sh
@@ -54,7 +54,7 @@ WORKDIR=
 IMGTAG=latest
 JAVA=
 RELEASE_STEP=
-while getopts "d:hj:ns:t:" opt; do
+while getopts ":d:hj:ns:t:" opt; do
   case $opt in
 d) WORKDIR="$OPTARG" ;;
 n) DRY_RUN=1 ;;
@@ -62,7 +62,7 @@ while getopts "d:hj:ns:t:" opt; do
 j) JAVA="$OPTARG" ;;
 s) RELEASE_STEP="$OPTARG" ;;
 h) usage ;;
-?) error "Invalid option. Run with -h for help." ;;
+\?) error "Invalid option. Run with -h for help." ;;
   esac
 done
 
diff --git a/dev/create-release/do-release.sh b/dev/create-release/do-release.sh
index f1d4f3a..4f18a55 100755
--- a/dev/create-release/do-release.sh
+++ b/dev/create-release/do-release.sh
@@ -20,11 +20,11 @@
 SELF=$(cd $(dirname $0) && pwd)
 . "$SELF/release-util.sh"
 
-while getopts "bn" opt; do
+while getopts ":b:n" opt; do
   case $opt in
 b) GIT_BRANCH=$OPTARG ;;
 n) DRY_RUN=1 ;;
-?) error "Invalid option: $OPTARG" ;;
+\?) error "Invalid option: $OPTARG" ;;
   esac
 done
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ea75c15  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'
ea75c15 is described below

commit ea75c15edc67cd369503f1f442cd15d9817c03ab
Author: Kousuke Saruta 
AuthorDate: Fri Apr 17 02:31:08 2020 +0900

[SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'



### What changes were proposed in this pull request?

This change explicitly set locale of timeline view to 'en' to be the same 
appearance as before upgrading vis-timeline.

### Why are the changes needed?

We upgraded vis-timeline in #28192 and the upgraded version is different 
from before we used in the notation of dates.
The notation seems to be dependent on locale. The following is appearance 
in my Japanese environment.
https://user-images.githubusercontent.com/4736016/79265314-de886700-7ed0-11ea-8641-fa76b993c0d9.png;>

Although the notation is in Japanese, the default format is a little bit 
unnatural (e.g. 4月9日 05:39 is natural rather than 9 四月 05:39).

I found we can get the same appearance as before by explicitly set locale 
to 'en'.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I visited JobsPage, JobPage and StagePage and confirm that timeline view 
shows dates with 'en' locale.
https://user-images.githubusercontent.com/4736016/79267107-8bfc7a00-7ed3-11ea-8a25-f6681d04a83c.png;>

NOTE: #28192 will be backported to branch-2.4 and branch-3.0 so this PR 
should be follow #28214 and #28213 .

Closes #28218 from sarutak/fix-locale-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit df27350142d81a3e8941939870bfc0ab50e37a43)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js 
b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
index a63ee86..5be8cff 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
@@ -28,6 +28,7 @@ function drawApplicationTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -110,6 +111,7 @@ function drawJobTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -194,6 +196,7 @@ function drawTaskAssignmentTimeline(groupArray, 
eventObjArray, minLaunchTime, ma
 start: minLaunchTime,
 end: maxFinishTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ea75c15  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'
ea75c15 is described below

commit ea75c15edc67cd369503f1f442cd15d9817c03ab
Author: Kousuke Saruta 
AuthorDate: Fri Apr 17 02:31:08 2020 +0900

[SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'



### What changes were proposed in this pull request?

This change explicitly set locale of timeline view to 'en' to be the same 
appearance as before upgrading vis-timeline.

### Why are the changes needed?

We upgraded vis-timeline in #28192 and the upgraded version is different 
from before we used in the notation of dates.
The notation seems to be dependent on locale. The following is appearance 
in my Japanese environment.
https://user-images.githubusercontent.com/4736016/79265314-de886700-7ed0-11ea-8641-fa76b993c0d9.png;>

Although the notation is in Japanese, the default format is a little bit 
unnatural (e.g. 4月9日 05:39 is natural rather than 9 四月 05:39).

I found we can get the same appearance as before by explicitly set locale 
to 'en'.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I visited JobsPage, JobPage and StagePage and confirm that timeline view 
shows dates with 'en' locale.
https://user-images.githubusercontent.com/4736016/79267107-8bfc7a00-7ed3-11ea-8a25-f6681d04a83c.png;>

NOTE: #28192 will be backported to branch-2.4 and branch-3.0 so this PR 
should be follow #28214 and #28213 .

Closes #28218 from sarutak/fix-locale-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit df27350142d81a3e8941939870bfc0ab50e37a43)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js 
b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
index a63ee86..5be8cff 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
@@ -28,6 +28,7 @@ function drawApplicationTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -110,6 +111,7 @@ function drawJobTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -194,6 +196,7 @@ function drawTaskAssignmentTimeline(groupArray, 
eventObjArray, minLaunchTime, ma
 start: minLaunchTime,
 end: maxFinishTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new bc6f673  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'
bc6f673 is described below

commit bc6f673da4b36616614c23d8fa69a8e420b966c2
Author: Kousuke Saruta 
AuthorDate: Fri Apr 17 02:31:08 2020 +0900

[SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'



### What changes were proposed in this pull request?

This change explicitly set locale of timeline view to 'en' to be the same 
appearance as before upgrading vis-timeline.

### Why are the changes needed?

We upgraded vis-timeline in #28192 and the upgraded version is different 
from before we used in the notation of dates.
The notation seems to be dependent on locale. The following is appearance 
in my Japanese environment.
https://user-images.githubusercontent.com/4736016/79265314-de886700-7ed0-11ea-8641-fa76b993c0d9.png;>

Although the notation is in Japanese, the default format is a little bit 
unnatural (e.g. 4月9日 05:39 is natural rather than 9 四月 05:39).

I found we can get the same appearance as before by explicitly set locale 
to 'en'.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I visited JobsPage, JobPage and StagePage and confirm that timeline view 
shows dates with 'en' locale.
https://user-images.githubusercontent.com/4736016/79267107-8bfc7a00-7ed3-11ea-8a25-f6681d04a83c.png;>

NOTE: #28192 will be backported to branch-2.4 and branch-3.0 so this PR 
should be follow #28214 and #28213 .

Closes #28218 from sarutak/fix-locale-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit df27350142d81a3e8941939870bfc0ab50e37a43)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js 
b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
index a63ee86..5be8cff 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
@@ -28,6 +28,7 @@ function drawApplicationTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -110,6 +111,7 @@ function drawJobTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -194,6 +196,7 @@ function drawTaskAssignmentTimeline(groupArray, 
eventObjArray, minLaunchTime, ma
 start: minLaunchTime,
 end: maxFinishTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new ea75c15  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'
ea75c15 is described below

commit ea75c15edc67cd369503f1f442cd15d9817c03ab
Author: Kousuke Saruta 
AuthorDate: Fri Apr 17 02:31:08 2020 +0900

[SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'



### What changes were proposed in this pull request?

This change explicitly set locale of timeline view to 'en' to be the same 
appearance as before upgrading vis-timeline.

### Why are the changes needed?

We upgraded vis-timeline in #28192 and the upgraded version is different 
from before we used in the notation of dates.
The notation seems to be dependent on locale. The following is appearance 
in my Japanese environment.
https://user-images.githubusercontent.com/4736016/79265314-de886700-7ed0-11ea-8641-fa76b993c0d9.png;>

Although the notation is in Japanese, the default format is a little bit 
unnatural (e.g. 4月9日 05:39 is natural rather than 9 四月 05:39).

I found we can get the same appearance as before by explicitly set locale 
to 'en'.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I visited JobsPage, JobPage and StagePage and confirm that timeline view 
shows dates with 'en' locale.
https://user-images.githubusercontent.com/4736016/79267107-8bfc7a00-7ed3-11ea-8a25-f6681d04a83c.png;>

NOTE: #28192 will be backported to branch-2.4 and branch-3.0 so this PR 
should be follow #28214 and #28213 .

Closes #28218 from sarutak/fix-locale-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit df27350142d81a3e8941939870bfc0ab50e37a43)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js 
b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
index a63ee86..5be8cff 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
@@ -28,6 +28,7 @@ function drawApplicationTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -110,6 +111,7 @@ function drawJobTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -194,6 +196,7 @@ function drawTaskAssignmentTimeline(groupArray, 
eventObjArray, minLaunchTime, ma
 start: minLaunchTime,
 end: maxFinishTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new bc6f673  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'
bc6f673 is described below

commit bc6f673da4b36616614c23d8fa69a8e420b966c2
Author: Kousuke Saruta 
AuthorDate: Fri Apr 17 02:31:08 2020 +0900

[SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 'en'



### What changes were proposed in this pull request?

This change explicitly set locale of timeline view to 'en' to be the same 
appearance as before upgrading vis-timeline.

### Why are the changes needed?

We upgraded vis-timeline in #28192 and the upgraded version is different 
from before we used in the notation of dates.
The notation seems to be dependent on locale. The following is appearance 
in my Japanese environment.
https://user-images.githubusercontent.com/4736016/79265314-de886700-7ed0-11ea-8641-fa76b993c0d9.png;>

Although the notation is in Japanese, the default format is a little bit 
unnatural (e.g. 4月9日 05:39 is natural rather than 9 四月 05:39).

I found we can get the same appearance as before by explicitly set locale 
to 'en'.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

I visited JobsPage, JobPage and StagePage and confirm that timeline view 
shows dates with 'en' locale.
https://user-images.githubusercontent.com/4736016/79267107-8bfc7a00-7ed3-11ea-8a25-f6681d04a83c.png;>

NOTE: #28192 will be backported to branch-2.4 and branch-3.0 so this PR 
should be follow #28214 and #28213 .

Closes #28218 from sarutak/fix-locale-issue.

Authored-by: Kousuke Saruta 
Signed-off-by: Kousuke Saruta 
(cherry picked from commit df27350142d81a3e8941939870bfc0ab50e37a43)
Signed-off-by: Kousuke Saruta 
---
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js 
b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
index a63ee86..5be8cff 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/timeline-view.js
@@ -28,6 +28,7 @@ function drawApplicationTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -110,6 +111,7 @@ function drawJobTimeline(groupArray, eventObjArray, 
startTime, offset) {
 showCurrentTime: false,
 start: startTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }
@@ -194,6 +196,7 @@ function drawTaskAssignmentTimeline(groupArray, 
eventObjArray, minLaunchTime, ma
 start: minLaunchTime,
 end: maxFinishTime,
 zoomable: false,
+locale: "en",
 moment: function (date) {
   return vis.moment(date).utcOffset(offset);
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7ad6ba3 -> df27350)

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7ad6ba3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec
 add df27350  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7ad6ba3 -> df27350)

2020-04-16 Thread sarutak
This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7ad6ba3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec
 add df27350  [SPARK-31420][WEBUI][FOLLOWUP] Make locale of timeline-view 
'en'

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/org/apache/spark/ui/static/timeline-view.js | 3 +++
 1 file changed, 3 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.0 updated: [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 020f3a3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec
020f3a3 is described below

commit 020f3a33dd711d05337bb42d5f65708a4aec2daa
Author: Peter Toth 
AuthorDate: Thu Apr 16 17:52:22 2020 +0900

[SPARK-30564][SQL] Revert Block.length and use comment placeholders in 
HashAggregateExec

### What changes were proposed in this pull request?
SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused 
significant performance regression in cases where the source code size is 
fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting 
the code. The change in `length` makes sense as the comment and extra new lines 
shouldn't be taken into account when deciding on splitting, but the regular 
expression based approach is very slow and adds a big relative overhead to 
cases where the execution is  [...]
This PR:
- restores `Block.length` to its original form
- places comments in `HashAggragateExec` with 
`CodegenContext.registerComment` so as to appear only when comments are enabled 
(`spark.sql.codegen.comments=true`)

Before this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)  1137   1143  
 8  0.1   11368.3   0.0X
```

After this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)   167180  
 7  0.61674.3   0.1X
```
### Why are the changes needed?
To fix performance regression.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UTs.

Closes #28083 from peter-toth/SPARK-30564-use-comment-placeholders.

Authored-by: Peter Toth 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit 7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9)

Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
index dff2589..1c59c3c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
@@ -143,10 +143,10 @@ trait Block extends TreeNode[Block] with JavaCode {
 case _ => code.trim
   }
 
-  def length: Int = {
-// Returns a code length without comments
-CodeFormatter.stripExtraNewLinesAndComments(toString).length
-  }
+  // We could remove comments, extra whitespaces and newlines when calculating 
length as it is used
+  // only for codegen method splitting, but SPARK-30564 showed that this is a 
performance critical
+  // function so we decided not to do so.
+  def length: Int = toString.length
 
   def isEmpty: Boolean = toString.isEmpty
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index 7a26fd7..87a4081 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -367,10 +367,10 @@ case class HashAggregateExec(
  """.stripMargin
   }
   code"""
- |// do aggregate for ${aggNames(i)}
- |// evaluate aggregate function
+ |${ctx.registerComment(s"do aggregate for ${aggNames(i)}")}
+ |${ctx.registerComment("evaluate aggregate function")}
  |${evaluateVariables(bufferEvalsForOneFunc)}
- |// update aggregation buffers
+ |${ctx.registerComment("update aggregation buffers")}
  |${updates.mkString("\n").trim}
""".stripMargin
 }
@@ -975,9 +975,9 @@ case class HashAggregateExec(
   

[spark] branch branch-3.0 updated: [SPARK-30564][SQL] Revert Block.length and use comment placeholders in HashAggregateExec

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 020f3a3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec
020f3a3 is described below

commit 020f3a33dd711d05337bb42d5f65708a4aec2daa
Author: Peter Toth 
AuthorDate: Thu Apr 16 17:52:22 2020 +0900

[SPARK-30564][SQL] Revert Block.length and use comment placeholders in 
HashAggregateExec

### What changes were proposed in this pull request?
SPARK-21870 (cb0cddf#diff-06dc5de6163687b7810aa76e7e152a76R146-R149) caused 
significant performance regression in cases where the source code size is 
fairly large as `HashAggregateExec` uses `Block.length` to decide on splitting 
the code. The change in `length` makes sense as the comment and extra new lines 
shouldn't be taken into account when deciding on splitting, but the regular 
expression based approach is very slow and adds a big relative overhead to 
cases where the execution is  [...]
This PR:
- restores `Block.length` to its original form
- places comments in `HashAggragateExec` with 
`CodegenContext.registerComment` so as to appear only when comments are enabled 
(`spark.sql.codegen.comments=true`)

Before this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)  1137   1143  
 8  0.1   11368.3   0.0X
```

After this PR:
```
deeply nested struct field r/w:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative


250 deep x 400 rows (read in-mem)   167180  
 7  0.61674.3   0.1X
```
### Why are the changes needed?
To fix performance regression.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing UTs.

Closes #28083 from peter-toth/SPARK-30564-use-comment-placeholders.

Authored-by: Peter Toth 
Signed-off-by: Takeshi Yamamuro 

(cherry picked from commit 7ad6ba36f28b7a5ca548950dec6afcd61e5d68b9)

Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
index dff2589..1c59c3c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala
@@ -143,10 +143,10 @@ trait Block extends TreeNode[Block] with JavaCode {
 case _ => code.trim
   }
 
-  def length: Int = {
-// Returns a code length without comments
-CodeFormatter.stripExtraNewLinesAndComments(toString).length
-  }
+  // We could remove comments, extra whitespaces and newlines when calculating 
length as it is used
+  // only for codegen method splitting, but SPARK-30564 showed that this is a 
performance critical
+  // function so we decided not to do so.
+  def length: Int = toString.length
 
   def isEmpty: Boolean = toString.isEmpty
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index 7a26fd7..87a4081 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -367,10 +367,10 @@ case class HashAggregateExec(
  """.stripMargin
   }
   code"""
- |// do aggregate for ${aggNames(i)}
- |// evaluate aggregate function
+ |${ctx.registerComment(s"do aggregate for ${aggNames(i)}")}
+ |${ctx.registerComment("evaluate aggregate function")}
  |${evaluateVariables(bufferEvalsForOneFunc)}
- |// update aggregation buffers
+ |${ctx.registerComment("update aggregation buffers")}
  |${updates.mkString("\n").trim}
""".stripMargin
 }
@@ -975,9 +975,9 @@ case class HashAggregateExec(
   

[spark] branch master updated (c76c31e -> 7ad6ba3)

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c76c31e  [SPARK-31455][SQL] Fix rebasing of not-existed timestamps
 add 7ad6ba3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (c76c31e -> 7ad6ba3)

2020-04-16 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c76c31e  [SPARK-31455][SQL] Fix rebasing of not-existed timestamps
 add 7ad6ba3  [SPARK-30564][SQL] Revert Block.length and use comment 
placeholders in HashAggregateExec

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/codegen/javaCode.scala  |  8 
 .../spark/sql/execution/aggregate/HashAggregateExec.scala  | 14 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org