date:20201008

[spark] branch master updated (a907729 -> 3beab8d)

2020-10-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString
 add 3beab8d  [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark 
annotations and SparkR

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE  | 2 ++
 R/pkg/R/functions.R  | 6 --
 python/pyspark/sql/functions.pyi | 4 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a907729 -> 3beab8d)

2020-10-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString
 add 3beab8d  [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark 
annotations and SparkR

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE  | 2 ++
 R/pkg/R/functions.R  | 6 --
 python/pyspark/sql/functions.pyi | 4 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a907729 -> 3beab8d)

2020-10-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString
 add 3beab8d  [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark 
annotations and SparkR

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE  | 2 ++
 R/pkg/R/functions.R  | 6 --
 python/pyspark/sql/functions.pyi | 4 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a907729 -> 3beab8d)

2020-10-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString
 add 3beab8d  [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark 
annotations and SparkR

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE  | 2 ++
 R/pkg/R/functions.R  | 6 --
 python/pyspark/sql/functions.pyi | 4 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a907729 -> 3beab8d)

2020-10-08 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString
 add 3beab8d  [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark 
annotations and SparkR

No new revisions were added by this update.

Summary of changes:
 R/pkg/NAMESPACE  | 2 ++
 R/pkg/R/functions.R  | 6 --
 python/pyspark/sql/functions.pyi | 4 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c5f6af9 -> a907729)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system
 add a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/unresolved.scala   |  5 +++-
 .../resources/sql-tests/inputs/explain-aqe.sql |  1 +
 .../test/resources/sql-tests/inputs/explain.sql|  6 
 .../sql-tests/results/explain-aqe.sql.out  | 33 ++
 .../resources/sql-tests/results/explain.sql.out| 33 ++
 5 files changed, 77 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c5f6af9 -> a907729)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system
 add a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/unresolved.scala   |  5 +++-
 .../resources/sql-tests/inputs/explain-aqe.sql |  1 +
 .../test/resources/sql-tests/inputs/explain.sql|  6 
 .../sql-tests/results/explain-aqe.sql.out  | 33 ++
 .../resources/sql-tests/results/explain.sql.out| 33 ++
 5 files changed, 77 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c5f6af9 -> a907729)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system
 add a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/unresolved.scala   |  5 +++-
 .../resources/sql-tests/inputs/explain-aqe.sql |  1 +
 .../test/resources/sql-tests/inputs/explain.sql|  6 
 .../sql-tests/results/explain-aqe.sql.out  | 33 ++
 .../resources/sql-tests/results/explain.sql.out| 33 ++
 5 files changed, 77 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c5f6af9 -> a907729)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system
 add a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/unresolved.scala   |  5 +++-
 .../resources/sql-tests/inputs/explain-aqe.sql |  1 +
 .../test/resources/sql-tests/inputs/explain.sql|  6 
 .../sql-tests/results/explain-aqe.sql.out  | 33 ++
 .../resources/sql-tests/results/explain.sql.out| 33 ++
 5 files changed, 77 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c5f6af9 -> a907729)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system
 add a907729  [SPARK-32743][SQL] Add distinct info at UnresolvedFunction 
toString

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/unresolved.scala   |  5 +++-
 .../resources/sql-tests/inputs/explain-aqe.sql |  1 +
 .../test/resources/sql-tests/inputs/explain.sql|  6 
 .../sql-tests/results/explain-aqe.sql.out  | 33 ++
 .../resources/sql-tests/results/explain.sql.out| 33 ++
 5 files changed, 77 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4987db8 -> c5f6af9)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
 add c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcUtils.scala  |  6 +++---
 .../sql/execution/datasources/orc/OrcSourceSuite.scala  | 17 -
 .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala   |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4987db8 -> c5f6af9)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
 add c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcUtils.scala  |  6 +++---
 .../sql/execution/datasources/orc/OrcSourceSuite.scala  | 17 -
 .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala   |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
 add 3e28f49  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (782ab8e -> c1b660e)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add c1b660e  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4987db8 -> c5f6af9)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
 add c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcUtils.scala  |  6 +++---
 .../sql/execution/datasources/orc/OrcSourceSuite.scala  | 17 -
 .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala   |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
 add 3e28f49  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (782ab8e -> c1b660e)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add c1b660e  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4987db8 -> c5f6af9)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
 add c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcUtils.scala  |  6 +++---
 .../sql/execution/datasources/orc/OrcSourceSuite.scala  | 17 -
 .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala   |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
 add 3e28f49  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1b660e  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
c1b660e is described below

commit c1b660efa31fdb79df058c847244791c9bec90ff
Author: Dongjoon Hyun 
AuthorDate: Thu Oct 8 11:50:53 2020 -0700

[SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

### What changes were proposed in this pull request?

This PR aims to use `LinkedHashMap` instead of `Map` for 
`newlyCreatedExecutors`.

### Why are the changes needed?

This makes log messages (INFO/DEBUG) more readable. This is helpful when 
`spark.kubernetes.allocation.batch.size` is large and especially when K8s 
dynamic allocation is used.

**BEFORE**
```
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod 
requests (5,10,6,9,2,7,3,8,4).
```

**AFTER**
```
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod 
requests (2,3,4,5,6,7,8,9,10).
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"`

Closes #29979 from dongjoon-hyun/SPARK-K8S-LOG.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 4987db8c88b49a0c0d8503b6291455e92e114efa)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
index b394f35..66cba55 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
+++

[spark] branch master updated (4a47b3e -> 4987db8)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion
 add 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a47b3e -> 4987db8)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion
 add 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4987db8 -> c5f6af9)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
 add c5f6af9  [SPARK-33094][SQL] Make ORC format propagate Hadoop config 
from DS options to underlying HDFS file system

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/orc/OrcUtils.scala  |  6 +++---
 .../sql/execution/datasources/orc/OrcSourceSuite.scala  | 17 -
 .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala   |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
 add 3e28f49  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1b660e  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
c1b660e is described below

commit c1b660efa31fdb79df058c847244791c9bec90ff
Author: Dongjoon Hyun 
AuthorDate: Thu Oct 8 11:50:53 2020 -0700

[SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

### What changes were proposed in this pull request?

This PR aims to use `LinkedHashMap` instead of `Map` for 
`newlyCreatedExecutors`.

### Why are the changes needed?

This makes log messages (INFO/DEBUG) more readable. This is helpful when 
`spark.kubernetes.allocation.batch.size` is large and especially when K8s 
dynamic allocation is used.

**BEFORE**
```
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod 
requests (5,10,6,9,2,7,3,8,4).
```

**AFTER**
```
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod 
requests (2,3,4,5,6,7,8,9,10).
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"`

Closes #29979 from dongjoon-hyun/SPARK-K8S-LOG.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 4987db8c88b49a0c0d8503b6291455e92e114efa)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
index b394f35..66cba55 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
+++

[spark] branch master updated (4a47b3e -> 4987db8)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion
 add 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
 add 3e28f49  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new c1b660e  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors
c1b660e is described below

commit c1b660efa31fdb79df058c847244791c9bec90ff
Author: Dongjoon Hyun 
AuthorDate: Thu Oct 8 11:50:53 2020 -0700

[SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

### What changes were proposed in this pull request?

This PR aims to use `LinkedHashMap` instead of `Map` for 
`newlyCreatedExecutors`.

### Why are the changes needed?

This makes log messages (INFO/DEBUG) more readable. This is helpful when 
`spark.kubernetes.allocation.batch.size` is large and especially when K8s 
dynamic allocation is used.

**BEFORE**
```
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod 
requests (5,10,6,9,2,7,3,8,4).
```

**AFTER**
```
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not 
found in the Kubernetes cluster since it was created 0 milliseconds ago.
20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod 
requests (2,3,4,5,6,7,8,9,10).
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"`

Closes #29979 from dongjoon-hyun/SPARK-K8S-LOG.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 4987db8c88b49a0c0d8503b6291455e92e114efa)
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
index b394f35..66cba55 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
+++

[spark] branch master updated (4a47b3e -> 4987db8)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion
 add 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a47b3e -> 4987db8)

2020-10-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion
 add 4987db8  [SPARK-33096][K8S] Use LinkedHashMap instead of Map for 
newlyCreatedExecutors

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5effa8e -> 4a47b3e)

2020-10-08 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion

No new revisions were added by this update.

Summary of changes:
 docs/submitting-applications.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5effa8e -> 4a47b3e)

2020-10-08 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion

No new revisions were added by this update.

Summary of changes:
 docs/submitting-applications.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5effa8e -> 4a47b3e)

2020-10-08 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion

No new revisions were added by this update.

Summary of changes:
 docs/submitting-applications.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5effa8e -> 4a47b3e)

2020-10-08 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion

No new revisions were added by this update.

Summary of changes:
 docs/submitting-applications.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5effa8e -> 4a47b3e)

2020-10-08 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
 add 4a47b3e  [DOC][MINOR] pySpark usage - removed repeated keyword causing 
confusion

No new revisions were added by this update.

Summary of changes:
 docs/submitting-applications.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter

2020-10-08 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
45a8b89 is described below

commit 45a8b892455daaad34d97e37356dee85256d316d
Author: Tom van Bussel 
AuthorDate: Thu Oct 8 09:58:01 2020 +0200

[SPARK-32901][CORE][2.4] Do not allocate memory while spilling 
UnsafeExternalSorter

Backport of #29785 to Spark 2.4

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 51 ++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 --
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 -
 .../apache/spark/memory/TestMemoryManager.scala| 12 ++-
 5 files changed, 143 insertions(+), 100 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index f720ccd..9552e79 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to ensure that we can always 
process at least

[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter

2020-10-08 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
45a8b89 is described below

commit 45a8b892455daaad34d97e37356dee85256d316d
Author: Tom van Bussel 
AuthorDate: Thu Oct 8 09:58:01 2020 +0200

[SPARK-32901][CORE][2.4] Do not allocate memory while spilling 
UnsafeExternalSorter

Backport of #29785 to Spark 2.4

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 51 ++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 --
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 -
 .../apache/spark/memory/TestMemoryManager.scala| 12 ++-
 5 files changed, 143 insertions(+), 100 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index f720ccd..9552e79 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to ensure that we can always 
process at least

[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter

2020-10-08 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
45a8b89 is described below

commit 45a8b892455daaad34d97e37356dee85256d316d
Author: Tom van Bussel 
AuthorDate: Thu Oct 8 09:58:01 2020 +0200

[SPARK-32901][CORE][2.4] Do not allocate memory while spilling 
UnsafeExternalSorter

Backport of #29785 to Spark 2.4

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 51 ++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 --
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 -
 .../apache/spark/memory/TestMemoryManager.scala| 12 ++-
 5 files changed, 143 insertions(+), 100 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index f720ccd..9552e79 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to ensure that we can always 
process at least

[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter

2020-10-08 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
45a8b89 is described below

commit 45a8b892455daaad34d97e37356dee85256d316d
Author: Tom van Bussel 
AuthorDate: Thu Oct 8 09:58:01 2020 +0200

[SPARK-32901][CORE][2.4] Do not allocate memory while spilling 
UnsafeExternalSorter

Backport of #29785 to Spark 2.4

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 51 ++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 --
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 -
 .../apache/spark/memory/TestMemoryManager.scala| 12 ++-
 5 files changed, 143 insertions(+), 100 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index f720ccd..9552e79 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to ensure that we can always 
process at least

[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter

2020-10-08 Thread hvanhovell

This is an automated email from the ASF dual-hosted git repository.

hvanhovell pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 45a8b89  [SPARK-32901][CORE][2.4] Do not allocate memory while 
spilling UnsafeExternalSorter
45a8b89 is described below

commit 45a8b892455daaad34d97e37356dee85256d316d
Author: Tom van Bussel 
AuthorDate: Thu Oct 8 09:58:01 2020 +0200

[SPARK-32901][CORE][2.4] Do not allocate memory while spilling 
UnsafeExternalSorter

Backport of #29785 to Spark 2.4

### What changes were proposed in this pull request?

This PR changes `UnsafeExternalSorter` to no longer allocate any memory 
while spilling. In particular it removes the allocation of a new pointer array 
in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever 
the next record is inserted into the sorter.

### Why are the changes needed?

Without this change the `UnsafeExternalSorter` could throw an OOM while 
spilling. The following sequence of events would have triggered an OOM:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array, and tries to 
allocate a new small pointer array.
5. `TaskMemoryManager` tries to allocate the memory backing the small array 
using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, 
as the `TaskMemoryManager` is still holding on to the memory it got for the new 
large array.
6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this 
time there is nothing to spill.
7. `UnsafeInMemorySorter` receives less memory than it requested, and 
causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to 
fail.

With the changes in the PR the following will happen instead:

1. `UnsafeExternalSorter` runs out of space in its pointer array and 
attempts to allocate a new large array to replace the old one.
2. `TaskMemoryManager` tries to allocate the memory backing the new large 
array using `MemoryManager`, but `MemoryManager` is only willing to return most 
but not all of the memory requested.
3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes 
`UnsafeExternalSorter` to spill the current run to disk, to free its record 
pages and to reset its `UnsafeInMemorySorter`.
4. `UnsafeInMemorySorter` frees the old pointer array.
5. `TaskMemoryManager` returns control to 
`UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the 
new large array or by throwing a `SparkOutOfMemoryError`).
6. `UnsafeExternalSorter` either frees the new large array or it ignores 
the `SparkOutOfMemoryError` depending on what happened in the previous step.
7. `UnsafeExternalSorter` successfully allocates a new small pointer array 
and operation continues as normal.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Tests were added in `UnsafeExternalSorterSuite` and 
`UnsafeInMemorySorterSuite`.

Closes #29910 from tomvanbussel/backport-SPARK-32901.

Authored-by: Tom van Bussel 
Signed-off-by: herman 
---
 .../unsafe/sort/UnsafeExternalSorter.java  | 96 --
 .../unsafe/sort/UnsafeInMemorySorter.java  | 51 ++--
 .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 --
 .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 -
 .../apache/spark/memory/TestMemoryManager.scala| 12 ++-
 5 files changed, 143 insertions(+), 100 deletions(-)

diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index f720ccd..9552e79 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends 
MemoryConsumer {
 }
 
 if (inMemSorter == null || inMemSorter.numRecords() <= 0) {
+  // There could still be some memory allocated when there are no records 
in the in-memory
+  // sorter. We will not spill it however, to ensure that we can always 
process at least

[spark] branch branch-3.0 updated (a7e4318 -> 782ab8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a7e4318  [SPARK-33089][SQL] make avro format propagate Hadoop config 
from DS options to underlying HDFS file system
 add 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (a7e4318 -> 782ab8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a7e4318  [SPARK-33089][SQL] make avro format propagate Hadoop config 
from DS options to underlying HDFS file system
 add 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d6e3fb -> 5effa8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7d6e3fb  [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 
Table Catalog
 add 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
782ab8e is described below

commit 782ab8e244252696c50b4b432d07a56c374b8680
Author: HyukjinKwon 
AuthorDate: Thu Oct 8 16:29:15 2020 +0900

[SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential 
side effect at callers of OrcUtils.readCatalystSchema

### What changes were proposed in this pull request?

This is a kind of a followup of SPARK-32646. New JIRA was filed to control 
the fixed versions properly.

When you use `map`, it might be lazily evaluated and not executed. To avoid 
this,  we should better use `foreach`. See also SPARK-16694. Current codes look 
not causing any bug for now but it should be best to fix to avoid potential 
issues.

### Why are the changes needed?

To avoid potential issues from `map` being lazy and not executed.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran related tests. CI in this PR should verify.

Closes #29974 from HyukjinKwon/SPARK-32646.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 5effa8ea261ba59214afedc2853d1b248b330ca6)
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 69badb4..c540007 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -185,7 +185,7 @@ class OrcFileFormat
   } else {
 // ORC predicate pushdown
 if (orcFilterPushDown) {
-  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+  OrcUtils.readCatalystSchema(filePath, conf, 
ignoreCorruptFiles).foreach { fileSchema =>
 OrcFilters.createFilter(fileSchema, filters).foreach { f =>
   OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
index 1f38128..b0ddee0 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
@@ -69,7 +69,7 @@ case class OrcPartitionReaderFactory(
 
   private def pushDownPredicates(filePath: Path, conf: Configuration): Unit = {
 if (orcFilterPushDown) {
-  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { 
fileSchema =>
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach 
{ fileSchema =>
 OrcFilters.createFilter(fileSchema, filters).foreach { f =>
   OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d6e3fb -> 5effa8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7d6e3fb  [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 
Table Catalog
 add 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
782ab8e is described below

commit 782ab8e244252696c50b4b432d07a56c374b8680
Author: HyukjinKwon 
AuthorDate: Thu Oct 8 16:29:15 2020 +0900

[SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential 
side effect at callers of OrcUtils.readCatalystSchema

### What changes were proposed in this pull request?

This is a kind of a followup of SPARK-32646. New JIRA was filed to control 
the fixed versions properly.

When you use `map`, it might be lazily evaluated and not executed. To avoid 
this,  we should better use `foreach`. See also SPARK-16694. Current codes look 
not causing any bug for now but it should be best to fix to avoid potential 
issues.

### Why are the changes needed?

To avoid potential issues from `map` being lazy and not executed.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran related tests. CI in this PR should verify.

Closes #29974 from HyukjinKwon/SPARK-32646.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 5effa8ea261ba59214afedc2853d1b248b330ca6)
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 69badb4..c540007 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -185,7 +185,7 @@ class OrcFileFormat
   } else {
 // ORC predicate pushdown
 if (orcFilterPushDown) {
-  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+  OrcUtils.readCatalystSchema(filePath, conf, 
ignoreCorruptFiles).foreach { fileSchema =>
 OrcFilters.createFilter(fileSchema, filters).foreach { f =>
   OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
index 1f38128..b0ddee0 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
@@ -69,7 +69,7 @@ case class OrcPartitionReaderFactory(
 
   private def pushDownPredicates(filePath: Path, conf: Configuration): Unit = {
 if (orcFilterPushDown) {
-  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { 
fileSchema =>
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach 
{ fileSchema =>
 OrcFilters.createFilter(fileSchema, filters).foreach { f =>
   OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d6e3fb -> 5effa8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7d6e3fb  [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 
Table Catalog
 add 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 782ab8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema
782ab8e is described below

commit 782ab8e244252696c50b4b432d07a56c374b8680
Author: HyukjinKwon 
AuthorDate: Thu Oct 8 16:29:15 2020 +0900

[SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential 
side effect at callers of OrcUtils.readCatalystSchema

### What changes were proposed in this pull request?

This is a kind of a followup of SPARK-32646. New JIRA was filed to control 
the fixed versions properly.

When you use `map`, it might be lazily evaluated and not executed. To avoid 
this,  we should better use `foreach`. See also SPARK-16694. Current codes look 
not causing any bug for now but it should be best to fix to avoid potential 
issues.

### Why are the changes needed?

To avoid potential issues from `map` being lazy and not executed.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Ran related tests. CI in this PR should verify.

Closes #29974 from HyukjinKwon/SPARK-32646.

Authored-by: HyukjinKwon 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 5effa8ea261ba59214afedc2853d1b248b330ca6)
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
index 69badb4..c540007 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
@@ -185,7 +185,7 @@ class OrcFileFormat
   } else {
 // ORC predicate pushdown
 if (orcFilterPushDown) {
-  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map 
{ fileSchema =>
+  OrcUtils.readCatalystSchema(filePath, conf, 
ignoreCorruptFiles).foreach { fileSchema =>
 OrcFilters.createFilter(fileSchema, filters).foreach { f =>
   OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
index 1f38128..b0ddee0 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala
@@ -69,7 +69,7 @@ case class OrcPartitionReaderFactory(
 
   private def pushDownPredicates(filePath: Path, conf: Configuration): Unit = {
 if (orcFilterPushDown) {
-  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { 
fileSchema =>
+  OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach 
{ fileSchema =>
 OrcFilters.createFilter(fileSchema, filters).foreach { f =>
   OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d6e3fb -> 5effa8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7d6e3fb  [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 
Table Catalog
 add 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (7d6e3fb -> 5effa8e)

2020-10-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7d6e3fb  [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 
Table Catalog
 add 5effa8e  [SPARK-33091][SQL] Avoid using map instead of foreach to 
avoid potential side effect at callers of OrcUtils.readCatalystSchema

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala  | 2 +-
 .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

50 matches

Mail list logo