[spark] branch master updated (a907729 -> 3beab8d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString add 3beab8d [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark annotations and SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 2 ++ R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.pyi | 4 ++-- 3 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a907729 -> 3beab8d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString add 3beab8d [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark annotations and SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 2 ++ R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.pyi | 4 ++-- 3 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a907729 -> 3beab8d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString add 3beab8d [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark annotations and SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 2 ++ R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.pyi | 4 ++-- 3 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a907729 -> 3beab8d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString add 3beab8d [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark annotations and SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 2 ++ R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.pyi | 4 ++-- 3 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a907729 -> 3beab8d)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString add 3beab8d [SPARK-32793][FOLLOW-UP] Minor corrections for PySpark annotations and SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 2 ++ R/pkg/R/functions.R | 6 -- python/pyspark/sql/functions.pyi | 4 ++-- 3 files changed, 8 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c5f6af9 -> a907729)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system add a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/unresolved.scala | 5 +++- .../resources/sql-tests/inputs/explain-aqe.sql | 1 + .../test/resources/sql-tests/inputs/explain.sql| 6 .../sql-tests/results/explain-aqe.sql.out | 33 ++ .../resources/sql-tests/results/explain.sql.out| 33 ++ 5 files changed, 77 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c5f6af9 -> a907729)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system add a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/unresolved.scala | 5 +++- .../resources/sql-tests/inputs/explain-aqe.sql | 1 + .../test/resources/sql-tests/inputs/explain.sql| 6 .../sql-tests/results/explain-aqe.sql.out | 33 ++ .../resources/sql-tests/results/explain.sql.out| 33 ++ 5 files changed, 77 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c5f6af9 -> a907729)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system add a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/unresolved.scala | 5 +++- .../resources/sql-tests/inputs/explain-aqe.sql | 1 + .../test/resources/sql-tests/inputs/explain.sql| 6 .../sql-tests/results/explain-aqe.sql.out | 33 ++ .../resources/sql-tests/results/explain.sql.out| 33 ++ 5 files changed, 77 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c5f6af9 -> a907729)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system add a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/unresolved.scala | 5 +++- .../resources/sql-tests/inputs/explain-aqe.sql | 1 + .../test/resources/sql-tests/inputs/explain.sql| 6 .../sql-tests/results/explain-aqe.sql.out | 33 ++ .../resources/sql-tests/results/explain.sql.out| 33 ++ 5 files changed, 77 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (c5f6af9 -> a907729)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system add a907729 [SPARK-32743][SQL] Add distinct info at UnresolvedFunction toString No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/unresolved.scala | 5 +++- .../resources/sql-tests/inputs/explain-aqe.sql | 1 + .../test/resources/sql-tests/inputs/explain.sql| 6 .../sql-tests/results/explain-aqe.sql.out | 33 ++ .../resources/sql-tests/results/explain.sql.out| 33 ++ 5 files changed, 77 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4987db8 -> c5f6af9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors add c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcUtils.scala | 6 +++--- .../sql/execution/datasources/orc/OrcSourceSuite.scala | 17 - .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4987db8 -> c5f6af9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors add c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcUtils.scala | 6 +++--- .../sql/execution/datasources/orc/OrcSourceSuite.scala | 17 - .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter add 3e28f49 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (782ab8e -> c1b660e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add c1b660e [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4987db8 -> c5f6af9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors add c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcUtils.scala | 6 +++--- .../sql/execution/datasources/orc/OrcSourceSuite.scala | 17 - .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter add 3e28f49 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (782ab8e -> c1b660e)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add c1b660e [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4987db8 -> c5f6af9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors add c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcUtils.scala | 6 +++--- .../sql/execution/datasources/orc/OrcSourceSuite.scala | 17 - .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter add 3e28f49 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c1b660e [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors c1b660e is described below commit c1b660efa31fdb79df058c847244791c9bec90ff Author: Dongjoon Hyun AuthorDate: Thu Oct 8 11:50:53 2020 -0700 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors ### What changes were proposed in this pull request? This PR aims to use `LinkedHashMap` instead of `Map` for `newlyCreatedExecutors`. ### Why are the changes needed? This makes log messages (INFO/DEBUG) more readable. This is helpful when `spark.kubernetes.allocation.batch.size` is large and especially when K8s dynamic allocation is used. **BEFORE** ``` 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (5,10,6,9,2,7,3,8,4). ``` **AFTER** ``` 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (2,3,4,5,6,7,8,9,10). ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"` Closes #29979 from dongjoon-hyun/SPARK-K8S-LOG. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 4987db8c88b49a0c0d8503b6291455e92e114efa) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala index b394f35..66cba55 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala +++
[spark] branch master updated (4a47b3e -> 4987db8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion add 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a47b3e -> 4987db8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion add 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4987db8 -> c5f6af9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors add c5f6af9 [SPARK-33094][SQL] Make ORC format propagate Hadoop config from DS options to underlying HDFS file system No new revisions were added by this update. Summary of changes: .../spark/sql/execution/datasources/orc/OrcUtils.scala | 6 +++--- .../sql/execution/datasources/orc/OrcSourceSuite.scala | 17 - .../org/apache/spark/sql/hive/orc/OrcFileFormat.scala | 2 +- 3 files changed, 20 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter add 3e28f49 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c1b660e [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors c1b660e is described below commit c1b660efa31fdb79df058c847244791c9bec90ff Author: Dongjoon Hyun AuthorDate: Thu Oct 8 11:50:53 2020 -0700 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors ### What changes were proposed in this pull request? This PR aims to use `LinkedHashMap` instead of `Map` for `newlyCreatedExecutors`. ### Why are the changes needed? This makes log messages (INFO/DEBUG) more readable. This is helpful when `spark.kubernetes.allocation.batch.size` is large and especially when K8s dynamic allocation is used. **BEFORE** ``` 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (5,10,6,9,2,7,3,8,4). ``` **AFTER** ``` 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (2,3,4,5,6,7,8,9,10). ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"` Closes #29979 from dongjoon-hyun/SPARK-K8S-LOG. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 4987db8c88b49a0c0d8503b6291455e92e114efa) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala index b394f35..66cba55 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala +++
[spark] branch master updated (4a47b3e -> 4987db8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion add 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (45a8b89 -> 3e28f49)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter add 3e28f49 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c1b660e [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors c1b660e is described below commit c1b660efa31fdb79df058c847244791c9bec90ff Author: Dongjoon Hyun AuthorDate: Thu Oct 8 11:50:53 2020 -0700 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors ### What changes were proposed in this pull request? This PR aims to use `LinkedHashMap` instead of `Map` for `newlyCreatedExecutors`. ### Why are the changes needed? This makes log messages (INFO/DEBUG) more readable. This is helpful when `spark.kubernetes.allocation.batch.size` is large and especially when K8s dynamic allocation is used. **BEFORE** ``` 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:24:21 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (5,10,6,9,2,7,3,8,4). ``` **AFTER** ``` 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 2 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 3 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 4 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 5 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 6 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 7 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 8 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 9 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 DEBUG ExecutorPodsAllocator: Executor with id 10 was not found in the Kubernetes cluster since it was created 0 milliseconds ago. 20/10/08 10:25:17 INFO ExecutorPodsAllocator: Deleting 9 excess pod requests (2,3,4,5,6,7,8,9,10). ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CI or `build/sbt -Pkubernetes "kubernetes/test"` Closes #29979 from dongjoon-hyun/SPARK-K8S-LOG. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 4987db8c88b49a0c0d8503b6291455e92e114efa) Signed-off-by: Dongjoon Hyun --- .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala index b394f35..66cba55 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala +++
[spark] branch master updated (4a47b3e -> 4987db8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion add 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a47b3e -> 4987db8)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion add 4987db8 [SPARK-33096][K8S] Use LinkedHashMap instead of Map for newlyCreatedExecutors No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5effa8e -> 4a47b3e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion No new revisions were added by this update. Summary of changes: docs/submitting-applications.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5effa8e -> 4a47b3e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion No new revisions were added by this update. Summary of changes: docs/submitting-applications.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5effa8e -> 4a47b3e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion No new revisions were added by this update. Summary of changes: docs/submitting-applications.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5effa8e -> 4a47b3e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion No new revisions were added by this update. Summary of changes: docs/submitting-applications.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5effa8e -> 4a47b3e)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema add 4a47b3e [DOC][MINOR] pySpark usage - removed repeated keyword causing confusion No new revisions were added by this update. Summary of changes: docs/submitting-applications.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter 45a8b89 is described below commit 45a8b892455daaad34d97e37356dee85256d316d Author: Tom van Bussel AuthorDate: Thu Oct 8 09:58:01 2020 +0200 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter Backport of #29785 to Spark 2.4 ### What changes were proposed in this pull request? This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter. ### Why are the changes needed? Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array. 5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array. 6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill. 7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail. With the changes in the PR the following will happen instead: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array. 5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`). 6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step. 7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`. Closes #29910 from tomvanbussel/backport-SPARK-32901. Authored-by: Tom van Bussel Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 96 -- .../unsafe/sort/UnsafeInMemorySorter.java | 51 ++-- .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 -- .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 - .../apache/spark/memory/TestMemoryManager.scala| 12 ++- 5 files changed, 143 insertions(+), 100 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index f720ccd..9552e79 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends MemoryConsumer { } if (inMemSorter == null || inMemSorter.numRecords() <= 0) { + // There could still be some memory allocated when there are no records in the in-memory + // sorter. We will not spill it however, to ensure that we can always process at least
[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter 45a8b89 is described below commit 45a8b892455daaad34d97e37356dee85256d316d Author: Tom van Bussel AuthorDate: Thu Oct 8 09:58:01 2020 +0200 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter Backport of #29785 to Spark 2.4 ### What changes were proposed in this pull request? This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter. ### Why are the changes needed? Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array. 5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array. 6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill. 7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail. With the changes in the PR the following will happen instead: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array. 5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`). 6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step. 7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`. Closes #29910 from tomvanbussel/backport-SPARK-32901. Authored-by: Tom van Bussel Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 96 -- .../unsafe/sort/UnsafeInMemorySorter.java | 51 ++-- .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 -- .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 - .../apache/spark/memory/TestMemoryManager.scala| 12 ++- 5 files changed, 143 insertions(+), 100 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index f720ccd..9552e79 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends MemoryConsumer { } if (inMemSorter == null || inMemSorter.numRecords() <= 0) { + // There could still be some memory allocated when there are no records in the in-memory + // sorter. We will not spill it however, to ensure that we can always process at least
[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter 45a8b89 is described below commit 45a8b892455daaad34d97e37356dee85256d316d Author: Tom van Bussel AuthorDate: Thu Oct 8 09:58:01 2020 +0200 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter Backport of #29785 to Spark 2.4 ### What changes were proposed in this pull request? This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter. ### Why are the changes needed? Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array. 5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array. 6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill. 7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail. With the changes in the PR the following will happen instead: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array. 5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`). 6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step. 7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`. Closes #29910 from tomvanbussel/backport-SPARK-32901. Authored-by: Tom van Bussel Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 96 -- .../unsafe/sort/UnsafeInMemorySorter.java | 51 ++-- .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 -- .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 - .../apache/spark/memory/TestMemoryManager.scala| 12 ++- 5 files changed, 143 insertions(+), 100 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index f720ccd..9552e79 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends MemoryConsumer { } if (inMemSorter == null || inMemSorter.numRecords() <= 0) { + // There could still be some memory allocated when there are no records in the in-memory + // sorter. We will not spill it however, to ensure that we can always process at least
[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter 45a8b89 is described below commit 45a8b892455daaad34d97e37356dee85256d316d Author: Tom van Bussel AuthorDate: Thu Oct 8 09:58:01 2020 +0200 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter Backport of #29785 to Spark 2.4 ### What changes were proposed in this pull request? This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter. ### Why are the changes needed? Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array. 5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array. 6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill. 7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail. With the changes in the PR the following will happen instead: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array. 5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`). 6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step. 7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`. Closes #29910 from tomvanbussel/backport-SPARK-32901. Authored-by: Tom van Bussel Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 96 -- .../unsafe/sort/UnsafeInMemorySorter.java | 51 ++-- .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 -- .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 - .../apache/spark/memory/TestMemoryManager.scala| 12 ++- 5 files changed, 143 insertions(+), 100 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index f720ccd..9552e79 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends MemoryConsumer { } if (inMemSorter == null || inMemSorter.numRecords() <= 0) { + // There could still be some memory allocated when there are no records in the in-memory + // sorter. We will not spill it however, to ensure that we can always process at least
[spark] branch branch-2.4 updated: [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter
This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 45a8b89 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter 45a8b89 is described below commit 45a8b892455daaad34d97e37356dee85256d316d Author: Tom van Bussel AuthorDate: Thu Oct 8 09:58:01 2020 +0200 [SPARK-32901][CORE][2.4] Do not allocate memory while spilling UnsafeExternalSorter Backport of #29785 to Spark 2.4 ### What changes were proposed in this pull request? This PR changes `UnsafeExternalSorter` to no longer allocate any memory while spilling. In particular it removes the allocation of a new pointer array in `UnsafeInMemorySorter`. Instead the new pointer array is allocated whenever the next record is inserted into the sorter. ### Why are the changes needed? Without this change the `UnsafeExternalSorter` could throw an OOM while spilling. The following sequence of events would have triggered an OOM: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array, and tries to allocate a new small pointer array. 5. `TaskMemoryManager` tries to allocate the memory backing the small array using `MemoryManager`, but `MemoryManager` is unwilling to give it any memory, as the `TaskMemoryManager` is still holding on to the memory it got for the new large array. 6. `TaskMemoryManager` again asks `UnsafeExternalSorter` to spill, but this time there is nothing to spill. 7. `UnsafeInMemorySorter` receives less memory than it requested, and causes a `SparkOutOfMemoryError` to be thrown, which causes the current task to fail. With the changes in the PR the following will happen instead: 1. `UnsafeExternalSorter` runs out of space in its pointer array and attempts to allocate a new large array to replace the old one. 2. `TaskMemoryManager` tries to allocate the memory backing the new large array using `MemoryManager`, but `MemoryManager` is only willing to return most but not all of the memory requested. 3. `TaskMemoryManager` asks `UnsafeExternalSorter` to spill, which causes `UnsafeExternalSorter` to spill the current run to disk, to free its record pages and to reset its `UnsafeInMemorySorter`. 4. `UnsafeInMemorySorter` frees the old pointer array. 5. `TaskMemoryManager` returns control to `UnsafeExternalSorter.growPointerArrayIfNecessary` (either by returning the the new large array or by throwing a `SparkOutOfMemoryError`). 6. `UnsafeExternalSorter` either frees the new large array or it ignores the `SparkOutOfMemoryError` depending on what happened in the previous step. 7. `UnsafeExternalSorter` successfully allocates a new small pointer array and operation continues as normal. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tests were added in `UnsafeExternalSorterSuite` and `UnsafeInMemorySorterSuite`. Closes #29910 from tomvanbussel/backport-SPARK-32901. Authored-by: Tom van Bussel Signed-off-by: herman --- .../unsafe/sort/UnsafeExternalSorter.java | 96 -- .../unsafe/sort/UnsafeInMemorySorter.java | 51 ++-- .../unsafe/sort/UnsafeExternalSorterSuite.java | 45 -- .../unsafe/sort/UnsafeInMemorySorterSuite.java | 39 - .../apache/spark/memory/TestMemoryManager.scala| 12 ++- 5 files changed, 143 insertions(+), 100 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java index f720ccd..9552e79 100644 --- a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java +++ b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java @@ -203,6 +203,10 @@ public final class UnsafeExternalSorter extends MemoryConsumer { } if (inMemSorter == null || inMemSorter.numRecords() <= 0) { + // There could still be some memory allocated when there are no records in the in-memory + // sorter. We will not spill it however, to ensure that we can always process at least
[spark] branch branch-3.0 updated (a7e4318 -> 782ab8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from a7e4318 [SPARK-33089][SQL] make avro format propagate Hadoop config from DS options to underlying HDFS file system add 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (a7e4318 -> 782ab8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from a7e4318 [SPARK-33089][SQL] make avro format propagate Hadoop config from DS options to underlying HDFS file system add 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7d6e3fb -> 5effa8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7d6e3fb [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog add 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema 782ab8e is described below commit 782ab8e244252696c50b4b432d07a56c374b8680 Author: HyukjinKwon AuthorDate: Thu Oct 8 16:29:15 2020 +0900 [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema ### What changes were proposed in this pull request? This is a kind of a followup of SPARK-32646. New JIRA was filed to control the fixed versions properly. When you use `map`, it might be lazily evaluated and not executed. To avoid this, we should better use `foreach`. See also SPARK-16694. Current codes look not causing any bug for now but it should be best to fix to avoid potential issues. ### Why are the changes needed? To avoid potential issues from `map` being lazy and not executed. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran related tests. CI in this PR should verify. Closes #29974 from HyukjinKwon/SPARK-32646. Authored-by: HyukjinKwon Signed-off-by: Takeshi Yamamuro (cherry picked from commit 5effa8ea261ba59214afedc2853d1b248b330ca6) Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 69badb4..c540007 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -185,7 +185,7 @@ class OrcFileFormat } else { // ORC predicate pushdown if (orcFilterPushDown) { - OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach { fileSchema => OrcFilters.createFilter(fileSchema, filters).foreach { f => OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala index 1f38128..b0ddee0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala @@ -69,7 +69,7 @@ case class OrcPartitionReaderFactory( private def pushDownPredicates(filePath: Path, conf: Configuration): Unit = { if (orcFilterPushDown) { - OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach { fileSchema => OrcFilters.createFilter(fileSchema, filters).foreach { f => OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7d6e3fb -> 5effa8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7d6e3fb [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog add 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema 782ab8e is described below commit 782ab8e244252696c50b4b432d07a56c374b8680 Author: HyukjinKwon AuthorDate: Thu Oct 8 16:29:15 2020 +0900 [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema ### What changes were proposed in this pull request? This is a kind of a followup of SPARK-32646. New JIRA was filed to control the fixed versions properly. When you use `map`, it might be lazily evaluated and not executed. To avoid this, we should better use `foreach`. See also SPARK-16694. Current codes look not causing any bug for now but it should be best to fix to avoid potential issues. ### Why are the changes needed? To avoid potential issues from `map` being lazy and not executed. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran related tests. CI in this PR should verify. Closes #29974 from HyukjinKwon/SPARK-32646. Authored-by: HyukjinKwon Signed-off-by: Takeshi Yamamuro (cherry picked from commit 5effa8ea261ba59214afedc2853d1b248b330ca6) Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 69badb4..c540007 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -185,7 +185,7 @@ class OrcFileFormat } else { // ORC predicate pushdown if (orcFilterPushDown) { - OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach { fileSchema => OrcFilters.createFilter(fileSchema, filters).foreach { f => OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala index 1f38128..b0ddee0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala @@ -69,7 +69,7 @@ case class OrcPartitionReaderFactory( private def pushDownPredicates(filePath: Path, conf: Configuration): Unit = { if (orcFilterPushDown) { - OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach { fileSchema => OrcFilters.createFilter(fileSchema, filters).foreach { f => OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7d6e3fb -> 5effa8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7d6e3fb [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog add 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 782ab8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema 782ab8e is described below commit 782ab8e244252696c50b4b432d07a56c374b8680 Author: HyukjinKwon AuthorDate: Thu Oct 8 16:29:15 2020 +0900 [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema ### What changes were proposed in this pull request? This is a kind of a followup of SPARK-32646. New JIRA was filed to control the fixed versions properly. When you use `map`, it might be lazily evaluated and not executed. To avoid this, we should better use `foreach`. See also SPARK-16694. Current codes look not causing any bug for now but it should be best to fix to avoid potential issues. ### Why are the changes needed? To avoid potential issues from `map` being lazy and not executed. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran related tests. CI in this PR should verify. Closes #29974 from HyukjinKwon/SPARK-32646. Authored-by: HyukjinKwon Signed-off-by: Takeshi Yamamuro (cherry picked from commit 5effa8ea261ba59214afedc2853d1b248b330ca6) Signed-off-by: Takeshi Yamamuro --- .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala index 69badb4..c540007 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala @@ -185,7 +185,7 @@ class OrcFileFormat } else { // ORC predicate pushdown if (orcFilterPushDown) { - OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach { fileSchema => OrcFilters.createFilter(fileSchema, filters).foreach { f => OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) } diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala index 1f38128..b0ddee0 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala @@ -69,7 +69,7 @@ case class OrcPartitionReaderFactory( private def pushDownPredicates(filePath: Path, conf: Configuration): Unit = { if (orcFilterPushDown) { - OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).map { fileSchema => + OrcUtils.readCatalystSchema(filePath, conf, ignoreCorruptFiles).foreach { fileSchema => OrcFilters.createFilter(fileSchema, filters).foreach { f => OrcInputFormat.setSearchArgument(conf, f, fileSchema.fieldNames) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7d6e3fb -> 5effa8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7d6e3fb [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog add 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7d6e3fb -> 5effa8e)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7d6e3fb [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog add 5effa8e [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential side effect at callers of OrcUtils.readCatalystSchema No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala | 2 +- .../sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org