This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-4.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.0 by this push:
new 8e27e833a348 [SPARK-51365][SQL][TESTS] Add Envs to control the number
of `SHUFFLE_EXCHANGE/RESULT_QUERY_STAGE` threads used in test cases related to
`SharedSparkSession/TestHive`
8e27e833a348 is described below
commit 8e27e833a3480e14446cbd92ff3aadf85707ee05
Author: yangjie01 <[email protected]>
AuthorDate: Sun Mar 9 20:03:57 2025 -0700
[SPARK-51365][SQL][TESTS] Add Envs to control the number of
`SHUFFLE_EXCHANGE/RESULT_QUERY_STAGE` threads used in test cases related to
`SharedSparkSession/TestHive`
### What changes were proposed in this pull request?
This PR adds the following environment variables:
- `SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD`: Used to control
the `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD` for test cases related to
`SharedSparkSession`.
- `SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD`: Used to control
the `RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for test cases related to
`SharedSparkSession`.
- `SPARK_TEST_HIVE_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD`: Used to control
the `SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD` for test cases related to
`TestHive`.
- `SPARK_TEST_HIVE_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD`: Used to
control the `RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` for test cases related to
`TestHive`.
This allows the maximum number of `SHUFFLE_EXCHANGE`/`RESULT_QUERY_STAGE`
threads used in test cases related to `SharedSparkSession`/`TestHive` to be
controlled by setting environment variables.
Additionally, due to the memory configuration of the macOS + Apple Silicon
runner specification in the standard GitHub-hosted runners being only half of
that of other specifications (7G vs 14G), this pr configures the following
settings in `build_maven_java21_macos15.yml`:
```
"SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD": "256",
"SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD": "256",
"SPARK_TEST_HIVE_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD": "48",
"SPARK_TEST_HIVE_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD": "48"
```
This is to avoid test errors similar to the following from occurring in
daily tests on macOS:
```
Warning: [343.044s][warning][os,thread] Failed to start thread "Unknown
thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 4096k,
guardsize: 16k, detached.
11372Warning: [343.044s][warning][os,thread] Failed to start the native
thread for java.lang.Thread "shuffle-exchange-1529"
11373*** RUN ABORTED ***
11374An exception or error caused a run to abort: unable to create native
thread: possibly out of memory or process/resource limits reached
11375 java.lang.OutOfMemoryError: unable to create native thread: possibly
out of memory or process/resource limits reached
11376 at java.base/java.lang.Thread.start0(Native Method)
11377 at java.base/java.lang.Thread.start(Thread.java:1553)
11378 at java.base/java.lang.System$2.start(System.java:2577)
11379 at
java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
11380 at
java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
11381 at
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
11382 at
scala.concurrent.impl.ExecutionContextImpl.execute(ExecutionContextImpl.scala:21)
11383 at
java.base/java.util.concurrent.CompletableFuture.asyncSupplyStage(CompletableFuture.java:1782)
11384 at
java.base/java.util.concurrent.CompletableFuture.supplyAsync(CompletableFuture.java:2005)
11385 at
org.apache.spark.sql.execution.SQLExecution$.withThreadLocalCaptured(SQLExecution.scala:329)
11386 ...
```
### Why are the changes needed?
The default configuration values for
`SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD` and
`RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD` are 1024. Additionally, since the
`-Xss` value used in Spark test cases is relatively large by default, such as
`-Xss4m` for the SQL module and `-Xss64m` for the Hive module, it is necessary
to provide the ability to adjust the maximum number of related threads to
accommodate different test environments, such as the daily tests on macOS.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass Github Actions
- Test macOs on Github Action:
https://github.com/LuciferYang/spark/actions/runs/13745222147

### Was this patch authored or co-authored using generative AI tooling?
NO
Closes #50206 from LuciferYang/SPARK-51365.
Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 34c29cf9fb95ee90a19fab72c5f0d433b9d30a40)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.github/workflows/build_maven_java21_macos15.yml | 6 +++++-
.../scala/org/apache/spark/sql/test/SharedSparkSession.scala | 6 ++++++
.../test/scala/org/apache/spark/sql/hive/test/TestHive.scala | 10 ++++++++--
3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/.github/workflows/build_maven_java21_macos15.yml
b/.github/workflows/build_maven_java21_macos15.yml
index 377a67191ab4..173810be9fe9 100644
--- a/.github/workflows/build_maven_java21_macos15.yml
+++ b/.github/workflows/build_maven_java21_macos15.yml
@@ -36,5 +36,9 @@ jobs:
os: macos-15
envs: >-
{
- "OBJC_DISABLE_INITIALIZE_FORK_SAFETY": "YES"
+ "OBJC_DISABLE_INITIALIZE_FORK_SAFETY": "YES",
+ "SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD": "256",
+ "SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD": "256",
+ "SPARK_TEST_HIVE_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD": "48",
+ "SPARK_TEST_HIVE_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD": "48"
}
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala
b/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala
index b8348cefe7c9..245219c1756d 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala
@@ -79,6 +79,12 @@ trait SharedSparkSessionBase
StaticSQLConf.WAREHOUSE_PATH,
conf.get(StaticSQLConf.WAREHOUSE_PATH) + "/" + getClass.getCanonicalName)
conf.set(StaticSQLConf.LOAD_SESSION_EXTENSIONS_FROM_CLASSPATH, false)
+ conf.set(StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD,
+ sys.env.getOrElse("SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD",
+
StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+ conf.set(StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD,
+
sys.env.getOrElse("SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD",
+
StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
}
/**
diff --git
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala
index 220d965d2860..a394d0b7393c 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala
@@ -44,7 +44,7 @@ import org.apache.spark.sql.execution.{CommandExecutionMode,
QueryExecution, SQL
import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive.client.HiveClient
import org.apache.spark.sql.internal.{SessionState, SharedState, SQLConf,
WithTestConf}
-import org.apache.spark.sql.internal.StaticSQLConf.{CATALOG_IMPLEMENTATION,
WAREHOUSE_PATH}
+import org.apache.spark.sql.internal.StaticSQLConf.{CATALOG_IMPLEMENTATION,
RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD, SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD,
WAREHOUSE_PATH}
import org.apache.spark.util.{ShutdownHookManager, Utils}
// SPARK-3729: Test key required to check for initialization errors with
config.
@@ -70,7 +70,13 @@ object TestHive
// LocalRelation will exercise the optimization rules better by
disabling it as
// this rule may potentially block testing of other optimization rules
such as
// ConstantPropagation etc.
- .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key,
ConvertToLocalRelation.ruleName))) {
+ .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key,
ConvertToLocalRelation.ruleName)
+ .set(SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD,
+
sys.env.getOrElse("SPARK_TEST_HIVE_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD",
+ SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+ .set(RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD,
+
sys.env.getOrElse("SPARK_TEST_HIVE_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD",
+
RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt))) {
override def conf: SQLConf = sparkSession.sessionState.conf
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]