[ https://issues.apache.org/jira/browse/SPARK-42596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Zhuge updated SPARK-42596: ------------------------------- Description: Run this PySpark script with `spark.executor.cores=1` {code:python} import os from pyspark.sql import SparkSession from pyspark.sql.functions import udf spark = SparkSession.builder.getOrCreate() var_name = 'OMP_NUM_THREADS' def get_env_var(): return os.getenv(var_name) udf_get_env_var = udf(get_env_var) spark.range(1).toDF("id").withColumn(f"env_{var_name}", udf_get_env_var()).show(truncate=False) {code} Output with release `3.3.2`: {noformat} +---+-----------------------+ |id |env_OMP_NUM_THREADS | +---+-----------------------+ |0 |null | +---+-----------------------+ {noformat} Output with release `3.3.0`: {noformat} +---+-----------------------+ |id |env_OMP_NUM_THREADS | +---+-----------------------+ |0 |1 | +---+-----------------------+ {noformat} was: Run this PySpark script with `spark.executor.cores=1` {code:python} import os from pyspark.sql import SparkSession from pyspark.sql.functions import udf spark = SparkSession.builder.getOrCreate() var_name = 'OMP_NUM_THREADS' def get_env_var(): return os.getenv(var_name) udf_get_env_var = udf(get_env_var) spark.range(1).toDF("id").withColumn(f"env_{var_name}", udf_get_env_var()).show(truncate=False) {code} Output with release `3.3.2`: {noformat} +---+-----------------------+ |id |env_OMP_NUM_THREADS | +---+-----------------------+ |0 |null | +---+-----------------------+ {noformat} Output with release `3.3.0`: {noformat} +---+-----------------------+ |id |env_OMP_NUM_THREADS| +---+-----------------------+ |0 |1 | +---+-----------------------+ {noformat} > [YARN] OMP_NUM_THREADS not set to number of executor cores by default > --------------------------------------------------------------------- > > Key: SPARK-42596 > URL: https://issues.apache.org/jira/browse/SPARK-42596 > Project: Spark > Issue Type: Bug > Components: PySpark, YARN > Affects Versions: 3.3.2 > Reporter: John Zhuge > Priority: Major > > Run this PySpark script with `spark.executor.cores=1` > {code:python} > import os > from pyspark.sql import SparkSession > from pyspark.sql.functions import udf > spark = SparkSession.builder.getOrCreate() > var_name = 'OMP_NUM_THREADS' > def get_env_var(): > return os.getenv(var_name) > udf_get_env_var = udf(get_env_var) > spark.range(1).toDF("id").withColumn(f"env_{var_name}", > udf_get_env_var()).show(truncate=False) > {code} > Output with release `3.3.2`: > {noformat} > +---+-----------------------+ > |id |env_OMP_NUM_THREADS | > +---+-----------------------+ > |0 |null | > +---+-----------------------+ > {noformat} > Output with release `3.3.0`: > {noformat} > +---+-----------------------+ > |id |env_OMP_NUM_THREADS | > +---+-----------------------+ > |0 |1 | > +---+-----------------------+ > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org