[ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267126#comment-17267126
 ] 

Norbert Schultz commented on SPARK-34115:
-----------------------------------------

Just tried it
{code:java}
 diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index 926f3a3c11..cc15481b4b 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -1891,7 +1891,7 @@ private[spark] object Utils extends Logging {
   /**
    * Indicates whether Spark is currently running unit tests.
    */
-  def isTesting: Boolean = {
+  lazy val isTesting: Boolean = {
     sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing")
   }
{code}
Solves the problem.

The lookup time of the contains call in sys.env.contains(..) is indeed 
constant, however sys.env recreates the immutable map on each call and thus is 
slow.

 

 

 

> Long runtime on many environment variables
> ------------------------------------------
>
>                 Key: SPARK-34115
>                 URL: https://issues.apache.org/jira/browse/SPARK-34115
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.4.0, 2.4.7, 3.0.1
>         Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>            Reporter: Norbert Schultz
>            Priority: Major
>         Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to