[ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Norbert Schultz updated SPARK-34115: ------------------------------------ Affects Version/s: 3.0.1 > Long runtime on many environment variables > ------------------------------------------ > > Key: SPARK-34115 > URL: https://issues.apache.org/jira/browse/SPARK-34115 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.4.0, 2.4.7, 3.0.1 > Environment: Spark 2.4.0 local[2] on a Kubernetes Pod > Reporter: Norbert Schultz > Priority: Major > Attachments: spark-bug-34115.tar.gz > > > I am not sure if this is a bug report or a feature request. The code is is > the same in current versions of Spark and maybe this ticket saves someone > some time for debugging. > We migrated some older code to Spark 2.4.0, and suddently the integration > tests on our build machine were much slower than expected. > On local machines it was running perfectly. > At the end it turned out, that Spark was wasting CPU Cycles during DataFrame > analyzing in the following functions > * AnalysisHelper.assertNotAnalysisRule calling > * Utils.isTesting > Utils.isTesting is traversing all environment variables. > The offending build machine was a Kubernetes Pod which automatically exposed > all services as environment variables, so it had more than 3000 environment > variables. > As Utils.isTesting is called very often throgh > AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, > transformUp). > > Of course we will restrict the number of environment variables, on the other > side Utils.isTesting could also use a lazy val for > > {code:java} > sys.env.contains("SPARK_TESTING") {code} > > to not make it that expensive. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org