[jira] [Comment Edited] (SPARK-34115) Long runtime on many environment variables
[ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267161#comment-17267161 ] Hyukjin Kwon edited comment on SPARK-34115 at 1/18/21, 10:42 AM: - I see, okay. That makes sense. Can you try to open a PR? See also http://spark.apache.org/contributing.html was (Author: hyukjin.kwon): I see, okay. That makes sense. Can you try to open a PR? See also http://spark.apache.org/contributing.html For a more conservative fix, we can switch from: {{sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing")}} to: {{sys.props.contains("spark.testing") || sys.env.contains("SPARK_TESTING")}} If the lazy val approach does not work for any reason. > Long runtime on many environment variables > -- > > Key: SPARK-34115 > URL: https://issues.apache.org/jira/browse/SPARK-34115 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.4.0, 2.4.7, 3.0.1 > Environment: Spark 2.4.0 local[2] on a Kubernetes Pod >Reporter: Norbert Schultz >Priority: Major > Attachments: spark-bug-34115.tar.gz > > > I am not sure if this is a bug report or a feature request. The code is is > the same in current versions of Spark and maybe this ticket saves someone > some time for debugging. > We migrated some older code to Spark 2.4.0, and suddently the integration > tests on our build machine were much slower than expected. > On local machines it was running perfectly. > At the end it turned out, that Spark was wasting CPU Cycles during DataFrame > analyzing in the following functions > * AnalysisHelper.assertNotAnalysisRule calling > * Utils.isTesting > Utils.isTesting is traversing all environment variables. > The offending build machine was a Kubernetes Pod which automatically exposed > all services as environment variables, so it had more than 3000 environment > variables. > As Utils.isTesting is called very often throgh > AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, > transformUp). > > Of course we will restrict the number of environment variables, on the other > side Utils.isTesting could also use a lazy val for > > {code:java} > sys.env.contains("SPARK_TESTING") {code} > > to not make it that expensive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34115) Long runtime on many environment variables
[ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267163#comment-17267163 ] Hyukjin Kwon edited comment on SPARK-34115 at 1/18/21, 10:42 AM: - Let's see if tests pass first. and we can also get some more feedback from other people easily with a PR. was (Author: hyukjin.kwon): Let's see if tests pass first. and we can also get some more feedback from other people easily > Long runtime on many environment variables > -- > > Key: SPARK-34115 > URL: https://issues.apache.org/jira/browse/SPARK-34115 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.4.0, 2.4.7, 3.0.1 > Environment: Spark 2.4.0 local[2] on a Kubernetes Pod >Reporter: Norbert Schultz >Priority: Major > Attachments: spark-bug-34115.tar.gz > > > I am not sure if this is a bug report or a feature request. The code is is > the same in current versions of Spark and maybe this ticket saves someone > some time for debugging. > We migrated some older code to Spark 2.4.0, and suddently the integration > tests on our build machine were much slower than expected. > On local machines it was running perfectly. > At the end it turned out, that Spark was wasting CPU Cycles during DataFrame > analyzing in the following functions > * AnalysisHelper.assertNotAnalysisRule calling > * Utils.isTesting > Utils.isTesting is traversing all environment variables. > The offending build machine was a Kubernetes Pod which automatically exposed > all services as environment variables, so it had more than 3000 environment > variables. > As Utils.isTesting is called very often throgh > AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, > transformUp). > > Of course we will restrict the number of environment variables, on the other > side Utils.isTesting could also use a lazy val for > > {code:java} > sys.env.contains("SPARK_TESTING") {code} > > to not make it that expensive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34115) Long runtime on many environment variables
[ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265834#comment-17265834 ] Norbert Schultz edited comment on SPARK-34115 at 1/15/21, 8:58 AM: --- Added demonstration code based upon Spark 2.4.7 Call * show_fast.sh for regular running time * show_slow.sh for a lot of environment variables Running time (locally) * fast: 4000ms * slow: 11303ms The calculation done is completely useless but should give Spark SQL something to optimize (Also tried Spark 3.0.1, showing the same behaviour) was (Author: nob13): Added demonstration code based upon Spark 2.4.7 Call * show_fast.sh for regular running time * show_slow.sh for a lot of environment variables Running time (locally) * fast: 4000ms * slow: 11303ms The calculation done is completely useless but should give Spark SQL something to optimize > Long runtime on many environment variables > -- > > Key: SPARK-34115 > URL: https://issues.apache.org/jira/browse/SPARK-34115 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.4.0, 2.4.7 > Environment: Spark 2.4.0 local[2] on a Kubernetes Pod >Reporter: Norbert Schultz >Priority: Major > Attachments: spark-bug-34115.tar.gz > > > I am not sure if this is a bug report or a feature request. The code is is > the same in current versions of Spark and maybe this ticket saves someone > some time for debugging. > We migrated some older code to Spark 2.4.0, and suddently the integration > tests on our build machine were much slower than expected. > On local machines it was running perfectly. > At the end it turned out, that Spark was wasting CPU Cycles during DataFrame > analyzing in the following functions > * AnalysisHelper.assertNotAnalysisRule calling > * Utils.isTesting > Utils.isTesting is traversing all environment variables. > The offending build machine was a Kubernetes Pod which automatically exposed > all services as environment variables, so it had more than 3000 environment > variables. > As Utils.isTesting is called very often throgh > AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, > transformUp). > > Of course we will restrict the number of environment variables, on the other > side Utils.isTesting could also use a lazy val for > > {code:java} > sys.env.contains("SPARK_TESTING") {code} > > to not make it that expensive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org