subject:"\[jira\] \[Updated\] \(SPARK\-34115\) Long runtime on many environment variables"

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-02-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34115:
-
Fix Version/s: (was: 3.1.2)

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 2.4.7, 3.0.1
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Assignee: Norbert Schultz
>Priority: Major
> Fix For: 3.0.2, 3.1.1
>
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-02-08 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-34115:
-
Fix Version/s: 3.1.1

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 2.4.7, 3.0.1
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Assignee: Norbert Schultz
>Priority: Major
> Fix For: 3.0.2, 3.1.1, 3.1.2
>
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-34115:
--
Parent: SPARK-33005
Issue Type: Sub-task  (was: Bug)

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 2.4.7, 3.0.1
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Assignee: Norbert Schultz
>Priority: Major
> Fix For: 3.0.2, 3.1.2
>
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-15 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Affects Version/s: 3.0.1

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 2.4.7, 3.0.1
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-15 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Component/s: SQL

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0, 2.4.7
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-15 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Affects Version/s: 2.4.7

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.7
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-15 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Attachment: spark-bug-34115.tar.gz

> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
> Attachments: spark-bug-34115.tar.gz
>
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-14 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Description: 
I am not sure if this is a bug report or a feature request. The code is is the 
same in current versions of Spark and maybe this ticket saves someone some time 
for debugging.

We migrated some older code to Spark 2.4.0, and suddently the integration tests 
on our build machine were much slower than expected.

On local machines it was running perfectly.

At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
analyzing in the following functions
 * AnalysisHelper.assertNotAnalysisRule calling
 * Utils.isTesting

Utils.isTesting is traversing all environment variables.

The offending build machine was a Kubernetes Pod which automatically exposed 
all services as environment variables, so it had more than 3000 environment 
variables.

As Utils.isTesting is called very often throgh 
AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
transformUp).

 

Of course we will restrict the number of environment variables, on the other 
side Utils.isTesting could also use a lazy val for

 
{code:java}
sys.env.contains("SPARK_TESTING") {code}
 

to not make it that expensive.

  was:
I am not sure if this is a bug report or a feature request. The code is is the 
same in current versions of Spark and maybe this ticket saves someone some time 
for debugging.

We migrated some older code to Spark 2.4.0, and suddently the unit tests on our 
build machine were much slower than expected.

On local machines it was running perfectly.

At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
analyzing in the following functions
 * AnalysisHelper.assertNotAnalysisRule calling
 * Utils.isTesting

Utils.isTesting is traversing all environment variables.

The offending build machine was a Kubernetes Pod which automatically exposed 
all services as environment variables, so it had more than 3000 environment 
variables.

As Utils.isTesting is called very often throgh 
AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
transformUp).

 

Of course we will restrict the number of environment variables, on the other 
side Utils.isTesting could also use a lazy val for

 
{code:java}
sys.env.contains("SPARK_TESTING") {code}
 

to not make it that expensive.


> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the integration 
> tests on our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-14 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Description: 
I am not sure if this is a bug report or a feature request. The code is is the 
same in current versions of Spark and maybe this ticket saves someone some time 
for debugging.

We migrated some older code to Spark 2.4.0, and suddently the unit tests on our 
build machine were much slower than expected.

On local machines it was running perfectly.

At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
analyzing in the following functions
 * AnalysisHelper.assertNotAnalysisRule calling
 * Utils.isTesting

Utils.isTesting is traversing all environment variables.

The offending build machine was a Kubernetes Pod which automatically exposed 
all services as environment variables, so it had more than 3000 environment 
variables.

As Utils.isTesting is called very often throgh 
AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
transformUp).

 

Of course we will restrict the number of environment variables, on the other 
side Utils.isTesting could also use a lazy val for

```

sys.env.contains("SPARK_TESTING")

```

to not make it that expensive.

  was:
I am not sure if this is a bug report or a feature request. The code is is the 
same in current versions of Spark and maybe this ticket saves someone some time 
for debugging.

We migrated some older code to Spark 2.4.0, and suddently the unit tests on our 
build machine were much slower than expected.

On local machines it was running perfectly.

At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
analyzing in the following functions
 * AnalysisHelper.assertNotAnalysisRule calling
 * Utils.isTesting

Utils.isTesting is traversing all environment variables.

The offending build machine was a Kubernetes Pod which automatically exposed 
all services as environment variables, so it had more than 3000 environment 
variables.

As Utils.isTesting is called very often throgh 
AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
transformUp).

 

Of course we will restrict the number of environment variables, on the other 
side Utils.isTesting could also use a lazy val for 
"sys.env.contains("SPARK_TESTING")" to not make it that expensive.


> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the unit tests on 
> our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
> ```
> sys.env.contains("SPARK_TESTING")
> ```
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

2021-01-14 Thread Norbert Schultz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Schultz updated SPARK-34115:

Description: 
I am not sure if this is a bug report or a feature request. The code is is the 
same in current versions of Spark and maybe this ticket saves someone some time 
for debugging.

We migrated some older code to Spark 2.4.0, and suddently the unit tests on our 
build machine were much slower than expected.

On local machines it was running perfectly.

At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
analyzing in the following functions
 * AnalysisHelper.assertNotAnalysisRule calling
 * Utils.isTesting

Utils.isTesting is traversing all environment variables.

The offending build machine was a Kubernetes Pod which automatically exposed 
all services as environment variables, so it had more than 3000 environment 
variables.

As Utils.isTesting is called very often throgh 
AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
transformUp).

 

Of course we will restrict the number of environment variables, on the other 
side Utils.isTesting could also use a lazy val for

 
{code:java}
sys.env.contains("SPARK_TESTING") {code}
 

to not make it that expensive.

  was:
I am not sure if this is a bug report or a feature request. The code is is the 
same in current versions of Spark and maybe this ticket saves someone some time 
for debugging.

We migrated some older code to Spark 2.4.0, and suddently the unit tests on our 
build machine were much slower than expected.

On local machines it was running perfectly.

At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
analyzing in the following functions
 * AnalysisHelper.assertNotAnalysisRule calling
 * Utils.isTesting

Utils.isTesting is traversing all environment variables.

The offending build machine was a Kubernetes Pod which automatically exposed 
all services as environment variables, so it had more than 3000 environment 
variables.

As Utils.isTesting is called very often throgh 
AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
transformUp).

 

Of course we will restrict the number of environment variables, on the other 
side Utils.isTesting could also use a lazy val for

```

sys.env.contains("SPARK_TESTING")

```

to not make it that expensive.


> Long runtime on many environment variables
> --
>
> Key: SPARK-34115
> URL: https://issues.apache.org/jira/browse/SPARK-34115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0 local[2] on a Kubernetes Pod
>Reporter: Norbert Schultz
>Priority: Major
>
> I am not sure if this is a bug report or a feature request. The code is is 
> the same in current versions of Spark and maybe this ticket saves someone 
> some time for debugging.
> We migrated some older code to Spark 2.4.0, and suddently the unit tests on 
> our build machine were much slower than expected.
> On local machines it was running perfectly.
> At the end it turned out, that Spark was wasting CPU Cycles during DataFrame 
> analyzing in the following functions
>  * AnalysisHelper.assertNotAnalysisRule calling
>  * Utils.isTesting
> Utils.isTesting is traversing all environment variables.
> The offending build machine was a Kubernetes Pod which automatically exposed 
> all services as environment variables, so it had more than 3000 environment 
> variables.
> As Utils.isTesting is called very often throgh 
> AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, 
> transformUp).
>  
> Of course we will restrict the number of environment variables, on the other 
> side Utils.isTesting could also use a lazy val for
>  
> {code:java}
> sys.env.contains("SPARK_TESTING") {code}
>  
> to not make it that expensive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

[jira] [Updated] (SPARK-34115) Long runtime on many environment variables

10 matches

Site Navigation

Mail list logo

Footer information