[jira] [Assigned] (SPARK-37329) File system delegation tokens are leaked

2021-11-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37329:


Assignee: Apache Spark

> File system delegation tokens are leaked
> 
>
> Key: SPARK-37329
> URL: https://issues.apache.org/jira/browse/SPARK-37329
> Project: Spark
>  Issue Type: Bug
>  Components: Security, YARN
>Affects Versions: 2.4.0
>Reporter: Wei-Chiu Chuang
>Assignee: Apache Spark
>Priority: Major
>
> On a very busy Hadoop cluster (with HDFS at rest encryption) we found KMS 
> accumulated millions of delegation tokens that are not cancelled even after 
> jobs are finished, and KMS goes out of memory within a day because of the 
> delegation token leak.
> We were able to reproduce the bug in a smaller test cluster, and realized 
> when a Spark job starts, it acquires two delegation tokens, and only one is 
> cancelled properly after the job finishes. The other one is left over and 
> linger around for up to 7 days ( default Hadoop delegation token life time).
> YARN handles the lifecycle of a delegation token properly if its renewer is 
> 'yarn'. However, Spark intentionally (a hack?) acquires a second delegation 
> token with the job issuer as the renewer, simply to get the token renewal 
> interval. The token is then ignored but not cancelled.
> Propose: cancel the delegation token immediately after the token renewal 
> interval is obtained.
> Environment: CDH6.3.2 (based on Apache Spark 2.4.0) but the bug probably got 
> introduced since day 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37329) File system delegation tokens are leaked

2021-11-15 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37329:


Assignee: (was: Apache Spark)

> File system delegation tokens are leaked
> 
>
> Key: SPARK-37329
> URL: https://issues.apache.org/jira/browse/SPARK-37329
> Project: Spark
>  Issue Type: Bug
>  Components: Security, YARN
>Affects Versions: 2.4.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> On a very busy Hadoop cluster (with HDFS at rest encryption) we found KMS 
> accumulated millions of delegation tokens that are not cancelled even after 
> jobs are finished, and KMS goes out of memory within a day because of the 
> delegation token leak.
> We were able to reproduce the bug in a smaller test cluster, and realized 
> when a Spark job starts, it acquires two delegation tokens, and only one is 
> cancelled properly after the job finishes. The other one is left over and 
> linger around for up to 7 days ( default Hadoop delegation token life time).
> YARN handles the lifecycle of a delegation token properly if its renewer is 
> 'yarn'. However, Spark intentionally (a hack?) acquires a second delegation 
> token with the job issuer as the renewer, simply to get the token renewal 
> interval. The token is then ignored but not cancelled.
> Propose: cancel the delegation token immediately after the token renewal 
> interval is obtained.
> Environment: CDH6.3.2 (based on Apache Spark 2.4.0) but the bug probably got 
> introduced since day 1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org