[jira] [Assigned] (SPARK-37329) File system delegation tokens are leaked
[ https://issues.apache.org/jira/browse/SPARK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37329: Assignee: Apache Spark > File system delegation tokens are leaked > > > Key: SPARK-37329 > URL: https://issues.apache.org/jira/browse/SPARK-37329 > Project: Spark > Issue Type: Bug > Components: Security, YARN >Affects Versions: 2.4.0 >Reporter: Wei-Chiu Chuang >Assignee: Apache Spark >Priority: Major > > On a very busy Hadoop cluster (with HDFS at rest encryption) we found KMS > accumulated millions of delegation tokens that are not cancelled even after > jobs are finished, and KMS goes out of memory within a day because of the > delegation token leak. > We were able to reproduce the bug in a smaller test cluster, and realized > when a Spark job starts, it acquires two delegation tokens, and only one is > cancelled properly after the job finishes. The other one is left over and > linger around for up to 7 days ( default Hadoop delegation token life time). > YARN handles the lifecycle of a delegation token properly if its renewer is > 'yarn'. However, Spark intentionally (a hack?) acquires a second delegation > token with the job issuer as the renewer, simply to get the token renewal > interval. The token is then ignored but not cancelled. > Propose: cancel the delegation token immediately after the token renewal > interval is obtained. > Environment: CDH6.3.2 (based on Apache Spark 2.4.0) but the bug probably got > introduced since day 1. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37329) File system delegation tokens are leaked
[ https://issues.apache.org/jira/browse/SPARK-37329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37329: Assignee: (was: Apache Spark) > File system delegation tokens are leaked > > > Key: SPARK-37329 > URL: https://issues.apache.org/jira/browse/SPARK-37329 > Project: Spark > Issue Type: Bug > Components: Security, YARN >Affects Versions: 2.4.0 >Reporter: Wei-Chiu Chuang >Priority: Major > > On a very busy Hadoop cluster (with HDFS at rest encryption) we found KMS > accumulated millions of delegation tokens that are not cancelled even after > jobs are finished, and KMS goes out of memory within a day because of the > delegation token leak. > We were able to reproduce the bug in a smaller test cluster, and realized > when a Spark job starts, it acquires two delegation tokens, and only one is > cancelled properly after the job finishes. The other one is left over and > linger around for up to 7 days ( default Hadoop delegation token life time). > YARN handles the lifecycle of a delegation token properly if its renewer is > 'yarn'. However, Spark intentionally (a hack?) acquires a second delegation > token with the job issuer as the renewer, simply to get the token renewal > interval. The token is then ignored but not cancelled. > Propose: cancel the delegation token immediately after the token renewal > interval is obtained. > Environment: CDH6.3.2 (based on Apache Spark 2.4.0) but the bug probably got > introduced since day 1. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org