Spark session dies out after ~40 hours when running against Hadoop Secure
cluster.

spark-submit has --principal and --keytab so kerberos ticket renewal works
fine according to logs.

Some happens with HDFS dfs connection?

These messages come up every 1 second:
  See complete stack: http://pastebin.com/QxcQvpqm

16/03/11 16:04:59 WARN hdfs.LeaseRenewer: Failed to renew lease for
> [DFSClient_NONMAPREDUCE_1534318438_13] for 2802 seconds.  Will retry
> shortly ...
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
> cache


Then in 1 hour it stops trying:

16/03/11 16:18:17 WARN hdfs.DFSClient: Failed to renew lease for
> DFSClient_NONMAPREDUCE_1534318438_13 for 3600 seconds (>= hard-limit =3600
> seconds.) Closing all files being written ...
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
> cache


It doesn't look it is Kerberos principal ticket renewal problem, because
that would expire much sooner (by default we have 12 hours), and from the
logs Spark kerberos ticket renewer works fine.

It's some sort of other hdfs delegation token renewal process that breaks?

RHEL 6.7
> Spark 1.5
> Hadoop 2.6


Found HDFS-5322, YARN-2648 that seem relevant, but I am not sure if it's
the same problem.
It seems Spark problem as I only seen this problem in Spark.
This is reproducible problem, just wait for ~40 hours and a Spark session
is no good.


Thanks,
Ruslan

Reply via email to