Mine is the same scenario. I get the HDFS_DELEGATION_TOKEN issue exactly
after the 7 days of the spark job started and it then gets killed.

Even  I'm also looking for the solution.

Regards,
Nik.

On Fri, Mar 11, 2016 at 8:10 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> [image: Boxbe] <https://www.boxbe.com/overview> This message is eligible
> for Automatic Cleanup! (dautkha...@gmail.com) Add cleanup rule
> <https://www.boxbe.com/popup?url=https%3A%2F%2Fwww.boxbe.com%2Fcleanup%3Ftoken%3DJHpPwnE%252BYWA%252Bajh8IOJO0CFuX0TJLT%252F0yU7giLRZRG%252BlI6DXTWdFY94sO%252BGXdQlKP6Y%252BTAQfMlKkYCdUo%252BGxG10PtItcYUUp758XIlPyVVqdzqEIfRsz%252BVQ%252BPNhxFUAjErrWLt%252FTi7k%253D%26key%3DNpxSVgbRz%252FHfM5eY%252B6VN2bEGqKWnv3005suYjGN0A5w%253D&tc_serial=24687159490&tc_rand=1562107157&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
> | More info
> <http://blog.boxbe.com/general/boxbe-automatic-cleanup?tc_serial=24687159490&tc_rand=1562107157&utm_source=stf&utm_medium=email&utm_campaign=ANNO_CLEANUP_ADD&utm_content=001>
>
> Spark session dies out after ~40 hours when running against Hadoop Secure
> cluster.
>
> spark-submit has --principal and --keytab so kerberos ticket renewal works
> fine according to logs.
>
> Some happens with HDFS dfs connection?
>
> These messages come up every 1 second:
>   See complete stack: http://pastebin.com/QxcQvpqm
>
> 16/03/11 16:04:59 WARN hdfs.LeaseRenewer: Failed to renew lease for
>> [DFSClient_NONMAPREDUCE_1534318438_13] for 2802 seconds.  Will retry
>> shortly ...
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
>> cache
>
>
> Then in 1 hour it stops trying:
>
> 16/03/11 16:18:17 WARN hdfs.DFSClient: Failed to renew lease for
>> DFSClient_NONMAPREDUCE_1534318438_13 for 3600 seconds (>= hard-limit =3600
>> seconds.) Closing all files being written ...
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
>> cache
>
>
> It doesn't look it is Kerberos principal ticket renewal problem, because
> that would expire much sooner (by default we have 12 hours), and from the
> logs Spark kerberos ticket renewer works fine.
>
> It's some sort of other hdfs delegation token renewal process that breaks?
>
> RHEL 6.7
>> Spark 1.5
>> Hadoop 2.6
>
>
> Found HDFS-5322, YARN-2648 that seem relevant, but I am not sure if it's
> the same problem.
> It seems Spark problem as I only seen this problem in Spark.
> This is reproducible problem, just wait for ~40 hours and a Spark session
> is no good.
>
>
> Thanks,
> Ruslan
>
>
>

Reply via email to