[ https://issues.apache.org/jira/browse/HDFS-7798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324233#comment-14324233 ]
Hudson commented on HDFS-7798: ------------------------------ SUCCESS: Integrated in Hadoop-Hdfs-trunk #2039 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2039/]) HDFS-7798. Checkpointing failure caused by shared KerberosAuthenticator. (Chengbing Liu via yliu) (yliu: rev 500e6a0f46d14a591d0ec082b6d26ee59bdfdf76) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/URLConnectionFactory.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Checkpointing failure caused by shared KerberosAuthenticator > ------------------------------------------------------------ > > Key: HDFS-7798 > URL: https://issues.apache.org/jira/browse/HDFS-7798 > Project: Hadoop HDFS > Issue Type: Bug > Components: security > Affects Versions: 2.6.0 > Reporter: Chengbing Liu > Assignee: Chengbing Liu > Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7798.01.patch > > > We have observed in our real cluster occasional checkpointing failure. The > standby NameNode was not able to upload image to the active NameNode. > After some digging, the root cause appears to be a shared > {{KerberosAuthenticator}} in {{URLConnectionFactory}}. The authenticator is > designed as a use-once instance, and is not stateless. It has attributes such > as {{HttpURLConnection}} and {{URL}}. When multiple threads are calling > {{URLConnectionFactory#openConnection(...)}}, the shared authenticator is > going to have race condition, resulting in a failed image uploading. > Therefore for the first step, without breaking the current API, I propose we > create a new {{KerberosAuthenticator}} instance for each connection, to make > checkpointing work. We may consider making {{Authenticator}} design and > implementation stateless afterwards, as {{ConnectionConfigurator}} does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)