[ https://issues.apache.org/jira/browse/HADOOP-10850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066711#comment-14066711 ]
Alejandro Abdelnur commented on HADOOP-10850: --------------------------------------------- I've spent a good few hours looking into this. Making {{KerberosAuthenticator}} to use JDK SPNEGO is straight forward, simply removing code. Still we want to continue using the {{AuthenticatedURL}} because it sets the hadoop-auth cookie which avoids triggering SPENGO every time (and failing with doing a PUT/POST with a payload). But there is problem, (IMO) a nasty bug in the JDK SPNEGO implementation that makes it unusable in a reliable way. The (Sun) JDK {{sun.net.www.protocol.http.HttpURLConnection}} class uses the {{NegotiateAuthentication}} class to keep track of hostnames that support SPNEGO. The {{NegotiateAuthentication}} class uses a static {{HashMap<String, Boolean> supported}} where it keeps if a hostname supports SPNEGO or not. The first time the JDK gets a {{WWW-Authenticate: NEGOTIATE}} for hostname it will try to obtain the Kerberos service ticket for {{HTTP/<hostname>}}. If the service ticket is obtained, then the entry in the {{supported}} map will be {{(<hostname>,TRUE)}}, if the service ticket is not obtained the entry will be {{(<hostname>, FALSE)}}. The {{supported}} map is never purged, this means a given hostname will be blacklisted for the lifetime of the JVM (the singleton is kept in the bootstrap classloader, so it affects the whole JVM). The problem is that the service ticket may not be obtained for different reasons: * JVM client is not within Kerberos login context * KDC is not available at the time of the call * {{HTTP/<hostname>}} principal does not exist at the time of the call All these reasons can be 'transient'. If they ever happen before a successful attempt, the hostname is blacklisted for the lifetime of the JVM. When running testcases, depending on the order testcases run, they pass or the fail. It is also reasonable to say these 'transient' errors can happen in production in a long running service (NN, Oozie). The good thing is, and that is why {{AuthenticatedURL}}/{{KerberosAuthenticator}} works fine, is that the {{HttpURLConnection}} checks if the user code is doing SPNEGO handling and lets the user code handle it. And because {{KerberosAuthenticator}} does not blacklists hostnames, things work and recover if necessary. Based on these finding, I would close this JIRA and HADOO-10453 as invalids. > KerberosAuthenticator should not do the SPNEGO handshake > -------------------------------------------------------- > > Key: HADOOP-10850 > URL: https://issues.apache.org/jira/browse/HADOOP-10850 > Project: Hadoop Common > Issue Type: Bug > Components: security > Affects Versions: 2.4.1 > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > > As mentioned in HADOOP-10453, the JDK automatically does a SPNEGO handshake > when opening a connection with a URL within a Kerberos login context, there > is no need to do the SPNEGO handshake in the {{KerberosAuthenticator}}, > simply extract the auth token (hadoop-auth cookie) and do the fallback if > necessary. -- This message was sent by Atlassian JIRA (v6.2#6252)