[ 
https://issues.apache.org/jira/browse/HADOOP-9421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690034#comment-13690034
 ] 

Daryn Sharp commented on HADOOP-9421:
-------------------------------------

You seem to be trying to tailor a design that only considers today's 
implementation of tokens and kerberos.  It seems "easy" when you assume there 
are only two choices.  The optimization becomes more and more complicated, and 
in many cases impossible, instead of simply doing something the server tells 
you to do.

When pluggable auth support allows a world of heterogenous security, such as 
Knox or Rhino, requiring REINITIATE penalties becomes very expensive.

Sorry for the very long read, but these are topics I intended to address on the 
call that unfortunately didn't happen today.

+IP failover+
Distinct service principals with IP failover isn't "insane".  With a shared 
principal services can't be accessed directly because the host doesn't match 
the shared principal.  So a different config with a hardcoded shared principal 
is needed.  Similarly, DNs won't be able to heartbeat directly into HA NNs.  
I'm sure there are more problems than we've already discovered investigating 
that route.

The root issue is the client must only use the hostname that appears in the 
kerberos service principal.  Which means you can't access the service via all 
its interface, hostnames, or even pretty CNAMEs.

If server advertises "this is who I am" via the NEGOTIATE, then the problem is 
solved.

+Token selection issues+
Selecting tokens pre-connection based on the service as a host or ip port tuple 
is a problem.  Let's take a few examples:

Using the IP precludes multi-interface host support, for instance if you want 
to have a fast/private intra-cluster network and a separate public network.  
Tokens will contain the public IP, but clients using the private interface 
(different IP) can't find them.  This isn't contrived, it's something Cloudera 
has wanted to do.

You also can't use the IP because changing a service's IP will break clients 
using tokens with the old IP.  In comes the bane of my creation, use_ip=false, 
to use the given hostname.  But you can't allow non-fully qualified names 
because it will resolve differently on depending on the dns search path.  
There's a raft of reasons why the canonicalization isn't as straightforward as 
you'd think, which led to a custom NetUtils resolver and complicated path 
normalization.

Likewise, any sort of public proxy or NAT-ing between an external client and a 
cluster service creates an unusable token service within the grid.

HA token logic is unnecessarily convoluted to clone tokens from a logical uri 
into multiple tokens with each failover's service.

_Solution_
A clean solution to all these problems is tokens contain a server generated 
opaque id.  The server NEGOTIATE reports this id.  The client looks for a token 
with that id.  Now no matter what interface/IP/hostname/proxy/NAT is used, the 
client will always find the token.

If you cut out the use of the NEGOTIATE, this ability is gone.

+Supporting new auth methods+
Other new auths in the future may need the protocol/serverId hints from the 
NEGOTIATE to locate the required credentials.  Guessing may not be an option.

The RPC client shouldn't have to be modified to make a pre-connection guess for 
all the auth methods it supports.  Because...

Why should the client attempt an auth method before it _even knows if the 
server can do it_?  Let's look at some hairy examples:

The client tries to do kerberos, so it needs to generate the initial response 
to take advantage of your "optimization".  But the server isn't kerberized.  So 
either the client fails because it has no TGT, which it doesn't even need!  Or 
fails to get a non-existent service principal.

What if the client decides to use an SSO service, but the server doesn't do 
SSO?  Take a REINITIATE penalty every time?

+Supporting new mechanisms+
Let's say we add support for a new mechanism like SCRAM.  Just because the 
client can do it doesn't mean all services across all clusters can do it.  The 
server's NEGOTIATE will tell the client if it can do DIGEST-MD5, SCRAM, etc.

Inter-cluster compatibility and rolling upgrades will introduce scenarios where 
the required mechanism differs, and penalizing the client to REINITIATE is not 
a valid option.

---

In all of these scenarios, there aren't complex issues if the NEGOTIATE is used 
to chose an appropriate auth type.  In a world of multiple auths and multiple 
mechanisms for an auth, requiring REINITIATE penalties is too expensive.

Ignoring all the issues I've cited, your optimization doesn't appear to have a 
positive impact on performance.  Even if it did shave a few milliseconds or 
even 100ms, will it have a measurable real-world impact?  Considering how many 
RPC requests are performed over a single connection, will the negligible 
penalty from one extra packet make any difference?

I feel like we've spent weeks haggling over an ill-suited pre-mature 
optimization that could been spent building upon this implementation. :(
                
> Convert SASL to use ProtoBuf and add lengths for non-blocking processing
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-9421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9421
>             Project: Hadoop Common
>          Issue Type: Sub-task
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sanjay Radia
>            Assignee: Daryn Sharp
>            Priority: Blocker
>         Attachments: HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch, 
> HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch, HADOOP-9421.patch, 
> HADOOP-9421.patch, HADOOP-9421-v2-demo.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to