> On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
> > shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
> > <https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210>
> >
> >     Why is username the "short" user name? Is that unique in Kerberos? If 
> > not, the long version should be used everywhere instead. Otherwise, one 
> > user can appear to be another in logs, etc.
> >     
> >     If "getShortUserName" is not unique, it should avoided everywhere.
> 
> Josh Elser wrote:
>     Check out: 
> http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
>     
>     Kerberos principals are of the form: primary/instance@realm. Kerberos 
> principals are typically categorized as users and services. A user is not 
> qualified to a single instance (a host) and represent authentication across 
> the realm. For example, els...@example.com means that I can "roam". 
> Conversely, a service is typically "fixed" to a specific host. For example, 
> accumulo/node1.example....@example.com means that there is a process, logged 
> in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
> on any other host. Now, an important note if someone actually creates a 
> principal "accum...@example.com" this is unique with respect to any other 
> "accumulo/`host`@EXAMPLE.COM" principal. I'm not sure if we need to do 
> anything else other than convention of kerberos principals, or if we should 
> be including the instance in "our" username when present.
>     
>     This kind of ties back into the SystemCredentials discussion again.
> 
> Christopher Tubbs wrote:
>     Okay, so a smart configuration would make shortnames unique. However, 
> UserGroupInformation returns only the `primary` for the short name. This 
> means that user names will have to be unique across realms and instances. 
> Right now, you are storing permissions using the short name. So, any user 
> with the same primary, will be able to masquerade as any other user with the 
> same primary from a different instance and/or realm, and be able to user 
> their permissions and authorizations. That's the problem with the shortname 
> here. That's very unexpected.
> 
> Josh Elser wrote:
>     Bingo. If you look at how HDFS does their configuration, this is the same 
> convention. The lack of documentation from me leaves something to be desired 
> here, and I apologize for that.
>     
>     To save you looking at HDFS (if you care not to look), you'll see that an 
> HDFS process uses a given principal with a special replacement string 
> `_HOST`. The common convention is to use something like 
> `dn/_h...@example.com` (the realm is unimportant for this example). This 
> ensures that the same configuration files can be used across all hosts in the 
> HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
> host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
> datanodes and this is protected by the fact that access to the KDC is 
> restricted (you can't make your own user). The circle of trust is two-fold: 
> having a keytab with the correct principal and that Hadoop is requires that 
> specific configuration (which restricts the principal).
> 
> Christopher Tubbs wrote:
>     My concerns here are more about the impact on users, than for the system 
> credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
> checking the realm when checking permissions/access on an authenticated 
> principal, then they are less secure than I think we should be. Referencing 
> HDFS also seems to imply that we're not so much doing Kerberos, as we are 
> implementing HDFS-specific Kerberos conventions (which are less secure, with 
> respect to data authorizations/permissions within Accumulo, than I'm 
> comfortable with).
> 
> Josh Elser wrote:
>     bq. if they aren't (minimally) checking the realm when checking 
> permissions/access on an authenticated principal
>     
>     Do you mean the instance instead of the realm? In the case of a single 
> realm, the KDC is going to verify the correct realm. Assuming you meant the 
> instance though (the optional "/hostname"), it's typical that a user has the 
> ability to use their credentials anywhere. Thus, you typically see principals 
> without instances for actual users. As far as I understand it, that's what 
> HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
> you come from, just what your name is and that you have valid credentials. I 
> don't think we're being substantially less secure by not including the 
> instance in the Accumulo principal.
> 
> Christopher Tubbs wrote:
>     No, I mean the realm, to make it only necessary to guarantee uniqueness 
> within a realm, vs. across all known realms (more reasonable of a guarantee 
> to make for a KDC user admin). We could also include the instance (when 
> specified), if we want to really be careful that users aren't sharing 
> permissions.
>     
>     In my concerns, I'm assuming we authenticate users in any realm. If we 
> are somehow restricted to a single realm (either by a "permittedRealm" 
> configuration item or by the nature of Kerberos itself), then realm isn't 
> that important, but we should discuss more about the instance. My 
> understanding is that Kerberos authenticates the user by the fully qualified 
> Kerberos principal (`primary/instance@realm`) in whatever realm they are, but 
> it doesn't have to be a specific realm (like the same one as the server), and 
> then we are truncating their identity, essentially binning people from 
> different realms into the same bucket. It's like authenticating me as 
> `Christopher Tubbs`, and then assigning me to a bucket called `Christopher` 
> where I share permissions/authorizations with all other `Christopher`s.
> 
> Josh Elser wrote:
>     Oh, I apologize, I follow you now. Your concern wasn't clicking for me.
>     
>     > My understanding is that Kerberos authenticates the user by the fully 
> qualified Kerberos principal (primary/instance@realm) in whatever realm they 
> are, but it doesn't have to be a specific realm (like the same one as the 
> server), and then we are truncating their identity, essentially binning 
> people from different realms into the same bucket
>     
>     Well, the KDC you're communicating with has to be set up for the realm 
> being requested (and if one isn't provided, it will delegate to another KDC 
> or drop you into a default realm, depending on krb5.conf). As I understand 
> it, if you haven't defined a `default_realm` in `libdefaults` in krb5.conf, 
> and a user comes in with an incorrect hostname (instance) or realm 
> specification, the KDC won't authenticate you which keeps them out of 
> Accumulo completely. I use `default_realm` locally, since I just use a dummy 
> realm instead of actually matching my laptop.
>     
>     In all honesty thought, I haven't thought past single-realm KDC setups. 
> Is enforcing that clients are a member of the same realm the Accumulo server 
> principals reside in sufficient? I'm worried about scope-creep of trying to 
> do multi-realm configuration correct before single realm is adequately 
> polished.
> 
> Christopher Tubbs wrote:
>     bq. Is enforcing that clients are a member of the same realm the Accumulo 
> server principals reside in sufficient?
>     
>     Perhaps. Where would we do this? In the site configuration?
>     
>     bq. I'm worried about scope-creep of trying to do multi-realm 
> configuration correct before single realm is adequately polished.
>     
>     Understood, but I'm thinking about it from the other side. I don't want 
> to make assumptions which are valid in a narrow case, but which leave 
> security holes in a more general case. I'm also coming at this from the 
> perspective of dealing with X.509 certificates, and understanding the 
> differences between a CN and a DN.
>     
>     If we lock things down to a single realm (so we can safely omit it in our 
> internal structures), we'd still need to address the `instance` portion. For 
> that, it sounded like you were saying that `myPrimary/myInstance@myRealm` is 
> distinct from `myPrimary@myRealm` and could both be valid users according to 
> the KDC. If that's the case, I think it makes sense for the permissions 
> handler/authorizer to use the `primary/instance` for the principal and not 
> just the `primary` (which is what shortname does), because it could have 
> different permissions. If the user administrator wishes to allow 
> `myPrimary@myRealm`, then they should create such a user in the KDC (I hope 
> I'm understanding this correctly.), so we would just use `myPrimary` as the 
> user principal in Accumulo, but we shouldn't strip the instance off if it is 
> present.
> 
> Josh Elser wrote:
>     > > Is enforcing that clients are a member of the same realm the Accumulo 
> server principals reside in sufficient?
>     > Perhaps. Where would we do this? In the site configuration?
>     
>     Yeah. My thought was to just piggy-back on top of the realm provided in 
> the kerberos principal. That keeps us from having to introduce a new property 
> for something we know that might not be entirely sufficient.
>     
>     > we'd still need to address the instance portion. For that, it sounded 
> like you were saying that myPrimary/myInstance@myRealm is distinct from 
> myPrimary@myRealm and could both be valid users according to the KDC
>     
>     Yes, principals are valid (and distinct!) both with and without an 
> instance. In our case, I believe the instance being distinct is undesirable 
> (and where I was going with the reference to how Hadoop does things). Any 
> server with a given principal (or matching a certain principal) is considered 
> the Accumulo "system" user (along with the `instance.*` check we mentioned 
> earlier). A simple way to do this (without getting into complicated regex's 
> defining who is actually considered the system user) is to just treat any 
> instance also as that user. It brings a bit of coordination required in how 
> KRB principals are created, but it's the "common" configuration/deployment at 
> the cost of flexibility. I would envision leveraging something similar to the 
> `auth_to_local` RULEs 
> (http://web.mit.edu/kerberos/krb5-devel/doc/admin/conf_files/krb5_conf.html) 
> like Hadoop does, but I don't *really* want to do that right now (mapping 
> some set of principal regexs to a "user"). This would let us say th
 ings like "accumulo/node1.example.com" is "accumulo" as is 
"old_server/node2.example.com".
>     
>     For normal users, convention is that they aren't attached to an instance 
> (and are valid within the realm), and this implementation would be a 
> limitation on us for edge cases in KDC configurations.
>     
>     > If the user administrator wishes to allow myPrimary@myRealm, then they 
> should create such a user in the KDC (I hope I'm understanding this 
> correctly.), so we would just use myPrimary as the user principal in 
> Accumulo, but we shouldn't strip the instance off if it is present.
>     
>     Yes, you are correct. One thing I'm confused about is if there is ever a 
> case that a user would have an instance in their principal. Not understanding 
> why this might actually happen pushes me in the direction that truncating 
> things is ok. That covers "human" users, but "application" users would still 
> be likely tied to a specific hostname, in which case perhaps I can't punt on 
> this for now. I really just want to avoid having N `accumulo/hostname` users 
> in our "database" which would the sum of all Accumulo server processes. The 
> regex matching would be needed to avoid that.
>     
>     Maybe this is experimental until I do that as well? Maybe I shouldn't 
> commit any of this without that? I'm not completely decided yet, but I'm 
> erring on the former presently.

bq. My thought was to just piggy-back on top of the realm provided in the 
kerberos principal.

You mean the server's own realm? That makes sense to me. We can document that 
they should match, but we'd need to make sure we explicitly check that.


bq. For normal users, convention is that they aren't attached to an instance 
(and are valid within the realm), and this implementation would be a limitation 
on us for edge cases in KDC configurations.

My concerns here are for normal users. The !SYSTEM user doesn't even have 
permissions or authorizations stored in ZK (it shouldn't anyway). I had assumed 
the !SYSTEM user would be treated specially after authentication at the 
transport layer. I don't think it should rely on the Kerberos principal. This 
relates to our other discussion about the SystemToken.

bq. One thing I'm confused about is if there is ever a case that a user would 
have an instance in their principal.

I can imagine use cases where a user has permission to access a table, but only 
from a specific, vetted system. This is analogous to OpenStack and EC2 security 
group / firewall rules which allow access only from specific sources. MySQL 
also has this concept in its permissions model.

bq. That covers "human" users, but "application" users would still be likely 
tied to a specific hostname, in which case perhaps I can't punt on this for now.

Agreed.

bq. I really just want to avoid having N accumulo/hostname users in our 
"database" which would the sum of all Accumulo server processes. The regex 
matching would be needed to avoid that.

I don't think that's the case. The system user doesn't (shouldn't) write to the 
ZK user database. Its permissions are evaluated separately, and it should never 
have any authorizations. Rather than regex matching, our discussion around the 
SystemToken might help resolve this. If the system credentials (!SYSTEM, 
SystemToken) are left as-is, then you can keep using those internally after the 
transport layer is finished. I wouldn't use the server's Kerberos principal for 
the server components. I'd keep using the existing !SYSTEM principal, but only 
after the server component is verified at the transport layer to actually 
reflect a server component.


- Christopher


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29386/#review66382
-----------------------------------------------------------


On Dec. 31, 2014, 4:24 p.m., Josh Elser wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29386/
> -----------------------------------------------------------
> 
> (Updated Dec. 31, 2014, 4:24 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-2815
>     https://issues.apache.org/jira/browse/ACCUMULO-2815
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> ACCUMULO-2815 Initial support for Kerberos client authentication.
> 
> Leverage SASL transport provided by Thrift which can speak GSSAPI, which 
> Kerberos implements. Introduced...
> 
> * An Accumulo KerberosToken which is an AuthenticationToken to validate users.
> * Custom thrift processor and invocation handler to ensure server RPCs have a 
> valid KRB identity and Accumulo authentication.
> * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos 
> identities seamlessly.
> * New ClientConf variables to use SASL transport and pass Kerberos server 
> principal
> * Updated ClientOpts and Shell opts to transparently use a KerberosToken when 
> SASL is enabled (no extra client work).
> 
> I believe this is the "bare minimum" for Kerberos support. They are also 
> grossly lacking in unit and integration tests. I believe that I might have 
> somehow broken the client address string in the server (I saw log messages 
> with client: null, but I'm not sure if it's due to these changes or not). A 
> necessary limitation in the Thrift server used is that, like the SSL 
> transport, the SASL transport cannot presently be used with the 
> TFramedTransport, which means none of the [half]async thrift servers will 
> function with this -- we're stuck with the TThreadPoolServer.
> 
> Performed some contrived benchmarks on my laptop (while still using it 
> myself) to get at big-picture view of the performance impact against "normal" 
> operation and Kerberos alone. Each "run" was the duration to ingest 100M 
> records using continuous-ingest, timed with `time`, using 'real'.
> 
> THsHaServer (our default), 6 runs:
> 
> Avg: 10m7.273s (607.273s)
> Min: 9m43.395s
> Max: 10m52.715s
> 
> TThreadPoolServer (no SASL), 5 runs:
> 
> Avg: 11m16.254s (676.254s)
> Min: 10m30.987s
> Max: 12m24.192s
> 
> TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:
> 
> Avg: 13m17.187s (797.187s)
> Min: 10m52.997s
> Max: 16m0.975s
> 
> The general takeway is that there's about 15% performance degredation in its 
> initial state which is in the realm of what I expected (~10%).
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 
>   core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 
> 6fe61a5 
>   core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java 
> e75bec6 
>   core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java 
> f481cc3 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java
>  6dc846f 
>   
> core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java
>  5da803b 
>   
> core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java
>  PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f 
>   core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java 
> PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java 
> PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 
> 6eace77 
>   core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 
>   core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java 
> PRE-CREATION 
>   
> core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java
>  PRE-CREATION 
>   core/src/main/java/org/apache/accumulo/core/security/Credentials.java 
> 525a958 
>   core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 
>   
> core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java
>  PRE-CREATION 
>   
> core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 
> 40be70f 
>   
> core/src/test/java/org/apache/accumulo/core/rpc/SaslConnectionParamsTest.java 
> PRE-CREATION 
>   proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java 4b048eb 
>   
> server/base/src/main/java/org/apache/accumulo/server/AccumuloServerContext.java
>  09ae4f4 
>   server/base/src/main/java/org/apache/accumulo/server/init/Initialize.java 
> 046cfb5 
>   
> server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingInvocationHandler.java
>  PRE-CREATION 
>   
> server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingWrapper.java
>  PRE-CREATION 
>   server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java 
> 641c0bf 
>   
> server/base/src/main/java/org/apache/accumulo/server/rpc/ThriftServerType.java
>  PRE-CREATION 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java
>  5e81018 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/SecurityUtil.java
>  29e4939 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/SystemCredentials.java
>  a59d57c 
>   
> server/base/src/main/java/org/apache/accumulo/server/security/handler/KerberosAuthenticator.java
>  PRE-CREATION 
>   
> server/base/src/main/java/org/apache/accumulo/server/thrift/UGIAssumingProcessor.java
>  PRE-CREATION 
>   
> server/base/src/test/java/org/apache/accumulo/server/AccumuloServerContextTest.java
>  PRE-CREATION 
>   
> server/base/src/test/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingInvocationHandlerTest.java
>  PRE-CREATION 
>   
> server/base/src/test/java/org/apache/accumulo/server/security/SystemCredentialsTest.java
>  4202a7e 
>   server/gc/src/main/java/org/apache/accumulo/gc/SimpleGarbageCollector.java 
> 93a9a49 
>   
> server/gc/src/test/java/org/apache/accumulo/gc/GarbageCollectWriteAheadLogsTest.java
>  f98721f 
>   
> server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java
>  99558b8 
>   
> server/gc/src/test/java/org/apache/accumulo/gc/replication/CloseWriteAheadLogReferencesTest.java
>  cad1e01 
>   server/master/src/main/java/org/apache/accumulo/master/Master.java 12195fa 
>   server/tracer/src/main/java/org/apache/accumulo/tracer/TraceServer.java 
> 7e33300 
>   server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java 
> d5c1d2f 
>   shell/src/main/java/org/apache/accumulo/shell/Shell.java 58308ff 
>   shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java 8167ef8 
>   shell/src/test/java/org/apache/accumulo/shell/ShellConfigTest.java 0e72c8c 
>   shell/src/test/java/org/apache/accumulo/shell/ShellOptionsJCTest.java 
> PRE-CREATION 
>   test/src/main/java/org/apache/accumulo/test/functional/ZombieTServer.java 
> eb84533 
>   
> test/src/main/java/org/apache/accumulo/test/performance/thrift/NullTserver.java
>  2ebc2e3 
>   
> test/src/test/java/org/apache/accumulo/server/security/SystemCredentialsIT.java
>  fb71f5f 
> 
> Diff: https://reviews.apache.org/r/29386/diff/
> 
> 
> Testing
> -------
> 
> Ensure existing unit tests still function. Accumulo is functional and ran 
> continuous ingest multiple times using a client with only a Kerberos identity 
> (no user/password provided). Used MIT Kerberos with Apache Hadoop 2.6.0 and 
> Apache ZooKeeper 3.4.5.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Reply via email to