Re: Review Request 29176: ACCUMULO-3420 Get Visibility Metrics from PrintInfo

2015-01-06 Thread keith

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29176/#review66882
---



core/src/test/java/org/apache/accumulo/core/file/rfile/RFileMetricsTest.java
https://reviews.apache.org/r/29176/#comment110579

Is this copied from the RFile unit test?  If can the two test be modified 
to share this code?


- kturner


On Dec. 22, 2014, 3:25 p.m., Jenna Huston wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29176/
 ---
 
 (Updated Dec. 22, 2014, 3:25 p.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3420
 https://issues.apache.org/jira/browse/ACCUMULO-3420
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 Added an option to PrintInfo to get visibility metrics.  Prints the number of 
 times a visibilty is seen in each locality group.  Also shows how many blocks 
 in the locality group have his visibiltiy.
 
 
 Diffs
 -
 
   core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java 
 PRE-CREATION 
   core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java 
 43586dd 
   core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java 9dcb3a5 
   
 core/src/main/java/org/apache/accumulo/core/file/rfile/VisMetricsGatherer.java
  PRE-CREATION 
   
 core/src/test/java/org/apache/accumulo/core/file/rfile/RFileMetricsTest.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29176/diff/
 
 
 Testing
 ---
 
 Tested with a few RFiles that I made.
 
 
 Thanks,
 
 Jenna Huston
 




Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  server/base/src/main/java/org/apache/accumulo/server/security/handler/KerberosAuthenticator.java,
   line 41
  https://reviews.apache.org/r/29386/diff/4/?file=803164#file803164line41
 
  I don't think this should be so tightly coupled with the 
  ZKAuthenticator. It doesn't seem that this coupling is necessary to provide 
  authentication functionality.
  
  Have a different implementation of an Authenticator that hijacks the 
  data storage of another implementation, could be quite confusing to 
  administrators, especially if they need to swap out the implementation.
  
  See ACCUMULO-1300 and related issues for other options.
 
 Josh Elser wrote:
 I understand the intentions for ACCUMULO-1300; however, that's not 
 implemented. Being able to leverage the existing ZKAuthorizor was wonderful 
 so I really don't want to move away from how this fundamentally works.
 
 Shouldn't the Authenticator and Authorizor be fixed per instance 
 (allowing reset via `accumulo init --reset-security` or w/e)? What do you 
 actually want changed aside from documentation to make it clear to 
 administrators (I haven't written any documentation yet).
 
 Christopher Tubbs wrote:
 So, I pointed out in the conversation around ACCUMULO-259, that I would 
 prefer a pluggable security module that integrated all the security-related 
 functions, because they are so dependent on each other. However, they were 
 implemented as 3 separately configurable pluggable components. I would still 
 like to see them merged (which doesn't preclude making one from modular 
 components themselves).
 
 For now, I'm not really sure what it means for somebody to create a 
 user when the authenticator already validates them. That's why it's 
 confusing. Are we essentially making ZKAuthenticator Kerberos-aware, or are 
 we providing a separate Kerberos Authenticator? That's really what it boils 
 down to for me. It seems like we're doing a hybrid... but I think we should 
 solidify on one or the other.
 
 Josh Elser wrote:
 I view it as a Kerberos Authenticator. The implementation details of that 
 authenticator is that is happens to persist information into ZK to handle 
 things like permissions and authorizations, but a user doesn't have to know 
 that to use it. We could, hypothetically, swap out ZK for any other 
 persistent store for that information and it would continue to work.
 
 What would help alleviate your confusion? More javadocs? A different 
 class name?
 
 Christopher Tubbs wrote:
 Well, part is documentation. Part is behavior.
 
 If it's a separate authenticator, we should make sure that it does not 
 use the same storage area in such a tightly coupled way. It should not, for 
 instance, be setting things that look like passwords, and create local 
 user, list local users, and delete local users should be unsupported 
 client side operations, just as you've done with change password (since 
 users are managed within Kerberos). We already share storage between the 
 ZKAuthenticator and the corresponding permissions handler and authorizer, so 
 it's probably okay to share the directory identified with the principal. 
 However, it can seamlessly create this directory as needed.
 
 Mainly, I'm thinking the more we can separate the local user management 
 functions from the KerberosAuthenticator, the better, since users are managed 
 in Kerberos. Some of my concerns are with the current state of things, but 
 I'm hoping not to exacerbate the problems I see with the current 
 implementations. I don't want to make it harder to make improvements later. 
 I'd rather move towards a better design, then simply add features to the 
 existing ad-hoc design.
 
 Josh Elser wrote:
 bq. It should not, for instance, be setting things that look like 
 passwords, and create local user, list local users, and delete local 
 users should be unsupported client side operations, just as you've done with 
 change password (since users are managed within Kerberos)
 
 Thanks for pointing that out. That helps me understand where your 
 concerns are coming from. Perhaps it would be cleaner to just keep a 
 reference to the ZkAuthenticator as a delegate (instead of extending it). 
 This would give us a bit more obvious control over what's happening.
 
 bq. I'm thinking the more we can separate the local user management 
 functions from the KerberosAuthenticator, the better, since users are managed 
 in Kerberos. Some of my concerns are with the current state of things, but 
 I'm hoping not to exacerbate the problems I see with the current 
 implementations
 
 Agreed, those are very.. awkward, presently. Is making sure that we throw 
 exceptions from KerberosAuthenticator when we call these non-sensical methods 
 a sufficient change? Then, we can move to a better design later on like you 
 said?

What do 

Re: Review Request 29176: ACCUMULO-3420 Get Visibility Metrics from PrintInfo

2015-01-06 Thread Mike Drob


 On Dec. 18, 2014, 5:20 p.m., Christopher Tubbs wrote:
  core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java,
   line 34
  https://reviews.apache.org/r/29176/diff/1/?file=795044#file795044line34
 
  It'd be neat if there were a MetricsGatherer interface, which this 
  class implemented. That way, you could register other MetricsGatherers with 
  the RFile Reader.
  
  The interface should have javadocs which explain when each method is 
  expected to be called (e.g., how they are used by RFile).

You mean like 
https://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/core/SolrInfoMBean.html?


- Mike


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29176/#review65502
---


On Dec. 22, 2014, 3:25 p.m., Jenna Huston wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29176/
 ---
 
 (Updated Dec. 22, 2014, 3:25 p.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3420
 https://issues.apache.org/jira/browse/ACCUMULO-3420
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 Added an option to PrintInfo to get visibility metrics.  Prints the number of 
 times a visibilty is seen in each locality group.  Also shows how many blocks 
 in the locality group have his visibiltiy.
 
 
 Diffs
 -
 
   core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java 
 PRE-CREATION 
   core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java 
 43586dd 
   core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java 9dcb3a5 
   
 core/src/main/java/org/apache/accumulo/core/file/rfile/VisMetricsGatherer.java
  PRE-CREATION 
   
 core/src/test/java/org/apache/accumulo/core/file/rfile/RFileMetricsTest.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29176/diff/
 
 
 Testing
 ---
 
 Tested with a few RFiles that I made.
 
 
 Thanks,
 
 Jenna Huston
 




Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29176: ACCUMULO-3420 Get Visibility Metrics from PrintInfo

2015-01-06 Thread Christopher Tubbs


 On Dec. 18, 2014, 12:20 p.m., Christopher Tubbs wrote:
  core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java,
   line 34
  https://reviews.apache.org/r/29176/diff/1/?file=795044#file795044line34
 
  It'd be neat if there were a MetricsGatherer interface, which this 
  class implemented. That way, you could register other MetricsGatherers with 
  the RFile Reader.
  
  The interface should have javadocs which explain when each method is 
  expected to be called (e.g., how they are used by RFile).
 
 Mike Drob wrote:
 You mean like 
 https://lucene.apache.org/solr/4_10_2/solr-core/org/apache/solr/core/SolrInfoMBean.html?

I don't know what that is for. The updated patch does what I was thinking.


- Christopher


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29176/#review65502
---


On Dec. 22, 2014, 10:25 a.m., Jenna Huston wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29176/
 ---
 
 (Updated Dec. 22, 2014, 10:25 a.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3420
 https://issues.apache.org/jira/browse/ACCUMULO-3420
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 Added an option to PrintInfo to get visibility metrics.  Prints the number of 
 times a visibilty is seen in each locality group.  Also shows how many blocks 
 in the locality group have his visibiltiy.
 
 
 Diffs
 -
 
   core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java 
 PRE-CREATION 
   core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java 
 43586dd 
   core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java 9dcb3a5 
   
 core/src/main/java/org/apache/accumulo/core/file/rfile/VisMetricsGatherer.java
  PRE-CREATION 
   
 core/src/test/java/org/apache/accumulo/core/file/rfile/RFileMetricsTest.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29176/diff/
 
 
 Testing
 ---
 
 Tested with a few RFiles that I made.
 
 
 Thanks,
 
 Jenna Huston
 




Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java,
   lines 71-79
  https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line71
 
  Why is this unsupported?
 
 Josh Elser wrote:
 Again, see earlier discussion. I have no idea what this is actually 
 supposed to destroy (we have zero documentation on the matter that I found) 
 and it's already been changed to be a noop. I am just as confident that a 
 noop is correct as an unsupported operation is correct (I have no confidence).
 
 As far as destroying a UGI, I haven't found any parallel to the 
 Destroyable interface. I'm not sure if something exists that we should be 
 calling.
 
 Christopher Tubbs wrote:
 It's a way for objects to remove their internal state (preferably by 
 securely overwriting memory) so any credentials can be removed from memory, 
 to give the user control over their credentials. It is expected to be called 
 by the user.
 
 In the earlier implementation, setting the ugi to null would have been 
 appropriate. In the latest version, you can nullify the principal. It's not 
 terribly important, since the principal alone is not sensitive, but it would 
 be helpful to have reasonable behavior when the user calls these methods to 
 avoid things like I called destroy, but it won't destroy! or My 
 application verifies the token isn't destroyed before using it, but it seems 
 to always be destroyed!.
 
 Even if it was a truly no-op, we could add an internal destroyed flag 
 to give users reasonable behavior. Since we do have the principal, we can 
 just make that null instead of adding a new flag, and use that to check the 
 state of its destruction.
 
 Also, I think the other methods which assume it is not destroyed should 
 throw IllegalStateException explicitly if it is, in fact, destroyed. But, I'm 
 not even sure we do that in the other tokens, so it's probably fine to leave 
 that behavior undefined.
 
 Josh Elser wrote:
 WRT UserGroupInformation, I don't know how (or if I even can) destroy the 
 current login which is.. awkward to me. Generally speaking, this would be 
 great to get written down somewhere -- I'll try to fire up a ticket for that 
 so we don't have this discussion again in 6mos.

The semantics of destroy are not to destroy the current login. They are to 
destroy the current token, removing any user credentials from it. These 
semantics can be accomplished with a flag, or by setting the principal to null, 
without touching the Kerberos state that they represent.


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  core/src/main/java/org/apache/accumulo/core/conf/Property.java, lines 
  165-167
  https://reviews.apache.org/r/29386/diff/4/?file=803146#file803146line165
 
  I don't think we want to expose this. It's high risk. What is the 
  reason and how is it related to this change?
 
 Josh Elser wrote:
 Again, see the previous discussion on this subject. This is necessary to 
 actually perform an apples to apples comparison of the impact of Kerberos on 
 Accumulo. Comparing KRB/SASL with TThreadPoolServer against no KRB/SASL and 
 THsHaServer is an invalid benchmark to determine the overhead of KRB.
 
 Like I said earlier, `Experimental` is not really the appropriate 
 annotation here (Internal use would be more accurate), but it's what we have. 
 Do you have a recommendation on some other change in wording/annotation to 
 convey the intent?
 
 I'd like to not have to change the codebase any time I want to verify no 
 performance regressions in future versions (having to change code would 
 preclude any non-devs from running their own benchmarks too).
 
 Christopher Tubbs wrote:
 My main concern is that I'm not entirely confident we're not exposing 
 these to users in some way, still: see ACCUMULO-2460 and ACCUMULO-2401.
 
 Josh Elser wrote:
 It sounds to me like we should introduce a new annotation then to cover 
 things not meant for user consumption. Experimental is for maybe not stable 
 or entirely working user facing things. This property is (likely) only 
 relevant for developers. Any objections to addressing this by introduction of 
 a new annotation (not in this changeset)?

That's a simple enough solution. I still kind of object to the idea that 
experimental properties have default values at all (it's experimental, so it 
should be okay to set to empty value, and should be null/unused by default), 
which was the underlying problem between ACCUMULO-2460 and ACCUMULO-2401 (the 
default value was added and used as a proxy for what probably should have been 
a constant). If that underlying problem was addressed, then using Experimental 
here would be perfectly fine. However, it may still be useful to have separate, 
clear semantics, with an additional annotation.


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  

Re: Review Request 29230: ACCUMULO-3439 Add RegexGroupBalancer

2015-01-06 Thread Eric Newton

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29230/#review66915
---


Nit: whitespace
Nit: single-line statements w/out braces


server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110619

final



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110620

this(null);



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110621

final



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110622

final



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110623

final



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110624

final



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110625

@Override



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110626

@Override



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110627

@Override



server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
https://reviews.apache.org/r/29230/#comment110628

Could this class be made static by passing in the table ID?


- Eric Newton


On Dec. 19, 2014, 1:52 a.m., kturner wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29230/
 ---
 
 (Updated Dec. 19, 2014, 1:52 a.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3439
 https://issues.apache.org/jira/browse/ACCUMULO-3439
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 The patch contains a new balancer, test, and documentation.
 
 
 Diffs
 -
 
   docs/src/main/resources/examples/README.rgbalancer PRE-CREATION 
   
 server/base/src/main/java/org/apache/accumulo/server/master/balancer/GroupBalancer.java
  PRE-CREATION 
   
 server/base/src/main/java/org/apache/accumulo/server/master/balancer/RegexGroupBalancer.java
  PRE-CREATION 
   
 server/base/src/main/java/org/apache/accumulo/server/master/balancer/TableLoadBalancer.java
  5eae890 
   
 server/base/src/test/java/org/apache/accumulo/server/master/balancer/GroupBalancerTest.java
  PRE-CREATION 
   
 test/src/test/java/org/apache/accumulo/test/functional/RegexGroupBalanceIT.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29230/diff/
 
 
 Testing
 ---
 
 The new unit test and intergeration test added in the patch run.  
 
 
 Thanks,
 
 kturner
 




Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread keith

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29502/#review66908
---



test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java
https://reviews.apache.org/r/29502/#comment110613

Thinking a bit more about the batchScanner, `runTest()` could call another 
parameterized test that accepts a scanner.  This would make it easy to test 
scanner and batch scanner.  A reason to do this is if the batch scanner sets up 
iterators differently and does not pass auths in env, the test might catch it.

runTest(auths, shouldFail){
   runTest(scanner, auths, shouldFail);
   runTest(batchScanner, auths, shouldFail);
   batchScanner.close();
}


- kturner


On Jan. 6, 2015, 3:54 p.m., Corey Nolet wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29502/
 ---
 
 (Updated Jan. 6, 2015, 3:54 p.m.)
 
 
 Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and 
 kturner.
 
 
 Bugs: ACCUMULO-3458
 https://issues.apache.org/jira/browse/ACCUMULO-3458
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 ACCUMULO-3458 Propagating scan-time authorizations through the 
 IteratorEnvironment so that scan-time iterators can use them.
 
 
 Diffs
 -
 
   
 core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java
  4903656 
   core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a 
   core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 
 2552682 
   core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 
 666a8af 
   core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 
 9726266 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java
  2a79f05 
   
 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 
 72cb863 
   
 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java
  9e20cb1 
   
 core/src/test/java/org/apache/accumulo/core/client/impl/ScannerImplTest.java 
 be4d467 
   
 core/src/test/java/org/apache/accumulo/core/client/impl/TabletServerBatchReaderTest.java
  PRE-CREATION 
   
 core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java
  94da7b5 
   
 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java
  fa46360 
   
 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java
  4521e55 
   
 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
  4cebab7 
   
 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java
  4a45e99 
   
 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java
  a9801b0 
   
 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java
  bf35557 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java
  d1fece5 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java
  869cc33 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java
  fe4b16b 
   test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java 
 PRE-CREATION 
   test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29502/diff/
 
 
 Testing
 ---
 
 Wrote an integration test to verify that ScanDataSource is actually setting 
 the authorizations on the IteratorEnvironment
 
 
 Thanks,
 
 Corey Nolet
 




Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29176: ACCUMULO-3420 Get Visibility Metrics from PrintInfo

2015-01-06 Thread Christopher Tubbs

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29176/#review66906
---



core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java
https://reviews.apache.org/r/29176/#comment110609

Should be generic interface MetricsGathererT with a method T 
getMetrics(); to expose the metrics in ways other than printing.



core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java
https://reviews.apache.org/r/29176/#comment110603

The javadoc should describe when init is called (eg. when it is registered 
with the RFile Reader).



core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java
https://reviews.apache.org/r/29176/#comment110604

This javadoc is incorrect, based on its usage here. What appears to be 
passed in is the column family, not the locality group name.

new also seems to imply something is being created, when it simply 
indicates that a new one has been encountered. Maybe the start or begin 
instead of new would be more clear.

[I'm also entertaining the idea that this method could just take the first 
key (or key/value pair) from the new locality group in order to make fewer 
assumptions about how it is used, but I haven't fully convinced myself that's 
better than strictly defining this method to take the column family portion 
only.]



core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java
https://reviews.apache.org/r/29176/#comment110605

This could take a key and a value as two parameters, to enable gathering 
metrics of more things.



core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java
https://reviews.apache.org/r/29176/#comment110606

could be called startBlock or beginBlock instead of newBlock to avoid 
the implication that something new is being created.



core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java
https://reviews.apache.org/r/29176/#comment110607

javadoc should describe the output format, or rely on proposed getMetrics() 
method and defer the printing to the PrintInfo command.



core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java
https://reviews.apache.org/r/29176/#comment110608

description should specify whether this requires -v or implies -v



core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java
https://reviews.apache.org/r/29176/#comment110610

toString isn't safe here; probably better to just pass the column family 
directly, or the whole key and let the gatherer decide how to use it.



core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java
https://reviews.apache.org/r/29176/#comment110612

Needs javadoc. Should specify that you can only register one (it will 
clobber any previously set one), and that you should do so before iterating. 
This could be enforced, by checking state when this method is called, but 
minimally, the javadocs should explain.


- Christopher Tubbs


On Dec. 22, 2014, 10:25 a.m., Jenna Huston wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29176/
 ---
 
 (Updated Dec. 22, 2014, 10:25 a.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-3420
 https://issues.apache.org/jira/browse/ACCUMULO-3420
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 Added an option to PrintInfo to get visibility metrics.  Prints the number of 
 times a visibilty is seen in each locality group.  Also shows how many blocks 
 in the locality group have his visibiltiy.
 
 
 Diffs
 -
 
   core/src/main/java/org/apache/accumulo/core/file/rfile/MetricsGatherer.java 
 PRE-CREATION 
   core/src/main/java/org/apache/accumulo/core/file/rfile/PrintInfo.java 
 43586dd 
   core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java 9dcb3a5 
   
 core/src/main/java/org/apache/accumulo/core/file/rfile/VisMetricsGatherer.java
  PRE-CREATION 
   
 core/src/test/java/org/apache/accumulo/core/file/rfile/RFileMetricsTest.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29176/diff/
 
 
 Testing
 ---
 
 Tested with a few RFiles that I made.
 
 
 Thanks,
 
 Jenna Huston
 




Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).

bq. if they aren't (minimally) checking the realm when checking 
permissions/access on an authenticated principal

Do you mean the instance instead of the realm? In the case of a single realm, 
the KDC is going to verify the correct realm. Assuming you meant the instance 
though (the optional /hostname), it's typical that a user has the ability to 
use their credentials anywhere. Thus, you typically see principals without 
instances for actual users. As far as I understand it, that's what HDFS tends 
to follow and what I tried to as well. Accumulo doesn't care where you come 
from, just what your name is and that you have valid credentials. I don't think 
we're being substantially less secure by not including the instance in the 
Accumulo principal.


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  server/base/src/main/java/org/apache/accumulo/server/security/handler/KerberosAuthenticator.java,
   line 41
  https://reviews.apache.org/r/29386/diff/4/?file=803164#file803164line41
 
  I don't think this should be so tightly coupled with the 
  ZKAuthenticator. It doesn't seem that this coupling is necessary to provide 
  authentication functionality.
  
  Have 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29386/
---

(Updated Jan. 6, 2015, 11:14 p.m.)


Review request for accumulo.


Changes
---

Big set of changes with not all of the stuff I wanted to do yet.

* Found a really bad bug where ThriftTransportKey didn't respect the SASL 
principal which allowed users to reuse other user's cached transports in the 
same JVM (e.g. switched users)
* Introduced KerberosIT which users MiniKdc to do basic Kerberos testing in the 
build (always runs with Kerberos) which will help greatly with catching 
regressions.
* Introduced a `useKrbForIT` which will try to use MiniKdc for 
AccumuloClusterITs. Test coverage is no where close to 100% and I don't intend 
to do so for a first pass.

Documentation and system token changes still incoming...


Bugs: ACCUMULO-2815
https://issues.apache.org/jira/browse/ACCUMULO-2815


Repository: accumulo


Description
---

ACCUMULO-2815 Initial support for Kerberos client authentication.

Leverage SASL transport provided by Thrift which can speak GSSAPI, which 
Kerberos implements. Introduced...

* An Accumulo KerberosToken which is an AuthenticationToken to validate users.
* Custom thrift processor and invocation handler to ensure server RPCs have a 
valid KRB identity and Accumulo authentication.
* A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos 
identities seamlessly.
* New ClientConf variables to use SASL transport and pass Kerberos server 
principal
* Updated ClientOpts and Shell opts to transparently use a KerberosToken when 
SASL is enabled (no extra client work).

I believe this is the bare minimum for Kerberos support. They are also 
grossly lacking in unit and integration tests. I believe that I might have 
somehow broken the client address string in the server (I saw log messages with 
client: null, but I'm not sure if it's due to these changes or not). A 
necessary limitation in the Thrift server used is that, like the SSL transport, 
the SASL transport cannot presently be used with the TFramedTransport, which 
means none of the [half]async thrift servers will function with this -- we're 
stuck with the TThreadPoolServer.

Performed some contrived benchmarks on my laptop (while still using it myself) 
to get at big-picture view of the performance impact against normal operation 
and Kerberos alone. Each run was the duration to ingest 100M records using 
continuous-ingest, timed with `time`, using 'real'.

THsHaServer (our default), 6 runs:

Avg: 10m7.273s (607.273s)
Min: 9m43.395s
Max: 10m52.715s

TThreadPoolServer (no SASL), 5 runs:

Avg: 11m16.254s (676.254s)
Min: 10m30.987s
Max: 12m24.192s

TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:

Avg: 13m17.187s (797.187s)
Min: 10m52.997s
Max: 16m0.975s

The general takeway is that there's about 15% performance degredation in its 
initial state which is in the realm of what I expected (~10%).


Diffs (updated)
-

  core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 
  core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 
6fe61a5 
  core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java 
e75bec6 
  core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java 
f481cc3 
  core/src/main/java/org/apache/accumulo/core/client/impl/MasterClient.java 
a9ad8a1 
  
core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 
6dc846f 
  
core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java
 5da803b 
  
core/src/main/java/org/apache/accumulo/core/client/security/tokens/AbstractKerberosToken.java
 PRE-CREATION 
  
core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java
 PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f 
  core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 
6eace77 
  core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 
  core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java 
PRE-CREATION 
  
core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java
 PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 
  core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 
  
core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java 
PRE-CREATION 
  
core/src/test/java/org/apache/accumulo/core/client/impl/ThriftTransportKeyTest.java
 PRE-CREATION 
  core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 
40be70f 
  

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 7:39 p.m., kturner wrote:
  core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java,
   line 46
  https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line46
 
  This is adding UserGroupInformation to Accumulo API.  Is that a stable 
  hadoop API?  Does it have to be in API, why not just principal?
 
 Josh Elser wrote:
 UGI is marked as LimitedPrivate and Evolving, but I have no qualms 
 against including it into our API. The problem with accepting a principal 
 alone, we would then have to duplicate the login capabilities that UGI 
 already provides (e.g performing a login via password or keytab). By 
 accepting a UGI directly, we don't have to own this logic. It's my opinion 
 that it's easiest to just use UGI directly, but could see an argument for 
 transparently using it behind the scenes if you think that would be better.
 
 kturner wrote:
  I have no qualms against including it into our API
 
 Why do you think its ok to ignore the API marking?  Shouldnt we honor 
 whats marked on the assumption that someone in Hadoop land will think they 
 are free to make breaking changes based on whats makred?
 
  but could see an argument for transparently using it behind the 
 scenes if you think that would be better.
 
 From an implementation perspective, I don't know whats best.  I don't 
 know what the benefits of using UGI are.  What are the problems with not 
 using it?  Would not using it require a complex facade class?  Would not 
 using it cause interoperability problems with others using Kerberos in the 
 hadoop ecosystem?  
 
 If this object is really useful to users of Hadoop and we want to use it 
 in our API, I would advocate for opening a hadoop issue to change the 
 stability guarantees.  I am uncertain if we should use it before this change 
 is made, but thats because I don't fully understand the details.
 
 Josh Elser wrote:
 bq. Why do you think its ok to ignore the API marking? Shouldnt we honor 
 whats marked on the assumption that someone in Hadoop land will think they 
 are free to make breaking changes based on whats makred?
 
 The only difference between Stable and Evolving is that breaking changes 
 could happen at minor releases instead of just major releases (akin to 
 semver). So, just to make a point, even using only Stable classes will result 
 in us having to worry about breaking changes.
 
 It's possible that we could use a very thin shim around the most often 
 used methods of UGI, but I'm worried that would be insufficient. For example, 
 most people use `UserGroupInformation.loginUserFromKeytab` or they are have a 
 cached ticket (from `kinit`'ing on the command line before invoking the 
 application) which we can get access to via 
 `UserGroupInformation.getCurrentUser`. The other approach is to use 
 `UserGroupInformation.createProxyUser` which accepts a principal and 
 proxies another user on top of another UGI -- this is very useful for us as 
 it eliminates the need to do Authorization intersection as some server-user 
 to maintain application security. The server is running an Accumulo query for 
 the user; however Accumulo sees the user, not the server.
 
 Back to the point though: this actual assertion is a quick fail to make 
 sure that the user trying to use Kerberos is actually logged in (catch it and 
 fail before we actually try to send an RPC). We could push this farther down 
 into the codebase -- inside ConnectorImpl maybe? -- which would even moreso 
 make this a shell AuthenticationToken. To clarify, your biggest worry is 
 the UGI argument in the constructor and not using it in the implementation, 
 right?
 
 kturner wrote:
  To clarify, your biggest worry is the UGI argument in the constructor 
 and not using it in the implementation, right?
 
 Correct.  This patch adds KerberosToken to the public API w/ a public 
 method using UGI.  I would be much less concerned if UGI were only in 
 implementation of KerberosToken as that gives us a lot more wiggle room.
 
 kturner wrote:
  The only difference between Stable and Evolving is that breaking 
 changes could happen at minor releases instead of just major releases (akin 
 to semver). 
 
 Will Hadoop deprecate before changing in a minor release? Trying to think 
 through case where a minor hadoop release moves UGI.  If Hadoop does not 
 deprecate and just moves it, we would be up the creek w/o a paddle.  W/o 
 deprecation we would be forced to deprecated and introduce a facade anyway 
 because we can depend on two hadoop releases simultaneously.
 
  1. Accumulo 1.7.0 release API method w/ UGI in signature, lets call this 
 method foo1(UGI_A).
  2. Hadoop release 2.9.0 which moves UGI_A to UGI_B AND leaves UGI_A as 
 deprecated.
  3. Accumulo must release 1.8.0 w/ foo1(UGI_A) deprecated and introduce 
 foo1(UGI_B).

To bring some closure 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 5:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Christopher Tubbs


 On Dec. 30, 2014, 2:39 p.m., kturner wrote:
  core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java,
   line 46
  https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line46
 
  This is adding UserGroupInformation to Accumulo API.  Is that a stable 
  hadoop API?  Does it have to be in API, why not just principal?
 
 Josh Elser wrote:
 UGI is marked as LimitedPrivate and Evolving, but I have no qualms 
 against including it into our API. The problem with accepting a principal 
 alone, we would then have to duplicate the login capabilities that UGI 
 already provides (e.g performing a login via password or keytab). By 
 accepting a UGI directly, we don't have to own this logic. It's my opinion 
 that it's easiest to just use UGI directly, but could see an argument for 
 transparently using it behind the scenes if you think that would be better.
 
 kturner wrote:
  I have no qualms against including it into our API
 
 Why do you think its ok to ignore the API marking?  Shouldnt we honor 
 whats marked on the assumption that someone in Hadoop land will think they 
 are free to make breaking changes based on whats makred?
 
  but could see an argument for transparently using it behind the 
 scenes if you think that would be better.
 
 From an implementation perspective, I don't know whats best.  I don't 
 know what the benefits of using UGI are.  What are the problems with not 
 using it?  Would not using it require a complex facade class?  Would not 
 using it cause interoperability problems with others using Kerberos in the 
 hadoop ecosystem?  
 
 If this object is really useful to users of Hadoop and we want to use it 
 in our API, I would advocate for opening a hadoop issue to change the 
 stability guarantees.  I am uncertain if we should use it before this change 
 is made, but thats because I don't fully understand the details.
 
 Josh Elser wrote:
 bq. Why do you think its ok to ignore the API marking? Shouldnt we honor 
 whats marked on the assumption that someone in Hadoop land will think they 
 are free to make breaking changes based on whats makred?
 
 The only difference between Stable and Evolving is that breaking changes 
 could happen at minor releases instead of just major releases (akin to 
 semver). So, just to make a point, even using only Stable classes will result 
 in us having to worry about breaking changes.
 
 It's possible that we could use a very thin shim around the most often 
 used methods of UGI, but I'm worried that would be insufficient. For example, 
 most people use `UserGroupInformation.loginUserFromKeytab` or they are have a 
 cached ticket (from `kinit`'ing on the command line before invoking the 
 application) which we can get access to via 
 `UserGroupInformation.getCurrentUser`. The other approach is to use 
 `UserGroupInformation.createProxyUser` which accepts a principal and 
 proxies another user on top of another UGI -- this is very useful for us as 
 it eliminates the need to do Authorization intersection as some server-user 
 to maintain application security. The server is running an Accumulo query for 
 the user; however Accumulo sees the user, not the server.
 
 Back to the point though: this actual assertion is a quick fail to make 
 sure that the user trying to use Kerberos is actually logged in (catch it and 
 fail before we actually try to send an RPC). We could push this farther down 
 into the codebase -- inside ConnectorImpl maybe? -- which would even moreso 
 make this a shell AuthenticationToken. To clarify, your biggest worry is 
 the UGI argument in the constructor and not using it in the implementation, 
 right?
 
 kturner wrote:
  To clarify, your biggest worry is the UGI argument in the constructor 
 and not using it in the implementation, right?
 
 Correct.  This patch adds KerberosToken to the public API w/ a public 
 method using UGI.  I would be much less concerned if UGI were only in 
 implementation of KerberosToken as that gives us a lot more wiggle room.
 
 kturner wrote:
  The only difference between Stable and Evolving is that breaking 
 changes could happen at minor releases instead of just major releases (akin 
 to semver). 
 
 Will Hadoop deprecate before changing in a minor release? Trying to think 
 through case where a minor hadoop release moves UGI.  If Hadoop does not 
 deprecate and just moves it, we would be up the creek w/o a paddle.  W/o 
 deprecation we would be forced to deprecated and introduce a facade anyway 
 because we can depend on two hadoop releases simultaneously.
 
  1. Accumulo 1.7.0 release API method w/ UGI in signature, lets call this 
 method foo1(UGI_A).
  2. Hadoop release 2.9.0 which moves UGI_A to UGI_B AND leaves UGI_A as 
 deprecated.
  3. Accumulo must release 1.8.0 w/ foo1(UGI_A) deprecated and introduce 
 foo1(UGI_B).
 
 Josh Elser wrote:

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Corey Nolet


 On Jan. 5, 2015, 9:09 p.m., Christopher Tubbs wrote:
  core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java,
   line 63
  https://reviews.apache.org/r/29502/diff/2/?file=804705#file804705line63
 
  Should this be Authorizations.EMPTY? Or should it have a default 
  implementation on WrappingIterator which calls source.getAuthorizations()?
 
 Christopher Tubbs wrote:
 make that `getSource().getAuthorizations()`

Specific to this test I returned null because all the other getters (other than 
what was being explicitly tested) were returning null. Were you thinking 
WrappingIterator should also provide a getAuthorizations() method?


 On Jan. 5, 2015, 9:09 p.m., Christopher Tubbs wrote:
  server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java,
   line 46
  https://reviews.apache.org/r/29502/diff/2/?file=804711#file804711line46
 
  I wonder if there's a better way to provide environment options, like 
  this and others, at specific scopes. Maybe use some dependency injection, 
  with annotations, like Servlet @Context or JUnit @Rule: @ScanContext 
  Authorizations auths; (throw error if type is not appropriate for context 
  during injection).

This feature would be pretty neat. Were you thinking this would extend past 
just the IteratorEnvironment into other places? Any other fields you can think 
of that would benefit from this change other than Authorizations?


- Corey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29502/#review66725
---


On Dec. 31, 2014, 3:40 p.m., Corey Nolet wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29502/
 ---
 
 (Updated Dec. 31, 2014, 3:40 p.m.)
 
 
 Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and 
 kturner.
 
 
 Bugs: ACCUMULO-3458
 https://issues.apache.org/jira/browse/ACCUMULO-3458
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 ACCUMULO-3458 Propagating scan-time authorizations through the 
 IteratorEnvironment so that scan-time iterators can use them.
 
 
 Diffs
 -
 
   
 core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java
  4903656 
   core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a 
   core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 
 2552682 
   core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 
 666a8af 
   core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 
 9726266 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java
  2a79f05 
   
 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 
 72cb863 
   
 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java
  9e20cb1 
   
 core/src/main/java/org/apache/accumulo/core/iterators/system/VisibilityFilter.java
  15c33fa 
   
 core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java
  94da7b5 
   
 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java
  fa46360 
   
 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java
  4521e55 
   
 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
  4cebab7 
   
 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java
  4a45e99 
   
 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java
  a9801b0 
   
 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java
  bf35557 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java
  d1fece5 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java
  869cc33 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java
  fe4b16b 
   test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java 
 PRE-CREATION 
   test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29502/diff/
 
 
 Testing
 ---
 
 Wrote an integration test to verify that ScanDataSource is actually setting 
 the authorizations on the IteratorEnvironment
 
 
 Thanks,
 
 Corey Nolet
 




Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Corey Nolet

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29502/
---

(Updated Jan. 6, 2015, 3:44 p.m.)


Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and 
kturner.


Changes
---

Fixed based on feedback from Christopher and Keith. Noticed some extra 
formatting removing whitespace in some places.


Bugs: ACCUMULO-3458
https://issues.apache.org/jira/browse/ACCUMULO-3458


Repository: accumulo


Description
---

ACCUMULO-3458 Propagating scan-time authorizations through the 
IteratorEnvironment so that scan-time iterators can use them.


Diffs (updated)
-

  core/src/main/java/org/apache/accumulo/core/client/BatchDeleter.java 2bfc347 
  
core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java
 4903656 
  core/src/main/java/org/apache/accumulo/core/client/Scanner.java 112179e 
  core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a 
  core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 
2552682 
  core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 
666a8af 
  core/src/main/java/org/apache/accumulo/core/client/impl/ScannerIterator.java 
1e0ac99 
  core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 
9726266 
  
core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java
 2a79f05 
  core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 
72cb863 
  
core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java 
9e20cb1 
  core/src/main/java/org/apache/accumulo/core/iterators/WrappingIterator.java 
060fa76 
  
core/src/main/java/org/apache/accumulo/core/iterators/system/VisibilityFilter.java
 15c33fa 
  core/src/test/java/org/apache/accumulo/core/client/impl/ScannerImplTest.java 
be4d467 
  
core/src/test/java/org/apache/accumulo/core/client/impl/TabletServerBatchReaderTest.java
 PRE-CREATION 
  
core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java
 94da7b5 
  
core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java
 fa46360 
  
core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java
 4521e55 
  
core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
 4cebab7 
  
server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java
 4a45e99 
  
server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java
 a9801b0 
  
server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java
 bf35557 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java
 d1fece5 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java 
869cc33 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java
 fe4b16b 
  test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java 
PRE-CREATION 
  test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java PRE-CREATION 

Diff: https://reviews.apache.org/r/29502/diff/


Testing
---

Wrote an integration test to verify that ScanDataSource is actually setting the 
authorizations on the IteratorEnvironment


Thanks,

Corey Nolet



Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Corey Nolet

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29502/
---

(Updated Jan. 6, 2015, 3:54 p.m.)


Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and 
kturner.


Changes
---

Removing files which were formatted but not changed in any other way to augment 
the feature in the commit.


Bugs: ACCUMULO-3458
https://issues.apache.org/jira/browse/ACCUMULO-3458


Repository: accumulo


Description
---

ACCUMULO-3458 Propagating scan-time authorizations through the 
IteratorEnvironment so that scan-time iterators can use them.


Diffs (updated)
-

  
core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java
 4903656 
  core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a 
  core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 
2552682 
  core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 
666a8af 
  core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 
9726266 
  
core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java
 2a79f05 
  core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 
72cb863 
  
core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java 
9e20cb1 
  core/src/test/java/org/apache/accumulo/core/client/impl/ScannerImplTest.java 
be4d467 
  
core/src/test/java/org/apache/accumulo/core/client/impl/TabletServerBatchReaderTest.java
 PRE-CREATION 
  
core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java
 94da7b5 
  
core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java
 fa46360 
  
core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java
 4521e55 
  
core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
 4cebab7 
  
server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java
 4a45e99 
  
server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java
 a9801b0 
  
server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java
 bf35557 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java
 d1fece5 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java 
869cc33 
  
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java
 fe4b16b 
  test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java 
PRE-CREATION 
  test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java PRE-CREATION 

Diff: https://reviews.apache.org/r/29502/diff/


Testing
---

Wrote an integration test to verify that ScanDataSource is actually setting the 
authorizations on the IteratorEnvironment


Thanks,

Corey Nolet



Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  core/src/main/java/org/apache/accumulo/core/conf/Property.java, lines 
  165-167
  https://reviews.apache.org/r/29386/diff/4/?file=803146#file803146line165
 
  I don't think we want to expose this. It's high risk. What is the 
  reason and how is it related to this change?
 
 Josh Elser wrote:
 Again, see the previous discussion on this subject. This is necessary to 
 actually perform an apples to apples comparison of the impact of Kerberos on 
 Accumulo. Comparing KRB/SASL with TThreadPoolServer against no KRB/SASL and 
 THsHaServer is an invalid benchmark to determine the overhead of KRB.
 
 Like I said earlier, `Experimental` is not really the appropriate 
 annotation here (Internal use would be more accurate), but it's what we have. 
 Do you have a recommendation on some other change in wording/annotation to 
 convey the intent?
 
 I'd like to not have to change the codebase any time I want to verify no 
 performance regressions in future versions (having to change code would 
 preclude any non-devs from running their own benchmarks too).
 
 Christopher Tubbs wrote:
 My main concern is that I'm not entirely confident we're not exposing 
 these to users in some way, still: see ACCUMULO-2460 and ACCUMULO-2401.
 
 Josh Elser wrote:
 It sounds to me like we should introduce a new annotation then to cover 
 things not meant for user consumption. Experimental is for maybe not stable 
 or entirely working user facing things. This property is (likely) only 
 relevant for developers. Any objections to addressing this by introduction of 
 a new annotation (not in this changeset)?
 
 Christopher Tubbs wrote:
 That's a simple enough solution. I still kind of object to the idea that 
 experimental properties have default values at all (it's experimental, so 
 it should be okay to set to empty value, and should be null/unused by 
 default), which was the underlying problem between ACCUMULO-2460 and 
 ACCUMULO-2401 (the default value was added and used as a proxy for what 
 probably should have been a constant). If that underlying problem was 
 addressed, then using Experimental here would be perfectly fine. However, it 
 may still be useful to have separate, clear semantics, with an additional 
 annotation.

Set a default of  and updated the get() on the enum to default to hsha.


- Josh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29386/#review66382
---


On Jan. 7, 2015, 5:43 a.m., Josh Elser wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29386/
 ---
 
 (Updated Jan. 7, 2015, 5:43 a.m.)
 
 
 Review request for accumulo.
 
 
 Bugs: ACCUMULO-2815
 https://issues.apache.org/jira/browse/ACCUMULO-2815
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 ACCUMULO-2815 Initial support for Kerberos client authentication.
 
 Leverage SASL transport provided by Thrift which can speak GSSAPI, which 
 Kerberos implements. Introduced...
 
 * An Accumulo KerberosToken which is an AuthenticationToken to validate users.
 * Custom thrift processor and invocation handler to ensure server RPCs have a 
 valid KRB identity and Accumulo authentication.
 * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos 
 identities seamlessly.
 * New ClientConf variables to use SASL transport and pass Kerberos server 
 principal
 * Updated ClientOpts and Shell opts to transparently use a KerberosToken when 
 SASL is enabled (no extra client work).
 
 I believe this is the bare minimum for Kerberos support. They are also 
 grossly lacking in unit and integration tests. I believe that I might have 
 somehow broken the client address string in the server (I saw log messages 
 with client: null, but I'm not sure if it's due to these changes or not). A 
 necessary limitation in the Thrift server used is that, like the SSL 
 transport, the SASL transport cannot presently be used with the 
 TFramedTransport, which means none of the [half]async thrift servers will 
 function with this -- we're stuck with the TThreadPoolServer.
 
 Performed some contrived benchmarks on my laptop (while still using it 
 myself) to get at big-picture view of the performance impact against normal 
 operation and Kerberos alone. Each run was the duration to ingest 100M 
 records using continuous-ingest, timed with `time`, using 'real'.
 
 THsHaServer (our default), 6 runs:
 
 Avg: 10m7.273s (607.273s)
 Min: 9m43.395s
 Max: 10m52.715s
 
 TThreadPoolServer (no SASL), 5 runs:
 
 Avg: 11m16.254s (676.254s)
 Min: 10m30.987s
 Max: 12m24.192s
 
 TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:
 
 Avg: 13m17.187s (797.187s)
 Min: 

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29386/
---

(Updated Jan. 7, 2015, 5:43 a.m.)


Review request for accumulo.


Changes
---

Eliminates SystemKerberosToken (and then AbstractKerberosToken too). Fixed some 
warnings around config keys for kerberos configuration properties. Ensure that 
system user doesn't get an account. Fix some classes called during 
start-all.sh or stop-all.sh to not require valid krb credentials set up by the 
user (use the server login).


Bugs: ACCUMULO-2815
https://issues.apache.org/jira/browse/ACCUMULO-2815


Repository: accumulo


Description
---

ACCUMULO-2815 Initial support for Kerberos client authentication.

Leverage SASL transport provided by Thrift which can speak GSSAPI, which 
Kerberos implements. Introduced...

* An Accumulo KerberosToken which is an AuthenticationToken to validate users.
* Custom thrift processor and invocation handler to ensure server RPCs have a 
valid KRB identity and Accumulo authentication.
* A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos 
identities seamlessly.
* New ClientConf variables to use SASL transport and pass Kerberos server 
principal
* Updated ClientOpts and Shell opts to transparently use a KerberosToken when 
SASL is enabled (no extra client work).

I believe this is the bare minimum for Kerberos support. They are also 
grossly lacking in unit and integration tests. I believe that I might have 
somehow broken the client address string in the server (I saw log messages with 
client: null, but I'm not sure if it's due to these changes or not). A 
necessary limitation in the Thrift server used is that, like the SSL transport, 
the SASL transport cannot presently be used with the TFramedTransport, which 
means none of the [half]async thrift servers will function with this -- we're 
stuck with the TThreadPoolServer.

Performed some contrived benchmarks on my laptop (while still using it myself) 
to get at big-picture view of the performance impact against normal operation 
and Kerberos alone. Each run was the duration to ingest 100M records using 
continuous-ingest, timed with `time`, using 'real'.

THsHaServer (our default), 6 runs:

Avg: 10m7.273s (607.273s)
Min: 9m43.395s
Max: 10m52.715s

TThreadPoolServer (no SASL), 5 runs:

Avg: 11m16.254s (676.254s)
Min: 10m30.987s
Max: 12m24.192s

TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:

Avg: 13m17.187s (797.187s)
Min: 10m52.997s
Max: 16m0.975s

The general takeway is that there's about 15% performance degredation in its 
initial state which is in the realm of what I expected (~10%).


Diffs (updated)
-

  core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 
  core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 
6fe61a5 
  core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java 
e75bec6 
  core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java 
f481cc3 
  core/src/main/java/org/apache/accumulo/core/client/impl/MasterClient.java 
a9ad8a1 
  
core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 
6dc846f 
  
core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java
 5da803b 
  
core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java
 PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f 
  core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 
6eace77 
  core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 
  core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java 
PRE-CREATION 
  
core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java
 PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 
  core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 
  
core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java 
PRE-CREATION 
  
core/src/test/java/org/apache/accumulo/core/client/impl/ThriftTransportKeyTest.java
 PRE-CREATION 
  core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 
40be70f 
  core/src/test/java/org/apache/accumulo/core/rpc/SaslConnectionParamsTest.java 
PRE-CREATION 
  
minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloClusterImpl.java
 27d6b19 
  
minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloConfigImpl.java
 26c23ed 
  pom.xml ae188a0 
  proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java 4b048eb 
  

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29386/
---

(Updated Jan. 7, 2015, 6:16 a.m.)


Review request for accumulo.


Changes
---

Updated default value for property controlling thrift server instantiated.


Bugs: ACCUMULO-2815
https://issues.apache.org/jira/browse/ACCUMULO-2815


Repository: accumulo


Description
---

ACCUMULO-2815 Initial support for Kerberos client authentication.

Leverage SASL transport provided by Thrift which can speak GSSAPI, which 
Kerberos implements. Introduced...

* An Accumulo KerberosToken which is an AuthenticationToken to validate users.
* Custom thrift processor and invocation handler to ensure server RPCs have a 
valid KRB identity and Accumulo authentication.
* A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos 
identities seamlessly.
* New ClientConf variables to use SASL transport and pass Kerberos server 
principal
* Updated ClientOpts and Shell opts to transparently use a KerberosToken when 
SASL is enabled (no extra client work).

I believe this is the bare minimum for Kerberos support. They are also 
grossly lacking in unit and integration tests. I believe that I might have 
somehow broken the client address string in the server (I saw log messages with 
client: null, but I'm not sure if it's due to these changes or not). A 
necessary limitation in the Thrift server used is that, like the SSL transport, 
the SASL transport cannot presently be used with the TFramedTransport, which 
means none of the [half]async thrift servers will function with this -- we're 
stuck with the TThreadPoolServer.

Performed some contrived benchmarks on my laptop (while still using it myself) 
to get at big-picture view of the performance impact against normal operation 
and Kerberos alone. Each run was the duration to ingest 100M records using 
continuous-ingest, timed with `time`, using 'real'.

THsHaServer (our default), 6 runs:

Avg: 10m7.273s (607.273s)
Min: 9m43.395s
Max: 10m52.715s

TThreadPoolServer (no SASL), 5 runs:

Avg: 11m16.254s (676.254s)
Min: 10m30.987s
Max: 12m24.192s

TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs:

Avg: 13m17.187s (797.187s)
Min: 10m52.997s
Max: 16m0.975s

The general takeway is that there's about 15% performance degredation in its 
initial state which is in the realm of what I expected (~10%).


Diffs (updated)
-

  core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 
  core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 
6fe61a5 
  core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java 
e75bec6 
  core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java 
f481cc3 
  core/src/main/java/org/apache/accumulo/core/client/impl/MasterClient.java 
a9ad8a1 
  
core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 
6dc846f 
  
core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java
 5da803b 
  
core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java
 PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f 
  core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java 
PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 
6eace77 
  core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 
  core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java 
PRE-CREATION 
  
core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java
 PRE-CREATION 
  core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 
  core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 
  
core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java 
PRE-CREATION 
  
core/src/test/java/org/apache/accumulo/core/client/impl/ThriftTransportKeyTest.java
 PRE-CREATION 
  core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 
40be70f 
  core/src/test/java/org/apache/accumulo/core/rpc/SaslConnectionParamsTest.java 
PRE-CREATION 
  
minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloClusterImpl.java
 27d6b19 
  
minicluster/src/main/java/org/apache/accumulo/minicluster/impl/MiniAccumuloConfigImpl.java
 26c23ed 
  pom.xml ae188a0 
  proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java 4b048eb 
  
server/base/src/main/java/org/apache/accumulo/server/AccumuloServerContext.java 
09ae4f4 
  server/base/src/main/java/org/apache/accumulo/server/init/Initialize.java 
046cfb5 
  
server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingInvocationHandler.java
 PRE-CREATION 
  

Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos

2015-01-06 Thread Josh Elser


 On Dec. 30, 2014, 10:46 p.m., Christopher Tubbs wrote:
  shell/src/main/java/org/apache/accumulo/shell/ShellOptionsJC.java, line 210
  https://reviews.apache.org/r/29386/diff/4/?file=803175#file803175line210
 
  Why is username the short user name? Is that unique in Kerberos? If 
  not, the long version should be used everywhere instead. Otherwise, one 
  user can appear to be another in logs, etc.
  
  If getShortUserName is not unique, it should avoided everywhere.
 
 Josh Elser wrote:
 Check out: 
 http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html
 
 Kerberos principals are of the form: primary/instance@realm. Kerberos 
 principals are typically categorized as users and services. A user is not 
 qualified to a single instance (a host) and represent authentication across 
 the realm. For example, els...@example.com means that I can roam. 
 Conversely, a service is typically fixed to a specific host. For example, 
 accumulo/node1.example@example.com means that there is a process, logged 
 in as 'accumulo' on the host 'node1.example.com'. That service can't be run 
 on any other host. Now, an important note if someone actually creates a 
 principal accum...@example.com this is unique with respect to any other 
 accumulo/`host`@EXAMPLE.COM principal. I'm not sure if we need to do 
 anything else other than convention of kerberos principals, or if we should 
 be including the instance in our username when present.
 
 This kind of ties back into the SystemCredentials discussion again.
 
 Christopher Tubbs wrote:
 Okay, so a smart configuration would make shortnames unique. However, 
 UserGroupInformation returns only the `primary` for the short name. This 
 means that user names will have to be unique across realms and instances. 
 Right now, you are storing permissions using the short name. So, any user 
 with the same primary, will be able to masquerade as any other user with the 
 same primary from a different instance and/or realm, and be able to user 
 their permissions and authorizations. That's the problem with the shortname 
 here. That's very unexpected.
 
 Josh Elser wrote:
 Bingo. If you look at how HDFS does their configuration, this is the same 
 convention. The lack of documentation from me leaves something to be desired 
 here, and I apologize for that.
 
 To save you looking at HDFS (if you care not to look), you'll see that an 
 HDFS process uses a given principal with a special replacement string 
 `_HOST`. The common convention is to use something like 
 `dn/_h...@example.com` (the realm is unimportant for this example). This 
 ensures that the same configuration files can be used across all hosts in the 
 HDFS instance, and Hadoop dynamically replaces `_HOST` with the FQDN of the 
 host. Thus, there's an implicit link that all `dn/*@EXAMPLE.COM` can act as 
 datanodes and this is protected by the fact that access to the KDC is 
 restricted (you can't make your own user). The circle of trust is two-fold: 
 having a keytab with the correct principal and that Hadoop is requires that 
 specific configuration (which restricts the principal).
 
 Christopher Tubbs wrote:
 My concerns here are more about the impact on users, than for the system 
 credentials. I don't know what HDFS is doing, but if they aren't (minimally) 
 checking the realm when checking permissions/access on an authenticated 
 principal, then they are less secure than I think we should be. Referencing 
 HDFS also seems to imply that we're not so much doing Kerberos, as we are 
 implementing HDFS-specific Kerberos conventions (which are less secure, with 
 respect to data authorizations/permissions within Accumulo, than I'm 
 comfortable with).
 
 Josh Elser wrote:
 bq. if they aren't (minimally) checking the realm when checking 
 permissions/access on an authenticated principal
 
 Do you mean the instance instead of the realm? In the case of a single 
 realm, the KDC is going to verify the correct realm. Assuming you meant the 
 instance though (the optional /hostname), it's typical that a user has the 
 ability to use their credentials anywhere. Thus, you typically see principals 
 without instances for actual users. As far as I understand it, that's what 
 HDFS tends to follow and what I tried to as well. Accumulo doesn't care where 
 you come from, just what your name is and that you have valid credentials. I 
 don't think we're being substantially less secure by not including the 
 instance in the Accumulo principal.
 
 Christopher Tubbs wrote:
 No, I mean the realm, to make it only necessary to guarantee uniqueness 
 within a realm, vs. across all known realms (more reasonable of a guarantee 
 to make for a KDC user admin). We could also include the instance (when 
 specified), if we want to really be careful that users aren't sharing 
 permissions.
 
 In my concerns, I'm assuming we 

Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment

2015-01-06 Thread Christopher Tubbs


 On Jan. 5, 2015, 4:09 p.m., Christopher Tubbs wrote:
  core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java,
   line 63
  https://reviews.apache.org/r/29502/diff/2/?file=804705#file804705line63
 
  Should this be Authorizations.EMPTY? Or should it have a default 
  implementation on WrappingIterator which calls source.getAuthorizations()?
 
 Christopher Tubbs wrote:
 make that `getSource().getAuthorizations()`
 
 Corey Nolet wrote:
 Specific to this test I returned null because all the other getters 
 (other than what was being explicitly tested) were returning null. Were you 
 thinking WrappingIterator should also provide a getAuthorizations() method?

Oh, nevermind. I misunderstood the change. This is fine.


- Christopher


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29502/#review66725
---


On Jan. 6, 2015, 10:54 a.m., Corey Nolet wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29502/
 ---
 
 (Updated Jan. 6, 2015, 10:54 a.m.)
 
 
 Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and 
 kturner.
 
 
 Bugs: ACCUMULO-3458
 https://issues.apache.org/jira/browse/ACCUMULO-3458
 
 
 Repository: accumulo
 
 
 Description
 ---
 
 ACCUMULO-3458 Propagating scan-time authorizations through the 
 IteratorEnvironment so that scan-time iterators can use them.
 
 
 Diffs
 -
 
   
 core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java
  4903656 
   core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a 
   core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 
 2552682 
   core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 
 666a8af 
   core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 
 9726266 
   
 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java
  2a79f05 
   
 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 
 72cb863 
   
 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java
  9e20cb1 
   
 core/src/test/java/org/apache/accumulo/core/client/impl/ScannerImplTest.java 
 be4d467 
   
 core/src/test/java/org/apache/accumulo/core/client/impl/TabletServerBatchReaderTest.java
  PRE-CREATION 
   
 core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java
  94da7b5 
   
 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java
  fa46360 
   
 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java
  4521e55 
   
 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java
  4cebab7 
   
 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java
  4a45e99 
   
 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java
  a9801b0 
   
 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java
  bf35557 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java
  d1fece5 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java
  869cc33 
   
 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java
  fe4b16b 
   test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java 
 PRE-CREATION 
   test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/29502/diff/
 
 
 Testing
 ---
 
 Wrote an integration test to verify that ScanDataSource is actually setting 
 the authorizations on the IteratorEnvironment
 
 
 Thanks,
 
 Corey Nolet