Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/ --- (Updated Dec. 31, 2014, 1:46 p.m.) Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and kturner. Repository: accumulo Description --- ACCUMULO-3458 Propagating scan-time authorizations through the IteratorEnvironment so that scan-time iterators can use them. Diffs (updated) - core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java 4903656 core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 2552682 core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 666a8af core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 9726266 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java 2a79f05 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 72cb863 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java 9e20cb1 core/src/main/java/org/apache/accumulo/core/iterators/system/VisibilityFilter.java 15c33fa core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java 94da7b5 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java fa46360 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java 4521e55 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java 4cebab7 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java 4a45e99 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java a9801b0 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java bf35557 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java d1fece5 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java 869cc33 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java fe4b16b test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java PRE-CREATION test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java PRE-CREATION Diff: https://reviews.apache.org/r/29502/diff/ Testing --- Wrote an integration test to verify that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet
Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment
On Dec. 31, 2014, 4:30 a.m., Josh Elser wrote: core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java, line 197 https://reviews.apache.org/r/29502/diff/1/?file=804415#file804415line197 Can't you pull this from the Scanner? I didn't see a good way to get this info from the scanner. The more I think about this- a simple getter on the scanner would be massively useful. On Dec. 31, 2014, 4:30 a.m., Josh Elser wrote: server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java, line 55 https://reviews.apache.org/r/29502/diff/1/?file=804426#file804426line55 It looks like TabletIteratorEnvironment is used for minor compactions. Isn't always setting `Authorizations.EMPTY` a little misleading? Is there something more representative of having all auths we could do here? Maybe extra documentation is enough? Could also throw UnsupportedOperationException or similar when the IteratorScope is something that isn't SCAN? Good point! This should definitely be documented as a scan-time only operation. I'm on the fence about throwing an exception- I think I could go either way on that. On Dec. 31, 2014, 4:30 a.m., Josh Elser wrote: test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java, line 54 https://reviews.apache.org/r/29502/diff/1/?file=804430#file804430line54 Please create a user, assign it the auths you need, and then remove the user after the test. If this test is run against a standalone instance, it should try to leave the system in the same state the test started in. You know I was thinking about this when I was coding the test and totally forgot to change it before I created the patch. - Corey --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/#review66439 --- On Dec. 31, 2014, 1:46 p.m., Corey Nolet wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/ --- (Updated Dec. 31, 2014, 1:46 p.m.) Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and kturner. Repository: accumulo Description --- ACCUMULO-3458 Propagating scan-time authorizations through the IteratorEnvironment so that scan-time iterators can use them. Diffs - core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java 4903656 core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 2552682 core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 666a8af core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 9726266 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java 2a79f05 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 72cb863 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java 9e20cb1 core/src/main/java/org/apache/accumulo/core/iterators/system/VisibilityFilter.java 15c33fa core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java 94da7b5 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java fa46360 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java 4521e55 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java 4cebab7 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java 4a45e99 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java a9801b0 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java bf35557 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java d1fece5 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java 869cc33 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java fe4b16b test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java PRE-CREATION test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java PRE-CREATION Diff: https://reviews.apache.org/r/29502/diff/ Testing --- Wrote an integration test to verify that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet
Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/ --- (Updated Dec. 31, 2014, 10:40 a.m.) Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and kturner. Bugs: ACCUMULO-3458 https://issues.apache.org/jira/browse/ACCUMULO-3458 Repository: accumulo Description --- ACCUMULO-3458 Propagating scan-time authorizations through the IteratorEnvironment so that scan-time iterators can use them. Diffs - core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java 4903656 core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 2552682 core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 666a8af core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 9726266 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java 2a79f05 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 72cb863 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java 9e20cb1 core/src/main/java/org/apache/accumulo/core/iterators/system/VisibilityFilter.java 15c33fa core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java 94da7b5 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java fa46360 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java 4521e55 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java 4cebab7 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java 4a45e99 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java a9801b0 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java bf35557 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java d1fece5 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java 869cc33 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java fe4b16b test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java PRE-CREATION test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java PRE-CREATION Diff: https://reviews.apache.org/r/29502/diff/ Testing --- Wrote an integration test to verify that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
On Dec. 30, 2014, 7:39 p.m., kturner wrote: core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java, line 63 https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line63 why is this a noop? Josh Elser wrote: The SASL transport in thrift is actually doing the work of sending the kerberos credentials over the wire for our RPC use. We have a TProcessor and a wrapper around the thrift interface implementation which passes this information from the thrift layer back into our Credentials objects which alleviates us from having to do (nearly) anything in the KerberosToken itself. I'm not sure if there is a better way to do it, but I'm open to suggestions. Thanks for the explanation, I suspected it was something like you said but wasnt sure. Would passing the principal across the wire be useful for sanity checks? On Dec. 30, 2014, 7:39 p.m., kturner wrote: core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java, line 46 https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line46 This is adding UserGroupInformation to Accumulo API. Is that a stable hadoop API? Does it have to be in API, why not just principal? Josh Elser wrote: UGI is marked as LimitedPrivate and Evolving, but I have no qualms against including it into our API. The problem with accepting a principal alone, we would then have to duplicate the login capabilities that UGI already provides (e.g performing a login via password or keytab). By accepting a UGI directly, we don't have to own this logic. It's my opinion that it's easiest to just use UGI directly, but could see an argument for transparently using it behind the scenes if you think that would be better. I have no qualms against including it into our API Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? but could see an argument for transparently using it behind the scenes if you think that would be better. From an implementation perspective, I don't know whats best. I don't know what the benefits of using UGI are. What are the problems with not using it? Would not using it require a complex facade class? Would not using it cause interoperability problems with others using Kerberos in the hadoop ecosystem? If this object is really useful to users of Hadoop and we want to use it in our API, I would advocate for opening a hadoop issue to change the stability guarantees. I am uncertain if we should use it before this change is made, but thats because I don't fully understand the details. On Dec. 30, 2014, 7:39 p.m., kturner wrote: server/base/src/main/java/org/apache/accumulo/server/init/Initialize.java, line 540 https://reviews.apache.org/r/29386/diff/4/?file=803157#file803157line540 why did this method need to be added? Josh Elser wrote: Client authentication with Kerberos implies that any user with valid Kerberos credentials can be authenticated with the system transparently. As such, the notion of a root super-user goes out the window -- it doesn't make sense to have a root user because that would require a new kerberos identity to be created just to delegate permissions to an actual user. By changing `accumulo init` to accept a root user (and granting the necessary system permissions to that user), we remove the need to have some extra user created only for the purposes of bootstraping the Accumulo permissions. Alternatively, we could also require that the user running `accumulo init` has Kerberos credentials and give that user system permissions. I could see this work either way. Thanks for the explanation. - kturner --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/#review66393 --- On Dec. 31, 2014, 4:36 a.m., Josh Elser wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/ --- (Updated Dec. 31, 2014, 4:36 a.m.) Review request for accumulo. Bugs: ACCUMULO-2815 https://issues.apache.org/jira/browse/ACCUMULO-2815 Repository: accumulo Description --- ACCUMULO-2815 Initial support for Kerberos client authentication. Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements. Introduced... * An Accumulo KerberosToken which is an AuthenticationToken to validate users. * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB identity and
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
On Dec. 30, 2014, 8:44 p.m., Mike Drob wrote: core/src/main/java/org/apache/accumulo/core/security/Credentials.java, lines 43-44 https://reviews.apache.org/r/29386/diff/4/?file=803153#file803153line43 Why have this? Josh Elser wrote: I had something in here at some point in time, but then removed it. Rather than just re-deleting the logger (to assumable re-add it again some day, I just left it in place). I can remove it if it bothers you. Christopher Tubbs wrote: I don't like warning suppression. We can add it later. I also noted this issue in my review, below. (I'll close mine, to de-dupe). Ok, will remove again. - Josh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/#review66394 --- On Dec. 31, 2014, 4:36 a.m., Josh Elser wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/ --- (Updated Dec. 31, 2014, 4:36 a.m.) Review request for accumulo. Bugs: ACCUMULO-2815 https://issues.apache.org/jira/browse/ACCUMULO-2815 Repository: accumulo Description --- ACCUMULO-2815 Initial support for Kerberos client authentication. Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements. Introduced... * An Accumulo KerberosToken which is an AuthenticationToken to validate users. * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB identity and Accumulo authentication. * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities seamlessly. * New ClientConf variables to use SASL transport and pass Kerberos server principal * Updated ClientOpts and Shell opts to transparently use a KerberosToken when SASL is enabled (no extra client work). I believe this is the bare minimum for Kerberos support. They are also grossly lacking in unit and integration tests. I believe that I might have somehow broken the client address string in the server (I saw log messages with client: null, but I'm not sure if it's due to these changes or not). A necessary limitation in the Thrift server used is that, like the SSL transport, the SASL transport cannot presently be used with the TFramedTransport, which means none of the [half]async thrift servers will function with this -- we're stuck with the TThreadPoolServer. Performed some contrived benchmarks on my laptop (while still using it myself) to get at big-picture view of the performance impact against normal operation and Kerberos alone. Each run was the duration to ingest 100M records using continuous-ingest, timed with `time`, using 'real'. THsHaServer (our default), 6 runs: Avg: 10m7.273s (607.273s) Min: 9m43.395s Max: 10m52.715s TThreadPoolServer (no SASL), 5 runs: Avg: 11m16.254s (676.254s) Min: 10m30.987s Max: 12m24.192s TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs: Avg: 13m17.187s (797.187s) Min: 10m52.997s Max: 16m0.975s The general takeway is that there's about 15% performance degredation in its initial state which is in the realm of what I expected (~10%). Diffs - core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 6fe61a5 core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java e75bec6 core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java f481cc3 core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 6dc846f core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java 5da803b core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 6eace77 core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java PRE-CREATION
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
On Dec. 30, 2014, 7:39 p.m., kturner wrote: core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java, line 46 https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line46 This is adding UserGroupInformation to Accumulo API. Is that a stable hadoop API? Does it have to be in API, why not just principal? Josh Elser wrote: UGI is marked as LimitedPrivate and Evolving, but I have no qualms against including it into our API. The problem with accepting a principal alone, we would then have to duplicate the login capabilities that UGI already provides (e.g performing a login via password or keytab). By accepting a UGI directly, we don't have to own this logic. It's my opinion that it's easiest to just use UGI directly, but could see an argument for transparently using it behind the scenes if you think that would be better. kturner wrote: I have no qualms against including it into our API Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? but could see an argument for transparently using it behind the scenes if you think that would be better. From an implementation perspective, I don't know whats best. I don't know what the benefits of using UGI are. What are the problems with not using it? Would not using it require a complex facade class? Would not using it cause interoperability problems with others using Kerberos in the hadoop ecosystem? If this object is really useful to users of Hadoop and we want to use it in our API, I would advocate for opening a hadoop issue to change the stability guarantees. I am uncertain if we should use it before this change is made, but thats because I don't fully understand the details. bq. Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? The only difference between Stable and Evolving is that breaking changes could happen at minor releases instead of just major releases (akin to semver). So, just to make a point, even using only Stable classes will result in us having to worry about breaking changes. It's possible that we could use a very thin shim around the most often used methods of UGI, but I'm worried that would be insufficient. For example, most people use `UserGroupInformation.loginUserFromKeytab` or they are have a cached ticket (from `kinit`'ing on the command line before invoking the application) which we can get access to via `UserGroupInformation.getCurrentUser`. The other approach is to use `UserGroupInformation.createProxyUser` which accepts a principal and proxies another user on top of another UGI -- this is very useful for us as it eliminates the need to do Authorization intersection as some server-user to maintain application security. The server is running an Accumulo query for the user; however Accumulo sees the user, not the server. Back to the point though: this actual assertion is a quick fail to make sure that the user trying to use Kerberos is actually logged in (catch it and fail before we actually try to send an RPC). We could push this farther down into the codebase -- inside ConnectorImpl maybe? -- which would even moreso make this a shell AuthenticationToken. To clarify, your biggest worry is the UGI argument in the constructor and not using it in the implementation, right? - Josh --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/#review66393 --- On Dec. 31, 2014, 4:36 a.m., Josh Elser wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/ --- (Updated Dec. 31, 2014, 4:36 a.m.) Review request for accumulo. Bugs: ACCUMULO-2815 https://issues.apache.org/jira/browse/ACCUMULO-2815 Repository: accumulo Description --- ACCUMULO-2815 Initial support for Kerberos client authentication. Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements. Introduced... * An Accumulo KerberosToken which is an AuthenticationToken to validate users. * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB identity and Accumulo authentication. * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities seamlessly. * New ClientConf variables to use SASL transport and pass Kerberos server principal * Updated ClientOpts
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
On Dec. 30, 2014, 7:39 p.m., kturner wrote: core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java, line 46 https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line46 This is adding UserGroupInformation to Accumulo API. Is that a stable hadoop API? Does it have to be in API, why not just principal? Josh Elser wrote: UGI is marked as LimitedPrivate and Evolving, but I have no qualms against including it into our API. The problem with accepting a principal alone, we would then have to duplicate the login capabilities that UGI already provides (e.g performing a login via password or keytab). By accepting a UGI directly, we don't have to own this logic. It's my opinion that it's easiest to just use UGI directly, but could see an argument for transparently using it behind the scenes if you think that would be better. kturner wrote: I have no qualms against including it into our API Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? but could see an argument for transparently using it behind the scenes if you think that would be better. From an implementation perspective, I don't know whats best. I don't know what the benefits of using UGI are. What are the problems with not using it? Would not using it require a complex facade class? Would not using it cause interoperability problems with others using Kerberos in the hadoop ecosystem? If this object is really useful to users of Hadoop and we want to use it in our API, I would advocate for opening a hadoop issue to change the stability guarantees. I am uncertain if we should use it before this change is made, but thats because I don't fully understand the details. Josh Elser wrote: bq. Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? The only difference between Stable and Evolving is that breaking changes could happen at minor releases instead of just major releases (akin to semver). So, just to make a point, even using only Stable classes will result in us having to worry about breaking changes. It's possible that we could use a very thin shim around the most often used methods of UGI, but I'm worried that would be insufficient. For example, most people use `UserGroupInformation.loginUserFromKeytab` or they are have a cached ticket (from `kinit`'ing on the command line before invoking the application) which we can get access to via `UserGroupInformation.getCurrentUser`. The other approach is to use `UserGroupInformation.createProxyUser` which accepts a principal and proxies another user on top of another UGI -- this is very useful for us as it eliminates the need to do Authorization intersection as some server-user to maintain application security. The server is running an Accumulo query for the user; however Accumulo sees the user, not the server. Back to the point though: this actual assertion is a quick fail to make sure that the user trying to use Kerberos is actually logged in (catch it and fail before we actually try to send an RPC). We could push this farther down into the codebase -- inside ConnectorImpl maybe? -- which would even moreso make this a shell AuthenticationToken. To clarify, your biggest worry is the UGI argument in the constructor and not using it in the implementation, right? To clarify, your biggest worry is the UGI argument in the constructor and not using it in the implementation, right? Correct. This patch adds KerberosToken to the public API w/ a public method using UGI. I would be much less concerned if UGI were only in implementation of KerberosToken as that gives us a lot more wiggle room. On Dec. 30, 2014, 7:39 p.m., kturner wrote: core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java, line 109 https://reviews.apache.org/r/29386/diff/4/?file=803139#file803139line109 It seems like the expectation is that `getPrincipal()` would be called now? If so can this be made private? Can't remember if jcommander is ok w/ that. Josh Elser wrote: I'm not sure if JCommander supports that (my gut reaction is no), but there might also be public API implications of that... If possible it would be nice to make it private, makes it much easier for us to use this class correctly going forward. But if it does not work out, no big deal. - kturner --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/#review66393
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
On Dec. 30, 2014, 7:39 p.m., kturner wrote: core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java, line 46 https://reviews.apache.org/r/29386/diff/4/?file=803145#file803145line46 This is adding UserGroupInformation to Accumulo API. Is that a stable hadoop API? Does it have to be in API, why not just principal? Josh Elser wrote: UGI is marked as LimitedPrivate and Evolving, but I have no qualms against including it into our API. The problem with accepting a principal alone, we would then have to duplicate the login capabilities that UGI already provides (e.g performing a login via password or keytab). By accepting a UGI directly, we don't have to own this logic. It's my opinion that it's easiest to just use UGI directly, but could see an argument for transparently using it behind the scenes if you think that would be better. kturner wrote: I have no qualms against including it into our API Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? but could see an argument for transparently using it behind the scenes if you think that would be better. From an implementation perspective, I don't know whats best. I don't know what the benefits of using UGI are. What are the problems with not using it? Would not using it require a complex facade class? Would not using it cause interoperability problems with others using Kerberos in the hadoop ecosystem? If this object is really useful to users of Hadoop and we want to use it in our API, I would advocate for opening a hadoop issue to change the stability guarantees. I am uncertain if we should use it before this change is made, but thats because I don't fully understand the details. Josh Elser wrote: bq. Why do you think its ok to ignore the API marking? Shouldnt we honor whats marked on the assumption that someone in Hadoop land will think they are free to make breaking changes based on whats makred? The only difference between Stable and Evolving is that breaking changes could happen at minor releases instead of just major releases (akin to semver). So, just to make a point, even using only Stable classes will result in us having to worry about breaking changes. It's possible that we could use a very thin shim around the most often used methods of UGI, but I'm worried that would be insufficient. For example, most people use `UserGroupInformation.loginUserFromKeytab` or they are have a cached ticket (from `kinit`'ing on the command line before invoking the application) which we can get access to via `UserGroupInformation.getCurrentUser`. The other approach is to use `UserGroupInformation.createProxyUser` which accepts a principal and proxies another user on top of another UGI -- this is very useful for us as it eliminates the need to do Authorization intersection as some server-user to maintain application security. The server is running an Accumulo query for the user; however Accumulo sees the user, not the server. Back to the point though: this actual assertion is a quick fail to make sure that the user trying to use Kerberos is actually logged in (catch it and fail before we actually try to send an RPC). We could push this farther down into the codebase -- inside ConnectorImpl maybe? -- which would even moreso make this a shell AuthenticationToken. To clarify, your biggest worry is the UGI argument in the constructor and not using it in the implementation, right? kturner wrote: To clarify, your biggest worry is the UGI argument in the constructor and not using it in the implementation, right? Correct. This patch adds KerberosToken to the public API w/ a public method using UGI. I would be much less concerned if UGI were only in implementation of KerberosToken as that gives us a lot more wiggle room. The only difference between Stable and Evolving is that breaking changes could happen at minor releases instead of just major releases (akin to semver). Will Hadoop deprecate before changing in a minor release? Trying to think through case where a minor hadoop release moves UGI. If Hadoop does not deprecate and just moves it, we would be up the creek w/o a paddle. W/o deprecation we would be forced to deprecated and introduce a facade anyway because we can depend on two hadoop releases simultaneously. 1. Accumulo 1.7.0 release API method w/ UGI in signature, lets call this method foo1(UGI_A). 2. Hadoop release 2.9.0 which moves UGI_A to UGI_B AND leaves UGI_A as deprecated. 3. Accumulo must release 1.8.0 w/ foo1(UGI_A) deprecated and introduce foo1(UGI_B). - kturner --- This is
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/ --- (Updated Dec. 31, 2014, 9:24 p.m.) Review request for accumulo. Changes --- Remove UserGroupInformation from the publicness of KerberosToken. Bugs: ACCUMULO-2815 https://issues.apache.org/jira/browse/ACCUMULO-2815 Repository: accumulo Description --- ACCUMULO-2815 Initial support for Kerberos client authentication. Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements. Introduced... * An Accumulo KerberosToken which is an AuthenticationToken to validate users. * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB identity and Accumulo authentication. * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities seamlessly. * New ClientConf variables to use SASL transport and pass Kerberos server principal * Updated ClientOpts and Shell opts to transparently use a KerberosToken when SASL is enabled (no extra client work). I believe this is the bare minimum for Kerberos support. They are also grossly lacking in unit and integration tests. I believe that I might have somehow broken the client address string in the server (I saw log messages with client: null, but I'm not sure if it's due to these changes or not). A necessary limitation in the Thrift server used is that, like the SSL transport, the SASL transport cannot presently be used with the TFramedTransport, which means none of the [half]async thrift servers will function with this -- we're stuck with the TThreadPoolServer. Performed some contrived benchmarks on my laptop (while still using it myself) to get at big-picture view of the performance impact against normal operation and Kerberos alone. Each run was the duration to ingest 100M records using continuous-ingest, timed with `time`, using 'real'. THsHaServer (our default), 6 runs: Avg: 10m7.273s (607.273s) Min: 9m43.395s Max: 10m52.715s TThreadPoolServer (no SASL), 5 runs: Avg: 11m16.254s (676.254s) Min: 10m30.987s Max: 12m24.192s TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs: Avg: 13m17.187s (797.187s) Min: 10m52.997s Max: 16m0.975s The general takeway is that there's about 15% performance degredation in its initial state which is in the realm of what I expected (~10%). Diffs (updated) - core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 6fe61a5 core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java e75bec6 core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java f481cc3 core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 6dc846f core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java 5da803b core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 6eace77 core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java PRE-CREATION core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 40be70f core/src/test/java/org/apache/accumulo/core/rpc/SaslConnectionParamsTest.java PRE-CREATION proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java 4b048eb server/base/src/main/java/org/apache/accumulo/server/AccumuloServerContext.java 09ae4f4 server/base/src/main/java/org/apache/accumulo/server/init/Initialize.java 046cfb5 server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingInvocationHandler.java PRE-CREATION server/base/src/main/java/org/apache/accumulo/server/rpc/TCredentialsUpdatingWrapper.java PRE-CREATION server/base/src/main/java/org/apache/accumulo/server/rpc/TServerUtils.java 641c0bf server/base/src/main/java/org/apache/accumulo/server/rpc/ThriftServerType.java PRE-CREATION server/base/src/main/java/org/apache/accumulo/server/security/SecurityOperation.java 5e81018
Re: Review Request 29386: ACCUMULO-2815 Client authentication via Kerberos
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/#review66489 --- core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java https://reviews.apache.org/r/29386/#comment110060 Need to update javadocs. - kturner On Dec. 31, 2014, 9:24 p.m., Josh Elser wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29386/ --- (Updated Dec. 31, 2014, 9:24 p.m.) Review request for accumulo. Bugs: ACCUMULO-2815 https://issues.apache.org/jira/browse/ACCUMULO-2815 Repository: accumulo Description --- ACCUMULO-2815 Initial support for Kerberos client authentication. Leverage SASL transport provided by Thrift which can speak GSSAPI, which Kerberos implements. Introduced... * An Accumulo KerberosToken which is an AuthenticationToken to validate users. * Custom thrift processor and invocation handler to ensure server RPCs have a valid KRB identity and Accumulo authentication. * A KerberosAuthenticator which extends ZKAuthenticator to support Kerberos identities seamlessly. * New ClientConf variables to use SASL transport and pass Kerberos server principal * Updated ClientOpts and Shell opts to transparently use a KerberosToken when SASL is enabled (no extra client work). I believe this is the bare minimum for Kerberos support. They are also grossly lacking in unit and integration tests. I believe that I might have somehow broken the client address string in the server (I saw log messages with client: null, but I'm not sure if it's due to these changes or not). A necessary limitation in the Thrift server used is that, like the SSL transport, the SASL transport cannot presently be used with the TFramedTransport, which means none of the [half]async thrift servers will function with this -- we're stuck with the TThreadPoolServer. Performed some contrived benchmarks on my laptop (while still using it myself) to get at big-picture view of the performance impact against normal operation and Kerberos alone. Each run was the duration to ingest 100M records using continuous-ingest, timed with `time`, using 'real'. THsHaServer (our default), 6 runs: Avg: 10m7.273s (607.273s) Min: 9m43.395s Max: 10m52.715s TThreadPoolServer (no SASL), 5 runs: Avg: 11m16.254s (676.254s) Min: 10m30.987s Max: 12m24.192s TThreadPoolServer+SASL/GSSAPI (these changes), 6 runs: Avg: 13m17.187s (797.187s) Min: 10m52.997s Max: 16m0.975s The general takeway is that there's about 15% performance degredation in its initial state which is in the realm of what I expected (~10%). Diffs - core/src/main/java/org/apache/accumulo/core/cli/ClientOpts.java f6ea934 core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java 6fe61a5 core/src/main/java/org/apache/accumulo/core/client/impl/ClientContext.java e75bec6 core/src/main/java/org/apache/accumulo/core/client/impl/ConnectorImpl.java f481cc3 core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportKey.java 6dc846f core/src/main/java/org/apache/accumulo/core/client/impl/ThriftTransportPool.java 5da803b core/src/main/java/org/apache/accumulo/core/client/security/tokens/KerberosToken.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/conf/Property.java e054a5f core/src/main/java/org/apache/accumulo/core/rpc/FilterTransport.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/SaslConnectionParams.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/TTimeoutTransport.java 6eace77 core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java 09bd6c4 core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransport.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/rpc/UGIAssumingTransportFactory.java PRE-CREATION core/src/main/java/org/apache/accumulo/core/security/Credentials.java 525a958 core/src/test/java/org/apache/accumulo/core/cli/TestClientOpts.java ff49bc0 core/src/test/java/org/apache/accumulo/core/client/ClientConfigurationTest.java PRE-CREATION core/src/test/java/org/apache/accumulo/core/conf/ClientConfigurationTest.java 40be70f core/src/test/java/org/apache/accumulo/core/rpc/SaslConnectionParamsTest.java PRE-CREATION proxy/src/main/java/org/apache/accumulo/proxy/Proxy.java 4b048eb server/base/src/main/java/org/apache/accumulo/server/AccumuloServerContext.java 09ae4f4 server/base/src/main/java/org/apache/accumulo/server/init/Initialize.java 046cfb5
Re: Review Request 29502: ACCUMULO-3458 Adding scan authorizations to IteratorEnvironment
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/#review66493 --- core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java https://reviews.apache.org/r/29502/#comment110073 Testing this method for scanner and batch scanner would be nice. core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java https://reviews.apache.org/r/29502/#comment110070 was putting @Override on same line as method decleration intentional? test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java https://reviews.apache.org/r/29502/#comment110071 could this be more strict, should the count be 1? test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java https://reviews.apache.org/r/29502/#comment110072 Two more test would be nice. * Test the case where empty set of Auths is set on scanner, and verify AuthsIterator.FAIL is returned. * Test case where set Auths(B) is set on scanner, and verify AuthsIterator.FAIL is returned. The TransformingIterator could possible be modified to leverage this change. That could be done as a follow on issue. - kturner On Dec. 31, 2014, 3:40 p.m., Corey Nolet wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29502/ --- (Updated Dec. 31, 2014, 3:40 p.m.) Review request for accumulo, Christopher Tubbs, Eric Newton, Josh Elser, and kturner. Bugs: ACCUMULO-3458 https://issues.apache.org/jira/browse/ACCUMULO-3458 Repository: accumulo Description --- ACCUMULO-3458 Propagating scan-time authorizations through the IteratorEnvironment so that scan-time iterators can use them. Diffs - core/src/main/java/org/apache/accumulo/core/client/ClientSideIteratorScanner.java 4903656 core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java 335b63a core/src/main/java/org/apache/accumulo/core/client/impl/OfflineScanner.java 2552682 core/src/main/java/org/apache/accumulo/core/client/impl/ScannerImpl.java 666a8af core/src/main/java/org/apache/accumulo/core/client/impl/ScannerOptions.java 9726266 core/src/main/java/org/apache/accumulo/core/client/impl/TabletServerBatchReader.java 2a79f05 core/src/main/java/org/apache/accumulo/core/client/mock/MockScannerBase.java 72cb863 core/src/main/java/org/apache/accumulo/core/iterators/IteratorEnvironment.java 9e20cb1 core/src/main/java/org/apache/accumulo/core/iterators/system/VisibilityFilter.java 15c33fa core/src/test/java/org/apache/accumulo/core/iterators/DefaultIteratorEnvironment.java 94da7b5 core/src/test/java/org/apache/accumulo/core/iterators/FirstEntryInRowIteratorTest.java fa46360 core/src/test/java/org/apache/accumulo/core/iterators/user/RowDeletingIteratorTest.java 4521e55 core/src/test/java/org/apache/accumulo/core/iterators/user/TransformingIteratorTest.java 4cebab7 server/base/src/test/java/org/apache/accumulo/server/iterators/MetadataBulkLoadFilterTest.java 4a45e99 server/base/src/test/java/org/apache/accumulo/server/replication/StatusCombinerTest.java a9801b0 server/monitor/src/main/java/org/apache/accumulo/monitor/servlets/trace/NullScanner.java bf35557 server/tserver/src/main/java/org/apache/accumulo/tserver/TabletIteratorEnvironment.java d1fece5 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Compactor.java 869cc33 server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/ScanDataSource.java fe4b16b test/src/main/java/org/apache/accumulo/test/functional/AuthsIterator.java PRE-CREATION test/src/test/java/org/apache/accumulo/test/ScanIteratorIT.java PRE-CREATION Diff: https://reviews.apache.org/r/29502/diff/ Testing --- Wrote an integration test to verify that ScanDataSource is actually setting the authorizations on the IteratorEnvironment Thanks, Corey Nolet