[ https://issues.apache.org/jira/browse/CASSANDRA-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266056#comment-14266056 ]
Sam Tunnicliffe commented on CASSANDRA-8194: -------------------------------------------- While there is a window during which a stale set of permissions is used, under normal operation I don't think this *should* present too many practical problems. Refresh is triggered by the first lookup after permisions_validity_in_ms, so we'll continue to use the stale set between that point and when that refresh actually completes. Outside of tests though, clients have no visibility/expectation about the precise load or expiry timings, so this shouldn't usually matter. My concern would be performing every IAuthorizer.authorize call on a single thread using StorageService.tasks instead of distributing them across client request threads could cause a backlog and allow the window to grow unacceptably (plus, these tasks will also be contending with other users of the shared executor). The point about the proliferation of threads and executors is valid, but maybe there's a case for a dedicated executor here. We could make it a TPE with a default poolsize of 1 but allow that to be increased via a system property if necessary. What may be more of an issue is that we'll continue to serve the stale perms as long as the refresh fails completely due to IAuthorizer.authorize throwing some exception. This shouldn't really happen with CassandraAuthorizer, but other IAuthorizer impls could well encounter errors when fetching perms. To guard against that, we can force an invalidation if the ListenableFutureTask encounters an exception. That would pretty much maintain current behaviour, with the client receiving an error response while the refresh fails (actually, the authorize calls after an error would serve stale perms until the exception is thrown & caught, but all subsequent calls would fail as per current behaviour). I've attached a v3 with this second change, what are your thoughts on reverting back to a dedicated executor for cache refresh? Also, as I mentioned, tests do have concrete expectations about expiry of permissions and so this breaks auth_test.py:TestAuth.permissions_caching_test. I've pushed a fix [here|https://github.com/beobal/cassandra-dtest/tree/8194] and I'll open a PR shortly. > Reading from Auth table should not be in the request path > --------------------------------------------------------- > > Key: CASSANDRA-8194 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8194 > Project: Cassandra > Issue Type: Improvement > Reporter: Vishy Kasar > Assignee: Vishy Kasar > Priority: Minor > Fix For: 2.0.12, 3.0 > > Attachments: 8194-V2.patch, 8194.patch, CacheTest2.java > > > We use PasswordAuthenticator and PasswordAuthorizer. The system_auth has a RF > of 10 per DC over 2 DCs. The permissions_validity_in_ms is 5 minutes. > We still have few thousand requests failing each day with the trace below. > The reason for this is read cache request realizing that cached entry has > expired and doing a blocking request to refresh cache. > We should have cache refreshed periodically only in the back ground. The user > request should simply look at the cache and not try to refresh it. > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - > received only 0 responses. > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258) > at com.google.common.cache.LocalCache.get(LocalCache.java:3990) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878) > at > org.apache.cassandra.service.ClientState.authorize(ClientState.java:292) > at > org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:172) > at > org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:165) > at > org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:149) > at > org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:75) > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:102) > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:113) > at > org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1735) > at > org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4162) > at > org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4150) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - > received only 0 responses. > at org.apache.cassandra.auth.Auth.selectUser(Auth.java:256) > at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84) > at > org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) > at > org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:68) > at org.apache.cassandra.service.ClientState$1.load(ClientState.java:278) > at org.apache.cassandra.service.ClientState$1.load(ClientState.java:275) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252) > ... 19 more > Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation > timed out - received only 0 responses. > at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:105) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:943) > at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:828) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:140) > at org.apache.cassandra.auth.Auth.selectUser(Auth.java:245) > ... 28 more > ERROR [Thrift:17232] 2014-10-24 05:06:51,004 CustomTThreadPoolServer.java > (line 224) Error occurred during processing of message. > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - > received only 0 responses. > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258) > at com.google.common.cache.LocalCache.get(LocalCache.java:3990) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878) > at > org.apache.cassandra.service.ClientState.authorize(ClientState.java:292) > at > org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:172) > at > org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:165) > at > org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:149) > at > org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:116) > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:102) > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:113) > at > org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1735) > at > org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4162) > at > org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4150) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - > received only 0 responses. > at org.apache.cassandra.auth.Auth.selectUser(Auth.java:256) > at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84) > at > org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) > at > org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:68) > at org.apache.cassandra.service.ClientState$1.load(ClientState.java:278) > at org.apache.cassandra.service.ClientState$1.load(ClientState.java:275) > at > com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252) > ... 19 more > Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation > timed out - received only 0 responses. > at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:105) > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:943) > at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:828) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:140) > at org.apache.cassandra.auth.Auth.selectUser(Auth.java:245) > ... 28 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)