So in the existing policy implementations, when parsing the policy files, additional start up delays may be caused by the CodeSource.implies() method making network DNS calls.
In my ConcurrentPolicyFile implementation (to replace the standard java PolicyFile implementation), I've created a URIGrant, I've taken code from Harmony to implement implies(ProtectionDomain pd), that performs wildcard matching compliant with CodeSource.implies, the only difference being, that no attempt to resolve URI's is made.
Typically most policy files specify file based URL's for CodeSource, however in a network application where many CodeSources may be network URL's, DNS lookup causes added delays.
I've also created a CodeSourceGrant which uses CodeSource.implies() for backward compatibility with existing java policy files, however I'm sure that most will simply want to revise their policy files.
The standard interface PermissionGrant, is implemented by the following inheritance hierarchy of immutable classes:
PrincipalGrant ______________|_______________________________| | ProtectionDomainGrant CertificateGrant | ________________ |________________ ClassLoaderGrant | | URIGrant CodeSourceGrant
Only PrincipalGrant is publicly visible, a builder returns the correct implementation.
ProtectionDomainGrant and ClassLoaderGrant are dynamically granted, by the completely new DynamicPolicyProvider (which has long since passed all tests).
CertificateGrant, URIGrant and CodeSourceGrant are used by the File based policy's and RemotePolicy, which is intended to be a service that nodes in a djinn can use to allow an administrator to update the policy (eg to include new certificates or principals), with all the protection of subject authentication and secure connections. RemotePolicy is idempotent, the policy is updated in one operation, so the current policy state is always known to the administrator (who is a client).
Since a File based policy is mostly read and only written when refreshed, PermissionGrant's are held in a volatile array reference, copied (only the reference) by any code that reads the array. The array reference is updated when the policy is updated, the array is never mutated after publishing.
A ConcurrentMap<ProtectionDomain, PermissionCollection> (with weak keys) acts as a cache, I've got ConcurrentPermissions, an implementation that replaces the hetergenous java.security.Permissions class, this also resolves any unresolved permissions.
However I'm starting to wonder if it's wiser to throw away the cache altogether and simply build java.security.Permissions on demand, then throw Permissions away immediately after use for collection in the young generation heap (it's likely to fit in level 2 cache and never even be copied to Ram). This would eliminate contention between existing PermissionCollection's that block, like SocketPermissionCollection.
So if you have for instance 100 different AccessControlContext's being checked by different threads, that all contain the same ProtectionDomain's for a SocketPermission, then all will be executed in parallel. Currently due to blocking, each SocketPermission that performs a DNS check must either resolve or timeout, before it's SocketPermissionCollection can release it's synchronization lock (and there may be multiple SocketPermission's in a SocketPermissionCollection), before another thread can check it's context and so on, which explains everything coming to a standstill.
If all permission checks execute in parallel independently, without blocking, then the timeout won't be magnified.
I am considering going one step further and replacing SocketPermission and SocketPermissionCollection, and implementing DNS checks in the SocketPermissionCollection rather than SocketPermission. By doing this a matching record will be found in most cases without requiring DNS reverse lookup. If I keep this as an internal policy implementation detail, then if Oracle fixes SocketPermission, we can return to using the standard java implementation, in fact I could make it a configuration property.
It's an unfortunate fact that not all permission checks are performed in the policy, replacing SocketPermission also requires the cooperation of the SecurityManager. To make matters worse, static ProtectionDomains created prior to my policy implementation being constructed will never consult my policy implementation as such they will still contain SocketPermission. So the SecurityManager would need to check each ProtectionDomain for both implementations, so reimplementing SocketPermission doesn't eliminate its use entirely.
It's worth noting that SocketPermission is implemented rather poorly and the same functionality can be provided with far fewer DNS lookups being performed, since the majority are performed completely unnecessarily. Perhaps it's worth me donating some time to OpenJDK to fix it, I'd have to check with Apache legal first I suppose.
The problems with DNS lookup also affects CodeSource and URL equals and hashcode methods, so these classes shouldn't be used in collections.
Cheers, Peter. Christopher Dolan wrote:
To simulate the problem, go to InetAddress.getHostFromNameService() in your IDE, set a breakpoint on the "nameService.getHostByAddr" line with a condition of something like this: new java.util.concurrent.CountDownLatch(1).await(15, java.util.concurrent.TimeUnit.SECONDS) then launch your River application from within the IDE. This will cause all reverse DNS lookups to stall for 15 seconds before succeeding. This will affect Reggie the worst because it has to verify so many hostnames. In a large group (a few thousand services) this will drive Reggie's thread count skyward, perhaps triggering OutOfMemory errors if it's in a 32-bit JVM. This problem happens in the real world in facilities that allow client connections to the production LAN, but do not allow the production LAN to resolve hosts in the client LAN. This may occur due to separate IT teams or strict security rules or simple configuration errors. Because most client-server systems, like web servers, do not require the server to contact the client this problem does not become immediately visible to IT. Instead, the question is inevitably "Why is Jini/River so sensitive to reverse DNS? All of my other services work fine." Chris -----Original Message-----From: Tom Hobbs [mailto:tvho...@googlemail.com] Sent: Monday, December 12, 2011 1:43 PMTo: dev@river.apache.org Subject: Re: RE: Implications for Security Checks - SocketPermission, URL and DNS lookups My biggest concern with such fundamental changes is controlling the impact it will have. I'm a pretty good example of this, I haven't experienced the troubles these changes are intended to overcome. I also don't havent made any attempt to dive into these areas of the code, for any reason. Is it possible to put together a test case which exposes these problems and also proves the solution? Obviously, a test case involving misconfigured networks is daft, in that instance a handy "if your network misconfigured" diagnostic tool or documentation would be a good idea. Please don't interpret this concern as a criticism of your work, Peter. Far from it. It's just a comment born out of not really having any contact with the area your working in! Grammar and spelling have been sacrificed on the altar of messaging via mobile device. On 12 Dec 2011 18:01, "Christopher Dolan" <christopher.do...@avid.com> wrote:Specifically for SocketPermission, I experienced severe timeout problems with reverse DNS misconfigurations. For some LAN-based deployments, I relaxed this criterion via 'new SocketPermission("*", "accept,listen,connect,resolve")'. This was difficult to apply to a general Sun/Oracle JVM, however, because the default security policy *prepends* a ("localhost:1024-","listen") permission that triggers the reverse DNS lookup. To avoid this inconvenient setting, I install a new java.security.Policy subclass that delegates to the default Policy except when the incoming permission is a SocketPermission. That way I don't need to modify the policy file in the JVM. The Policy.implies() override method is trivial because it just needs to do " if (permission instanceof SocketPermission) { ... }". The PermissionCollection methods were trickier to override (skip over any SocketPermission elements in the default Policy's PermissionCollection), but still only about 50 LOC. Chris -----Original Message----- From: Peter Firmstone [mailto:j...@zeus.net.au] Sent: Friday, December 09, 2011 9:28 PM To: dev@river.apache.org Subject: Implications for Security Checks - SocketPermission, URL and DNS lookups DNS lookups and reverse lookups caused by URL and SocketPermission, equals, hashCode and implies methods create some serious performance problems for distributed programs. The concurrent policy implementation I've been working on reduces lock contention between threads performing security checks. When the SecurityManager is used to check a guard, it calls the AccessController, which retrieves the AccessControlContext from the call stack, this contains all the ProtectionDomain's on the call stack (I won't go into privileged calls here), if a ProtectionDomain is dynamic it will consult the Policy, prior to checking the static permissions it contains. The problem with the old policy implementation is lock contention caused by multiple threads all using multiple ProtectionDomains, when the time taken to perform a check is considerable, especially where identical security checks might be performed by multiple threads executing the same code. Although concurrent policy reduces contention between ProtectionDomain's calls to Policy.implies, there remain some fundamental problems with the implementations of SocketPermission and URL, that cause unnecessary DNS lookups during equals(), hashCode() and implies() methods. The following bugs concern SocketPermission (please read before continuing) : http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6592285 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4975882 - contains a lot of valuable comments. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4671007 - fixed, perhaps incorrectly. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6501746 Anyway to cut a long story short, DNS lookups and DNS reverse lookups are performed for the equals and hashCode implementations in SocketPermission and URL, with disastrous performance implications for policy implementations using collections and caching security permission check results. For example, once a SocketPermission guard has been checked for a specific AccessContolContext the result is cached by my SecurityManager, avoiding repeat security checks, however if that cache contains SocketPermission, DNS lookups will be required, the cache will perform slower than some other directly performed security checks! The cache is intended to return quickly to avoid reconsulting every ProtectionDomain on the stack. To make matters worse, when checking a SocketPermission guard, the DNS may be consulted for every non wild card SocketPermission contained within a SocketPermissionCollection, up until it is implied. DNS checks are being made unnecessarily, since the wild card that matches may not require a DNS lookup at all, but because the non matching SocketPermission's are being checked first, the DNS lookups and reverse lookups are still performed. This could be fixed completely, by moving the responsibility of DNS lookups from SocketPermission to SocketPermissionCollection. The identity of two SocketPermission's are equal if they resolve to the same IP address, but their hashCode's are different! See bug 6592623. The identity of a SocketPermission with an IP address and a DNS name, resolving to identical IP address should not (in my opinion) be equal, but is! One SocketPermission should only imply the other while DNS resolves to the same IP address, otherwise the equality of the two SocketPermission's will change if the IP address is assigned to a different domain! Object equality / identity shouldn't depend on the result of a possibly unreliable network source. SocketPermission and SocketPermissionCollection are broken, the only solution I can think of is to re-implement these classes (from Harmony) in the policy and SecurityManager, substituting the existing jvm classes. This would not be visible to client developers. SocketPermission's may also exist in a ProtectionDomain's static Permissions, these would have to be converted by the policy when merging the permissions from the ProtectionDomain with those from the policy. Since ProtectionDomain, attempts to check it's own internal permissions, after the policy permission check fails, DNS checks are currently performed by duplicate SocketPermission's residing in the ProectionDomain, this will no longer occur, since the permission being checked will be converted to say for argument sake org.apache.river.security.SocketPermission. However because some ProtectionDomains are static, they never consult the policy, so the Permission's contained in each ProtectionDomain will require conversion also, to do so will require extending and implementing a ProtectionDomain that encapsulates existing ProtectionDomain's in the AccessControlContext, by utilising a DomainCombiner. For CodeSource grant's, the policy file based grant's are defined by URL's, however URL's identity depend upon DNS record results, similar to SocketPermission equals and hashCode implementations which we have no control over. I'm thinking about implementing URI based grant's instead, to avoid DNS lookups, then allowing a policy compatibility mode to be enabled (with logging) for falling back to CodeSource grant's when a URL cannot be converted to a URI, this is a much simpler fix than the SocketPermission problem. For Dynamic Policy Grants, because ProtectionDomain doesn't override equals (that's a good thing), the contained CodeSource must also be checked, again potentially slowing down permission checks with DNS lookups, simply because CodeSource uses URL's. Changing the Dynamic Grant's to use URI based comparison would be relatively simple, since the URI is obtained dynamically when the dynamic grant is created. URI based grant's don't use DNS resolution and would have a narrower scope of implied CodeSources, an IP based grant won't imply a DNS domain URL based CodeSource and vice versa. Rather than rely on DNS resolution, grant's could be made specifically for IPv4, IPv6 and DNS names in policy files. URL.toURI() can be utilised to check if URI grant's imply a CodeSource without resorting to DNS. Any thoughts, comments or ideas? N.B. It's sad that security is implemented the way it is, it would be far better if it was Executor based, since every protection domain could be checked in parallel, rather than in sequence. Regards, Peter.