Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Peter Firmstone Wed, 16 Sep 2009 16:12:46 -0700

Some Implementation design thoughts on Security:

Security by Name space visibility and Trust within Package Class loader's?

If each package is segregated into its own class loader and alldependencies required by that package have been determined by Code baseanalysis, then visibility should be limited to the classes and methodsdiscovered by the codebase server analysis and enforced at class loadingtime .A local namespace visibility policy (more fine grained than javasecurity policies) , might contain a list of allowable system methodsfor code originating from untrusted entitites (even though the code baseis trusted and the code has been analysed). Any method signatures inthe downloaded code that didn't appear in the list as allowable, wouldnot be granted visibility, a default working set could be created fordistribution with River, all disallowed methods are commented out.

Then in the worst cast of trust, where neither the code base or theorigin of the code is trusted, the list of required dependencies andmethods declared by the code base analysis are only allowed if they areallowed locally. So if a code base were to submit code with nondisclosed methods, those methods would not be accessible to theuntrusted code. The dependency analysis information provided by thecode base forms a contract between untrusted parties.


Consider the following:

  1. Code base A is trusted and has obtained it's code from another
     trusted entity (who ever uploaded the code to the code base server
     in the first place).
  2. Code base B is untrusted.
  3. Code base A is trusted and has obtained some code from Code base B
     which is untrusted.
  4. Trusted and Untrusted code will be loaded into separate class
     loaders by a client JVM.

Note: my reference to methods, include protected or public visibility,the terminology may be freely interchanged with fields that are publicor protected also.

Code base A could bundle and sign the trusted code, and bundle withoutsigning the untrusted code after analysis. (where bundle means splittingan existing jar into multiple jar's after analysis, one for each package).

The client would receive a dependency analysis report from Code base A,the client would restrict the visibility of the untrusted code to asubset of declared methods that are allowed.

Code base A, might later receive trusted code that is API compatiblewith that of the untrusted code, this would be discovered by analysis.From then on, Code base A would be able to provide trusted code, toit's trusting clients when required.

This could lead to the desirable situation where a Client is receiving amarshalled object stream from an untrusted service or vice versa, bothentities could obtain trusted byte code for unmarshalling from their ownpreferred trusted code bases, regardless of the source of the marshalledobject stream.

In the worst case, code could be obtained from an untrusted code base,however that byte code would not be able to access any methods that hadnot been declared as required dependencies by the code base, thedeclared methods would also be vetted against the local securitypolicy. In the worst case the code would be available with degradedfunctionality, but will not violate the local security and namespacevisibility policy, unpermitted methods would not be visible in theuntrusted package's class loader.


However I've deliberately left out a scenario:

Interoperability between trusted and untrusted code?

What about untrusted application code interacting with trustedapplication code? How does one restrict access for untrusted code? Whois responsible for determining what methods should be accessible bydefault, for application packages? The package might not exist in thelocal JVM at load time, it may be downloaded later.

The onus in this case would have to be placed upon the trustedapplication package distributor (as trusted by the code base) who may attheir discretion, change what methods untrusted code can safely haveaccess to. Hence there will need to be a means for the code base toallow and provide name space visibility policies for application codealso. Determining trust is left to the client. An unknown third partymay become trusted by a client, if that party is trusted by a trustedcode base. A friend of a friend so to speak.

Perhaps trusted code should be limited to the codebase's declaredvisibility requirements as an additional precaution, assisting withanalysis bug identification too. Perhaps different namespace visibilitypolicies could be developed for different trusted codebaseentities/identities, I'm not sure if this is an essential requirement,however the implementation could be made extensible so as not to excludethe possibility.


One other point:

Class load time delays caused by bytecode verification; perhaps bytecodeverification could be performed by the trusted code base, eliminatingthe need to verify remote code, improving load time response. Localcode is not verified at load time by default. In this case anadministrator would trust their code bases and would not under anycircumstance allow bytecode to be utilised from untrusted sources. Butthen with the New Verifier in Java SE 6 as a result of JSR202... perhapsverification time has been mitigated somewhat?


Anyone have any input or implementation suggestions?

Regards,

Peter.


Peter Firmstone wrote:

Look forward to it mate,

N.B. this line should read:

  * Codebase surrogates, for objects originating from periodically
disconnected services for clients to obtain their bytecode (theyalso require Refreshable References and
    Xuid's)

Cheers,

Peter.


Gregg Wonderly wrote:
Peter, I want to write up some questions and thoughts about thispost, but can't do that right now, hopefully I can in a day or so.
Gregg Wonderly

Peter Firmstone wrote:
I've had some more thoughts on Codebase services after spending timeresearching & reflecting.
Issues I'd like to see addressed or simplified using Codebase services:

   * Codebase loss
   * Codebase replication
   * Codebase upgrades
   * Codebase configuration
   * Codebase surrogates, for objects originating from periodically
     disconnected clients (they also require Refreshable References and
     Xuid's)
   * Bytecode Dependency Analysis & API signature identification, for
     Package & Class Binary Compatiblity & ClassLoader Isolation
   * Bytecode Static Security Analysis, repackaging & code signing.
On the last issue I've had some thoughts about Code bases being ableto act as a trust mediator to receive, analyse, repackage, sign andforward bytecode on behalf of clients. The last two items above fitinto the category of Bytecode Analysis service responsibilities forcodebases. Prior to loading class files, a client can have a trustrelationship with one or more preferred codebase providers. A codebase provider also provides bytecode static analysis services forsecurity and binary compatibility purposes.I got thinking about this solution after reading about service proxycircular code verification issues for disconnected clients thatproject neuromancer exposed. A surrogate security verifier as wellas a codebase surrogate.
All this would be implemented with minimal changes to services andclients configurations and no change to third party library code,unlike my evolving objects framework proposals.
After receiving a tip off from Michael Warres, Tim Blackman wasgracious enough to share learnings from his research on class loadertree's. Tim built a prototype system using message digests and wasconsidering implementing textual Class API signatures foridentifying compatibility between different class bytecode's. Timconsidered the textual API signatures when he found independentvendor compiler optimisations produced different bytecode, hencedifferent SHA-1 signatures, although they have identical andcompatible class API. I thought about this further and realisedthat Binary Compatiblity for class files and package change is farmore flexible than source code compatibility. While Timconcentrated on API compatibility for ensuring objects that shouldbe shared, could be, he found that groups of class files, based ondependency analysis (this is where the replacement ClassDep codecame from), required their own ClassLoader's, hence there are asignificant number of class loader instances required for maximumcompatibility (without going into more detail).
In essence, the solution I'm striving for, is to solve the problemin a distributed world that OSGi solves in the JVM; segregation andisolation of incompatibility while allowing compatibleimplementations to cooperate. However I want an implementationwithout commitment to any particular container or module technology,so as not to force container implementation choices on projects thatalready have their specific container implementations.
Rather than reinventing another container technology, all jar filesa service's client requires, could be uploaded to codebase services,just prior to service registration. The codebase service couldanalyse, repackage and sign the jar files into compatible bundles,dynamic containers if you wish, one for each ClassLoader, where eachclass loader represents a Package API group signature.
Using the uploaded jar files, the codebase services could generateand propagate analysis reports amongst themselves in a p2p fashion,such that between them, they could determine the latest binarycompatible version of a package, such that the latest compatibleversion would always be preferred. Once the latest version isidentified, a codebase service can verify, with it's own analysis,in order to confirm and report malicious or malfunctioning codebaseservers. Newer versions of a Package, found to have broken BinaryBackward compatibility, would be kept in a separate ClassLoader asdetermined by their API signature, thus incompatibility isisolated. There may be subgroups within a package, that could alsobe shared between incompatible package versions to provide improvedclass file and object sharing.
Hence a client receiving bytecode, could choose to channel itthrough one or more codebase servers that it has trust relationshipswith. A bytecode trust surrogate, the preferred codebase servercould retrieve required bytecode that it doesn't already posses vialookup services of other codebase service locations. The bytecoderecipient would retrieve analysis information detailing bytecodeimplementation security concerns prior to loading any bytecode. Thecodebase server would not execute any untrusted bytecode itself,only perform analysis using the ASM library, the aim would be that acodebase server was as secure as possible, such that it can beconsidered trustworthy and as impervious to attack aspossible(existing denial of service attack strategies requireconsideration). One could even perform tests on codebases, byuploading deliberately malicious code and checking resultinganalysis reports, or by occasionally confirming the analysis reportswith other codebases or using a local codebase analysis processes.Separation of concerns.
Codebase Services would only be required to maintain a copy of theevolution bloodline for the latest binary backward compatiblepackage. A package fork or breaking of backward compatibility wouldmean storing a copy of both of the latest divergent compatibilitysignatures, again some unchanged class subgroups may be sharedbetween them. Java Bytecode versions (compiler specific) would alsodictate which package version could be used safely in local JVM's.
Clients of services will have to accept a certain amount ofdowntime, once a particular instance of a package's classes areloaded into a classloader, no other compatible implementations ofthat package will be able to be loaded, this is only a problem forlong lived service client processes. Object state will need to bepersisted while the JVM restarts and reloads new bytecode(Serializable is also part of class API). This is due to theinability of an existing ClassLoader to reload classes (java debugexcluded). Backward Binary compatibility doesn't necessarily inferforward compatibility, classes and interfaces can add methodswithout breaking compatibility with pre existing binaries,visibility can become more visible, abstract methods can become nonabstract, even though some of these changes break source codecompatibility, old clients aren't aware of the new methods and don'texecute them. For specifics see Chapter 13, Binary Compatibility ofthe Java Language Specification, 3rd Edition, this is what I plan tobase the compatibility analysis upon.
It would also be possible for services to utilise codebase serversin their classpath.
These issues I propose tackling are not simple obstacles, nor willthey be easy to implement, some issues may even be intractable, butwhat the hell, who' with me? That's why we got into this in thefirst place isn't it? The challenge! Project Neuromancerhighlighted areas for improvement, if we address some of these, Ibelieve that River can become the much vaunted and dreamt ofsemantic web.
I want problems identified so solutions can be devised, lets seeobjections & supporting logic or better ideas.
Cheers,

Peter.

Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Reply via email to