Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Peter Firmstone Tue, 08 Sep 2009 00:50:40 -0700

I've had some more thoughts on Codebase services after spending timeresearching & reflecting.


Issues I'd like to see addressed or simplified using Codebase services:


   * Codebase loss
   * Codebase replication
   * Codebase upgrades
   * Codebase configuration
   * Codebase surrogates, for objects originating from periodically
     disconnected clients (they also require Refreshable References and
     Xuid's)
   * Bytecode Dependency Analysis & API signature identification, for
     Package & Class Binary Compatiblity & ClassLoader Isolation

* Bytecode Static Security Analysis, repackaging & code signing.

On the last issue I've had some thoughts about Code bases being able toact as a trust mediator to receive, analyse, repackage, sign and forwardbytecode on behalf of clients. The last two items above fit into thecategory of Bytecode Analysis service responsibilities for codebases.Prior to loading class files, a client can have a trust relationshipwith one or more preferred codebase providers. A code base provideralso provides bytecode static analysis services for security and binarycompatibility purposes.I got thinking about this solution after reading about service proxycircular code verification issues for disconnected clients that projectneuromancer exposed. A surrogate security verifier as well as acodebase surrogate.

All this would be implemented with minimal changes to services andclients configurations and no change to third party library code, unlikemy evolving objects framework proposals.

After receiving a tip off from Michael Warres, Tim Blackman was graciousenough to share learnings from his research on class loader tree's. Timbuilt a prototype system using message digests and was consideringimplementing textual Class API signatures for identifying compatibilitybetween different class bytecode's. Tim considered the textual APIsignatures when he found independent vendor compiler optimisationsproduced different bytecode, hence different SHA-1 signatures, althoughthey have identical and compatible class API. I thought about thisfurther and realised that Binary Compatiblity for class files andpackage change is far more flexible than source code compatibility.While Tim concentrated on API compatibility for ensuring objects thatshould be shared, could be, he found that groups of class files, basedon dependency analysis (this is where the replacement ClassDep code camefrom), required their own ClassLoader's, hence there are a significantnumber of class loader instances required for maximum compatibility(without going into more detail).

In essence, the solution I'm striving for, is to solve the problem in adistributed world that OSGi solves in the JVM; segregation and isolationof incompatibility while allowing compatible implementations tocooperate. However I want an implementation without commitment to anyparticular container or module technology, so as not to force containerimplementation choices on projects that already have their specificcontainer implementations.

Rather than reinventing another container technology, all jar files aservice's client requires, could be uploaded to codebase services, justprior to service registration. The codebase service could analyse,repackage and sign the jar files into compatible bundles, dynamiccontainers if you wish, one for each ClassLoader, where each classloader represents a Package API group signature.

Using the uploaded jar files, the codebase services could generate andpropagate analysis reports amongst themselves in a p2p fashion, suchthat between them, they could determine the latest binary compatibleversion of a package, such that the latest compatible version wouldalways be preferred. Once the latest version is identified, a codebaseservice can verify, with it's own analysis, in order to confirm andreport malicious or malfunctioning codebase servers. Newer versions ofa Package, found to have broken Binary Backward compatibility, would bekept in a separate ClassLoader as determined by their API signature,thus incompatibility is isolated. There may be subgroups within apackage, that could also be shared between incompatible package versionsto provide improved class file and object sharing.

Hence a client receiving bytecode, could choose to channel it throughone or more codebase servers that it has trust relationships with. Abytecode trust surrogate, the preferred codebase server could retrieverequired bytecode that it doesn't already posses via lookup services ofother codebase service locations. The bytecode recipient would retrieveanalysis information detailing bytecode implementation security concernsprior to loading any bytecode. The codebase server would not executeany untrusted bytecode itself, only perform analysis using the ASMlibrary, the aim would be that a codebase server was as secure aspossible, such that it can be considered trustworthy and as imperviousto attack as possible(existing denial of service attack strategiesrequire consideration). One could even perform tests on codebases, byuploading deliberately malicious code and checking resulting analysisreports, or by occasionally confirming the analysis reports with othercodebases or using a local codebase analysis processes. Separation ofconcerns.

Codebase Services would only be required to maintain a copy of theevolution bloodline for the latest binary backward compatible package.A package fork or breaking of backward compatibility would mean storinga copy of both of the latest divergent compatibility signatures, againsome unchanged class subgroups may be shared between them. JavaBytecode versions (compiler specific) would also dictate which packageversion could be used safely in local JVM's.

Clients of services will have to accept a certain amount of downtime,once a particular instance of a package's classes are loaded into aclassloader, no other compatible implementations of that package will beable to be loaded, this is only a problem for long lived service clientprocesses. Object state will need to be persisted while the JVMrestarts and reloads new bytecode (Serializable is also part of classAPI). This is due to the inability of an existing ClassLoader to reloadclasses (java debug excluded). Backward Binary compatibility doesn'tnecessarily infer forward compatibility, classes and interfaces can addmethods without breaking compatibility with pre existing binaries,visibility can become more visible, abstract methods can become nonabstract, even though some of these changes break source codecompatibility, old clients aren't aware of the new methods and don'texecute them. For specifics see Chapter 13, Binary Compatibility of theJava Language Specification, 3rd Edition, this is what I plan to basethe compatibility analysis upon.

It would also be possible for services to utilise codebase servers intheir classpath.

These issues I propose tackling are not simple obstacles, nor will theybe easy to implement, some issues may even be intractable, but what thehell, who' with me? That's why we got into this in the first placeisn't it? The challenge! Project Neuromancer highlighted areas forimprovement, if we address some of these, I believe that River canbecome the much vaunted and dreamt of semantic web.

I want problems identified so solutions can be devised, lets seeobjections & supporting logic or better ideas.


Cheers,

Peter.

Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Reply via email to