Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Peter Firmstone Tue, 08 Sep 2009 16:14:16 -0700

Look forward to it mate,

N.B. this line should read:


  * Codebase surrogates, for objects originating from periodically

disconnected services for clients to obtain their bytecode (theyalso require Refreshable References and

    Xuid's)

Cheers,

Peter.


Gregg Wonderly wrote:

Peter, I want to write up some questions and thoughts about this post,but can't do that right now, hopefully I can in a day or so.
Gregg Wonderly

Peter Firmstone wrote:
I've had some more thoughts on Codebase services after spending timeresearching & reflecting.
Issues I'd like to see addressed or simplified using Codebase services:

   * Codebase loss
   * Codebase replication
   * Codebase upgrades
   * Codebase configuration
   * Codebase surrogates, for objects originating from periodically
     disconnected clients (they also require Refreshable References and
     Xuid's)
   * Bytecode Dependency Analysis & API signature identification, for
     Package & Class Binary Compatiblity & ClassLoader Isolation
   * Bytecode Static Security Analysis, repackaging & code signing.
On the last issue I've had some thoughts about Code bases being ableto act as a trust mediator to receive, analyse, repackage, sign andforward bytecode on behalf of clients. The last two items above fitinto the category of Bytecode Analysis service responsibilities forcodebases. Prior to loading class files, a client can have a trustrelationship with one or more preferred codebase providers. A codebase provider also provides bytecode static analysis services forsecurity and binary compatibility purposes.I got thinking about this solution after reading about service proxycircular code verification issues for disconnected clients thatproject neuromancer exposed. A surrogate security verifier as wellas a codebase surrogate.
All this would be implemented with minimal changes to services andclients configurations and no change to third party library code,unlike my evolving objects framework proposals.
After receiving a tip off from Michael Warres, Tim Blackman wasgracious enough to share learnings from his research on class loadertree's. Tim built a prototype system using message digests and wasconsidering implementing textual Class API signatures for identifyingcompatibility between different class bytecode's. Tim considered thetextual API signatures when he found independent vendor compileroptimisations produced different bytecode, hence different SHA-1signatures, although they have identical and compatible class API. Ithought about this further and realised that Binary Compatiblity forclass files and package change is far more flexible than source codecompatibility. While Tim concentrated on API compatibility forensuring objects that should be shared, could be, he found thatgroups of class files, based on dependency analysis (this is wherethe replacement ClassDep code came from), required their ownClassLoader's, hence there are a significant number of class loaderinstances required for maximum compatibility (without going into moredetail).
In essence, the solution I'm striving for, is to solve the problem ina distributed world that OSGi solves in the JVM; segregation andisolation of incompatibility while allowing compatibleimplementations to cooperate. However I want an implementationwithout commitment to any particular container or module technology,so as not to force container implementation choices on projects thatalready have their specific container implementations.
Rather than reinventing another container technology, all jar filesa service's client requires, could be uploaded to codebase services,just prior to service registration. The codebase service couldanalyse, repackage and sign the jar files into compatible bundles,dynamic containers if you wish, one for each ClassLoader, where eachclass loader represents a Package API group signature.
Using the uploaded jar files, the codebase services could generateand propagate analysis reports amongst themselves in a p2p fashion,such that between them, they could determine the latest binarycompatible version of a package, such that the latest compatibleversion would always be preferred. Once the latest version isidentified, a codebase service can verify, with it's own analysis, inorder to confirm and report malicious or malfunctioning codebaseservers. Newer versions of a Package, found to have broken BinaryBackward compatibility, would be kept in a separate ClassLoader asdetermined by their API signature, thus incompatibility is isolated.There may be subgroups within a package, that could also be sharedbetween incompatible package versions to provide improved class fileand object sharing.
Hence a client receiving bytecode, could choose to channel it throughone or more codebase servers that it has trust relationships with. Abytecode trust surrogate, the preferred codebase server couldretrieve required bytecode that it doesn't already posses via lookupservices of other codebase service locations. The bytecode recipientwould retrieve analysis information detailing bytecode implementationsecurity concerns prior to loading any bytecode. The codebase serverwould not execute any untrusted bytecode itself, only performanalysis using the ASM library, the aim would be that a codebaseserver was as secure as possible, such that it can be consideredtrustworthy and as impervious to attack as possible(existing denialof service attack strategies require consideration). One could evenperform tests on codebases, by uploading deliberately malicious codeand checking resulting analysis reports, or by occasionallyconfirming the analysis reports with other codebases or using a localcodebase analysis processes. Separation of concerns.
Codebase Services would only be required to maintain a copy of theevolution bloodline for the latest binary backward compatiblepackage. A package fork or breaking of backward compatibility wouldmean storing a copy of both of the latest divergent compatibilitysignatures, again some unchanged class subgroups may be sharedbetween them. Java Bytecode versions (compiler specific) would alsodictate which package version could be used safely in local JVM's.
Clients of services will have to accept a certain amount of downtime,once a particular instance of a package's classes are loaded into aclassloader, no other compatible implementations of that package willbe able to be loaded, this is only a problem for long lived serviceclient processes. Object state will need to be persisted while theJVM restarts and reloads new bytecode (Serializable is also part ofclass API). This is due to the inability of an existing ClassLoaderto reload classes (java debug excluded). Backward Binarycompatibility doesn't necessarily infer forward compatibility,classes and interfaces can add methods without breaking compatibilitywith pre existing binaries, visibility can become more visible,abstract methods can become non abstract, even though some of thesechanges break source code compatibility, old clients aren't aware ofthe new methods and don't execute them. For specifics see Chapter13, Binary Compatibility of the Java Language Specification, 3rdEdition, this is what I plan to base the compatibility analysis upon.
It would also be possible for services to utilise codebase servers intheir classpath.
These issues I propose tackling are not simple obstacles, nor willthey be easy to implement, some issues may even be intractable, butwhat the hell, who' with me? That's why we got into this in thefirst place isn't it? The challenge! Project Neuromancerhighlighted areas for improvement, if we address some of these, Ibelieve that River can become the much vaunted and dreamt of semanticweb.
I want problems identified so solutions can be devised, lets seeobjections & supporting logic or better ideas.
Cheers,

Peter.

Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Reply via email to