Look forward to it mate,
N.B. this line should read:
* Codebase surrogates, for objects originating from periodically
disconnected services for clients to obtain their bytecode (they
also require Refreshable References and
Xuid's)
Cheers,
Peter.
Gregg Wonderly wrote:
Peter, I want to write up some questions and thoughts about this post,
but can't do that right now, hopefully I can in a day or so.
Gregg Wonderly
Peter Firmstone wrote:
I've had some more thoughts on Codebase services after spending time
researching & reflecting.
Issues I'd like to see addressed or simplified using Codebase services:
* Codebase loss
* Codebase replication
* Codebase upgrades
* Codebase configuration
* Codebase surrogates, for objects originating from periodically
disconnected clients (they also require Refreshable References and
Xuid's)
* Bytecode Dependency Analysis & API signature identification, for
Package & Class Binary Compatiblity & ClassLoader Isolation
* Bytecode Static Security Analysis, repackaging & code signing.
On the last issue I've had some thoughts about Code bases being able
to act as a trust mediator to receive, analyse, repackage, sign and
forward bytecode on behalf of clients. The last two items above fit
into the category of Bytecode Analysis service responsibilities for
codebases. Prior to loading class files, a client can have a trust
relationship with one or more preferred codebase providers. A code
base provider also provides bytecode static analysis services for
security and binary compatibility purposes.
I got thinking about this solution after reading about service proxy
circular code verification issues for disconnected clients that
project neuromancer exposed. A surrogate security verifier as well
as a codebase surrogate.
All this would be implemented with minimal changes to services and
clients configurations and no change to third party library code,
unlike my evolving objects framework proposals.
After receiving a tip off from Michael Warres, Tim Blackman was
gracious enough to share learnings from his research on class loader
tree's. Tim built a prototype system using message digests and was
considering implementing textual Class API signatures for identifying
compatibility between different class bytecode's. Tim considered the
textual API signatures when he found independent vendor compiler
optimisations produced different bytecode, hence different SHA-1
signatures, although they have identical and compatible class API. I
thought about this further and realised that Binary Compatiblity for
class files and package change is far more flexible than source code
compatibility. While Tim concentrated on API compatibility for
ensuring objects that should be shared, could be, he found that
groups of class files, based on dependency analysis (this is where
the replacement ClassDep code came from), required their own
ClassLoader's, hence there are a significant number of class loader
instances required for maximum compatibility (without going into more
detail).
In essence, the solution I'm striving for, is to solve the problem in
a distributed world that OSGi solves in the JVM; segregation and
isolation of incompatibility while allowing compatible
implementations to cooperate. However I want an implementation
without commitment to any particular container or module technology,
so as not to force container implementation choices on projects that
already have their specific container implementations.
Rather than reinventing another container technology, all jar files
a service's client requires, could be uploaded to codebase services,
just prior to service registration. The codebase service could
analyse, repackage and sign the jar files into compatible bundles,
dynamic containers if you wish, one for each ClassLoader, where each
class loader represents a Package API group signature.
Using the uploaded jar files, the codebase services could generate
and propagate analysis reports amongst themselves in a p2p fashion,
such that between them, they could determine the latest binary
compatible version of a package, such that the latest compatible
version would always be preferred. Once the latest version is
identified, a codebase service can verify, with it's own analysis, in
order to confirm and report malicious or malfunctioning codebase
servers. Newer versions of a Package, found to have broken Binary
Backward compatibility, would be kept in a separate ClassLoader as
determined by their API signature, thus incompatibility is isolated.
There may be subgroups within a package, that could also be shared
between incompatible package versions to provide improved class file
and object sharing.
Hence a client receiving bytecode, could choose to channel it through
one or more codebase servers that it has trust relationships with. A
bytecode trust surrogate, the preferred codebase server could
retrieve required bytecode that it doesn't already posses via lookup
services of other codebase service locations. The bytecode recipient
would retrieve analysis information detailing bytecode implementation
security concerns prior to loading any bytecode. The codebase server
would not execute any untrusted bytecode itself, only perform
analysis using the ASM library, the aim would be that a codebase
server was as secure as possible, such that it can be considered
trustworthy and as impervious to attack as possible(existing denial
of service attack strategies require consideration). One could even
perform tests on codebases, by uploading deliberately malicious code
and checking resulting analysis reports, or by occasionally
confirming the analysis reports with other codebases or using a local
codebase analysis processes. Separation of concerns.
Codebase Services would only be required to maintain a copy of the
evolution bloodline for the latest binary backward compatible
package. A package fork or breaking of backward compatibility would
mean storing a copy of both of the latest divergent compatibility
signatures, again some unchanged class subgroups may be shared
between them. Java Bytecode versions (compiler specific) would also
dictate which package version could be used safely in local JVM's.
Clients of services will have to accept a certain amount of downtime,
once a particular instance of a package's classes are loaded into a
classloader, no other compatible implementations of that package will
be able to be loaded, this is only a problem for long lived service
client processes. Object state will need to be persisted while the
JVM restarts and reloads new bytecode (Serializable is also part of
class API). This is due to the inability of an existing ClassLoader
to reload classes (java debug excluded). Backward Binary
compatibility doesn't necessarily infer forward compatibility,
classes and interfaces can add methods without breaking compatibility
with pre existing binaries, visibility can become more visible,
abstract methods can become non abstract, even though some of these
changes break source code compatibility, old clients aren't aware of
the new methods and don't execute them. For specifics see Chapter
13, Binary Compatibility of the Java Language Specification, 3rd
Edition, this is what I plan to base the compatibility analysis upon.
It would also be possible for services to utilise codebase servers in
their classpath.
These issues I propose tackling are not simple obstacles, nor will
they be easy to implement, some issues may even be intractable, but
what the hell, who' with me? That's why we got into this in the
first place isn't it? The challenge! Project Neuromancer
highlighted areas for improvement, if we address some of these, I
believe that River can become the much vaunted and dreamt of semantic
web.
I want problems identified so solutions can be devised, lets see
objections & supporting logic or better ideas.
Cheers,
Peter.