I've had some more thoughts on Codebase services after spending time
researching & reflecting.
Issues I'd like to see addressed or simplified using Codebase services:
* Codebase loss
* Codebase replication
* Codebase upgrades
* Codebase configuration
* Codebase surrogates, for objects originating from periodically
disconnected clients (they also require Refreshable References and
Xuid's)
* Bytecode Dependency Analysis & API signature identification, for
Package & Class Binary Compatiblity & ClassLoader Isolation
* Bytecode Static Security Analysis, repackaging & code signing.
On the last issue I've had some thoughts about Code bases being able to
act as a trust mediator to receive, analyse, repackage, sign and forward
bytecode on behalf of clients. The last two items above fit into the
category of Bytecode Analysis service responsibilities for codebases.
Prior to loading class files, a client can have a trust relationship
with one or more preferred codebase providers. A code base provider
also provides bytecode static analysis services for security and binary
compatibility purposes.
I got thinking about this solution after reading about service proxy
circular code verification issues for disconnected clients that project
neuromancer exposed. A surrogate security verifier as well as a
codebase surrogate.
All this would be implemented with minimal changes to services and
clients configurations and no change to third party library code, unlike
my evolving objects framework proposals.
After receiving a tip off from Michael Warres, Tim Blackman was gracious
enough to share learnings from his research on class loader tree's. Tim
built a prototype system using message digests and was considering
implementing textual Class API signatures for identifying compatibility
between different class bytecode's. Tim considered the textual API
signatures when he found independent vendor compiler optimisations
produced different bytecode, hence different SHA-1 signatures, although
they have identical and compatible class API. I thought about this
further and realised that Binary Compatiblity for class files and
package change is far more flexible than source code compatibility.
While Tim concentrated on API compatibility for ensuring objects that
should be shared, could be, he found that groups of class files, based
on dependency analysis (this is where the replacement ClassDep code came
from), required their own ClassLoader's, hence there are a significant
number of class loader instances required for maximum compatibility
(without going into more detail).
In essence, the solution I'm striving for, is to solve the problem in a
distributed world that OSGi solves in the JVM; segregation and isolation
of incompatibility while allowing compatible implementations to
cooperate. However I want an implementation without commitment to any
particular container or module technology, so as not to force container
implementation choices on projects that already have their specific
container implementations.
Rather than reinventing another container technology, all jar files a
service's client requires, could be uploaded to codebase services, just
prior to service registration. The codebase service could analyse,
repackage and sign the jar files into compatible bundles, dynamic
containers if you wish, one for each ClassLoader, where each class
loader represents a Package API group signature.
Using the uploaded jar files, the codebase services could generate and
propagate analysis reports amongst themselves in a p2p fashion, such
that between them, they could determine the latest binary compatible
version of a package, such that the latest compatible version would
always be preferred. Once the latest version is identified, a codebase
service can verify, with it's own analysis, in order to confirm and
report malicious or malfunctioning codebase servers. Newer versions of
a Package, found to have broken Binary Backward compatibility, would be
kept in a separate ClassLoader as determined by their API signature,
thus incompatibility is isolated. There may be subgroups within a
package, that could also be shared between incompatible package versions
to provide improved class file and object sharing.
Hence a client receiving bytecode, could choose to channel it through
one or more codebase servers that it has trust relationships with. A
bytecode trust surrogate, the preferred codebase server could retrieve
required bytecode that it doesn't already posses via lookup services of
other codebase service locations. The bytecode recipient would retrieve
analysis information detailing bytecode implementation security concerns
prior to loading any bytecode. The codebase server would not execute
any untrusted bytecode itself, only perform analysis using the ASM
library, the aim would be that a codebase server was as secure as
possible, such that it can be considered trustworthy and as impervious
to attack as possible(existing denial of service attack strategies
require consideration). One could even perform tests on codebases, by
uploading deliberately malicious code and checking resulting analysis
reports, or by occasionally confirming the analysis reports with other
codebases or using a local codebase analysis processes. Separation of
concerns.
Codebase Services would only be required to maintain a copy of the
evolution bloodline for the latest binary backward compatible package.
A package fork or breaking of backward compatibility would mean storing
a copy of both of the latest divergent compatibility signatures, again
some unchanged class subgroups may be shared between them. Java
Bytecode versions (compiler specific) would also dictate which package
version could be used safely in local JVM's.
Clients of services will have to accept a certain amount of downtime,
once a particular instance of a package's classes are loaded into a
classloader, no other compatible implementations of that package will be
able to be loaded, this is only a problem for long lived service client
processes. Object state will need to be persisted while the JVM
restarts and reloads new bytecode (Serializable is also part of class
API). This is due to the inability of an existing ClassLoader to reload
classes (java debug excluded). Backward Binary compatibility doesn't
necessarily infer forward compatibility, classes and interfaces can add
methods without breaking compatibility with pre existing binaries,
visibility can become more visible, abstract methods can become non
abstract, even though some of these changes break source code
compatibility, old clients aren't aware of the new methods and don't
execute them. For specifics see Chapter 13, Binary Compatibility of the
Java Language Specification, 3rd Edition, this is what I plan to base
the compatibility analysis upon.
It would also be possible for services to utilise codebase servers in
their classpath.
These issues I propose tackling are not simple obstacles, nor will they
be easy to implement, some issues may even be intractable, but what the
hell, who' with me? That's why we got into this in the first place
isn't it? The challenge! Project Neuromancer highlighted areas for
improvement, if we address some of these, I believe that River can
become the much vaunted and dreamt of semantic web.
I want problems identified so solutions can be devised, lets see
objections & supporting logic or better ideas.
Cheers,
Peter.