Hi Dennis,
Reasoning and hopefully the why's? below.
Dennis Reedy wrote:
Hi Peter,
I was hoping to take a step back for a second, perhaps its just me that seems
to have my head spinning of late on this list. I may have missed some things,
but we've discussed many issues over the past week:
- How to advertise the DL jar(s) a service vends, allowing a client to download
requisite jars that allow the jars to be loaded from a local (trusted) location
Yes, we can use an Entry, or as Chris pointed out, if we annotate
MarshalledInstance's using a new Maven URL schema we can extract that
info and make it available via MarshalledServiceItem (An abstract class
that extends ServiceItem).
- Given the capability above, the need for a codebase service may not be
required
Agreed
- Conventions on how to develop River services, as it relates to jar naming,
packaging and what dependencies are between the various artifacts
- How to possibly move forward with utilizing Maven repositories and the
implied capabilities of published artifacts
- The development of a maven archetype to allow a developer to easily create a
working project in seconds
Yes to all above.
Your attention to detail and the documentation of how class loader interactions
with regards to security is great. I'd like to understand the requirements of
what you have documented below, the urge to refactor MarshalledInstance, and
why the new class loader hierarchy needs to be added to River.
The urge to refactor MarshalledInstance is to allow the URL annotation
to be requested directly and passed via StreamServiceRegistrar and
combined with delayed unmarshalling of proxy's via
MarshalledServiceItem, to allow the client to provision and provide an
alternate CodeSource if need be.
StreamServiceRegistrar returns a ResultStream<ServiceItem> , so you have
check with instanceof MarshalledServiceItem.
The new packaging Scheme can be applied to distributed objects also,
provided we create an implementation of CodebaseAccessClassLoader
(contributed by Gregg to replace RMIClassLoaderSPI) that performs or
requests local Maven archive provisioning.
The new ClassLoader hierarchy is needed, to solve class identity (fully
qualified runtime classname = class + ClassLoader), class visibility,
isolation and versioning problems, that PreferredClassProvider partially
solves.
Perhaps I'm just missing some fundamental issues, but maybe we need to take
some time and determine the whys before the hows? Is this direction fundamental
to the OSGi direction that you're taking? If so, how does this impact non-OSGi
based systems?
The changes are OSGi agnostic, OSGi will live in the application space,
so while they benefit OSGi, they are independent of it, so the same
benefits will apply to other software and OSGi isn't required.
I realised that fundamentally OSGi uses ClassLoaders for isolating
software into components, so implementation classes aren't exposed
outside of their module, something which OSGi does very well, it also
manages security concerns very well. Something else I realised, OSGi's
use of ClassLoaders is not optimum for distributed systems, there are
difficulties determining the correct ClassLoader during deserialization.
OSGi wasn't designed with Serialization in mind. Distributed computing
introduces another dimension, like going from 2D to 3D, in OSGi, you
only have one bundle version combination loaded (you can have many
bundles of different versions but I believe typically only one of each
unique bundle instance, you can have the same package version exported
by differently versioned bundles). So how do you determine the correct
ClassLoader during unmarshalling. In River we may have many proxy's
using the same jar version, however we don't want the proxy's
implementation to get all tied up in the local application bundles, we'd
be allowing the smart proxy to pollute the local application space, some
parts of the local application could see the proxy implementation.
In our new ClassLoader tree, a smart proxy can have it's own personal
ClassLoader, because the ContextClassLoader will be that of the proxy's
during returning object deserialization, since it initiated the
communication with the remote Service host. The reason a clients
parameter implementation cannot have it's own ClassLoader and must share
with other clients that use the same codebase and version is that they
have no link to the ClassLoader at the remote Service host, with ony the
Codebase and Version to go by, since they didn't initiate the
communication, there could otherwise be many ClassLoaders containing
that codebase version, there not enough information to find it, the last
thing I want to do is require the client have an identity or location to
deal with that deserialization of parameters at the Service node.
Rather than take, "how you use OSGi" and apply it to River, I decided to
understand why they solved their problems the way they did and learn
from it. It is a very good solution to the problem they've solved.
However with our solution we can solve the deserialization issue for
distributed applications utilising OSGi.
Currently River uses Permission grants based on ClassLoader, (so does
OSGi), what I realised was I needed a finer grained Permission grant and
having many ProtectionDomain's inside one ClassLoader is about as fine
as you can get. Only one ClassLoader is used for the API space for
class identity reasons, to allow maximum sharing of API classes because
you just can't control and coordinate someone else's JVM's ClassLoader
visibility, without overcoming some serious trust issues (Simpler is
better I don't even want to attempt to solve them!). There is however
one compromise with my approach.
By loading all API classes into the same ClassLoader, we cannot have
duplicate classes, so we must always load the latest API version, that
must not break backward compatibility. If the backward compatibility
constraints are hampering your design, it's simply better to deprecate a
package and append a number to change the package name. (Or create a
completely new API jar)
org.some.thing
org.some.thing2
The reason we version packages is so we don't have to rename them when
they break backward compatibility, this makes sense for implementations,
but not API. If your going to have long lived persistent objects they
belong in the API space, if you don't need to persist your objects, why
not have an interface and throwaway class implementations, this solves
Serialization exposing class internal state and evolution. Extend the
interface if you wan't new methods.
If a JVM has been running a long time, a new API version may have been
released, clients using the old API functionality only, won't be able to
see or utilise the new functionality until we restart the jvm. That is
the compromise. But I figure it's not too bad a compromise once API's
have stabilised and go into longer development cycles. I can handle
having to restart my JVM once every 6 months.
I think Michael Warres got to the crux of the problem with his
publication on ClassLoader issues, my interpretation of what he said, is
perhaps java should tear apart the multiple ClassLoader concerns, of
Security, Isolation and Identity and start again. I've chosen what
appears to me to be the best compromise based on Java ClassLoader's today.
So this new ClassLoader hierarchy should play nice with Maven, OSGi and
other stuff too, because now the API is visible to everything below in
the ClassLoader hierarchy, while the implementations below, don't expose
themselves, instead, everything cooperates through the API.
OSGi can be used to synchronize ClassLoader visibility between two
separate JVM's, however that still requires the implementer deal with
deserialization issues, with our solution, we won't have to worry much
about ClassLoader issues. With Maven, we won't have to worry about lost
codebases either.
Yep, it has been a bit of a head spin, needed your help to work out the
details before I forgot them.
There is one more detail, I'd like to include in the jar archive: a list
of permissions the jar needs. I'd like to use the same format OSGi
uses, because it's been done before, why be different. This is to solve
the: "what grants does it need?" Problem. So we can minimise permission
grants.
One more step towards the net...
Thanks
Dennis
On May 24, 2010, at 1034PM, Peter Firmstone wrote:
Thanks Chris,
Sound like it's time for some MarshalledInstance Refactoring?
Perhaps a Maven (generic if possible) URL schema (with message digest support), we
need an annotation (or name convention) that indicates whether proxy's can share
ClassLoader & ProtectionDomain space, dictated by static variables and common
Principals.
A new constructor for MarshalledInstance that accepts an alternate URL too.
... and two new methods in MarshalledInstance:
Object get(ClassLoader cl, CodeSource[] cs, boolean verifyCodeBaseIntegrity);
URL[] getCodeSourceAnnotation();
Then MarshalledServiceItem could include new methods:
public URL[] getCodeSourceAnnotation();
public Object getService( CodeSource[] cs );
//If cs == null || cs missing a CodeSource use default URL.
Note here that while unmarshalling has been delayed, I haven't relinquished control of ClassLoaders or ProtectionDomains, eg the client can use OSGi, without dictating the Service must also, none of the serialized instances from method returns will need to be deserialized by OSGi, avoiding altogether the OSGi deserialization issue.
The client application doesn't have to deal with these concerns directly, we could write multiple ResultStreamFilters that can be chained, the filter that matches the URL schema will unmarshall the service, the filter sequence will dictate the preferred unmarshalling. The filter responsible for successful unmarshalling would construct a new ServiceItem, that isn't unmarshalled, the next unmarshalling filter would ignore it, allowing it to pass through. After it is unmarshalled another filter will check method constraints.
Method Parameters that originate from client ClassLoaders will be unmarshalled
in the Application ClassLoader space on the Service implementation node, this
is where things get hairy if the Service API method parameters are non final,
abstract or interfaces. Any class that belongs to a Service API jar will be
safely loaded into the Jini Platform ClassLoader space in it's own
ProtectionDomain. Client returned parameter classes however will need their
own ClassLoader's
If the Service API is loaded into a Parent ClassLoader (Jini Platform
ClassLoader) at the Service implementation node and API parameters are
extended, the client classes will need their own ClassLoader space at the
Service Implementation end, Since a service may serve many clients, these
ClassLoaders must be shared, based on identical CodeSource and Principals. The
client classes will only be accessible via the Service API interfaces or
classes (they are abstracted).
ANY CLIENT THAT IMPLEMENTS AN API Interface or extends an API parameter, will
need to make it's implementation package jar publicly available. Like the
proxy implementation, it is free to change, however it should be versioned
appropriately, like the proxy and have it's own jar. ( This is where the Java
Package Version Spec comes in handy, we can annotate classes with Package
version and local CodeSource). The CodeSource might contain a file URL,
however it will contain the jar archive name (which is why Dennis want's to
name packages with their versions, which can't hurt!) and given the Package
Version Spec, it will work for OSGi bundles as well as Maven. A client using
an OSGi bundle must remember that all of the implementing classes should be in
the same bundle and the Service node and may not be utilising OSGi, so
shouldn't attempt to use any OSGi services in Service API parameter
implementations.
The version spec will identify compatiblity of classes, the closed compatible
local CodeSource may be used, otherwise a new ClassLoader will be used. Each
client will either share all compatible CodeSource and Principals or have their
own ClassLoader space.
Greg, do you think we could use your service-client.jar for client parameter
implementations or would this cause confusion?
Perhaps we should use:
service-param.jar
So to really round if off:
Service Implementers must produce versioned manifest jar archives of:
Smart Proxy:
Implementation jar: service.jar (depends on service-api.jar)
API jar: service-api.jar
Smart proxy jar: service-proxy.jar (depends on service-api.jar)
Selfish Smart proxy jar: service-iproxy.jar (depends on
service-api.jar)
Dumb Proxy:
Implementation jar: service.jar (depends on service-api.jar)
API jar: service-api.jar
Client Implementers must produce version manifest jar archives of:
Client Parameter extensions: service-param.jar
If you didn't guess correctly the Selfish Smart proxy jar is the one that
proxy's cannot share in the same ClassLoader and ProtectionDomain.
ClassLoader Structure (In addition to all your helpful comments on river-dev,
thanks also to Jim, Tim & Mike, planting the seed):
System ClassLoader
|
Extension ClassLoader (incl jsk-policy.jar)
|
Jini Platform ClassLoader (incl jsk-platform.jar, *-api.jar)
|
_______________|__________________________________
| | |
Application ClassLoader Proxy ClassLoader's Parameter Impl ClassLoader's
(Apps & Service Impl) (Smart Proxy's) (Remote client parameter
classes)
Advise History:
Jim: Use common Interfaces and classes in Parent ClassLoaders
Tim: Thanks for research on Dependency Tree and ClassLoader Tree's and
guidance.
Mike: Research paper on ClassLoader issues.
Thanks & Praise worth mentioning:
Bob Scheifler and others for Jini's strong Security foundation.
Bill Venners for the ServiceUI, it is truly innovative
(hint: come back)
Christopher Dolan wrote:
Isn't List<URL> already present in the MarshalledInstance? Why repeat
this as an Entry? Wouldn't it be easier to just add a public accessor
to deserialize the list of URLs from MarshalledInstance.locBytes?
I apologize if this was already explained, but there's been a LOT of
email to read on this list lately.
Chris
-----Original Message-----
From: Dennis Reedy [mailto:[email protected]] Sent: Saturday, May 22, 2010
9:29 AM
To: [email protected]
Subject: Re: Maven repository Entry was Re: Codebase service?
[CJD] ... <snip> ...
I would just go with a
List<String> dlJars;
With this you could provide support for retrieving the DL jar(s) for
non-maven systems as well. If the dlJars property contains 1 element and
is of the form groupId:artifactId:version:classifier, then maven
resolution gets used. Otherwise the DL jars can be obtained using the
codebase of the advertising service.
For maven resolution, I think you'll also want to either provide support
for parsing your maven settings.xml or include the repositories to go
find the artifact if it's not present. If the artifact is retrieved from
the repository it will have a message digest along side of it (with
either a .sha1 or .md5 extension). That can be used to compare a locally
computed digest HttpmdUtil.computeDigest() for updates. But that
comparison really only needs to take place for snapshots, since by
definition releases are considered immutable.
IMO supporting transitive deps is a must have, without that we really
dont get that far. A DL artifact may depend on another DL artifact, and
that DL artifact may have deps as well.