In the paper below, Mirrors are metadata objects that represent class
file api, ie method signatures, field signatures...
The author states they use BCEL, to perform the static bytecode
analysis, we're using ASM. If a code base is trusted by a client, the
overhead that ISOMOD (A modular isolation framework) is subject to
during class loading, could be almost entirely avoided by analysis and
construction of the mirror objects at the code base.
This would allow a code base service to upload untrusted bytecode,
analyse it, produce the mirror object tree's and provide these to a
trusting client, the client would need to be aware that the code was not
fully trusted. Trusted package authors uploading jar files to a
codebase, would need to create access policy's for untrusted code /
packages to be enforced by clients during class loading and class name
resolution.
I suspect the authors use inheritance to provide a "view" of the
original class file, passing the "view" during class resolution, this
imposes limits on what methods can be hidden from another namespace:
* Classes must be public or package-private.
* Classes must be non-final
* Methods must be public, package-private or protected
* Methods must be non-final
A class "view" isn't required when hiding the entire class, only for
methods.
Trying to create access policies for working with untrusted code base
services doesn't seem to be worthwhile, it would increase the local
class loading processing overhead, since analysis would have to be
performed locally by the client. It would probably be best to simply
revert to the applet model of isolation.
I think initially I'll try implementing policies without constraints on
method visibility, that can wait until later.
Peter Firmstone wrote:
The comments on security below are based on this research paper:
http://pages.cpsc.ucalgary.ca/~pwlfong/Pub/fong-orr-2009-manuscript.pdf
I can see how I could restrict access by making certain classes
invisible to untrusted code using a class loader framework, however
finer grained access control applied to methods looks rather difficult
to implement. N.B. I tried contacting the paper's author, without luck.
I'm trying to figure out what the alternative, restricted view of the
entity could be? It can't be a reflective proxy, that requires
interfaces. Could it be an overridden copy of the class, created using
reflection, where the restricted methods are overridden to hide the
original method? Using polymorphism these would work in place of the
original class, it wouldn't work for final classes or public fields,
not ideal? Google-guice does something similar to this, however the
paper above criticizes this approach as adding runtime overheads.
Anyone have any ideas?
N.B. I'm not getting much time near my dev workstation (Ultra80
Solaris 10) right now, so haven't done anything about the River AR2
release, but will get there. Most of the information and my thoughts
here have been collected while on the road. Note for anyone wondering
about my health, I'm receiving treatment for a non-malignant brain
tumor, it has shrunk by 2mm. Don't let that worry you about offending
me with comments or questions however, I could use some assistance ;)
Cheers,
Peter.
The guts of the paper on page 5 reads:
In a dynamically extensible software system, the trusted application
core is defined in a parent
namespace, while child namespaces are created for defining untrusted
software extensions (Figure
1). Core application services are exposed to the extension code by
implicitly importing names
from the core application namespace to the extension namespace. ISOMOD
is a run-time module
system designed for isolating untrusted software extensions. It does
so by controlling the visibility
of names in the namespaces in which untrusted software extensions
reside. Specifically, an
ISOMOD namespace enforces two kinds of control: (1) restricting the
visibility of names that are
imported from the parent namespace, and (2) restricting the visibility
of locally defined names.
When a name is placed under visibility control, an ISOMOD namespace
may (a) control which
locally defined class can “see” the name, and (b) present an
alternative, restricted view of the entity
to which the name is bound. Every ISOMOD name space is endowed with a
custom name
visibility policy, which specifies visibility restrictions to be
imposed on the names visible in the
namespace. When appropriately constructed, an ISOMOD policy may be
used to selectively hide
core application services from untrusted extensions (Section 4.1 and
4.2), or impose collaboration
protocols among classes defined in the extension namespace (Section
4.3). A major contribution of
this work is the design of a policy language that can express a rich
family of access control policies
as fine-grained visibility constraints.
An ISOMOD namespace is an instance of a user-defined class loader
class. An ISOMOD class
loader performs extra checks on a classfile before converting it into
a Class object. Specifically,
class definition is only authorized when no external accesses in the
classfile are denied by the
policy. This late enforcement (i.e., load time) of visibility control
distinguishes ISOMOD from
traditional module systems, in which visibility control is enforced
only at compile time. It is this
feature that makes the ISOMOD module system into a viable protection
mechanism.
An ISOMOD namespace may be constructed at run-time by an application
core from an ISOMOD
policy. This late binding of access control policy to code not only
supports the separate
maintenance of code and policy, but also supports the presentation of
different views of the same
application core to different extensions.
Peter Firmstone wrote:
Some Implementation design thoughts on Security:
Security by Name space visibility and Trust within Package Class
loader's?
If each package is segregated into its own class loader and all
dependencies required by that package have been determined by Code
base analysis, then visibility should be limited to the classes and
methods discovered by the codebase server analysis and enforced at
class loading time .
A local namespace visibility policy (more fine grained than java
security policies) , might contain a list of allowable system methods
for code originating from untrusted entitites (even though the code
base is trusted and the code has been analysed). Any method
signatures in the downloaded code that didn't appear in the list as
allowable, would not be granted visibility, a default working set
could be created for distribution with River, all disallowed methods
are commented out.
Then in the worst cast of trust, where neither the code base or the
origin of the code is trusted, the list of required dependencies and
methods declared by the code base analysis are only allowed if they
are allowed locally. So if a code base were to submit code with non
disclosed methods, those methods would not be accessible to the
untrusted code. The dependency analysis information provided by the
code base forms a contract between untrusted parties.
Consider the following:
1. Code base A is trusted and has obtained it's code from another
trusted entity (who ever uploaded the code to the code base server
in the first place).
2. Code base B is untrusted.
3. Code base A is trusted and has obtained some code from Code base B
which is untrusted.
4. Trusted and Untrusted code will be loaded into separate class
loaders by a client JVM.
Note: my reference to methods, include protected or public
visibility, the terminology may be freely interchanged with fields
that are public or protected also.
Code base A could bundle and sign the trusted code, and bundle
without signing the untrusted code after analysis. (where bundle
means splitting an existing jar into multiple jar's after analysis,
one for each package).
The client would receive a dependency analysis report from Code base
A, the client would restrict the visibility of the untrusted code to
a subset of declared methods that are allowed.
Code base A, might later receive trusted code that is API compatible
with that of the untrusted code, this would be discovered by
analysis. From then on, Code base A would be able to provide trusted
code, to it's trusting clients when required.
This could lead to the desirable situation where a Client is
receiving a marshalled object stream from an untrusted service or
vice versa, both entities could obtain trusted byte code for
unmarshalling from their own preferred trusted code bases, regardless
of the source of the marshalled object stream.
In the worst case, code could be obtained from an untrusted code
base, however that byte code would not be able to access any methods
that had not been declared as required dependencies by the code base,
the declared methods would also be vetted against the local security
policy. In the worst case the code would be available with degraded
functionality, but will not violate the local security and namespace
visibility policy, unpermitted methods would not be visible in the
untrusted package's class loader.
However I've deliberately left out a scenario:
Interoperability between trusted and untrusted code?
What about untrusted application code interacting with trusted
application code? How does one restrict access for untrusted code?
Who is responsible for determining what methods should be accessible
by default, for application packages? The package might not exist in
the local JVM at load time, it may be downloaded later.
The onus in this case would have to be placed upon the trusted
application package distributor (as trusted by the code base) who may
at their discretion, change what methods untrusted code can safely
have access to. Hence there will need to be a means for the code base
to allow and provide name space visibility policies for application
code also. Determining trust is left to the client. An unknown third
party may become trusted by a client, if that party is trusted by a
trusted code base. A friend of a friend so to speak.
Perhaps trusted code should be limited to the codebase's declared
visibility requirements as an additional precaution, assisting with
analysis bug identification too. Perhaps different namespace
visibility policies could be developed for different trusted codebase
entities/identities, I'm not sure if this is an essential
requirement, however the implementation could be made extensible so
as not to exclude the possibility.
One other point:
Class load time delays caused by bytecode verification; perhaps
bytecode verification could be performed by the trusted code base,
eliminating the need to verify remote code, improving load time
response. Local code is not verified at load time by default. In this
case an administrator would trust their code bases and would not
under any circumstance allow bytecode to be utilised from untrusted
sources. But then with the New Verifier in Java SE 6 as a result of
JSR202... perhaps verification time has been mitigated somewhat?
Anyone have any input or implementation suggestions?
Regards,
Peter.
Peter Firmstone wrote:
Look forward to it mate,
N.B. this line should read:
* Codebase surrogates, for objects originating from periodically
disconnected services for clients to obtain their bytecode (they
also require Refreshable References and
Xuid's)
Cheers,
Peter.
Gregg Wonderly wrote:
Peter, I want to write up some questions and thoughts about this
post, but can't do that right now, hopefully I can in a day or so.
Gregg Wonderly
Peter Firmstone wrote:
I've had some more thoughts on Codebase services after spending
time researching & reflecting.
Issues I'd like to see addressed or simplified using Codebase
services:
* Codebase loss
* Codebase replication
* Codebase upgrades
* Codebase configuration
* Codebase surrogates, for objects originating from periodically
disconnected clients (they also require Refreshable References and
Xuid's)
* Bytecode Dependency Analysis & API signature identification, for
Package & Class Binary Compatiblity & ClassLoader Isolation
* Bytecode Static Security Analysis, repackaging & code signing.
On the last issue I've had some thoughts about Code bases being
able to act as a trust mediator to receive, analyse, repackage,
sign and forward bytecode on behalf of clients. The last two items
above fit into the category of Bytecode Analysis service
responsibilities for codebases. Prior to loading class files, a
client can have a trust relationship with one or more preferred
codebase providers. A code base provider also provides bytecode
static analysis services for security and binary compatibility
purposes.
I got thinking about this solution after reading about service
proxy circular code verification issues for disconnected clients
that project neuromancer exposed. A surrogate security verifier as
well as a codebase surrogate.
All this would be implemented with minimal changes to services and
clients configurations and no change to third party library code,
unlike my evolving objects framework proposals.
After receiving a tip off from Michael Warres, Tim Blackman was
gracious enough to share learnings from his research on class
loader tree's. Tim built a prototype system using message digests
and was considering implementing textual Class API signatures for
identifying compatibility between different class bytecode's. Tim
considered the textual API signatures when he found independent
vendor compiler optimisations produced different bytecode, hence
different SHA-1 signatures, although they have identical and
compatible class API. I thought about this further and realised
that Binary Compatiblity for class files and package change is far
more flexible than source code compatibility. While Tim
concentrated on API compatibility for ensuring objects that should
be shared, could be, he found that groups of class files, based on
dependency analysis (this is where the replacement ClassDep code
came from), required their own ClassLoader's, hence there are a
significant number of class loader instances required for maximum
compatibility (without going into more detail).
In essence, the solution I'm striving for, is to solve the problem
in a distributed world that OSGi solves in the JVM; segregation
and isolation of incompatibility while allowing compatible
implementations to cooperate. However I want an implementation
without commitment to any particular container or module
technology, so as not to force container implementation choices on
projects that already have their specific container implementations.
Rather than reinventing another container technology, all jar
files a service's client requires, could be uploaded to codebase
services, just prior to service registration. The codebase service
could analyse, repackage and sign the jar files into compatible
bundles, dynamic containers if you wish, one for each ClassLoader,
where each class loader represents a Package API group signature.
Using the uploaded jar files, the codebase services could generate
and propagate analysis reports amongst themselves in a p2p
fashion, such that between them, they could determine the latest
binary compatible version of a package, such that the latest
compatible version would always be preferred. Once the latest
version is identified, a codebase service can verify, with it's
own analysis, in order to confirm and report malicious or
malfunctioning codebase servers. Newer versions of a Package,
found to have broken Binary Backward compatibility, would be kept
in a separate ClassLoader as determined by their API signature,
thus incompatibility is isolated. There may be subgroups within a
package, that could also be shared between incompatible package
versions to provide improved class file and object sharing.
Hence a client receiving bytecode, could choose to channel it
through one or more codebase servers that it has trust
relationships with. A bytecode trust surrogate, the preferred
codebase server could retrieve required bytecode that it doesn't
already posses via lookup services of other codebase service
locations. The bytecode recipient would retrieve analysis
information detailing bytecode implementation security concerns
prior to loading any bytecode. The codebase server would not
execute any untrusted bytecode itself, only perform analysis using
the ASM library, the aim would be that a codebase server was as
secure as possible, such that it can be considered trustworthy and
as impervious to attack as possible(existing denial of service
attack strategies require consideration). One could even perform
tests on codebases, by uploading deliberately malicious code and
checking resulting analysis reports, or by occasionally confirming
the analysis reports with other codebases or using a local
codebase analysis processes. Separation of concerns.
Codebase Services would only be required to maintain a copy of the
evolution bloodline for the latest binary backward compatible
package. A package fork or breaking of backward compatibility
would mean storing a copy of both of the latest divergent
compatibility signatures, again some unchanged class subgroups may
be shared between them. Java Bytecode versions (compiler specific)
would also dictate which package version could be used safely in
local JVM's.
Clients of services will have to accept a certain amount of
downtime, once a particular instance of a package's classes are
loaded into a classloader, no other compatible implementations of
that package will be able to be loaded, this is only a problem for
long lived service client processes. Object state will need to be
persisted while the JVM restarts and reloads new bytecode
(Serializable is also part of class API). This is due to the
inability of an existing ClassLoader to reload classes (java debug
excluded). Backward Binary compatibility doesn't necessarily infer
forward compatibility, classes and interfaces can add methods
without breaking compatibility with pre existing binaries,
visibility can become more visible, abstract methods can become
non abstract, even though some of these changes break source code
compatibility, old clients aren't aware of the new methods and
don't execute them. For specifics see Chapter 13, Binary
Compatibility of the Java Language Specification, 3rd Edition,
this is what I plan to base the compatibility analysis upon.
It would also be possible for services to utilise codebase servers
in their classpath.
These issues I propose tackling are not simple obstacles, nor will
they be easy to implement, some issues may even be intractable,
but what the hell, who' with me? That's why we got into this in
the first place isn't it? The challenge! Project Neuromancer
highlighted areas for improvement, if we address some of these, I
believe that River can become the much vaunted and dreamt of
semantic web.
I want problems identified so solutions can be devised, lets see
objections & supporting logic or better ideas.
Cheers,
Peter.