Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Peter Firmstone Sat, 19 Sep 2009 22:30:24 -0700

In the paper below, Mirrors are metadata objects that represent classfile api, ie method signatures, field signatures...

The author states they use BCEL, to perform the static bytecodeanalysis, we're using ASM. If a code base is trusted by a client, theoverhead that ISOMOD (A modular isolation framework) is subject toduring class loading, could be almost entirely avoided by analysis andconstruction of the mirror objects at the code base.

This would allow a code base service to upload untrusted bytecode,analyse it, produce the mirror object tree's and provide these to atrusting client, the client would need to be aware that the code was notfully trusted. Trusted package authors uploading jar files to acodebase, would need to create access policy's for untrusted code /packages to be enforced by clients during class loading and class nameresolution.

I suspect the authors use inheritance to provide a "view" of theoriginal class file, passing the "view" during class resolution, thisimposes limits on what methods can be hidden from another namespace:


   * Classes must be public or package-private.
   * Classes must be non-final
   * Methods must be public, package-private or protected
   * Methods must be non-final

A class "view" isn't required when hiding the entire class, only formethods.

Trying to create access policies for working with untrusted code baseservices doesn't seem to be worthwhile, it would increase the localclass loading processing overhead, since analysis would have to beperformed locally by the client. It would probably be best to simplyrevert to the applet model of isolation.

I think initially I'll try implementing policies without constraints onmethod visibility, that can wait until later.


Peter Firmstone wrote:

The comments on security below are based on this research paper:

http://pages.cpsc.ucalgary.ca/~pwlfong/Pub/fong-orr-2009-manuscript.pdf
I can see how I could restrict access by making certain classesinvisible to untrusted code using a class loader framework, howeverfiner grained access control applied to methods looks rather difficultto implement. N.B. I tried contacting the paper's author, without luck.
I'm trying to figure out what the alternative, restricted view of theentity could be? It can't be a reflective proxy, that requiresinterfaces. Could it be an overridden copy of the class, created usingreflection, where the restricted methods are overridden to hide theoriginal method? Using polymorphism these would work in place of theoriginal class, it wouldn't work for final classes or public fields,not ideal? Google-guice does something similar to this, however thepaper above criticizes this approach as adding runtime overheads.
Anyone have any ideas?
N.B. I'm not getting much time near my dev workstation (Ultra80Solaris 10) right now, so haven't done anything about the River AR2release, but will get there. Most of the information and my thoughtshere have been collected while on the road. Note for anyone wonderingabout my health, I'm receiving treatment for a non-malignant braintumor, it has shrunk by 2mm. Don't let that worry you about offendingme with comments or questions however, I could use some assistance ;)
Cheers,

Peter.

The guts of the paper on page 5 reads:
In a dynamically extensible software system, the trusted applicationcore is defined in a parentnamespace, while child namespaces are created for defining untrustedsoftware extensions (Figure1). Core application services are exposed to the extension code byimplicitly importing namesfrom the core application namespace to the extension namespace. ISOMODis a run-time modulesystem designed for isolating untrusted software extensions. It doesso by controlling the visibilityof names in the namespaces in which untrusted software extensionsreside. Specifically, anISOMOD namespace enforces two kinds of control: (1) restricting thevisibility of names that areimported from the parent namespace, and (2) restricting the visibilityof locally defined names.When a name is placed under visibility control, an ISOMOD namespacemay (a) control whichlocally defined class can “see” the name, and (b) present analternative, restricted view of the entityto which the name is bound. Every ISOMOD name space is endowed with acustom namevisibility policy, which specifies visibility restrictions to beimposed on the names visible in thenamespace. When appropriately constructed, an ISOMOD policy may beused to selectively hidecore application services from untrusted extensions (Section 4.1 and4.2), or impose collaborationprotocols among classes defined in the extension namespace (Section4.3). A major contribution ofthis work is the design of a policy language that can express a richfamily of access control policies
as fine-grained visibility constraints.
An ISOMOD namespace is an instance of a user-defined class loaderclass. An ISOMOD classloader performs extra checks on a classfile before converting it intoa Class object. Specifically,class definition is only authorized when no external accesses in theclassfile are denied by thepolicy. This late enforcement (i.e., load time) of visibility controldistinguishes ISOMOD fromtraditional module systems, in which visibility control is enforcedonly at compile time. It is thisfeature that makes the ISOMOD module system into a viable protectionmechanism.An ISOMOD namespace may be constructed at run-time by an applicationcore from an ISOMODpolicy. This late binding of access control policy to code not onlysupports the separatemaintenance of code and policy, but also supports the presentation ofdifferent views of the same
application core to different extensions.

Peter Firmstone wrote:
Some Implementation design thoughts on Security:
Security by Name space visibility and Trust within Package Classloader's?
If each package is segregated into its own class loader and alldependencies required by that package have been determined by Codebase analysis, then visibility should be limited to the classes andmethods discovered by the codebase server analysis and enforced atclass loading time .A local namespace visibility policy (more fine grained than javasecurity policies) , might contain a list of allowable system methodsfor code originating from untrusted entitites (even though the codebase is trusted and the code has been analysed). Any methodsignatures in the downloaded code that didn't appear in the list asallowable, would not be granted visibility, a default working setcould be created for distribution with River, all disallowed methodsare commented out.
Then in the worst cast of trust, where neither the code base or theorigin of the code is trusted, the list of required dependencies andmethods declared by the code base analysis are only allowed if theyare allowed locally. So if a code base were to submit code with nondisclosed methods, those methods would not be accessible to theuntrusted code. The dependency analysis information provided by thecode base forms a contract between untrusted parties.
Consider the following:

1. Code base A is trusted and has obtained it's code from another
trusted entity (who ever uploaded the code to the code base server
in the first place).
2. Code base B is untrusted.
3. Code base A is trusted and has obtained some code from Code base B
which is untrusted.
4. Trusted and Untrusted code will be loaded into separate class
loaders by a client JVM.
Note: my reference to methods, include protected or publicvisibility, the terminology may be freely interchanged with fieldsthat are public or protected also.
Code base A could bundle and sign the trusted code, and bundlewithout signing the untrusted code after analysis. (where bundlemeans splitting an existing jar into multiple jar's after analysis,one for each package).
The client would receive a dependency analysis report from Code baseA, the client would restrict the visibility of the untrusted code toa subset of declared methods that are allowed.
Code base A, might later receive trusted code that is API compatiblewith that of the untrusted code, this would be discovered byanalysis. From then on, Code base A would be able to provide trustedcode, to it's trusting clients when required.
This could lead to the desirable situation where a Client isreceiving a marshalled object stream from an untrusted service orvice versa, both entities could obtain trusted byte code forunmarshalling from their own preferred trusted code bases, regardlessof the source of the marshalled object stream.
In the worst case, code could be obtained from an untrusted codebase, however that byte code would not be able to access any methodsthat had not been declared as required dependencies by the code base,the declared methods would also be vetted against the local securitypolicy. In the worst case the code would be available with degradedfunctionality, but will not violate the local security and namespacevisibility policy, unpermitted methods would not be visible in theuntrusted package's class loader.
However I've deliberately left out a scenario:

Interoperability between trusted and untrusted code?
What about untrusted application code interacting with trustedapplication code? How does one restrict access for untrusted code?Who is responsible for determining what methods should be accessibleby default, for application packages? The package might not exist inthe local JVM at load time, it may be downloaded later.
The onus in this case would have to be placed upon the trustedapplication package distributor (as trusted by the code base) who mayat their discretion, change what methods untrusted code can safelyhave access to. Hence there will need to be a means for the code baseto allow and provide name space visibility policies for applicationcode also. Determining trust is left to the client. An unknown thirdparty may become trusted by a client, if that party is trusted by atrusted code base. A friend of a friend so to speak.
Perhaps trusted code should be limited to the codebase's declaredvisibility requirements as an additional precaution, assisting withanalysis bug identification too. Perhaps different namespacevisibility policies could be developed for different trusted codebaseentities/identities, I'm not sure if this is an essentialrequirement, however the implementation could be made extensible soas not to exclude the possibility.
One other point:
Class load time delays caused by bytecode verification; perhapsbytecode verification could be performed by the trusted code base,eliminating the need to verify remote code, improving load timeresponse. Local code is not verified at load time by default. In thiscase an administrator would trust their code bases and would notunder any circumstance allow bytecode to be utilised from untrustedsources. But then with the New Verifier in Java SE 6 as a result ofJSR202... perhaps verification time has been mitigated somewhat?
Anyone have any input or implementation suggestions?

Regards,

Peter.


Peter Firmstone wrote:
Look forward to it mate,

N.B. this line should read:

* Codebase surrogates, for objects originating from periodically
disconnected services for clients to obtain their bytecode (theyalso require Refreshable References and
Xuid's)

Cheers,

Peter.


Gregg Wonderly wrote:
Peter, I want to write up some questions and thoughts about thispost, but can't do that right now, hopefully I can in a day or so.
Gregg Wonderly

Peter Firmstone wrote:
I've had some more thoughts on Codebase services after spendingtime researching & reflecting.
Issues I'd like to see addressed or simplified using Codebaseservices:
* Codebase loss
* Codebase replication
* Codebase upgrades
* Codebase configuration
* Codebase surrogates, for objects originating from periodically
disconnected clients (they also require Refreshable References and
Xuid's)
* Bytecode Dependency Analysis & API signature identification, for
Package & Class Binary Compatiblity & ClassLoader Isolation
* Bytecode Static Security Analysis, repackaging & code signing.
On the last issue I've had some thoughts about Code bases beingable to act as a trust mediator to receive, analyse, repackage,sign and forward bytecode on behalf of clients. The last two itemsabove fit into the category of Bytecode Analysis serviceresponsibilities for codebases. Prior to loading class files, aclient can have a trust relationship with one or more preferredcodebase providers. A code base provider also provides bytecodestatic analysis services for security and binary compatibilitypurposes.I got thinking about this solution after reading about serviceproxy circular code verification issues for disconnected clientsthat project neuromancer exposed. A surrogate security verifier aswell as a codebase surrogate.
All this would be implemented with minimal changes to services andclients configurations and no change to third party library code,unlike my evolving objects framework proposals.
After receiving a tip off from Michael Warres, Tim Blackman wasgracious enough to share learnings from his research on classloader tree's. Tim built a prototype system using message digestsand was considering implementing textual Class API signatures foridentifying compatibility between different class bytecode's. Timconsidered the textual API signatures when he found independentvendor compiler optimisations produced different bytecode, hencedifferent SHA-1 signatures, although they have identical andcompatible class API. I thought about this further and realisedthat Binary Compatiblity for class files and package change is farmore flexible than source code compatibility. While Timconcentrated on API compatibility for ensuring objects that shouldbe shared, could be, he found that groups of class files, based ondependency analysis (this is where the replacement ClassDep codecame from), required their own ClassLoader's, hence there are asignificant number of class loader instances required for maximumcompatibility (without going into more detail).
In essence, the solution I'm striving for, is to solve the problemin a distributed world that OSGi solves in the JVM; segregationand isolation of incompatibility while allowing compatibleimplementations to cooperate. However I want an implementationwithout commitment to any particular container or moduletechnology, so as not to force container implementation choices onprojects that already have their specific container implementations.
Rather than reinventing another container technology, all jarfiles a service's client requires, could be uploaded to codebaseservices, just prior to service registration. The codebase servicecould analyse, repackage and sign the jar files into compatiblebundles, dynamic containers if you wish, one for each ClassLoader,where each class loader represents a Package API group signature.
Using the uploaded jar files, the codebase services could generateand propagate analysis reports amongst themselves in a p2pfashion, such that between them, they could determine the latestbinary compatible version of a package, such that the latestcompatible version would always be preferred. Once the latestversion is identified, a codebase service can verify, with it'sown analysis, in order to confirm and report malicious ormalfunctioning codebase servers. Newer versions of a Package,found to have broken Binary Backward compatibility, would be keptin a separate ClassLoader as determined by their API signature,thus incompatibility is isolated. There may be subgroups within apackage, that could also be shared between incompatible packageversions to provide improved class file and object sharing.
Hence a client receiving bytecode, could choose to channel itthrough one or more codebase servers that it has trustrelationships with. A bytecode trust surrogate, the preferredcodebase server could retrieve required bytecode that it doesn'talready posses via lookup services of other codebase servicelocations. The bytecode recipient would retrieve analysisinformation detailing bytecode implementation security concernsprior to loading any bytecode. The codebase server would notexecute any untrusted bytecode itself, only perform analysis usingthe ASM library, the aim would be that a codebase server was assecure as possible, such that it can be considered trustworthy andas impervious to attack as possible(existing denial of serviceattack strategies require consideration). One could even performtests on codebases, by uploading deliberately malicious code andchecking resulting analysis reports, or by occasionally confirmingthe analysis reports with other codebases or using a localcodebase analysis processes. Separation of concerns.
Codebase Services would only be required to maintain a copy of theevolution bloodline for the latest binary backward compatiblepackage. A package fork or breaking of backward compatibilitywould mean storing a copy of both of the latest divergentcompatibility signatures, again some unchanged class subgroups maybe shared between them. Java Bytecode versions (compiler specific)would also dictate which package version could be used safely inlocal JVM's.
Clients of services will have to accept a certain amount ofdowntime, once a particular instance of a package's classes areloaded into a classloader, no other compatible implementations ofthat package will be able to be loaded, this is only a problem forlong lived service client processes. Object state will need to bepersisted while the JVM restarts and reloads new bytecode(Serializable is also part of class API). This is due to theinability of an existing ClassLoader to reload classes (java debugexcluded). Backward Binary compatibility doesn't necessarily inferforward compatibility, classes and interfaces can add methodswithout breaking compatibility with pre existing binaries,visibility can become more visible, abstract methods can becomenon abstract, even though some of these changes break source codecompatibility, old clients aren't aware of the new methods anddon't execute them. For specifics see Chapter 13, BinaryCompatibility of the Java Language Specification, 3rd Edition,this is what I plan to base the compatibility analysis upon.
It would also be possible for services to utilise codebase serversin their classpath.
These issues I propose tackling are not simple obstacles, nor willthey be easy to implement, some issues may even be intractable,but what the hell, who' with me? That's why we got into this inthe first place isn't it? The challenge! Project Neuromancerhighlighted areas for improvement, if we address some of these, Ibelieve that River can become the much vaunted and dreamt ofsemantic web.
I want problems identified so solutions can be devised, lets seeobjections & supporting logic or better ideas.
Cheers,

Peter.

Re: Moving River into the Semantic Web with Codebase Services & Bytecode Analysis services.

Reply via email to