Re: [jruby-dev] Serialization and Persistence

Alan McKean Tue, 03 Jul 2007 18:53:05 -0700

Thanks for taking the time to think this through. Let me make sure Iunderstand. What you are proposing would decouple the runtime fromthe objects but still make it accessible via the ClassLoader. Thatseems to be a major part of the problem, but doesn't it still leavethe metaclass hierarchy coupled to the object? If we persist (orserialize) the object, we would want to replace the metaclassreference with enough information to be able to restore the metaclassreference when the object is paged back in from disk (ordeserialized). I was thinking that we had to decouple the runtime sothat we wouldn't be persisting things like threads and decouple themetaclass so that we wouldn't be storing metaclass hierarchy and allof the method dictionaries. Of course, storing the classes andmethods would open the door to having a distributed, transactionalIDE, so maybe we would want to flush that to the persistent store,too. But it seems like that would not be what we want for serialization.


Comments?


On Jul 3, 2007, at 12:33 AM, Charles Oliver Nutter wrote:

Alan McKean wrote:
Since the lack of Java serialization of JRuby objects stops usdead in our tracks when trying to hook up our persistence engine,I am interested in either getting someone on this end to work onit or jumping in myself. In either case, I need some background onthe JRuby runtime architecture and some guidance on particularissues. The issues are about how to detach an object from itsruntime elements and how to restore them when the object getsreloaded into memory:1) When we first tried saving a JRuby object to our database, wesaw it drag along a gaggle of runtime objects. Given that it mightbe loaded into a different VM when it is brought in from thedatabase,is reconnecting the object to a particular runtime important? Ifso, is there a way of determining which of the available runtimeswould be best to connect it to?2) Detaching an object from its 'runtime' variable and making the'metaclass' variable transient lets us store the object in ourdatabase without dragging much else along. But we need toreconnect things when the object is reloaded into memory. Is therea canonical name for the metaclass that we could store in thedatabase along with the instance? If not, what information isavailable for reconnecting. iWe persist type information in ourJava product by storing the fully-qualified name of the class withthe object, then lazily loading and initializing the connection(using the name) when we reload the object to memory. Will thiswork in JRuby?If someone has thought through a strategy for deserializing aJRuby object and restoring its connections to its runtime, I wouldlove to hear about it.
I've been looking into this a bit tonight. This email represents merambling.
Marking metaclass and finalizer as transient are no-brainers. I'mgoing to go ahead and commit that.
I'm going to see what would be needed to remove getRuntimeeverywhere it's needed. It would be a big job...be back in a fewminutes...
...ok I'm back. I think it's doable. Here's more rambling thoughts.

The runtime connection is used for a few things:

1. to construct other objects
This is mostly a self-fulfilling prophecy. Objects require aruntime when they're created, so all objects need runtime availableto create objects. If we break that chain, a number of places thatdepend on runtime disappear.
2. to locate classes in order to construct objects
This is a little harder to eliminate. In order to construct a Ruby"String" object, you need to have access to the "String" metaclass.That means having access to the place where the "String" metaclassis stored, currently in the runtime. Again, this is largely self-fulfilling; you need access to a metaclass to construct an object,so you need to locate the metaclass, and since the metaclasses arecurrently rooted in the runtime, you need the runtime. But theruntime dependency is largely peripheral to the use case.
3. to access runtime-global and thread-local data at execution time
This is probably the hardest to eliminate. Every thread Ruby codecreates or encounters is associated with a ThreadContext, whichcontains extra thread-local state needed for executing Ruby code.Every external thread that touches a given runtime is "adopted" andgiven a ThreadContext and a Ruby "Thread" avatar to represent it.So a given Java thread may have many ruby "Thread" and"ThreadContext" associated with it, one per runtime it has touched.This allows us to share threads across runtimes, rather than havinga given thread bound to a given runtime execlusively, as in manyother JVM languages. But it also requires that we locate theruntime, and therefore the ThreadContext, in a different way.Therefore, we have the runtime dependency.
This essentially sums up all the major reasons why we have so manydependencies in code on access to a runtime object. And ultimately,requiring access to a runtime object obliterates the possibility ofthird-party manipulation and transport of Ruby objects.
So to summarize, the three actual reasons we depend on runtimebeing present are as follows:
1. to access and maintain types associated with a specific rubyworldspace2. to access and maintain state associated with a specific rubyworldspace3. to provide execution state and primitives for code running in aspecific ruby worldspace
Now let's rewrite the list by substituting in a different conceptfor our top-level ruby worldspace:
1. to access and maintain types associated with a specific ClassLoader
2. to access and maintain static associated with a specificClassLoader3. to provide execution state and primitives for code running in aspecific ClassLoader
So let's examine how we'd solve these issues.
First off, IRubyObject.getRuntime(). Let's assume that theclassloader that loads the Ruby class is our chosen, ultimateworldspace:
public Ruby getRuntime() {
  JRubyClassLoader cl = (JRubyClassLoader)Ruby.class.getClassLoader();
  cl.getRuntime();
}
Everything else largely falls out of this. Starting up a newinstance of JRuby largely becomes the act of constructing the top-level classloader in which it will live and telling it to "go".
JRubyClassLoader cl = new JRubyClassLoader(..., properties);
cl.evalScript("puts 'hello'", "(eval)");
Everything lives underneath the classloader, and since all classeshave access to that classloader, all code can retrieve the runtimeassociated with it.
Would something like this work? The view from inside theclassloader seems pretty reasonable...we already have this rootcontext and partitioning as part of Java's classloader support, andit seems fairly natural to use it. But I'm not well-enough versedin Java serialization to know if this will solve ourdeserialization issues. It may require you to have more controlover the object stream...but of course if you have control over theobject stream, you could also just have it ask a specific runtimeto unmarshal objects, avoiding the issue completely.
Thoughts? More ideas?

- Charlie


---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email



---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email

Re: [jruby-dev] Serialization and Persistence

Reply via email to