Re: Serialization in libgobject

Andrew Paprocki Wed, 24 May 2006 08:10:09 -0700

I'll outline the implementation of a real-world generic GObject 
serialization/deserialization system that we wrote from scratch to handle these 
issues.


> a) the GType system is not self-contained enough to recursively (de-)serialize
>     nested structures and objects (e.g. POINTER or BOXED types).

We added the notion of GAttributes which can be placed on the class, property, 
or signal level and work almost exactly like .NET attributes. They are lazily 
created only at access time, so an object does not instantiate every attached 
attribute when it is created, only when an external piece of code explicitly 
asks for the attributes. We have custom serialization/deserialization 
attributes that are meant for situations such as POINTER/BOXED where the type 
system itself can not infer what to do. While this is possible, we shy away 
from making these types of properties in general.

> b.1) should the target stream contain binary or text representations of the
>       values?

We use XML for representing the serialized data, and routinely 
serialize/deserialize from files/database records. As for transmission on the 
wire, I would suggest some other available tool be used to translate the 
serialized XML into, say, binary XML.
(http://www.w3.org/TR/wbxml/ ?)

> b.2) how are incompatibilities between object types/implementations and the
>       file format going to be handled?

All of our serialized XML data is versioned at the file and the object class 
level. I say object class, because a serialized GtkLabel could be using 
GtkLabel "v1" but GtkWidget "v2". We created a singleton registrar that allows 
classes to register GType+version to/from conversion routines. An API is 
provided so that inside the conversion routines, classes can easily transform 
properties. For example, in a conversion routine to go from "v1" to "v2", I 
could introspect for a property named "foobar", pull it out of the destination, 
and instead insert two separate "foo" and "bar" properties with the appropriate 
values. This system is extremely flexible and allows for correction of any API 
"mistakes" that later need to be fixed. 

Future library releases can advertise "GtkWidget v1 is no longer supported" and 
it should be treated the same as an API change. Objects are responsible for 
maintaining their version number. In the same place in class_init where the 
objects register any necessary converters, that object sets its current 
version. Thus, bumping version numbers is a code change and should be treated 
as an API change. Object-level versions _only_ need to be bumped when an 
incompatible change is being made, and _only_ when converters are necessary. By 
default, no object _needs_ converters because introspection will appropriately 
map everything. 

Our system is lax in the sense that class names can be changed without 
requiring extensive converters. I could simply map "GtkLabel" to "GtkNewLabel", 
change GtkLabel's class name, and bump its version number without providing 
converters, and the deserializer will automatically map properties to ones of 
the same name in the new class if they exist. (Not that this is ever really 
used, but it has proved invaluable in the past for getting out of some sticky 
situations.)

While all this flexibility exists, even class-level versions very rarely bump. 
I believe in the past year, we've only needed to bump 2 class-level versions 
and provide converters. The point is that this is not as scary as it sounds, if 
you are thinking the code will bloat with hundreds of converters all over the 
place.

> b.3) how's file versioning going to be handled?

As I said above, we have a global file-level version as well as object class 
versions. To provide a real-world example, we have been serializing GObjects 
for years now and our file-level version is still 1. IMO, the file-level 
version is only a nice-to-have in case the format is ever going to be moved 
away from XML to a better (?) format in the future.

> b.4) how is data being handled that isn't reflected by the object property
>       interface?

We try to avoid this whenever possible because of the problem it poses to 
deserialization, but we finally bit the bullet and made a system to handle 
this. We have an attribute that we can place on the property level which allows 
us to modify the deserialization order of the properties. Not pretty, but it 
works fine.

> b.5) how is storage of defaulting properties handled?

We handle this in two ways. We only save non-default values "by default". At 
deserialization time, an instance of an object is created via the type system 
and the properties of the current object are compared against the "default" 
object instance. Anything that is different is automatically serialized. To 
tweak this behavior, we have an attribute that can be placed on the property 
level to force it to always serialize without comparing against the "default" 
value. Only one object instance of any given type exists at a time during 
serialization and this is very fast (even though it sounds like it might not 
be). 

> b.6) how are properties classified to distinguish between ones that are used
>       at the GUI, as programming interface or to reflect serializable object
>       state, and any combinations thereof?

We handle this by attaching appropriate attributes to properties that we need 
to classify a certain way, and then we can instruct the serializer to "only 
serialize properties on a class if they have an attribute of GType <type 
here>". This way, the same system allows to handle "deep copy" as well as some 
other exception cases.

> b.7) how are object and structure pointers being handled?
>       - can/should they be saved by reference or recursively by value?
>       - and when restoring, factories and lookup mechanisms are needed to
>         resolve references. also, when restoring circular references on object
>         trees, properties can not anymore be restored in order.

We don't provide a system at the moment for serializing by reference, but 
individual objects can do it freely by using our custom serializer/deserializer 
attributes on the class level. Inside the custom functions, a singleton object 
coordinator can be used to provide "reference ids" if a by-reference object has 
already been serialized in the document. These ids will be resolved by the same 
singleton in the custom deserializer. Right now no one here does this, but we 
will need to provide this generic coordinator because we currently have objects 
that need by-reference serialization.

> the border line is, serialization/deserialisation is far from being esily 
> solved generically. and any non-generic implementation of this should better
> make sure to define exact usage cases to cover, otherwise it will fall short
> on most applications and overdo central abstractions.

I don't believe it is that difficult, it is just a sub-system that needs to be 
planned out with care. Our entire system has no pieces that depend on anything 
other than GObject and our GAttribute class and we have not come across any 
case that is not handled (aside from the lacking by-reference objects, which we 
can easily implement a solution for).

In short, serialization/deserialization is not something you want to tackle 
without the power and flexibility of .NET style class/property/signal 
attributes. Attributes let you achieve nearly anything you desire by tagging 
introspectable metadata on introspectable pieces of GObject. The attributes 
themselves support attributes, so you can nest them indefinetely if you have a 
situation that warrants it. (Just like in .NET) Many articles/books on .NET 
attributes exist, but here is a good description if anyone is interested:
http://www.csharphelp.com/archives3/archive558.html

Andrew Paprocki
Bloomberg LP
_______________________________________________
gtk-devel-list mailing list
gtk-devel-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-devel-list

Re: Serialization in libgobject

Reply via email to