Re: [Pharo-project] Fuel - a fast object deployment tool

Eliot Miranda Wed, 15 Jun 2011 11:31:03 -0700

Hi Martin & Mariano,

    regarding filtering.  Yesterday my colleague Yaron and I successfully
finished our port of Fuel to Newspeak and are successfully using it to save
and restore our data sets; thank you, its a cool framework.  We had to
implement two extensions, the first of which the ability to save and restore
Newspeak classes, which is complex because these are instantiated classes
inside instantiated Newspeak modules, not static Smalltalk classes in the
Smalltalk dictionary.  The second extension is the ability to map specific
objects to nil, to prune objects on the way out.  I want to discuss this
latter extension.

In our data set we have a set of references to objects that are logically
not persistent and hence not to be saved.  I'm sure that this will be a
common case.  The requirement is for the pickling system to prune certain
objects, typically by arranging that when an object graph is pickled,
references to the pruned objects are replaced by references to nil.  One way
of doing this is as described below, by specifiying per-class lists of
instance variables whose referents shoudl not be saved.  But this can be
clumsy; there may be references to objects one wants to prune from e.g. more
than one class, in which case one may have to provide multiple lists of the
relevant inst vars; there may be references to objects one wants to prune
from e.g. collections (e.g. sets and dictionaries) in which case the
instance variable list approach just doesn't work.

Here are two more general schemes.  VFirst, most directly, Fuel could
provide two filters, implemented in the default mapper, or the core
analyser.  One is a set of classes whose instances are not to be saved.  Any
reference to an instance of a class in the toBePrunedClasses set is saved as
nil.  The other is a set of instances that are not to be saved, and also any
reference to an instance in the toBePruned set is saved as nil.  Why have
both?  It can be convenient and efficient to filter by class (in our case we
had many instances of a specific class, all of which should be filtered, and
finding them could be time consuming), but filtering by class can be too
inflexible, there may indeed be specific instances to exclude (thing for
example of part of the object graph that functions as a cache; pruning the
specific objects in the cache is the right thing to do; pruning all
instances of classes whose instances exist in the cache may prune too much).

As an example here's how we implemented pruning.  Our system is called Glue,
and we start with a mapper for Glue objects, FLGlueMapper:

FLMapper subclass: #FLGlueMapper
instanceVariableNames: 'prunedObjectClasses newspeakClassesCluster
modelClasses'
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Mappers'

It accepts newspeak objects and filters instances in the
prunedObjectsClasses set, and as a side-effect collects certain classes that
we need in a manifest:

FLGlueMapper>>accepts: anObject
"Tells if the received object is handled by this analyzer.  We want to
hand-off
 instantiated Newspeak classes to the newspeakClassesCluster, and we want
 to record other model classes.  We want to filter-out instances of any
class
 in prunedObjectClasses."
^anObject isBehavior
ifTrue:
[(self isInstantiatedNewspeakClass: anObject)
ifTrue: [true]
ifFalse:
[(anObject inheritsFrom: GlueDataObject) ifTrue:
[modelClasses add: anObject].
false]]
ifFalse:
[prunedObjectClasses includes: anObject class]

It prunes by mapping instances of the prunedObjectClasses to a special
cluster.  It can do this in visitObject: since any newspeak objects it is
accepting will be visited in its visitClassOrTrait: method (i.e. it's
implicit that all arguments to visitObjects: are instances of the
prunedObjectsClasses set).

FLGlueMapper>>visitObject: anObject

analyzer
mapAndTrace: anObject
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

FLPrunedObjectsCluster is a specialization of the nil,true,false cluster
that maps its objects to nil:

FLNilTrueFalseCluster subclass: #FLPrunedObjectsCluster
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'Fuel-Core-Clusters'

FLPrunedObjectsCluster >>serialize: aPrunedObject on: aWriteStream

super serialize: nil on: aWriteStream

So this would generalize by the analyser having an e.g. FLPruningMapper as
the first mapper, and this having a prunedObjects and a priunedObjectClasses
set and going something like this:

FLPruningMapper>>accepts: anObject
^(prunedObjects includes: anObject) or: [prunedObjectClasses includes:
anObject class]

FLPruningMapper >>visitObject: anObject
analyzer
mapAndTrace: anObject
to: FLPrunedObjectsCluster instance
into: analyzer clustersWithBaselevelObjects

and then one would provide accessors in FLSerialzer and/or FLAnalyser to add
objects and classes to the prunedObjects and prunedObjectClasses set.

For efficiency one could arrange that the FLPruningMapper was not added to
the sequence of mappers unless and until objects or classes were added
to the prunedObjects and prunedObjectClasses set.

I think both Yaron and I feel the Fuel framework is comprehensible and
flexible.  We enjoyed using it and while we took two passes at coming up
with the pruning scheme we liked (our first was based on not serializing
specific ins vars and was much more complex than our second, based on
pruning instances of specific classes) we got there quickly and will very
little frustration along the way.  Thank you very much.

Finally, a couple of things.  First, it may be more flexible to implement
fuelCluster as fuelClusterIn: anFLAnalyser so that if one is trying to
override certain parts of the mapping framework an implementation can access
the analyser to find existing clusters, e.g.

MyClass>>fuelClusterIn: anFLAnalyser
^self shouldBeInASpecialCluster
 ifTrue: [anFLAnalyser clusterWithId: MySpecialCluster id]
 ifFalse: [super fuelClusterIn: anFLAnalyser]

This makes it easier to find a specific unique cluster to handle a group of
objects specially.

Lastly, the class-side cluster ids are a bit of a pain.  It would be nice to
know a) are these byte values or general integer values, i.e. can there be
more than 256 types of cluster?, and b) is there any meaning to the ids?
 For example, are clusters ordered by id, or is this just an integer tag?
 Also, some class-side code to assign an unused id would be nice.

You might think of virtualizing the id scheme.  For example, if FLCluster
maintained a weak array of all its subclasses then the id of a cluster could
be the index in the array, and the array could be cleaned up occasionally.
 Then each fuel serialization could start with the list of cluster class
names and ids, so that specific values of ids are specific to a particular
serialization.

again thanks for a great framework.

best,
Eliot

On Mon, Jun 13, 2011 at 10:16 AM, Mariano Martinez Peck <
[email protected]> wrote:

>
>
> On Thu, Jun 9, 2011 at 3:35 AM, Eliot Miranda <[email protected]>wrote:
>
>> Hi Martin and Mariano,
>>
>>     a couple of questions.  What's the right way to exclude certain
>> objects from the serialization?  Is there a way of excluding certain inst
>> vars from certain objects?
>>
>>
>
> Eliot and the rest....Martin implemented this feature in
> Fuel-MartinDias.258. For the moment, we decided to put
> #fuelIgnoredInstanceVariableNames at class side.
>
> Behavior >> fuelIgnoredInstanceVariableNames
>     "Indicates which variables have to be ignored during serialization."
>
>     ^#()
>
>
> MyClass class >> fuelIgnoredInstanceVariableNames
>   ^ #('instVar1')
>
>
> The impact in speed is nothing, so this is good. Now....we were thinking if
> it is common to need that 2 different instances of the same class need
> different instVars to ignore. Is this common ? do you usually need this ?
> We checked in SIXX and it is at instance side. Java uses the prefix
> 'transient' so it is at class side...
>
> thanks
>
>
> --
> Mariano
> http://marianopeck.wordpress.com
>
>

Re: [Pharo-project] Fuel - a fast object deployment tool

Reply via email to