Re: [Etoile-dev] Object graph Diff & Merging

Quentin Mathé Sat, 24 Jul 2010 14:37:26 -0700

Le 22 juil. 2010 à 21:55, Eric Wasylishen a écrit :

> On Jul 22, 2010, at 8:39 AM, Quentin Mathé wrote:
>
>> Le 20 juil. 2010 à 22:32, Eric Wasylishen a écrit :
>>
>>> I finally committed some code I've been working on for a while at 
>>> http://svn.gna.org/svn/etoile/branches/ericwa/ObjectMerging/
>>>
>>> The README gives a little introduction. It's basically my rethinking
>>> of how to implement CoreObject, along with some research I did on
>>> how to do object graph diff / merge with a CoreObject style object
>>> graph. Because our objects are labelled with UUID's, it turns out to
>>> be really easy (compared with a general XML diff/merge, which
>>> involves a lot of guesswork). I'm not doing anything novel, but I
>>> think the results will be very nice (with a simple implementation,
>>> too.)
>>>
>>> I'd be glad to hear what you guys think.
>>
>> I took a quick look at the code yesterday and I started to read the
>> papers mentioned in the README. I'm not sure this reimplementation is
>> what we want but I really like the result.
>
> I'm glad to hear you like it! :-)
>
>> Using a delta compression
>> based on model edit operations to record the history is really neat
>> btw :-)
>
> I'll just clarify that my code currently doesn't do any delta  
> compression;
> it's like unpacked git (when you commit, it saves a snapshot of all  
> modified
> objects). I will add delta compression of some kind, but I'm not yet  
> sure
> the best way to do it.


Ah ok.

> Doing a object graph diff and serializing that
> could be error prone - you have to make sure the object graph diff
> serialization/unserialization and creating/applying is 100% robust.

I see. I thought that was an implicit property of the diff algorithm/ 
model you use.
There are probably some border cases to be examined and tested  
carefully, but that seems achievable.

> It also increases the coupling between the low-level storage and
> higher level parts of CO.

Doesn't sound much worse than the one we have currently between  
EtoileSerialize and CoreObject.

> Another approach is to be like git and just do a separate binary diff
> on the serialized snapshot data. I like this because it makes storage
> conceptually simpler - it's just storing whole snapshots of objects
> but happens to delta compress related ones as an implementation  
> detail.

How would that work with media documents?
Suppose you work on a image that weights 300 MB and several commits  
per minute have to be done. User changes to record might even happen  
at very short interval (one second or two).

>> At first sight, the existing CoreObject and the reimplementation  
>> might
>> be closer than I thought initially. I mean operations are implemented
>> as classes in the reimplementation and as methods in the existing
>> CoreObject in a kinda dual way. If CoreObject as its stand currently
>> was restricted to COObject/COGroup strictly, they would be almost
>> identical since everything would revolve around operations such  
>> insert/
>> remove/update (-addObject:, -removeObject: and -
>> setValue:forProperty:). COObject in the reimplementation could even
>> support transparent persistency by wrapping it with proxy that
>> automatically calls -commit for the methods mentioned just before.
>
> Yeah. I'm not sure if doing a commit after every change would lead to
> too many history graph nodes or not - my idea was to keep the
> granularity of -commits to about the same as the granularity of
> undo in normal Mac applications. Ideally, the commits each have
> some meaning to the user. (I wrote some ideas in COHistoryGraphNode.h
> about attaching metadata to a commit to mark it as a "major  
> checkpoint",
> i.e. what happens when you click our replacement for the save button)

Yes, that's better to have meaningful commits.

Here is my take on that…
For simple model objects (photo, music etc.), setter or mutation  
methods usually correspond to a user-visible change. So CoreObject as  
it is work well here.
For compound document and content editing (image data, text), the  
mapping is more complex or inexistent. So user-visible change must be  
recorded at a higher-level.
Action handlers in EtoileUI provides an API that express these user- 
visible changes. In fact, that was roughly the Taligent solution to  
record the user actions in a persistent way.
Just to give another example… For CodeMonkey, the IDE class provides  
methods that corresponds to user actions, and is wrapped with a  
COProxy to record the changes.

That's not truly related, but ideally I'd like to have several "undo  
tracks". For example, multiple tracks would be:
- the document or object I'm working on
- the app-level or work context
- the library the object belongs to
- the overall UI (would record almost all other UI actions)
The last track would let me undo a window close or move. I'm not sure  
this last track is a realistic idea… Undoing a shutdown is hard ;-)  
Well various cases would be hard to undo or even record I think.

Presently this track notion is only partially related to object  
contexts in CoreObject, that's why I'm planning to rework  
COObjectContext into something closer to that.

> So for typing a document, doing a commit after every key press would
> probably be excessive, but after 5 words or something (IMHO the
> undo/redo granularity you get in OS X when typing is just right.)

I don't know. CO consumes disk space very quickly in my tests, so  
recording every key press sounds like a bad idea. A compression scheme  
could solve the problem or not.

If the recording granularity is bigger, we should still try to lose no  
key press :-) For this kind of use case, we could have something like  
a temporary track/log (at a global level) or a temporary branch (at  
the persistent root level) to save every key press (or similar  
serialized actions that quickly accumulate). The CoreObject garbage  
collector would then collect this temporary data when it has been  
superceded a recording at a higher granularity.

> Another example, moving a group of objects from one place to
> another should probably be one commit. I imagine this will be
> really common; it will happen whenever things are picked/dropped, etc.
>
>> A problem I would worry about in the reimplementation is its  
>> scability
>> with complex compound documents (e.g. a map edited with a vector
>> drawing app). For technical graphics documents, a tree structure that
>> contains 10 000 nodes and several objects as properties per node
>> doesn't seem unexpected. Computing the entire object graph diff in a
>> less than 100 ms sounds impossible. How would you handle such a case?
>
> I think the scalability should be pretty good. It should scale  
> roughly the same
> as a git/mercurial repository with 10000 files. I haven't done any  
> tests
> yet, though.

If there is no diff involved, it's easier ;-)

> I'll just quickly explain what the code currently does:
>
> - when you set a property/value of an object, the object's context
> records the object's UUID in a set, marking it as 'changed'. (just  
> like
> git/svn; when you change files in a working copy, the system has a
> fast way of checking which files were modified; I think they check
> modification time of all tracked files.)
>
> - when you commit, the object context just serializes the objects  
> which
> are marked as modified. (So there is never a diff of all 10000 objects
> in a normal cycle of edit - commit).
>
>
> If the user wants to see a diff of two versions of the document,
> the code would currently do a diff of all 10000 objects, which would
> likely be quite slow. However it can easily be improved to only
> diff the objects that were changed between the two revisions, since
> COHistoryGraphNodes records a list of UUIDS which were modified
> in that commit.
>
> Of course, there are still cases where we might have to diff
> all 10000 objects. I'll need to do some benchmarking.

ok

>> You also say selective undo support is planned but you don't explain
>> how… ?
>
> I think you can implement it pretty easily as a merge. Here's my  
> current
> idea:
>
> Suppose these are nodes in a history graph, and the current revision  
> is E.
>
> A---B---C---D---E
>
> The user wants to undo the changes made in revision B.
>
> What we could do is create a branch of B in which all edits made in  
> B are
> undone; i.e. it's the same state as in history graph node A, so call  
> it A'.
>
> A---B---C---D---E
>       \_A'
>
> Then just merge E and A' - I think this will be the same as a  
> selective undo.
> You'll get merge conflicts if there were any changes in C, D, or E  
> to which
> overlap with the changes being undone in B, but this is exactly what  
> you
> want.  I haven't tried it yet though - could be that I'm missing  
> some detail
> and this is nonsense :-)

Sounds like an interesting approach.
If C, D or E don't rely on the B state in any special way, this should  
work.
I mean, if B involves a state change expected by C, D and E… This  
state change must result into an overlap conflict, otherwise things  
could break.

btw have you taken a look at GINA which is mentioned in the Flexible  
Object Merging Framework paper? From what I read, it uses a command  
log very similar to CoreObject, and seems to support merging several  
message/command histories, this sounded very similar to what  
CoreObject intends to do.

> btw, the best resources I found on selective undo are this article,
> http://www.python.org/workshops/1997-10/proceedings/zukowski.html
> ,and the paper it references, "Undoing Actions in Collaborative Work:
> Framework and Experience'' by Prakash and Knister.

I'll take a look.

Quentin.


_______________________________________________
Etoile-dev mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-dev

Re: [Etoile-dev] Object graph Diff & Merging

Reply via email to