Le 3 août 2010 à 00:56, Eric Wasylishen a écrit :
>>> Another approach is to be like git and just do a separate binary
>>> diff
>>> on the serialized snapshot data. I like this because it makes
>>> storage
>>> conceptually simpler - it's just storing whole snapshots of objects
>>> but happens to delta compress related ones as an implementation
>>> detail.
>> How would that work with media documents?
>> Suppose you work on a image that weights 300 MB and several commits
>> per minute have to be done. User changes to record might even
>> happen at very short interval (one second or two).
>
> In my opinion, regardless of how CO is implemented, for photo/video
> editing what we have to keep persistent and versioned is the tree of
> drawing operations/filters that we discussed a bit already, rather
> than keeping the resulting bitmap data persistent and versioned.
>
> Two reasons:
> - you need the tree structure to do merge/selective undo,
> - and saving every snapshot would obviously eat disk space too fast
> (even with multi-terrabyte drives) :-)
Yes. My point was that additional operations are easier to support
with message-based persistency. You add a new method while in the
CoreObject reimplementation you have to define a new class I think.
For example… How would you express operations like 'blur' an image
area or 'cut a range' in a movie clip with a state-based CoreObject? I
suppose new operation subclasses would be added to express and save
them in the history graph?
> It might make sense to cache the bitmap data, but since it can be
> regenerated given the tree of drawing operations/filters, it
> probably doesn't make sense to keep old versions of the bitmap data
> given their potential size.
Agreed.
To comment a bit on the CoreObject reimplementation…
This makes me think that the need to cautious with the behavior and
arguments of messages that trigger persistency in the existing
CoreObject is now to shifted to the class that expresses the
operation. The advantage of your approach is that it automatically
reduces similar operations to a canonical operation (-removePerson:, -
addObject, -addWhateverAndPray: are automatically reduced to -
addObject:, -setValue:forProperty:, -removeObject:atIndex: etc.) and
this makes merging much easier and serialization safer.
My main concerns are I'm not sure it really solves some things what we
want solve:
- more transparent persistency (no explicit commit or database
connection management)
- store arbitrary objects (EtoileSerialize) or integrate foreign
object-model (COProxy)
The most problematic point would be the impossibility to add
persistency to EtoileUI, because persistent objects must be COObject
and store all their datas in a dictionary (no ivars).
From your perspective iirc, to support the extra things I outline
above introduces too much complexity. In the current state of
EtoileSerialize and CoreObject, I fully agree.
Although message-based persistency is not the panacea it appears to be
at first sight (e.g. it tends to favor big façade objects rather than
fine-grained objects with a clear role, and requires to be very
cautious with the behavior and arguments of messages that trigger
persistency), I still think it's a good approach because it's
operation-based rather than state-based and it also gives more
flexibility than a single model class.
>> That's not truly related, but ideally I'd like to have several
>> "undo tracks". For example, multiple tracks would be:
>> - the document or object I'm working on
>> - the app-level or work context
>> - the library the object belongs to
>> - the overall UI (would record almost all other UI actions)
>> The last track would let me undo a window close or move. I'm not
>> sure this last track is a realistic idea… Undoing a shutdown is
>> hard ;-) Well various cases would be hard to undo or even record I
>> think.
>>
>> Presently this track notion is only partially related to object
>> contexts in CoreObject, that's why I'm planning to rework
>> COObjectContext into something closer to that.
>
> I agree; we'll really need the undo tracks feature.
>
> One way I could see implementing this in my ObjectMerging project is
> by attaching metadata to the COHistoryGraphNode for each commit,
> like this:
> {document-uuid: XXX
> app-uuid: YYY
> library-uuid: ZZZ,
> .... (maybe other tracks) .... }
Yes, that should work.
> Then, supposing you want to do an undo/redo action for a particular
> document, you first filtering the overall history graph to get only
> the nodes with the correct document-uuid tag. The filtered history
> graph is then used to figure out which changes to undo at each step.
Right. In fact, that's what CoreObject does already when an object
context is restored to a past version. And this can already be
leveraged at the core object granularity level too.
> Since the nodes in the filtered history graph likely won't be
> adjacent in the overall history graph,
Right, but what matters to undo/redo in a single core object is
whether they are adjacent in this core object history rather than in
the entire core object graph history (aka overall history graph).
If the track records every message sent to a given core object and
just consists of the combined histories of several core objects, the
nodes would be adjacent at the persistent root granularity (exactly as
it the case with a COObjectContext history currently).
To create non-adjcent nodes, the track would have to select which
messages it logs based on a predicate. It sounds like an interesting
feature, but that's not what I was thinking about.
What I was suggesting is just the possibility to have core objects
that belongs to multiple object contexts at the same time rather than
a single one.
For exampe, when a message that triggers persistency is sent to a core
object, each track to which the object belongs to log the message.
Well in reality, the track uuids would be attached to the object
revision/message in the metadata db.
> undoing them will involve selective undo, which means merge
> conflicts could occur-
Yes, if the recorded messages are selected based on a predicate, no
otherwise I think.
> but I think this is okay and probably unavoidable when you have
> multiple undo tracks.
In some advanced cases, probably yes.
> btw, what do you think of my idea of modeling the history graph
> using the COHistoryGraphNode class?
From an implementation viewpoint, I don't think it's really needed,
we could just store the same data by improving/extending the current
history table in the metadata db. Then it's easy to query the history
in various ways or leverage the history to run other queries related
to the indexed content/properties.
For building a UI that lets you browse the history, a class like that
makes the versioning model explicit is nice. But I would rather write
it as thin layer around a query result.
>>>> You also say selective undo support is planned but you don't
>>>> explain
>>>> how… ?
>>>
>>> I think you can implement it pretty easily as a merge. Here's my
>>> current
>>> idea:
>>>
>>> Suppose these are nodes in a history graph, and the current
>>> revision is E.
>>>
>>> A---B---C---D---E
>>>
>>> The user wants to undo the changes made in revision B.
>>>
>>> What we could do is create a branch of B in which all edits made
>>> in B are
>>> undone; i.e. it's the same state as in history graph node A, so
>>> call it A'.
>>>
>>> A---B---C---D---E
>>> \_A'
>>>
>>> Then just merge E and A' - I think this will be the same as a
>>> selective undo.
>>> You'll get merge conflicts if there were any changes in C, D, or E
>>> to which
>>> overlap with the changes being undone in B, but this is exactly
>>> what you
>>> want. I haven't tried it yet though - could be that I'm missing
>>> some detail
>>> and this is nonsense :-)
>>
>> Sounds like an interesting approach.
>> If C, D or E don't rely on the B state in any special way, this
>> should work.
>> I mean, if B involves a state change expected by C, D and E… This
>> state change must result into an overlap conflict, otherwise things
>> could break.
>
> Right, the merge algorithm should correctly flag that as a conflict.
>
>> btw have you taken a look at GINA which is mentioned in the
>> Flexible Object Merging Framework paper? From what I read, it uses
>> a command log very similar to CoreObject, and seems to support
>> merging several message/command histories, this sounded very
>> similar to what CoreObject intends to do.
>
> I had a look at the GINA paper (T Berlage, A Genau. "A framework for
> shared applications with a replicated architecture"). They are
> using the Command Pattern, so you have to write a class for each
> operation which can modify document state. The command classes have
> methods like selectiveUndo, selectiveRedo, canSelectiveUndo,
> canSelectiveRedo. To merge two lists of commands, they don't do any
> transformations on them; they just concatenate the lists of commands.
>
> I also re-read the selective undo paper I mentioned ("A Framework
> for Undoing Actions in Collaborative Systems'' by Prakash and
> Knister - link:
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.4793&rep=rep1&type=pdf
>
> ) Everyone interested in selective undo/merging should check this
> out, I think it's a really good paper :-).
>
> They describe a general theory of how you can selectively undo/redo/
> merge operations. You need to define three functions:
>
> inverse(op) -> op' -- such that op' does the opposite of op.
> transpose(op1, op2) -> (op2', op1') --- such that applying op1
> followed by op2 has the same effect as applying op2' followed by
> op1'. This is the central opertaion in merging.
> conflicts?(op1, op2) -> bool --- true if op1 and op2 conflict. (as
> defined in the paper.)
Sounds interesting. I'll take a look at that.
For the papers you mention in the CoreObject reimplementation, both
were really good. The merge matrix is an interesting idea, I wouldn't
present that to the user, but it could be a nice way to represent the
merge settings at the developer-level. I also liked the possibility to
specify the merging node granularity (e.g. for a text document: word,
line, paragraph etc.) and the user priority per node.
> Looking at GINA from this viewpoint, the programmer writing the
> command classes has to define inverse() and conflicts(), but since
> you can't specify a transpose function in GINA, your command objects
> have to be able to be reordered without modifying them. This makes
> it tricky or impossible to write commands which modify arrays,
> because you can't store array indices in your command objects since
> they will become invalid if the array is changed before the command
> is executed. So I'm not sure if GINA really offers any interesting
> solutions.
hm ok. Do you suggest being able to adjust the array indice per
command based on which commands are skipped while replaying the
history would solve the problem?
I found this paper (I haven't read it yet) that seems to adjust old
commands to support selective undo:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.755&rep=rep1&type=pdf
I had the impression GINA relied on fine-grained command/message which
can give better merging results and better feedback for conflicts than
more coarse-grained commits. But that was pure speculation I admit :-)
> To return to CoreObject's implementation, I'm concerned that merge/
> selective undo isn't possible with pure message-based persistency.
> In order to support merging/selective undo, we need to be able to
> tell if operations conflict, get their inverse, and transpose groups
> of them. It's easy to define these for a small set of basic
> operations like setValue:forProperty:,
> insertObject:atIndex:ofProperty:, removeObject:atIndex:ofProperty:,
> etc., which is more or less what I did in ObjectMerging.
>
> But If the message log contains high level messages like "-
> refactorMethod:to:" or "-indentParagraph:" without any other
> information, you can't really do selective undo/merge. The only
> practical way I see of defining the inverse/transpose/conflicts
> functions on these high level operations is to record the high-level
> operations as a bunch of the primitive ones for which we already
> have inverse/transpose/conflicts defined.
>
> Then your message log looks something like this:
>
> {object: UUID1 recordMessage: -[setValue: 'abc' forProperty: 'bar']},
> {object: UUID1 recordMessage: -[setValue: 'def' forProperty: 'bar']},
> {object: UUID2 recordHighLevelMessage: -[refactorMethod: 'foo' to:
> 'bar'] definition: (
> {object: UUID2 message: -[setValue: 'bar' forProperty: 'name']},
> {object: UUID3 message: -[setValue: 'bar' forProperty: 'name']}
> )}
Sounds good.
Excep that UUID3 must not be a core object, but just a path inside the
core object UUID2, otherwise you get a side-effect that prevents the
deterministic replay.
> Now, this is really close to what I'm doing in ObjectMerging, except
> the groupings of low-level changes to record are indicated by doing
> a 'commit'. But it also no longer really looks like message-based
> persistency, because you're recording the state change..
I'm not sure to get your point. For a method or operation, I would say
the arguments encodes how the state will change. You don't record the
object state but just some additional metadatas/messages which could
even be represented as arguments I think. In the end, it is still
operation/message-based. For example…
objectUUID2 refactorMethod: foo' to: 'bar'
{
// Here -record would ask CoreObject to append the invocations to the
serialized -refactorMethod:to: invocation
// Alternatively we could handle that in a more implicit way by
recording every basic persistency messages invoked until the method
returns
[[objectUUID2 record] setValue: 'bar' forProperty: 'name'];y
// With objectUUID3 which is not a core object but a uniquely
identified object inside the object graph owned by the core object
(UUID2)
[[objectUUID3 record] setValue: 'bar' forProperty: 'name'];
}
can be rewritten as below:
objectUUID2 refactorMethod: 'foo' to: 'bar' setValue: bar forProperty:
on: objectUUID2 setValue: bar forProperty: name on: objectUUID3
{
[objectUUID2 setValue: bar forProperty: name];
[objectUUID3 setValue: bar forProperty: name];
}
This new method would be the one that triggers persistency instead of -
refactorMethod:to: that would just call it. This way you don't have to
record the intermediate -setValue:forProperty:, they get encoded in
the recorded message itself.
Does that make sense or am I completely off?
> What do you think?
I have to write some use cases on the paper and think about them :-) I
probably need to read some extra papers on the topic too.
It's a really tricky problem and how to integrate that cleanly without
too much complexity in the entire stack from EtoileSerialized to
EtoileUI hurts my brain ;-)
I found some other papers that could potential interest us:
- A document mark based on method supporting group undo
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.8003
- The Multi-version and Single-display Strategy in Undo Scheme (looks
like an updated version of the previous one)
http://jmyang.info/papers/cit_2005_undo.pdf
http://jmyang.info/slides/cit_2005_undo.ppt (slides)
- Consistency Maintenance Based on the Mark & Retrace Technique in
Groupware Systems (yet another more updated version)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.9726
http://jmyang.info/slides/group_2005_markretrace.ppt (slides)
- Undo Any Operation at Any Time in Group Editors
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.6266
- Undoing Any Operation in Collaborative Graphics Editing Systems
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.5993
- A Flexible Undo Framework for Collaborative Editing
http://hal.inria.fr/index.php?halsid=f54l7h2e5149op3i00vq8daid5&view_this_doc=inria-00275754&version=2
- A flexible multi-mode undo mechanism for a collaborative modeling
environment
http://portal.acm.org/citation.cfm?id=1813978
- A Temporal Model for Multi-Level Undo and Redo
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.8107
- A Selective Undo Mechanism for Graphical User Interfaces Based On
Command Objects (the one I mentioned earlier)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.755
- Reusable Hierarchical Commands (sounds similar to what you suggest
to support selective undo with message-based persistency)
http://portal.acm.org/citation.cfm?id=238386.238526&type=series
- Object-based nonlinear undo model
http://www.computer.org/portal/web/csdl/doi/10.1109/CMPSAC.1997.624739
More papers about collaborative editing on tree-structured documents:
- Operation-based versus State-based Merging in Asynchronous Graphical
Collaborative Editing
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCEW04.pdf
- Multi-level Editing of Hierarchical Documents
http://www.springerlink.com/content/472w061011830726/
- Tree-based model algorithm for maintaining consistency in real-time
collaborative editing systems
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCEW02.pdf
- Draw-Together: Graphical Editor for Collaborative Drawing
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatCSCW06.pdf
- Maintaining Consistency in Collaboration over Hierarchical Documents
(the thesis that relates to these previous papers)
http://www.loria.fr/~ignatcla/pmwiki/pub/papers/IgnatPhDThesis06.pdf
> Anyway, I hope this didn't get too long and rambling. :-)
So do I :-)
> Maybe we should have a skype meeting sometime to discuss this?
I think so. I will be away next week for vacations. So we could
organize it around August 20/30th.
Cheers,
Quentin.
_______________________________________________
Etoile-dev mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-dev