Hi Eric

On Fri, 06 May 2011 15:34 -0600, "Eric Wasylishen"
<[email protected]> wrote:
> 
> Last week I chatted with Quentin and we settled a few things:
> 
> - the undo/redo stack/tree must be persistent, so your computer
> restarting, or a process restarting, doesn't clear the stack.
> - the undo/redo user commands must apply to revision control operations
> like create branch, switch branch, merge, revert to previous version,
> etc. My favourite way of justifying this is to just look at the number of
> questions on stackoverflow.com of the form "Help, I accidentally did XYZ
> to my git/svn/hg respository, how do I undo it?" ;-)
> - applications need to have significant control over how undo/redo are
> implemented. It's part of the UI design and we don't want the database to
> be too limiting. 
> - the main failing of my ProjectDemo app was I never attempted to
> implement normal undo/redo. It was still a useful testbed for validating
> ideas, though.

These sound all okay, but I wonder if revision control operations should
be stored separately, so that we have a fully representative document
tree and another (purely linear) line of revision control operations. It
sounds difficult to store the revision control operations "in band", but
I suspect "out of band" operations have their own issues.
 
> We decided that I should try to revise OM4 with the above points in mind
> and try to build a simple composite document editor to validate it.

I have something basic that depends on my patches to OM4 and EtoileUI
which you can start with. Its just a basic outline-view editor. I'll
have to put it in a branch or something.

> > OM4 does not seem to provide a way to separate "core objects" so that you 
> > have separate object graphs that cannot overlap - it seems everything in 
> > the same repository can reference any other object in the repository. 
> 
> I originally did this because I wanted to avoid giving special properties
> to 'root' objects. My motivation for that principle is taking a 'root'
> document and moving it inside a document shouldn't change its properties
> too much (how it responds to revision control, etc.) I'm not totally sure
> how important that is, though.
> 
> > On first thought, this seems to be okay, but it makes it almost impossible 
> > to work on the same repository from different processes without a 
> > synchronisation or notification mechanism (which I don't think sqlite3 can 
> > provide).
> 
> Yeah, it'll take some extra work to support simultaneous editing between
> multiple processes.
> 
> > We need object roots at some point so that we can identify the top-level 
> > objects in a users workspace during search.
> 
> I'm not sure it's absolutely necessary - say you have a set of search
> results which include sub-nodes of documents; you could navigate through
> the nodes' parents until you hit document root nodes.
>

I still think the synchronisation will be the killer for this type of
structure. We'll need some sort of "synchronisation server" to
coordinate access to the repository. Without clearly defined object
roots, an object could go from root to embedded by a single user action.
This could mean we have to check if an object goes from being root to
embedded on every operation if we want to maintain a separate table of
object roots. It also means we'll have to introduce something like
conflict-resolution straight away (which is difficult) as every
programme can operate on any object. I suspect it also means that we
have to synchronise the usage of every object in the tree which could
make for a busy synchronisation server (although there are viable
commercial products that do this sort of thing at a "database row"
level). I don't know what it means for objects that become "orphaned"
e.g. deleting a link to an object could mean it becomes "root", even if
it makes no sense for that type of object (such as a e.g. PhoneNumber
object). It might just be easier to store a flag on the "root" object of
an object tree, and use that to locate object roots.

I hoped that we would use distinct object trees because it would be
easier to use whole object tree locking in earlier revisions and then
integrate some form of cross-context synchronisation later.  
 
> > Having a table for them would also help with indexing because we could 
> > store the object's type (used for opening it) and the date it was last 
> > created/accessed so that we can improve search results (i.e. put more 
> > recently accessed objects first in a search).
> 
> That sounds good.
> 
> Overall, segregating objects is probably a good idea. My hunch is that it
> should be mostly an implementation detail and not have any observable
> effect on the use of the library, but I'm not sure.

I'm not sure if its possible to use separate object roots without
visible effects to the user and to the developer. As you've shown in
your previous prototypes, moving internal objects from one tree to
another causes a delete operation and a insert operation, whereas a
totally integrated system just updates references. In either system, I
think we want some notion of "root objects" that is visible to the
developer and the user so that we know when to clean up orphaned objects
and user has a sense of where their environment begins and that
developers can correctly mark what objects can be "root" by their
nature.

> I don't feel too strongly about XML vs OpenStep plist. The choice
> shouldn't have any effect on FTS, though, because I just index the values
> of string properties, not the whole file.

Ok.

> > For simple properties, we should probably have a type column that specifies 
> > its type, even if we store the value as a string. For this case, sqlite 
> > provides almost automatic type conversion, which we could use to our 
> > advantage to help make search results more relevant. For example, if the 
> > user types in a string that can be converted to a date, we could also do a 
> > "date" search on the database against any date-type properties. This will 
> > expand the list of relevant results, especially where data is stored in a 
> > different locale to the way the user is using it (e.g. different date or 
> > number formats).
> 
> Something like that sounds good. Along the same lines, we could extend
> stemming/tokenizing to dates - we just need a date parser that can locate
> loosely formatted dates in text. Then we can search for "June 9" and pick
> up documents containing "06/09/11", say, since they both get stemmed to
> the same date representation.
> 
> btw, I did a bit of research a while ago on stemmers for text and it
> sounds like the state-of-the-art free one is Hunspell (as well as being
> the state-of-ther-art free spell checker - it's the openoffice.org
> spellchecker.) I've done some quick tests with it but that's about all.

Ok I'll have to check it out. I think sqlite uses libicu for stemming.

Cheers
Chris
-- 
  Christopher Armstrong
  carmstrong ^^AT^ fastmail dOT com /Dot/ au


_______________________________________________
Etoile-dev mailing list
[email protected]
https://mail.gna.org/listinfo/etoile-dev

Reply via email to