Re: [Tracker] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead

2008-12-09 Thread Philip Van Hoof
On Tue, 2008-12-09 at 18:00 +0530, Sankar wrote:

Hey Sankar,

I'm writing a plugin that will implement the Manager class as
described here. Tracker will then implement being a Registrar.

http://live.gnome.org/Evolution/Metadata

I will be using camel-db.h as you hinted me on IRC to implement the
features in a well performing way (direct SQLite access).

I will start working on this plugin tomorrow or next week. At the same
time I will be implementing support for it in Tracker, which will serve
as a prototype for other metadata engines.

I hope to inspire people from the Evolution team, and from the different
metadata engines, to comment on the proposed D-Bus API.

Let's try to get this valuable metadata out of those damn E-mail clients
and let's try to get it right this time. Not ad-hoc, but right.

If the namespace should be translated from org.gnome to org.freedesktop
we can of course do this afterwards. The metadata.Manager part would
also have to be renamed to a better name. But in the end implementation
would not be affected a lot by such renames. Meanwhile we can prototype
it in GNOME's Evolution D-Bus namespace.

The reason for all this prototyping is that we wouldn't like to release
a Tracker that doesn't support Evolution's new summary format.

This time we're into getting it right so just hacking around the new
summary format by fixing something that wrongfully interpreted
Evolution's cache by itself instead of letting Evolution tell us about
it ... 

* Well we could do this, but really ... let's just get it right now that
  I can spend time on this. At least that's my point of view on this.

* Other apps trying to read Evolution's caches externally just isn't
  ever going to be generic for all E-mail clients, and is not really
  right. For example file locking and (now that it's SQLite based)
  caring about transactions being held by Evolution and all that stuff.
  Caring about the possibility of Evolution changing the database
  schema.

* It's just not very nice to do it that way in my opinion: It adds a
  unasked for burden on the Evolution team too: having to negotiate with
  us when you want to change the schema of the database. Else you will
  break a lot of people's desktops unannounced. Evolution would need to
  make a mechanism for us to tell us about the version of the schema,
  for example. And we would have to implement things in Tracker that
  deal with all versions of Evolution's cache versions.

  One big spaghetti mess distributed over multiple projects.

  So, let's just do it right 


 On Mon, 2008-12-08 at 18:59 +0100, Philip Van Hoof wrote:
  All metadata engines are nowadays working on a method to let them get
  their metadata fed by external applications.
  Such APIs come down to storing RDF triples. A RDF triple comes down to a
  URI, a property and a value.
  
  For example (in Turtle format, which is SparQL's inline format and the
  typical w3's RDF storage format):
  We'd like to make an Evolution plugin that does this for Tracker. 
  
  Obviously would it be as easy as letting software like Beagle become an
  implementer of prox's InsertRDFTriples to start supporting Beagle with
  the same code and Evolution plugin, this way.
  
  I just don't know which EPlugin hooks I should use. Iterating all
  accounts and foreach account all folders and foreach folder all
  CamelMessageInfo instances is trivial and I know how to do this.
  
  What I don't know is what reliable hooks are for:
  
* Application started
 
 org.gnome.evolution.shell.events:1.0 - es-event.c - 
 
 sample plugin:
 groupwise-account-setup/org-gnome-gw-account-setup.eplug.xml 
 
 
* Account added
 
 org.gnome.evolution.mail.config:1.0 
 
 sample plugin:
 groupwise-account-setup/org-gnome-gw-account-setup.eplug.xml 
 
 For account-added: id = org.gnome.evolution.mail.config.accountDruid
 For account-edited: id = org.gnome.evolution.mail.config.accountEditor
 
* Account removed
 
 You may have to write a new hook
 
* Folder created
* Folder deleted
* Folder moved
* Message deleted (expunged)
* Message flagged for removal 
* Message flagged as Read and as Unread
* Message flagged (generic)
* Message moved (ie. deleted + created)
* New message received
  * Full message 
  * Just the ENVELOPE
  
 
 If you try to update your metadata for every of the above operations, it
 may be a overkill in terms of performance (and I believe more disk
 access as well for updating your metadata store). You can add a new hook
 while any change is made to the summary DB and listen to that. All the
 above changes will have to eventually come to summary DB for them to be
 valid.
 
 
 However, I personally believe:
 
 More and more applications are using sqlite (firefox and evolution my
 two most used apps.)  So, it may be a better idea to directly map the
 tables in an sqlite database into the search applications' data-store
 (beagle, tracker etc.) instead of depending on 

Re: [Tracker] [Evolution] Beagle and Tracker, letting Evolution feed those beasts RDF triples instead

2008-12-09 Thread Philip Van Hoof
On Tue, 2008-12-09 at 13:59 +0100, Philip Van Hoof wrote:
 On Tue, 2008-12-09 at 18:00 +0530, Sankar wrote:
 
 Hey Sankar,
 
 I'm writing a plugin that will implement the Manager class as
 described here. Tracker will then implement being a Registrar.
 
 http://live.gnome.org/Evolution/Metadata

For early visitors of that page, refresh because I have added/changed
quite a lot of it already.

This wiki page also serves as the description of the proposal. 

A experience developer should get a quite good idea of what will be
needed in Evolution:

Keeping timestamps around foreach message so that I can do a variation
of camel_db_read_message_info_records that accepts a since timestamp.

For example:

camel_db_message_infos_that_changed_since (db, since, callback, userd)

Something less easy is keeping track of deleted ones too. This would be
needed for the Unset and UnsetMany calls. Only thing that has to be
kept around is the UID. With direct access to IMAP I could implement
this in IMAP by searching for holes in the UID sets. For POP there's no
real other way than to just store all the UIDs or long-uids that ever
got expunged/popped/deleted and then locally deleted.

This is of course important for accurately cleaning up metadata engines
that want to accurately be aware of removed resources (removed E-mails).

This was also a painful part back when we manually parsed the summary
files: we had to scan all existing items to check if it's not in the
original summary file any longer. This meant having to parse-all,
scan-all, process-all each time we start up.

Not very nice for desktop-startup time :-(

Especially on mobile you want fast startup time and as few things as
possible to do before becoming operational. A metadata engine is of
course not good if it still has inaccuracies like metadata about data
that has long been removed in its stores.

Anyway, Evolution can easily log this as it's either updated by IMAP's
IDLE, unsolicited EXPUNGE events or NOTIFY, or by its synchronization
with POP or Evolution was the responsible for deleting the E-mail (or
apply this logic to E-mail protocol super X and E-mail protocol mega Y).


Let me know what you guys think ...

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list