Re: [Tracker] index evolution mail is broken

2008-12-02 Thread Martyn Russell

On 01/12/08 09:21, Jerry Tan wrote:

I check the format of evolution data file.

evolution store mail's meta data into a sqlite file named folders.db,
I can use sqlite to open it to check its schema.

it will create one table for every folder.
and store meta into it, including read/delete flag, subject, mail
from/to/cc,

But if I select "Copy folder content locally" and "sync",
evolution will store mails under my selected folder in plain txt file
under it.
to support full txt search, we need to parse these files also.


Hi,

Yes, I saw this yesterday too when looking into how things work. I think 
the best thing to do is to write a plugin for it which uses their 
internal APIs if possible.


Not sure when this will happen either. But with the new improvements 
made to the modules API we have in Tracker it should be easier to 
implement something.


--
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] index evolution mail is broken

2008-12-01 Thread Jerry Tan

I check the format of evolution data file.

evolution store mail's meta data into a sqlite file named folders.db,
I can use sqlite to open it to check its schema.

it will create  one table for every folder.
and store meta into it, including read/delete flag, subject, mail 
from/to/cc,


But if I select "Copy folder content locally" and "sync",
evolution will store mails under my selected folder in plain txt file 
under it.

to support full txt search, we need to parse these files also.



How will beagle guys work on this issue?
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] index evolution mail is broken

2008-11-30 Thread Laurent Aguerreche
Le dimanche 30 novembre 2008 à 22:47 +0100, Philip Van Hoof a écrit :
> On Sun, 2008-11-30 at 19:13 +0100, Laurent Aguerreche wrote:
> > Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit :
> > > On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote:
> > > > Evolution 2.24 has migrate to use sqlite to store mail's summary.
> > > > 
> > > > So the parser which is based on parsing Summary file is broken.
> > 
> > What I do not understand here is why Evo hackers have only replaced
> > summary files without including e-mail contents...
> 
> Because storing the content of E-mail in a database, for example as a
> BLOB, makes relatively few sense (as in: no sense whatsoever). Storing
> the metadata about the content of E-mail in a database does make some
> sense, though.
> 
> A lot of E-mail servers (like Dovecot, Cyrus, Isode's MBox, etc) store
> E-mail content in files too. An IMAP server is actually just a database
> with a frontend that happens to talk a specific RFC. Yet these groups of
> IMAP server developers still don't store things in relative databases.
> 
> 
> For an E-mail client:
> 
> IMHO Ideal would be to store E-mail messages as directories, with each
> MIME part of the E-mail being a separate file within that directory, and
> all other data stored in a database (headers could be stored as triples
> in a RDF triple store - like 3store -).
> 
> rename() on the folder-name can be used for the flags, just like Maildir
> does to make the format easy to reuse and backup, and instead of having
> to parse the entire message, and having to download the entire message,
> you can store individual parts-of-interest as individual files. Reducing
> format complexity. Having to MIME-parse Maildir and nearly all of the
> other local formats is among the reasons why I dislike most of the local
> formats.
> 
> If a client wants to reduce disk-space, it can remove attachments of
> E-mails that are cached locally and available remotely (like with IMAP)
> easily: the client would just have to unlink() a bunch of files in a
> bunch of directories.
> 
> Meanwhile you want to store the BODYSTRUCT and the ENVELOPE of E-mails
> and of MIME parts that are stored inside of E-mails (RFC822 forwarded
> messages) in a database or in an easily accessible format.
> 
> The reason for that is that ENVELOPE and BODYSTRUCT are precisely what a
> piece of software like Tracker needs to 'index' E-mail folders.
> 
> You might also want to base64-decode the MIME parts before storing, as
> this makes it more easy for indexers to index/scan the files (which
> encoding depends on the e-mail, you don't want to needlessly require
> indexer softwares to become as complex as having to understand your
> format and how to find out in what encoding MIME-part files are encoded,
> you instead want to store the original data as-is). This also makes
> sense from the POV of how you read (and store) XMP/exif/etc data: if
> encoded or compressed you need to use more memory before you can read
> (or write) that data. While harddisk-space is near to unlimited nowa-
> days, I/O access speed ain't.
> 
> > > A better idea would be that if we'd instead of trying to parse
> > > Evolution's file ourselves, make a Evolution plugin that over IPC,
> > > shmem, pipe() or whatever pumps its data over to us.
> > 
> > The last time I read Beagle's code, I found out it was also trying to
> > parse Evo's internal caches. Abstract access to Evo's caches is a good
> > idea but it can't be just a Tracker plugin, it has to be something used
> > by any program that wants to access Evo's data (and Evolution itself).
> 
> Sure
> 
> > It seems that Evo hackers are trying to replace Bonobo code in flavour
> > of DBus, would it be possible to us to also use DBus there or is it only
> > Evo internal things? See:
> > http://mail.gnome.org/archives/evolution-hackers/2008-November/msg9.html
> 
> The remote IPC (CORBA and now increasingly DBus) APIs that Evolution
> exposes have nothing to do with its E-mail functions. Only with
> calendaring, contact and TODO lists.
> 
> Evolution Data Server is not, what I call, "E-mail as a service".
> Everything about E-mail, in Evolution, happens in-process of the
> "evolution" process (that's the UI, the shell if you prefer that name). 
> 
> It's just a marketing stunt that "camel" is included in the EDS
> (Evolution Data Server) package. The "camel" library is technically not
> part of EDS. The Evolution shell dynamically links with it, and runs its
> code completely in-process of itself. That's unlike the services that
> provide the Calendar, Tasks, Todo and Contact data. Which are provided
> by the 'actual' Evolution Data Server.
> 
> Evolution Data Server "does not" serve E-mail. Don't let anybody tell
> you that, because it doesn't. (it's also not really a secret, just a
> misconception that a lot of people seem to have about Evolution).
> 
> Camel is indeed the library that Evolution uses for its E-mail
> abstraction, but

Re: [Tracker] index evolution mail is broken

2008-11-30 Thread Philip Van Hoof
On Sun, 2008-11-30 at 19:13 +0100, Laurent Aguerreche wrote:
> Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit :
> > On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote:
> > > Evolution 2.24 has migrate to use sqlite to store mail's summary.
> > > 
> > > So the parser which is based on parsing Summary file is broken.
> 
> What I do not understand here is why Evo hackers have only replaced
> summary files without including e-mail contents...

Because storing the content of E-mail in a database, for example as a
BLOB, makes relatively few sense (as in: no sense whatsoever). Storing
the metadata about the content of E-mail in a database does make some
sense, though.

A lot of E-mail servers (like Dovecot, Cyrus, Isode's MBox, etc) store
E-mail content in files too. An IMAP server is actually just a database
with a frontend that happens to talk a specific RFC. Yet these groups of
IMAP server developers still don't store things in relative databases.


For an E-mail client:

IMHO Ideal would be to store E-mail messages as directories, with each
MIME part of the E-mail being a separate file within that directory, and
all other data stored in a database (headers could be stored as triples
in a RDF triple store - like 3store -).

rename() on the folder-name can be used for the flags, just like Maildir
does to make the format easy to reuse and backup, and instead of having
to parse the entire message, and having to download the entire message,
you can store individual parts-of-interest as individual files. Reducing
format complexity. Having to MIME-parse Maildir and nearly all of the
other local formats is among the reasons why I dislike most of the local
formats.

If a client wants to reduce disk-space, it can remove attachments of
E-mails that are cached locally and available remotely (like with IMAP)
easily: the client would just have to unlink() a bunch of files in a
bunch of directories.

Meanwhile you want to store the BODYSTRUCT and the ENVELOPE of E-mails
and of MIME parts that are stored inside of E-mails (RFC822 forwarded
messages) in a database or in an easily accessible format.

The reason for that is that ENVELOPE and BODYSTRUCT are precisely what a
piece of software like Tracker needs to 'index' E-mail folders.

You might also want to base64-decode the MIME parts before storing, as
this makes it more easy for indexers to index/scan the files (which
encoding depends on the e-mail, you don't want to needlessly require
indexer softwares to become as complex as having to understand your
format and how to find out in what encoding MIME-part files are encoded,
you instead want to store the original data as-is). This also makes
sense from the POV of how you read (and store) XMP/exif/etc data: if
encoded or compressed you need to use more memory before you can read
(or write) that data. While harddisk-space is near to unlimited nowa-
days, I/O access speed ain't.

> > A better idea would be that if we'd instead of trying to parse
> > Evolution's file ourselves, make a Evolution plugin that over IPC,
> > shmem, pipe() or whatever pumps its data over to us.
> 
> The last time I read Beagle's code, I found out it was also trying to
> parse Evo's internal caches. Abstract access to Evo's caches is a good
> idea but it can't be just a Tracker plugin, it has to be something used
> by any program that wants to access Evo's data (and Evolution itself).

Sure

> It seems that Evo hackers are trying to replace Bonobo code in flavour
> of DBus, would it be possible to us to also use DBus there or is it only
> Evo internal things? See:
> http://mail.gnome.org/archives/evolution-hackers/2008-November/msg9.html

The remote IPC (CORBA and now increasingly DBus) APIs that Evolution
exposes have nothing to do with its E-mail functions. Only with
calendaring, contact and TODO lists.

Evolution Data Server is not, what I call, "E-mail as a service".
Everything about E-mail, in Evolution, happens in-process of the
"evolution" process (that's the UI, the shell if you prefer that name). 

It's just a marketing stunt that "camel" is included in the EDS
(Evolution Data Server) package. The "camel" library is technically not
part of EDS. The Evolution shell dynamically links with it, and runs its
code completely in-process of itself. That's unlike the services that
provide the Calendar, Tasks, Todo and Contact data. Which are provided
by the 'actual' Evolution Data Server.

Evolution Data Server "does not" serve E-mail. Don't let anybody tell
you that, because it doesn't. (it's also not really a secret, just a
misconception that a lot of people seem to have about Evolution).

Camel is indeed the library that Evolution uses for its E-mail
abstraction, but Camel is a normal shared library that runs in-process,
not a service that gets communicated with from another process (like the
shell).

Even worse. You can't use camel on top of the same "cache dir", as that
will make Camel write the same summary files, and th

Re: [Tracker] index evolution mail is broken

2008-11-30 Thread Laurent Aguerreche
Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit :
> On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote:
> > Evolution 2.24 has migrate to use sqlite to store mail's summary.
> > 
> > So the parser which is based on parsing Summary file is broken.

What I do not understand here is why Evo hackers have only replaced
summary files without including e-mail contents...

> 
> A better idea would be that if we'd instead of trying to parse
> Evolution's file ourselves, make a Evolution plugin that over IPC,
> shmem, pipe() or whatever pumps its data over to us.

The last time I read Beagle's code, I found out it was also trying to
parse Evo's internal caches. Abstract access to Evo's caches is a good
idea but it can't be just a Tracker plugin, it has to be something used
by any program that wants to access Evo's data (and Evolution itself).

It seems that Evo hackers are trying to replace Bonobo code in flavour
of DBus, would it be possible to us to also use DBus there or is it only
Evo internal things? See:
http://mail.gnome.org/archives/evolution-hackers/2008-November/msg9.html


Laurent.

> Right now trying to parse Evolution's hideous file formats is quite
> crazy, and each time they change their format we will have to fix our
> code too.
> 
> It's also "not correct" to read Evolution's internal cache files.
> Evolution is not designed to either cope with another process tampering
> with its caches nor will it care about the other process, at all.
> 
> If you guys at Sun want to join to fun, such a Evolution plugin would be
> an excellent contribution indeed. Perhaps also one for Thunderbird and
> some other E-mail clients ... and we can safely enter 2009!
> 
> 


signature.asc
Description: Ceci est une partie de message numériquement signée
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] index evolution mail is broken

2008-11-28 Thread Philip Van Hoof
On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote:
> Evolution 2.24 has migrate to use sqlite to store mail's summary.
> 
> So the parser which is based on parsing Summary file is broken.
> 

A better idea would be that if we'd instead of trying to parse
Evolution's file ourselves, make a Evolution plugin that over IPC,
shmem, pipe() or whatever pumps its data over to us.

Right now trying to parse Evolution's hideous file formats is quite
crazy, and each time they change their format we will have to fix our
code too.

It's also "not correct" to read Evolution's internal cache files.
Evolution is not designed to either cope with another process tampering
with its caches nor will it care about the other process, at all.

If you guys at Sun want to join to fun, such a Evolution plugin would be
an excellent contribution indeed. Perhaps also one for Thunderbird and
some other E-mail clients ... and we can safely enter 2009!


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list