Re: [Tracker] index evolution mail is broken
On 01/12/08 09:21, Jerry Tan wrote: I check the format of evolution data file. evolution store mail's meta data into a sqlite file named folders.db, I can use sqlite to open it to check its schema. it will create one table for every folder. and store meta into it, including read/delete flag, subject, mail from/to/cc, But if I select "Copy folder content locally" and "sync", evolution will store mails under my selected folder in plain txt file under it. to support full txt search, we need to parse these files also. Hi, Yes, I saw this yesterday too when looking into how things work. I think the best thing to do is to write a plugin for it which uses their internal APIs if possible. Not sure when this will happen either. But with the new improvements made to the modules API we have in Tracker it should be easier to implement something. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] index evolution mail is broken
I check the format of evolution data file. evolution store mail's meta data into a sqlite file named folders.db, I can use sqlite to open it to check its schema. it will create one table for every folder. and store meta into it, including read/delete flag, subject, mail from/to/cc, But if I select "Copy folder content locally" and "sync", evolution will store mails under my selected folder in plain txt file under it. to support full txt search, we need to parse these files also. How will beagle guys work on this issue? ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] index evolution mail is broken
Le dimanche 30 novembre 2008 à 22:47 +0100, Philip Van Hoof a écrit : > On Sun, 2008-11-30 at 19:13 +0100, Laurent Aguerreche wrote: > > Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit : > > > On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote: > > > > Evolution 2.24 has migrate to use sqlite to store mail's summary. > > > > > > > > So the parser which is based on parsing Summary file is broken. > > > > What I do not understand here is why Evo hackers have only replaced > > summary files without including e-mail contents... > > Because storing the content of E-mail in a database, for example as a > BLOB, makes relatively few sense (as in: no sense whatsoever). Storing > the metadata about the content of E-mail in a database does make some > sense, though. > > A lot of E-mail servers (like Dovecot, Cyrus, Isode's MBox, etc) store > E-mail content in files too. An IMAP server is actually just a database > with a frontend that happens to talk a specific RFC. Yet these groups of > IMAP server developers still don't store things in relative databases. > > > For an E-mail client: > > IMHO Ideal would be to store E-mail messages as directories, with each > MIME part of the E-mail being a separate file within that directory, and > all other data stored in a database (headers could be stored as triples > in a RDF triple store - like 3store -). > > rename() on the folder-name can be used for the flags, just like Maildir > does to make the format easy to reuse and backup, and instead of having > to parse the entire message, and having to download the entire message, > you can store individual parts-of-interest as individual files. Reducing > format complexity. Having to MIME-parse Maildir and nearly all of the > other local formats is among the reasons why I dislike most of the local > formats. > > If a client wants to reduce disk-space, it can remove attachments of > E-mails that are cached locally and available remotely (like with IMAP) > easily: the client would just have to unlink() a bunch of files in a > bunch of directories. > > Meanwhile you want to store the BODYSTRUCT and the ENVELOPE of E-mails > and of MIME parts that are stored inside of E-mails (RFC822 forwarded > messages) in a database or in an easily accessible format. > > The reason for that is that ENVELOPE and BODYSTRUCT are precisely what a > piece of software like Tracker needs to 'index' E-mail folders. > > You might also want to base64-decode the MIME parts before storing, as > this makes it more easy for indexers to index/scan the files (which > encoding depends on the e-mail, you don't want to needlessly require > indexer softwares to become as complex as having to understand your > format and how to find out in what encoding MIME-part files are encoded, > you instead want to store the original data as-is). This also makes > sense from the POV of how you read (and store) XMP/exif/etc data: if > encoded or compressed you need to use more memory before you can read > (or write) that data. While harddisk-space is near to unlimited nowa- > days, I/O access speed ain't. > > > > A better idea would be that if we'd instead of trying to parse > > > Evolution's file ourselves, make a Evolution plugin that over IPC, > > > shmem, pipe() or whatever pumps its data over to us. > > > > The last time I read Beagle's code, I found out it was also trying to > > parse Evo's internal caches. Abstract access to Evo's caches is a good > > idea but it can't be just a Tracker plugin, it has to be something used > > by any program that wants to access Evo's data (and Evolution itself). > > Sure > > > It seems that Evo hackers are trying to replace Bonobo code in flavour > > of DBus, would it be possible to us to also use DBus there or is it only > > Evo internal things? See: > > http://mail.gnome.org/archives/evolution-hackers/2008-November/msg9.html > > The remote IPC (CORBA and now increasingly DBus) APIs that Evolution > exposes have nothing to do with its E-mail functions. Only with > calendaring, contact and TODO lists. > > Evolution Data Server is not, what I call, "E-mail as a service". > Everything about E-mail, in Evolution, happens in-process of the > "evolution" process (that's the UI, the shell if you prefer that name). > > It's just a marketing stunt that "camel" is included in the EDS > (Evolution Data Server) package. The "camel" library is technically not > part of EDS. The Evolution shell dynamically links with it, and runs its > code completely in-process of itself. That's unlike the services that > provide the Calendar, Tasks, Todo and Contact data. Which are provided > by the 'actual' Evolution Data Server. > > Evolution Data Server "does not" serve E-mail. Don't let anybody tell > you that, because it doesn't. (it's also not really a secret, just a > misconception that a lot of people seem to have about Evolution). > > Camel is indeed the library that Evolution uses for its E-mail > abstraction, but
Re: [Tracker] index evolution mail is broken
On Sun, 2008-11-30 at 19:13 +0100, Laurent Aguerreche wrote: > Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit : > > On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote: > > > Evolution 2.24 has migrate to use sqlite to store mail's summary. > > > > > > So the parser which is based on parsing Summary file is broken. > > What I do not understand here is why Evo hackers have only replaced > summary files without including e-mail contents... Because storing the content of E-mail in a database, for example as a BLOB, makes relatively few sense (as in: no sense whatsoever). Storing the metadata about the content of E-mail in a database does make some sense, though. A lot of E-mail servers (like Dovecot, Cyrus, Isode's MBox, etc) store E-mail content in files too. An IMAP server is actually just a database with a frontend that happens to talk a specific RFC. Yet these groups of IMAP server developers still don't store things in relative databases. For an E-mail client: IMHO Ideal would be to store E-mail messages as directories, with each MIME part of the E-mail being a separate file within that directory, and all other data stored in a database (headers could be stored as triples in a RDF triple store - like 3store -). rename() on the folder-name can be used for the flags, just like Maildir does to make the format easy to reuse and backup, and instead of having to parse the entire message, and having to download the entire message, you can store individual parts-of-interest as individual files. Reducing format complexity. Having to MIME-parse Maildir and nearly all of the other local formats is among the reasons why I dislike most of the local formats. If a client wants to reduce disk-space, it can remove attachments of E-mails that are cached locally and available remotely (like with IMAP) easily: the client would just have to unlink() a bunch of files in a bunch of directories. Meanwhile you want to store the BODYSTRUCT and the ENVELOPE of E-mails and of MIME parts that are stored inside of E-mails (RFC822 forwarded messages) in a database or in an easily accessible format. The reason for that is that ENVELOPE and BODYSTRUCT are precisely what a piece of software like Tracker needs to 'index' E-mail folders. You might also want to base64-decode the MIME parts before storing, as this makes it more easy for indexers to index/scan the files (which encoding depends on the e-mail, you don't want to needlessly require indexer softwares to become as complex as having to understand your format and how to find out in what encoding MIME-part files are encoded, you instead want to store the original data as-is). This also makes sense from the POV of how you read (and store) XMP/exif/etc data: if encoded or compressed you need to use more memory before you can read (or write) that data. While harddisk-space is near to unlimited nowa- days, I/O access speed ain't. > > A better idea would be that if we'd instead of trying to parse > > Evolution's file ourselves, make a Evolution plugin that over IPC, > > shmem, pipe() or whatever pumps its data over to us. > > The last time I read Beagle's code, I found out it was also trying to > parse Evo's internal caches. Abstract access to Evo's caches is a good > idea but it can't be just a Tracker plugin, it has to be something used > by any program that wants to access Evo's data (and Evolution itself). Sure > It seems that Evo hackers are trying to replace Bonobo code in flavour > of DBus, would it be possible to us to also use DBus there or is it only > Evo internal things? See: > http://mail.gnome.org/archives/evolution-hackers/2008-November/msg9.html The remote IPC (CORBA and now increasingly DBus) APIs that Evolution exposes have nothing to do with its E-mail functions. Only with calendaring, contact and TODO lists. Evolution Data Server is not, what I call, "E-mail as a service". Everything about E-mail, in Evolution, happens in-process of the "evolution" process (that's the UI, the shell if you prefer that name). It's just a marketing stunt that "camel" is included in the EDS (Evolution Data Server) package. The "camel" library is technically not part of EDS. The Evolution shell dynamically links with it, and runs its code completely in-process of itself. That's unlike the services that provide the Calendar, Tasks, Todo and Contact data. Which are provided by the 'actual' Evolution Data Server. Evolution Data Server "does not" serve E-mail. Don't let anybody tell you that, because it doesn't. (it's also not really a secret, just a misconception that a lot of people seem to have about Evolution). Camel is indeed the library that Evolution uses for its E-mail abstraction, but Camel is a normal shared library that runs in-process, not a service that gets communicated with from another process (like the shell). Even worse. You can't use camel on top of the same "cache dir", as that will make Camel write the same summary files, and th
Re: [Tracker] index evolution mail is broken
Le vendredi 28 novembre 2008 à 18:19 +0100, Philip Van Hoof a écrit : > On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote: > > Evolution 2.24 has migrate to use sqlite to store mail's summary. > > > > So the parser which is based on parsing Summary file is broken. What I do not understand here is why Evo hackers have only replaced summary files without including e-mail contents... > > A better idea would be that if we'd instead of trying to parse > Evolution's file ourselves, make a Evolution plugin that over IPC, > shmem, pipe() or whatever pumps its data over to us. The last time I read Beagle's code, I found out it was also trying to parse Evo's internal caches. Abstract access to Evo's caches is a good idea but it can't be just a Tracker plugin, it has to be something used by any program that wants to access Evo's data (and Evolution itself). It seems that Evo hackers are trying to replace Bonobo code in flavour of DBus, would it be possible to us to also use DBus there or is it only Evo internal things? See: http://mail.gnome.org/archives/evolution-hackers/2008-November/msg9.html Laurent. > Right now trying to parse Evolution's hideous file formats is quite > crazy, and each time they change their format we will have to fix our > code too. > > It's also "not correct" to read Evolution's internal cache files. > Evolution is not designed to either cope with another process tampering > with its caches nor will it care about the other process, at all. > > If you guys at Sun want to join to fun, such a Evolution plugin would be > an excellent contribution indeed. Perhaps also one for Thunderbird and > some other E-mail clients ... and we can safely enter 2009! > > signature.asc Description: Ceci est une partie de message numériquement signée ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] index evolution mail is broken
On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote: > Evolution 2.24 has migrate to use sqlite to store mail's summary. > > So the parser which is based on parsing Summary file is broken. > A better idea would be that if we'd instead of trying to parse Evolution's file ourselves, make a Evolution plugin that over IPC, shmem, pipe() or whatever pumps its data over to us. Right now trying to parse Evolution's hideous file formats is quite crazy, and each time they change their format we will have to fix our code too. It's also "not correct" to read Evolution's internal cache files. Evolution is not designed to either cope with another process tampering with its caches nor will it care about the other process, at all. If you guys at Sun want to join to fun, such a Evolution plugin would be an excellent contribution indeed. Perhaps also one for Thunderbird and some other E-mail clients ... and we can safely enter 2009! -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list