-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112712/
-----------------------------------------------------------

(Updated Oct. 9, 2013, 9:06 a.m.)


Review request for Nepomuk.


Changes
-------

This update resolves the first issue (the "id" variable is reused instead of 
being re-computed)

> Check the resource is non-empty before merging it

How can the resource be empty ? Every e-mail has at least a title and plain 
text content. Do you want me to check that the e-mail is actually valid and not 
an empty e-mail (a corrupted MIME file for instance) ?

> This can never go in KDE4, as it would create as cyclic dependency between 
> nepomuk-core and kdepimlibs...

Is it planned to split KMime and KABC out of kdepimlibs in KDE5 ? These 
libraries are very handy and don't use Nepomuk themselves, so the cyclic 
dependency could be avoided. If it is the case, maybe this review request can 
be kept until Nepomuk and kdepimlibs are ported to KDE5 ?


Repository: nepomuk-core


Description
-------

This patch adds three new files extractors to Nepomuk. Two of them are of 
general use, and the third (that can be removed if it hasn't its place in 
Nepomuk) is specific to the use-case described in 
http://steckdenis.be/post-2013-09-06-a-nepomuk-integration-plugin-for-firefox.html
 .

The MIME/mbox file extractor takes an mbox file or MIME files (as found in 
Maildir directory trees) and index them as NMO:Message objects. The full 
content of the e-mails is indexed along with their title, sender, receiver, 
CC/BCC, date and message ID. NCO:Contacts and NCO:EmailAddress are created when 
needed. The main use of this indexer is to index e-mails managed by mutt, 
Thunderbird or any other e-mail client that does not use Akonadi.

This indexer is a bit special because it also queries the Nepomuk server. mbox 
files are typically huge, and change every time the user adds or removes a mail 
from it. This can cause many re-indexing operations, and as the file is big, 
every indexing operation can take quite a long time. To fasten the process, the 
file indexer tries to find already-indexed e-mails with the same messageID as 
the e-mails to be indexed. If a mail was already indexed, it is skipped. This 
reduces the amount of data transferred to the Nepomuk server (the full text of 
the mail doesn't have to be sent to the server only for it to detect a 
duplicate message), and a mbox file that took several minutes to index now only 
requires a couple of seconds.

The vCard indexer parses vCard files using the KABC library and stores every 
information found in them in NCO:Contact objects. vCard files containing more 
than one contact are supported. This allows users to export their contacts from 
a webmail or a contact-management application, and to have them indexed in 
Nepomuk.

The last indexer reads .webaction files, that consist of one line describing 
the action "DOWNLOAD", then one parameter per line. This file indexer is used 
by the Nepomuk Integration plugin for Firefox, that uses this kind of file to 
establish a link between a downloaded file and its original location on the 
Internet. If you don't want such a specific file indexer to be part of the 
Nepomuk Libraries, it can be removed from this patch.

All these file indexers create resources but don't touch the indexed file 
itself. The reason is that a mbox file is not an e-mail, a vCard file is not a 
contact (it describes a contact), and also that these files can be temporary 
(for instance, the Firefox add-on creates a temporary MIME file whenever the 
user reads a mail on a webmail, and this file is deleted when the computer is 
shut down).


Diffs (updated)
-----

  CMakeLists.txt 6e55d5e 
  services/fileindexer/indexer/CMakeLists.txt bcf8da2 
  services/fileindexer/indexer/mimeextractor.h PRE-CREATION 
  services/fileindexer/indexer/mimeextractor.cpp PRE-CREATION 
  services/fileindexer/indexer/nepomukmimeextractor.desktop PRE-CREATION 
  services/fileindexer/indexer/nepomukvcardextractor.desktop PRE-CREATION 
  services/fileindexer/indexer/nepomukwebactionextractor.desktop PRE-CREATION 
  services/fileindexer/indexer/vcardextractor.h PRE-CREATION 
  services/fileindexer/indexer/vcardextractor.cpp PRE-CREATION 
  services/fileindexer/indexer/webactionextractor.h PRE-CREATION 
  services/fileindexer/indexer/webactionextractor.cpp PRE-CREATION 

Diff: http://git.reviewboard.kde.org/r/112712/diff/


Testing
-------

Nepomuk Core builds with this patch applied. MIME, mbox (as produced by 
Thunderbird), vCard (exported from Yahoo! Mail) and webactions files are 
correctly indexed. If you want to test the webaction indexer, create a file 
somewhere (say "/tmp/test.txt"), and then put this in a .webaction file:

DOWNLOAD
http://www.example.com
http://www.example.com/test.txt
/tmp/test.txt

Then, use "nepomukindexer" to index the .webaction file. Use "nepomukshow" on 
the /tmp/test.txt file, and check that everything is okay. You can also open 
Dolphin and see that the "downloaded from" information of the test.txt file is 
correctly displayed.


Thanks,

Denis Steckelmacher

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Reply via email to