-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112712/#review41434
-----------------------------------------------------------



>> Check the resource is non-empty before merging it

> How can the resource be empty ? Every e-mail has at least a title and plain 
> text content. Do you want me to check that the > e-mail is actually valid and 
> not an empty e-mail (a corrupted MIME file for instance) ?

Yup, that's right. Experience shows that any possible corrupt file will be out 
there somewhere.

- Simeon Bird


On Oct. 9, 2013, 9:06 a.m., Denis Steckelmacher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112712/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2013, 9:06 a.m.)
> 
> 
> Review request for Nepomuk.
> 
> 
> Repository: nepomuk-core
> 
> 
> Description
> -------
> 
> This patch adds three new files extractors to Nepomuk. Two of them are of 
> general use, and the third (that can be removed if it hasn't its place in 
> Nepomuk) is specific to the use-case described in 
> http://steckdenis.be/post-2013-09-06-a-nepomuk-integration-plugin-for-firefox.html
>  .
> 
> The MIME/mbox file extractor takes an mbox file or MIME files (as found in 
> Maildir directory trees) and index them as NMO:Message objects. The full 
> content of the e-mails is indexed along with their title, sender, receiver, 
> CC/BCC, date and message ID. NCO:Contacts and NCO:EmailAddress are created 
> when needed. The main use of this indexer is to index e-mails managed by 
> mutt, Thunderbird or any other e-mail client that does not use Akonadi.
> 
> This indexer is a bit special because it also queries the Nepomuk server. 
> mbox files are typically huge, and change every time the user adds or removes 
> a mail from it. This can cause many re-indexing operations, and as the file 
> is big, every indexing operation can take quite a long time. To fasten the 
> process, the file indexer tries to find already-indexed e-mails with the same 
> messageID as the e-mails to be indexed. If a mail was already indexed, it is 
> skipped. This reduces the amount of data transferred to the Nepomuk server 
> (the full text of the mail doesn't have to be sent to the server only for it 
> to detect a duplicate message), and a mbox file that took several minutes to 
> index now only requires a couple of seconds.
> 
> The vCard indexer parses vCard files using the KABC library and stores every 
> information found in them in NCO:Contact objects. vCard files containing more 
> than one contact are supported. This allows users to export their contacts 
> from a webmail or a contact-management application, and to have them indexed 
> in Nepomuk.
> 
> The last indexer reads .webaction files, that consist of one line describing 
> the action "DOWNLOAD", then one parameter per line. This file indexer is used 
> by the Nepomuk Integration plugin for Firefox, that uses this kind of file to 
> establish a link between a downloaded file and its original location on the 
> Internet. If you don't want such a specific file indexer to be part of the 
> Nepomuk Libraries, it can be removed from this patch.
> 
> All these file indexers create resources but don't touch the indexed file 
> itself. The reason is that a mbox file is not an e-mail, a vCard file is not 
> a contact (it describes a contact), and also that these files can be 
> temporary (for instance, the Firefox add-on creates a temporary MIME file 
> whenever the user reads a mail on a webmail, and this file is deleted when 
> the computer is shut down).
> 
> 
> Diffs
> -----
> 
>   CMakeLists.txt 6e55d5e 
>   services/fileindexer/indexer/CMakeLists.txt bcf8da2 
>   services/fileindexer/indexer/mimeextractor.h PRE-CREATION 
>   services/fileindexer/indexer/mimeextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/nepomukmimeextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/nepomukvcardextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/nepomukwebactionextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/vcardextractor.h PRE-CREATION 
>   services/fileindexer/indexer/vcardextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/webactionextractor.h PRE-CREATION 
>   services/fileindexer/indexer/webactionextractor.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/112712/diff/
> 
> 
> Testing
> -------
> 
> Nepomuk Core builds with this patch applied. MIME, mbox (as produced by 
> Thunderbird), vCard (exported from Yahoo! Mail) and webactions files are 
> correctly indexed. If you want to test the webaction indexer, create a file 
> somewhere (say "/tmp/test.txt"), and then put this in a .webaction file:
> 
> DOWNLOAD
> http://www.example.com
> http://www.example.com/test.txt
> /tmp/test.txt
> 
> Then, use "nepomukindexer" to index the .webaction file. Use "nepomukshow" on 
> the /tmp/test.txt file, and check that everything is okay. You can also open 
> Dolphin and see that the "downloaded from" information of the test.txt file 
> is correctly displayed.
> 
> 
> Thanks,
> 
> Denis Steckelmacher
> 
>

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Reply via email to