-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112712/#review41282
-----------------------------------------------------------



services/fileindexer/indexer/mimeextractor.cpp
<http://git.reviewboard.kde.org/r/112712/#comment30263>

    Use QString id you calculated above



services/fileindexer/indexer/mimeextractor.cpp
<http://git.reviewboard.kde.org/r/112712/#comment30264>

    Check the resource is non-empty before merging it


I tested this, and it doesn't work for me. I pointed it at my ~/.thunderbird 
and got a bunch of messages like:

nepomukstorage(21093)/nepomuk (storage service) 
Nepomuk2::Sync::ResourceIdentifier::runIdentification: DUPLICATE RESULTS!
nepomukstorage(21093)/nepomuk (storage service) 
Nepomuk2::Sync::ResourceIdentifier::runIdentification: KUrl("_:uq")  -->  
KUrl("nepomuk:/res/c8e2fb55-76f7-43ca-9ad7-1a81d997ceb3")

virtuoso was sitting on a whole core (probably to run the identifications?) and 
the short identifiers ("KUrl("_:uq")") repeat ad infinitum. 

Also indexed emails never show up in dolphin search. 

I just applied it on top of git master - is there something else I should have 
applied first?

- Simeon Bird


On Sept. 13, 2013, 12:38 p.m., Denis Steckelmacher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112712/
> -----------------------------------------------------------
> 
> (Updated Sept. 13, 2013, 12:38 p.m.)
> 
> 
> Review request for Nepomuk.
> 
> 
> Repository: nepomuk-core
> 
> 
> Description
> -------
> 
> This patch adds three new files extractors to Nepomuk. Two of them are of 
> general use, and the third (that can be removed if it hasn't its place in 
> Nepomuk) is specific to the use-case described in 
> http://steckdenis.be/post-2013-09-06-a-nepomuk-integration-plugin-for-firefox.html
>  .
> 
> The MIME/mbox file extractor takes an mbox file or MIME files (as found in 
> Maildir directory trees) and index them as NMO:Message objects. The full 
> content of the e-mails is indexed along with their title, sender, receiver, 
> CC/BCC, date and message ID. NCO:Contacts and NCO:EmailAddress are created 
> when needed. The main use of this indexer is to index e-mails managed by 
> mutt, Thunderbird or any other e-mail client that does not use Akonadi.
> 
> This indexer is a bit special because it also queries the Nepomuk server. 
> mbox files are typically huge, and change every time the user adds or removes 
> a mail from it. This can cause many re-indexing operations, and as the file 
> is big, every indexing operation can take quite a long time. To fasten the 
> process, the file indexer tries to find already-indexed e-mails with the same 
> messageID as the e-mails to be indexed. If a mail was already indexed, it is 
> skipped. This reduces the amount of data transferred to the Nepomuk server 
> (the full text of the mail doesn't have to be sent to the server only for it 
> to detect a duplicate message), and a mbox file that took several minutes to 
> index now only requires a couple of seconds.
> 
> The vCard indexer parses vCard files using the KABC library and stores every 
> information found in them in NCO:Contact objects. vCard files containing more 
> than one contact are supported. This allows users to export their contacts 
> from a webmail or a contact-management application, and to have them indexed 
> in Nepomuk.
> 
> The last indexer reads .webaction files, that consist of one line describing 
> the action "DOWNLOAD", then one parameter per line. This file indexer is used 
> by the Nepomuk Integration plugin for Firefox, that uses this kind of file to 
> establish a link between a downloaded file and its original location on the 
> Internet. If you don't want such a specific file indexer to be part of the 
> Nepomuk Libraries, it can be removed from this patch.
> 
> All these file indexers create resources but don't touch the indexed file 
> itself. The reason is that a mbox file is not an e-mail, a vCard file is not 
> a contact (it describes a contact), and also that these files can be 
> temporary (for instance, the Firefox add-on creates a temporary MIME file 
> whenever the user reads a mail on a webmail, and this file is deleted when 
> the computer is shut down).
> 
> 
> Diffs
> -----
> 
>   CMakeLists.txt 6e55d5e 
>   services/fileindexer/indexer/CMakeLists.txt bcf8da2 
>   services/fileindexer/indexer/mimeextractor.h PRE-CREATION 
>   services/fileindexer/indexer/mimeextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/nepomukmimeextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/nepomukvcardextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/nepomukwebactionextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/vcardextractor.h PRE-CREATION 
>   services/fileindexer/indexer/vcardextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/webactionextractor.h PRE-CREATION 
>   services/fileindexer/indexer/webactionextractor.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/112712/diff/
> 
> 
> Testing
> -------
> 
> Nepomuk Core builds with this patch applied. MIME, mbox (as produced by 
> Thunderbird), vCard (exported from Yahoo! Mail) and webactions files are 
> correctly indexed. If you want to test the webaction indexer, create a file 
> somewhere (say "/tmp/test.txt"), and then put this in a .webaction file:
> 
> DOWNLOAD
> http://www.example.com
> http://www.example.com/test.txt
> /tmp/test.txt
> 
> Then, use "nepomukindexer" to index the .webaction file. Use "nepomukshow" on 
> the /tmp/test.txt file, and check that everything is okay. You can also open 
> Dolphin and see that the "downloaded from" information of the test.txt file 
> is correctly displayed.
> 
> 
> Thanks,
> 
> Denis Steckelmacher
> 
>

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Reply via email to