-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113217/#review41690
-----------------------------------------------------------

Ship it!


Writing a proper parser for the binary formats is quite hard. I think this 
approach makes sense for now.

Btw, I don't see any cmake changes in the patch. I think you might have just 
forgotten to add them. Please ship this to master, and thanks for taking care 
of this.


services/fileindexer/indexer/officeextractor.cpp
<http://git.reviewboard.kde.org/r/113217/#comment30455>

    It was really simple code. Attribution wasn't really required :)



services/fileindexer/indexer/officeextractor.cpp
<http://git.reviewboard.kde.org/r/113217/#comment30456>

    Maybe put this in a QScopedPointer so that is deleted when it goes out of 
scope? Otherwise we seem to have a minor memory leak.


- Vishesh Handa


On Oct. 12, 2013, 1:43 p.m., Denis Steckelmacher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/113217/
> -----------------------------------------------------------
> 
> (Updated Oct. 12, 2013, 1:43 p.m.)
> 
> 
> Review request for Nepomuk.
> 
> 
> Repository: nepomuk-core
> 
> 
> Description
> -------
> 
> This patch adds a File Extractor for doc, xls and ppt files (the binary MS 
> Office formats). The current version of the extractor is very simple and only 
> indexes the plain text content of the files (no title nor owner information 
> is extracted). The extractor is a tiny wrapper around the "catdoc", "catppt" 
> and "xls2csv" command-line utilities. These tools are packaged in the 
> "catdoc" package of Debian and openSUSE.
> 
> These utilities are released under the GNU GPLv2. If I recall correctly, the 
> LGPLv2.1 Nepomuk libraries can use these tools provided no library calls are 
> made to them. The extractor uses QProcess to launch an instance of catdoc, 
> catppt or xls2csv, giving it the name of the file to index, and gets the 
> plain text from the standard output of this process. I hope this complies 
> with the GPL.
> 
> The commands are located at run-time using KStandardDirs. This way, no new 
> build dependency is added to Nepomuk, and it is up to the user or the 
> distribution to add "catdoc" to the dependency list of Nepomuk. If a command 
> is not found, the indexer is disabled for the specific MIME type handled by 
> the command.
> 
> 
> Diffs
> -----
> 
>   services/fileindexer/indexer/officeextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/officeextractor.h PRE-CREATION 
>   services/fileindexer/indexer/nepomukofficeextractor.desktop PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/113217/diff/
> 
> 
> Testing
> -------
> 
> I have run the indexer on several DOC, XLS and PPT files I have on my 
> computer. The indexer doesn't work on encrypted files (catdoc refuses to 
> parse them). This is embarrassing because some interesting Excel files are 
> password-protected only on select pages, or only the edition of certain cells 
> is prohibited. The rest of the file can contain valuable data and should be 
> indexed.
> 
> 
> Thanks,
> 
> Denis Steckelmacher
> 
>

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Reply via email to