----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/113217/#review41770 -----------------------------------------------------------
This review has been submitted with commit 0d142dfd85039df0f6a5e07944b9d3f8577acba1 by Denis Steckelmacher to branch master. - Commit Hook On Oct. 12, 2013, 1:43 p.m., Denis Steckelmacher wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://git.reviewboard.kde.org/r/113217/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2013, 1:43 p.m.) > > > Review request for Nepomuk. > > > Repository: nepomuk-core > > > Description > ------- > > This patch adds a File Extractor for doc, xls and ppt files (the binary MS > Office formats). The current version of the extractor is very simple and only > indexes the plain text content of the files (no title nor owner information > is extracted). The extractor is a tiny wrapper around the "catdoc", "catppt" > and "xls2csv" command-line utilities. These tools are packaged in the > "catdoc" package of Debian and openSUSE. > > These utilities are released under the GNU GPLv2. If I recall correctly, the > LGPLv2.1 Nepomuk libraries can use these tools provided no library calls are > made to them. The extractor uses QProcess to launch an instance of catdoc, > catppt or xls2csv, giving it the name of the file to index, and gets the > plain text from the standard output of this process. I hope this complies > with the GPL. > > The commands are located at run-time using KStandardDirs. This way, no new > build dependency is added to Nepomuk, and it is up to the user or the > distribution to add "catdoc" to the dependency list of Nepomuk. If a command > is not found, the indexer is disabled for the specific MIME type handled by > the command. > > > Diffs > ----- > > services/fileindexer/indexer/officeextractor.cpp PRE-CREATION > services/fileindexer/indexer/officeextractor.h PRE-CREATION > services/fileindexer/indexer/nepomukofficeextractor.desktop PRE-CREATION > > Diff: http://git.reviewboard.kde.org/r/113217/diff/ > > > Testing > ------- > > I have run the indexer on several DOC, XLS and PPT files I have on my > computer. The indexer doesn't work on encrypted files (catdoc refuses to > parse them). This is embarrassing because some interesting Excel files are > password-protected only on select pages, or only the edition of certain cells > is prohibited. The rest of the file can contain valuable data and should be > indexed. > > > Thanks, > > Denis Steckelmacher > >
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
