----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://git.reviewboard.kde.org/r/113217/#review41765 -----------------------------------------------------------
services/fileindexer/indexer/officeextractor.cpp <http://git.reviewboard.kde.org/r/113217/#comment30490> It occurred to me that you don't really need to use a CustomCriteria for matching. You could just return the standard list of mimetypes by constructing the mimetype list in the constructor. That way you'll also be avoiding the extra checks at runtime. Feel free to ship this patch, and if you think it should be done, change the criteria in another patch. - Vishesh Handa On Oct. 12, 2013, 1:43 p.m., Denis Steckelmacher wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://git.reviewboard.kde.org/r/113217/ > ----------------------------------------------------------- > > (Updated Oct. 12, 2013, 1:43 p.m.) > > > Review request for Nepomuk. > > > Repository: nepomuk-core > > > Description > ------- > > This patch adds a File Extractor for doc, xls and ppt files (the binary MS > Office formats). The current version of the extractor is very simple and only > indexes the plain text content of the files (no title nor owner information > is extracted). The extractor is a tiny wrapper around the "catdoc", "catppt" > and "xls2csv" command-line utilities. These tools are packaged in the > "catdoc" package of Debian and openSUSE. > > These utilities are released under the GNU GPLv2. If I recall correctly, the > LGPLv2.1 Nepomuk libraries can use these tools provided no library calls are > made to them. The extractor uses QProcess to launch an instance of catdoc, > catppt or xls2csv, giving it the name of the file to index, and gets the > plain text from the standard output of this process. I hope this complies > with the GPL. > > The commands are located at run-time using KStandardDirs. This way, no new > build dependency is added to Nepomuk, and it is up to the user or the > distribution to add "catdoc" to the dependency list of Nepomuk. If a command > is not found, the indexer is disabled for the specific MIME type handled by > the command. > > > Diffs > ----- > > services/fileindexer/indexer/officeextractor.cpp PRE-CREATION > services/fileindexer/indexer/officeextractor.h PRE-CREATION > services/fileindexer/indexer/nepomukofficeextractor.desktop PRE-CREATION > > Diff: http://git.reviewboard.kde.org/r/113217/diff/ > > > Testing > ------- > > I have run the indexer on several DOC, XLS and PPT files I have on my > computer. The indexer doesn't work on encrypted files (catdoc refuses to > parse them). This is embarrassing because some interesting Excel files are > password-protected only on select pages, or only the edition of certain cells > is prohibited. The rest of the file can contain valuable data and should be > indexed. > > > Thanks, > > Denis Steckelmacher > >
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
