Today somebody on irc was seeking help on indexing word .doc files. Beagle handles word files with wvware. However, there is a bug in wv due to which some offending documents are able to crash the wv library (something which beagle or mono cant do anything about). The user in irc wanted to index a vast repo of word files but a lot of his files turned out to be crashers. If anybody is in such a situation either he can stop word filtering or keep adding --deny-pattern's to deny the crashers based on their names (this works only if there are very few crashers). Beagle is unable to skip these crashers while indexing because there is no way of knowing before the crash happens if some file will cause a crash (duh). There is actually something else you can try ... and if your luck favours ... things can get better.
You need Joe's latest magic ExternalFilter for that. Get any of the command-line based text extraction tool for word files (some are listed at http://www.linux.com/article.pl?sid=06/02/22/201247). Then use ExternalFilter (search the mailing list for info on ExternalFilter or ask in irc; some details are given at http://beaglewiki.org/ExternalFiltersRepository) to filter word doc files using these command line tools. Testing is actually easier with beagle-0.2.4 which contains beagle-extract-content. All you need to do is: 1) Install one of these tools ... or all of them. You never know which one will be lucky for you. 2) Change the external-filters.xml file (at the proper location). 3) Use "beagle-extract-content /path/to/file.doc" to test the performance of the filter. [ 4) If you find beagle is falling back to FilterDoc for indexing .doc files instead of using FilterExternal, then remove libwv1 from your system and rebuild beagle. FilterExternal should have the maximum priority and it needs a small fix.] Maybe it will return something, maybe it wont return anything. But as long as it doesnt crash mono, everything is good. The point is, even if it fails to index the file, as long as it doesnt crash mono, beagle can continue to index other files. It isnt as bad as stopping indexing altogether. OTOH, people of the adventuerous kind can try to fix the bug in wv (its linked from the bug in bugzilla with subject smthng like "crashing in .doc" ...). - dBera -- ----------------------------------------------------- Debajyoti Bera @ http://dbera.blogspot.com beagle / KDE fan Mandriva / Inspiron-1100 user _______________________________________________ Dashboard-hackers mailing list Dashboard-hackers@gnome.org http://mail.gnome.org/mailman/listinfo/dashboard-hackers