On 23.08.2013 12:32, Stefan Vollmar wrote: > Dear Jörn, > > On 22.08.2013, at 23:49, Jörn Friedrich Dreyer wrote: > >> The warnings about pdf and word are from getid3 lib and can be ignored if >> you are using search lucene. It comes with special indexers for these >> filetypes. >> >> The error about not beeing able to determine the file format for txt files >> also is from getid3 and might be caused by empty txt files. > Can get we rid of the error messages? setting debug level to error (3) should stop logging them.
>> Can you check if the reported txt file has 0 bytes? Can you search for a >> text in the pdf or word files and see if you get any results? > The text file is not empty. We have manually scheduled a re-scan of all files > and this might be the reason that now *some* search terms yield results with > that txt-file, we also have hits inside the PDF file. So, in principle, > search_lucene does seem to do something. Is there a way to monitor what > lucene is doing exactly and whether it has already indexed a particular file > at all? search_lucene tracks the indexing status in the oc_lucene_status table. there is no ui, yet sou will have to join the table to the oc_filecache table to get meaningful information. setting log level to debug (0) and tailing the owncloud.log file with a grep on "search_lucene" will give you only search lucene related output. > However, simple matching of file names (which should be much simpler and is > really helpful if you have a nested directory structure with many files) is > not nearly as good as it could be: it required the full "readme" before > "readme.txt" is offered as a hit, likewise all characters of "tourismus" > before "tourismus.jpg" turns up as a potential hit. > > Likewise "Serverraum" finds "Serverraum" in a PDF, however "server" or "raum" > triggers nothing. I will not say that this is useless, but it does not > compare favorably with either the Google or the Spotlight search engine - is > this maybe something that is configurable? Yes, something we did not yet decide how to handle. lucene search uses the lucene query language instead of the simple 'LIKE "%<term>%"' which is too expensive for most systems. You should be able to use '<term>*' and also '*<term>*' in the search field when you want to find partial matches. This also allows for more complex searches, but since the app is currently marked as experimental this really is stuff me need performance comparisons and usage reports for *hint* *hint* You can get back the simple search term behaviour of the stock search by uncommenting https://github.com/owncloud/apps/blob/master/search_lucene/lib/lucene.php#L196 but please bear in mind that php has to load the whole index on every request, which might take a while. We still need to investigate on how to optimize this for large indexes. so long Jörn -- Jörn Friedrich Dreyer ([email protected]) Senior Software Engineer ownCloud GmbH Your Data, Your Cloud, Your Way! ownCloud GmbH, GF: Markus Rex, Holger Dyroff Schloßäckerstrasse 26a, 90443 Nürnberg, HRB 28050 (AG Nürnberg)
_______________________________________________ Owncloud mailing list [email protected] https://mail.kde.org/mailman/listinfo/owncloud
