Hi! On Fri, Nov 16, 2007 at 02:56:26AM -0800, Scott Davies wrote: > I've been running some multithreaded tests on Ferret. Using a single > Ferret::Index::Index inside a DRb server, it definitely behaves for me > as if all readers are locked out of the index when writing is going on > in that index, not just optimization -- at least when segment merging > happens, which is when the writes take the longest and you can > therefore least afford to lock out all reads. This is very easy to > notice when you add, say, your 100,000th document to the index, and > that one write takes over 5 seconds to complete because it triggers a > bunch of incremental segment-merging, and all queries to the index > stall in the meantime. Or when you add your millionth document, which > can stall all reads for over a minute. :-(
Don't get me wrong, but how often do you think you'll add your millionth document to the index? And even if you really do index a million documents per week - I wouldn't exactly call it bad performance if one or two search requests *per week* take a minute to complete, while all others are completed in less than a second... Having that said, the problem with blocking searches might be possible to solve by not using Ferret's Index class for searching/indexing, but using the lower level APIs (Searcher and IndexWriter) and doing manual synchronization (inside *one* process). I didn't feel the need to implement this for aaf (yet ;-), since I think it's already fast enough to not be the bottleneck in most real world usage scenarios (say - typical Rails apps using aaf for full text search). > When I try to use an IndexReader in a separate process, things are > even worse. The IndexReader doesn't see any updates to the index > since it was created. Not too surprising, but if I try creating a new > IndexReader for every query, and have the Index in the other writing > process turn on auto_flush, then the reading process crashes after a > few (generally fewer than 100) queries, in one of at least two > different ways selected apparently at random: [..] Stick to the one-process-per-index rule to be on the safe side. > Given the combination of problems above, I'm at a loss to understand > how to use Ferret on a live website that requires reasonably fast > turnaround between a user submitting data and the user being able to > search over that data, unless either (1) the site only gets a few > thousand new index entries per day and the site can be taken down for > a few minutes daily to optimize the index, or (2) it's OK for the > entire site to periodically stall on all queries for seconds or even > minutes whenever segment-merging happens to kick in. I wouldn't set the limit at a few thousand new documents per day, and also optimizing daily is only useful if you're having lots of document deletions per day. Cheers, Jens PS: If you happen to benchmark Solr against aaf's DRb server, be sure to let us know your findings :-) -- Jens Krämer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

