Hi!

On Fri, Nov 16, 2007 at 02:56:26AM -0800, Scott Davies wrote:
> I've been running some multithreaded tests on Ferret.  Using a single
> Ferret::Index::Index inside a DRb server, it definitely behaves for me
> as if all readers are locked out of the index when writing is going on
> in that index, not just optimization -- at least when segment merging
> happens, which is when the writes take the longest and you can
> therefore least afford to lock out all reads.  This is very easy to
> notice when you add, say, your 100,000th document to the index, and
> that one write takes over 5 seconds to complete because it triggers a
> bunch of incremental segment-merging, and all queries to the index
> stall in the meantime.  Or when you add your millionth document, which
> can stall all reads for over a minute. :-(

Don't get me wrong, but how often do you think you'll add your millionth
document to the index? 

And even if you really do index a million documents per week - I
wouldn't exactly call it bad performance if one or two search requests
*per week* take a minute to complete, while all others are completed in
less than a second...

Having that said, the problem with blocking searches might be possible
to solve by not using Ferret's Index class for searching/indexing, but
using the lower level APIs (Searcher and IndexWriter) and doing manual
synchronization (inside *one* process). I didn't feel the need to
implement this for aaf (yet ;-), since I think it's already fast enough
to not be the bottleneck in most real world usage scenarios (say -
typical Rails apps using aaf for full text search).

> When I try to use an IndexReader in a separate process, things are
> even worse.  The IndexReader doesn't see any updates to the index
> since it was created.  Not too surprising, but if I try creating a new
> IndexReader for every query, and have the Index in the other writing
> process turn on auto_flush, then the reading process crashes after a
> few (generally fewer than 100) queries, in one of at least two
> different ways selected apparently at random:

[..]

Stick to the one-process-per-index rule to be on the safe side.

> Given the combination of problems above, I'm at a loss to understand
> how to use Ferret on a live website that requires reasonably fast
> turnaround between a user submitting data and the user being able to
> search over that data, unless either (1) the site only gets a few
> thousand new index entries per day and the site can be taken down for
> a few minutes daily to optimize the index, or (2) it's OK for the
> entire site to periodically stall on all queries for seconds or even
> minutes whenever segment-merging happens to kick in.

I wouldn't set the limit at a few thousand new documents per day, and
also optimizing daily is only useful if you're having lots of document
deletions per day. 


Cheers,
Jens

PS: If you happen to benchmark Solr against aaf's DRb server, be sure to
let us know your findings :-)

-- 
Jens Krämer
http://www.jkraemer.net/ - Blog
http://www.omdb.org/     - The new free film database
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to