Ben -- Thanks for the detailed explanation! Yes, that does make sense. If I understand it correctly, though, something won't show up in a search until at least one index switch happens after it's been submitted, which means we're talking about a minute or so on average (not just worst-case) from submission to search result, even if the switches are being done constantly (given that each switch takes about two minutes). For my site, I'm really hoping that most content will show up within a second or so of its submission. That simply can't happen if I'm not updating the same index I'm doing searches with. I'd be OK with the turnaround *occasionally* being a minute -- say, while an index optimization or particularly large segment merge happens. But so far it looks to me like the choices with Ferret are either:
(1) The *average* time from submission to search result is on the order of minutes. However, searches are always reasonably fast. (Your approach.) (2) The average time from submission to search result is less than a second. However, the *worst-case* times can be minutes, and now all *searches* stall over those minutes as well, which is Bad. If you don't get more than a few thousand submissions per day, you can at least schedule these outages as nightly index optimizations, but you'll have the outages one way or another. (All "same index used for reading + writing" approaches.) I don't think either of these choices is very good for the particular site I have in mind (at least if I'm being optimistic enough about its chances of "taking off" to worry about the possibility of many thousands of submissions / day). Am I correct in my summarization of the two choices with Ferret here, or have I missed something? Anyhow, thanks again! If those two options are in fact what I have, I think I'll run some tests with Lucene/JRuby to see whether that provides a third option as far as performance goes, and report back what sort of issues come up. (My guess is that it'll be moderately painful to set up and that the average throughput will be worse than Ferret's, but that an average submission-to-search-result turnaround time of a second or two will be achievable without the site necessarily going completely down for minutes every now and then. We'll see.) -- Scott On Nov 16, 2007 2:40 PM, Benjamin Krause <[EMAIL PROTECTED]> wrote: > Scott, > > we're using two directories, not one for ferret. One > index is the passive index. it is not used for searches, > but new indexing requests will be added to that index. > so lets call it the indexing-index. > > all mongrels will use the second directory, lets call it > searching-index. Both indexes are almost identical, > i'll explain the differences. > > All out indexing requests are queued. So whenever > you want to index something, it will be placed in the > queue, and added to the indexing-index. After a > certain amount of queue-items added to the index, > we're stopping indexing. The queue will be halted. > New requests can be added, but nothing will be > added to the indexing-index. > > Now we're rsyncing the indexing-index to all machines, > remember, searching is still done in the searching-index, > which is outdated, but we don't mind about that :) > > After rsync is complete, we're switching both directories, > so the indexing-index becomes the searching-index and > vice versa. Actually we're just switching symlinks, so > the this will take almost no time. And even if one of the > mongrels still have a filehandle to the old index open, > nothing will happen, it is still using the outdated index, > but the next request will use the new index. After that, > the new indexing-index will be synced from the > searching-index. As the searching-index is read-only, > there is no risk of corrupting something during the > sync. > > Now we're resuming processing the queue, until we've > added our certain amount of queue entries, or the queue > is empty. > > The downside is, that the searching-index is outdated, > but not more that a couple of minutes (about 2 minutes > on omdb). We didn't have one corrupted index since. > There is now downtime whatsoever, and the rsync snapshot > will always be coherent. > > > Cheers > Ben > > > > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

