Scott, we're using two directories, not one for ferret. One index is the passive index. it is not used for searches, but new indexing requests will be added to that index. so lets call it the indexing-index.
all mongrels will use the second directory, lets call it searching-index. Both indexes are almost identical, i'll explain the differences. All out indexing requests are queued. So whenever you want to index something, it will be placed in the queue, and added to the indexing-index. After a certain amount of queue-items added to the index, we're stopping indexing. The queue will be halted. New requests can be added, but nothing will be added to the indexing-index. Now we're rsyncing the indexing-index to all machines, remember, searching is still done in the searching-index, which is outdated, but we don't mind about that :) After rsync is complete, we're switching both directories, so the indexing-index becomes the searching-index and vice versa. Actually we're just switching symlinks, so the this will take almost no time. And even if one of the mongrels still have a filehandle to the old index open, nothing will happen, it is still using the outdated index, but the next request will use the new index. After that, the new indexing-index will be synced from the searching-index. As the searching-index is read-only, there is no risk of corrupting something during the sync. Now we're resuming processing the queue, until we've added our certain amount of queue entries, or the queue is empty. The downside is, that the searching-index is outdated, but not more that a couple of minutes (about 2 minutes on omdb). We didn't have one corrupted index since. There is now downtime whatsoever, and the rsync snapshot will always be coherent. Cheers Ben _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

