Scott,

we're using two directories, not one for ferret. One
index is the passive index. it is not used for searches,
but new indexing requests will be added to that index.
so lets call it the indexing-index.

all mongrels will use the second directory, lets call it
searching-index. Both indexes are almost identical,
i'll explain the differences.

All out indexing requests are queued. So whenever
you want to index something, it will be placed in the
queue, and added to the indexing-index. After a
certain amount of queue-items added to the index,
we're stopping indexing. The queue will be halted.
New requests can be added, but nothing will be
added to the indexing-index.

Now we're rsyncing the indexing-index to all machines,
remember, searching is still done in the searching-index,
which is outdated, but we don't mind about that :)

After rsync is complete, we're switching both directories,
so the indexing-index becomes the searching-index and
vice versa. Actually we're just switching symlinks, so
the this will take almost no time. And even if one of the
mongrels still have a filehandle to the old index open,
nothing will happen, it is still using the outdated index,
but the next request will use the new index. After that,
the new indexing-index will be synced from the
searching-index. As the searching-index is read-only,
there is no risk of corrupting something during the
sync.

Now we're resuming processing the queue, until we've
added our certain amount of queue entries, or the queue
is empty.

The downside is, that the searching-index is outdated,
but not more that a couple of minutes (about 2 minutes
on omdb). We didn't have one corrupted index since.
There is now downtime whatsoever, and the rsync snapshot
will always be coherent.

Cheers
  Ben



_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to