Not quite yet gone up to this scale but here are some points for
consideration based on a smaller scale system I have in production that
may be of interest:
By clustering I presume you are only talking about replication.
When we talk about scaling and using multiple machines we need to think
of "partitioning" and "replication" - they are 2 techniques for solving
two different problems (scaling for data volumes and scaling for user
volumes, respectively). If you have a large amount of data you need to
partition the data into separate partitions or "shards", distributed on
multiple machines and query them in parallel. If you have a large number
of users you need to replicate copies of an index/shard and balance the
query load across multiple replicated machines.
As for synchronizing I have previously used a centralized database as
the definitive source of all data from which index search nodes pull
their content. This has the benefits of using a database's powerful
transactional, and backup/restore facilities to manage the integrity of
the source data. Modulo n of a a hash of the database's primary key can
be used to allocate a search partition ID for each record which is a way
of evenly apportioning content to your n search partitions. Databases
also have facilities to automate the timestamping of updated/deleted
records. Each search server simply reads the latest
updated/modified/deleted records from the central database for the
partition(s) they manage and patches their index accordingly. Using this
approach a search node can suffer a failure and always be able to
recover without too much trouble. There are more efficient ways of
distributed content than via JDBC and polling and better ways of
building replicated indexes than duplicating the indexing logic on
multiple machines but this has the advantage of being simple to
implement and may suffice if the updates volume/frequency is relatively low.
Cheers
Mark
___________________________________________________________
All New Yahoo! Mail Tired of [EMAIL PROTECTED]@! come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]