Not quite yet gone up to this scale but here are some points for consideration based on a smaller scale system I have in production that may be of interest:

By clustering I presume you are only talking about replication.
When we talk about scaling and using multiple machines we need to think of "partitioning" and "replication" - they are 2 techniques for solving two different problems (scaling for data volumes and scaling for user volumes, respectively). If you have a large amount of data you need to partition the data into separate partitions or "shards", distributed on multiple machines and query them in parallel. If you have a large number of users you need to replicate copies of an index/shard and balance the query load across multiple replicated machines.

As for synchronizing I have previously used a centralized database as the definitive source of all data from which index search nodes pull their content. This has the benefits of using a database's powerful transactional, and backup/restore facilities to manage the integrity of the source data. Modulo n of a a hash of the database's primary key can be used to allocate a search partition ID for each record which is a way of evenly apportioning content to your n search partitions. Databases also have facilities to automate the timestamping of updated/deleted records. Each search server simply reads the latest updated/modified/deleted records from the central database for the partition(s) they manage and patches their index accordingly. Using this approach a search node can suffer a failure and always be able to recover without too much trouble. There are more efficient ways of distributed content than via JDBC and polling and better ways of building replicated indexes than duplicating the indexing logic on multiple machines but this has the advantage of being simple to implement and may suffice if the updates volume/frequency is relatively low.

Cheers
Mark




                
___________________________________________________________ All New Yahoo! Mail – Tired of [EMAIL PROTECTED]@! come-ons? Let our SpamGuard protect you. http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to