: For example, when you are indexing every hour and large document set : is present, it takes >1 hr to index the documents. Now you are : already behind indexing for the next hour. How do you design : something that is robust?
fundementally, this question is really about issues in a producer/consumer model then it is specificly about indexing... given a situation where data comes into a queue (from some set of producers) and you wnat to process that data (by some set of consumers) what do you do if the producers produce faster then the consumers consume. i know of 7 options: 1) decrease the number of producers 2) make the producers produce slower 3) make the queue infinitely large 4) make the queue block 5) make the consumers consumer faster 6) increase the number of consumers 7) throw away data #1, #2 and #3 are not usually practical but are listed for completelness. #4 may be practical in some situations, but there are no easy rules to know when. #5 tends to be very feasible in a well designed system where things can be parallelized while #6 can be frequently be achieved either by profiling and optimizing your code, or by making your code do less; which segues nicely to #7 -- it may sound like a joke but frequently big throughput gains can be made by reducing the amount of data being sent to the consumers ... sometimes it's a matter of taking some work that the consumers do making the producers do it (ie: eliminating data from the that you know you aren't going to index), in other cases it may truely be throwing away data because you can see that your queue is so full you switch into "critial info only mode" where you don't bother to process every little bit of data -- just the important stuff. you make the concious choice that it's better to be caught up on the big stuff then to fall way way behind dealing with the little stuff. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]