Don't do it that way <G>? Is this an actual or theoretical scenario? And do you reasonably expect it to become actual? Otherwise, why bother?
And you've got other problems here. If you're indexing that much data, you'll soon outgrow your disk. Unless you're replacing most of the documents. But assuming that all this is somehow not a problem, I'd consider something like indexing by directory. That is, for an hour, collect all the incoming documents in directory d1. Then turn an indexer process loose on d1 and start collecting docs in d2. At the end of the next hour, start indexing d2 and collecting d3. When each indexing process finishes, you can use IndexWriter.addIndexes. Or you could batch them up and add all the indexes that have been created in the last, say, 6 hours at once. You could even split this across multiple machines if you get CPU bound. That said, I can't stress enough that you really need to consider how long you can keep indexing data at that rate and have any performance to speak of at search time. If you're not indexing that much data, *and* you still have speed problems, I'd look long and hard at my code to see why indexing is taking so long. Are you closing/reopening the IndexWriter? Are you optimizing too often? Is the way you access the data (perhaps querying a database) painful? Some real numbers would help. Things like: How many documents are in your index? How many arrive each hour? How long does it take to index, say, 100 docs? How big are the docs upon input? How much bigger do they make the index? Have you measured *any* of these things? if so, please post the numbers. Think about doing everything *except* indexing and see if your bottleneck is somewhere unexpected. Anyway, hope this helps Erick On 5/8/07, Ram Peters <[EMAIL PROTECTED]> wrote:
I am indexing documents periodically every hour. I have a scenario. For example, when you are indexing every hour and large document set is present, it takes >1 hr to index the documents. Now you are already behind indexing for the next hour. How do you design something that is robust? thanks. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]