there's somebody on the mailing list who's talking about indexing a Billion (with a "B") documents. I don't know how far they've gotten, but at least *somebody* has contemplated a huge archive <G>... If memory serves, s/he had indexed a significant number of documents, you might try searching for "billion" in the archive. It was within the last couple of weeks.
Be aware that Lucene, by default, only indexes the first 10,000 words in a document, so if your starting point is a large, existing log you have to adjust this (there's a call, but I sure don't remember it off the top of my head). I've personally indexed over 1,000,000 documents and Lucene doesn't even breath hard. It'd probably be worth creating a small program the generates information and indexes it to play around with and see if you get what you need. The data won't be "real", but at least you'll have a better sense of how it plays in your environment. Best Erick