Hi, we are at the initial design stages of a public-facing web-based search application for a U.S. Federal Agency. We have proposed a clustered Lucene architecture as the best technical solution, as we feel their current system (based on Oracle) won't give the best performance, and introduces a lot of unnecessary complexity and expense (as the system is read-only). We also feel that the Lucene design will be very flexible, easier to maintain and administor. Government agencies are notoriously conservative when it comes to decisions about technology, especially when open-source is involved. Perhaps surprisingly, their response has been encouraging. However, they want further re-assurance that other big-name organizations have successfully used Lucene for large datasets. First some background: we will be searching a number of repositories, the largest of which includes about 600,000 documents, and might reach 10 million over the next 10 years. The documents are probably comparable to web pages in terms of average size, and would be indexed under about 10 different fields. Our plan is to partition the indexes and distribute then over a number of modern Intel/Linux servers. >From what I pick up on the mailing lists, this seems well within the capabilities of Lucene. I've looked at the Powered-by Lucene pages, but there are two problems: (i) there are no details on the size of the datasets being searched; (ii) I don't think our customer would recognize any of these organizations. In Otis' OnJava article, he list "FedEx, Overture, Mayo Clinic, Hewlett Packard, New Scientist magazine, Epiphany, and others using, or at least evaluating, Lucene". This is more like it(!), but I want to be honest and open with our customer, and the "or at least evaluating" comment is not concrete enough, and there is no idea of scale.
The best example that I've been able to find is the Yahoo research lab - as I understand it, this is a Nutch (i.e. Lucene) implementation that's providing impressive performance over a 100 million document repository. I would be very grateful if anyone could pass on some basic details of successful large-scale Lucene projects, and even more so if they involve a "big name" or government agency. If you are happy to pass this information on, but would prefer to keep it off the public mailing list, then please email me directly - I will respect confidentiality. I think that this problem of re-assuring customers/managers is a common one, so I would be happy to collate any responses to this as a new Wiki entry. Hopefully one day (with their permission) we will be able to add our customer to the Powered-by Lucene page too. Thanks in advance, Alex McManus ([EMAIL PROTECTED]) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]