On 3/28/06, Michael Levy <[EMAIL PROTECTED]> wrote: > I'm looking for advice on selecting a search application. I'm > responsible for developing a new search platform for use in a historical > research organization and museum. I've pretty much decided on Lucene as > the library for custom servlet apps that would use the Lucene API directly. > > At the same time, we have a number of application ideas that could > probably use a flexible/customizable off-the-shelf crawling > application. For starters, it would be pretty basic stuff like indexing > PDF files using some library, returning links that have been translated > to point to a Tomcat virtual directory containing the files. But our > apps could quickly get more complex as we think of new search ideas. > > I'm having a hard time comparing and contrasting Nutch, Solr, and > Red-piranha. I would appreciate anyone offering your ideas or > experiences about which of these (or any other comprehensive search > solutions) are good for which types of applications. TIA!
I'd use Solr for highly structured data (documents with multiple fields) and very configurable text analysis per-field. Think of it more like a database, but designed for full-text search. We are working on adding easy faceted browsing and indexing of SQL databases. I'd use Nutch for web-search (a free google replacement): crawling (discovering), indexing web pages, and automatically handling different types of human readable documents like HTML, PDF, etc. I don't have any experience with Red-piranha. Example use cases as I see it: To index a music web site with lots of articles: Nutch To index a music collection (structured data like title, album, author, year, genre, etc): Solr -Yonik http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]