A stepping stone to the above is that, in DB terms, a Lucene index is only one table. It has a suite of indexing features that are very different from database search. The features are oriented to searching large bodies of text for "ideas" rather than concrete words. It searches a lot faster than a DB. It also spends more time creating its various indexes than a DB. Other points- you can't add or drop fields or indexes.
On Wed, Aug 25, 2010 at 10:33 AM, Erick Erickson <erickerick...@gmail.com> wrote: > The SOLR wiki has lots of good information, start there: > http://wiki.apache.org/solr/ > > Otherwise, see below... > > On Wed, Aug 25, 2010 at 6:20 AM, Schreiner Wolfgang < > wolfgang.schrei...@itsv.at> wrote: > >> Hi all, >> >> We are currently evaluating potential search frameworks (such as Hibernate >> Search) which might be suitable to use in our project (using Spring, JPA >> with Hibernate) ... >> I am sending this E-Mail in hope you can advise me on a few issues that >> would help us in our decision making process. >> >> >> 1.) Is Lucene suitable for full text database searches? I read Lucene >> was designed to index and search documents but how does it behave querying >> relational data sets in general? >> > > Let's start be talking about the phrase "full text database searches". One > thing virtually all db-centric > people trip over is trying to use SOLR as if it were a database. You just > can't think about tables. The > first time you think about using SOLR to do something join-like, stop and > take a deep breath and > think about documents instead. The general approach is to flatten your data > so that each "document" > contains all the relevant info. Yes, this leads to de-normalization. Yes, > denormalized data makes a > good DBA cringe. But that's the difference between searching and using a > RDBMS. > > "Document" is somewhat misleading. A document in SOLR terms is just a > collection of fields. And, BTW, > there's no requirement that each document have the same fields (very unlike > a DB). > > >> >> 2.) Can we make assumptions on query performance considering combined >> searches, range queries or structured data and wildcard searches? If we >> consider a data structure consisting of say 3 tables and each table contains >> a few million entries (e.g. first name, last name and address fields) and we >> search for common values (such as 'John', 'Smith' and 'New York') where >> >> a. each value for itself and each combination would result in >> millions of hits >> > > Sure, but what those assumptions are is totally dependent on how you've set > things up. SOLR has been successfully > used on several billion document indexes. There are tools for making all > that work (i.e. replication, sharding, etc) > built into SOLR. So I suspect you can make things work. Several million > documents is not that large a data set. > > As always, there are tradeoffs between speed and complexity. But from what > you've described > I see no show stoppers. > > >> >> b. a person can have multiple first names and we want to make sure to >> receive any combination of the last name with any first name >> >> > This just sounds like an OR. But the queries can be pretty complex queries. > Some examples of what you expect would help. > See multi-valued fields. So, a "document" can have multiple "firstname" > entries. Again, not like a DB (your reflexes will trip you > up on this point <G>). > > >> c. we search for a last name and a range of birth dates >> >> > Sure, range queries work just fine. Note that dates can trip you up, look at > triedate if you experiment. > > >> 3.) Transaction safety: How does Lucene handle indexes? If we update >> data model and index, what happens to the index if anything goes wrong as >> soon as the data model has been persisted? >> > > A lot of work has been done to make SOLR quite robust if "anything goes > wrong". That said, how are you backing up your data? > That is, what is the source of the data you're going to index? If you're > relying on your SOLR index to be your backup, you simply must back it up > somewhere "often enough" to get by if your building burns down. I'd also > think about storing your original input... > > This is no different than a DB. you have to guard against the disk crashing, > someone walking by with a powerful magnet, earthquake, flood, fires > <G>..... > > Do note that if you modify your index schema, no existing documents reflect > the new schema, you have to reindex them. > > >> >> I hope I made the issues clear to you, just some general thoughts about how >> Lucene would behave in a real world application scenario ... Any support or >> pointers to helpful documents or Web links are highly appreciated! >> Cheers for now, >> >> w >> >> > -- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org