On Thu, Mar 26, 2009 at 6:28 PM, Matt Schraeder <mschrae...@btsb.com> wrote: > I'm new to Lucene and just beginning my project of adding it to our web > app. We are indexing data from a MS SQL 2000 database and building > full-text search from it. > > Everything I have read says that building the index is a resource heavy > operation so we should use it sparingly. For the most part the database > table we are working from is updated once a day so as soon as the table > itself is updated we can rebuild our Lucene indexes. However, there are > a few feilds that get updated with a cronjob every 15 minutes. In terms > of speed and efficiency, what would be a better system for keeping our > data synced between the database and Lucene? > > Of course one option would be to rebuild the Lucene index each time the > cronjob runs to keep the database and Lucene index synced. We could > either return the entire database table, loop through the rows, get a > row's document in lucene remove/readd it, and do that for each row. > Alternatively after we update the main table we return just the rows > that were changed, loop through those and remove/readd them in lucene, > and do that for just the rows that have changed. > > Alternatively I have thought of using Lucene purely for search to > return just the primary key of items from our database table, then query > the database for those items and get the most up to date data from the > database to actually display our search results. This would let us use > Lucene's superior searching capabilities and searching speed, but would > still require us to pull the data to be displayed from the database. > > Another option is that we could do the same, but only return the fields > that could change frequently. This would use Lucene to store and index > the majority of what is displayed on a search results page, only using > the database to return the 2 or 3 fields that might change in a search > for each row that lucene returns. > > I'm honestly not sure what the "proper" choice should be, or if it > really depends on our own test cases. Is it perfectly okay to run an > index update every 15 minutes? How much difference would it make in > terms of search time to search with lucene AND pull from the database? > My main issue with searching with lucene but getting the actual data > from the database is that it seems like that would make our current > search system that is entirely database driven to run slower. >
Not sure what ORM framework, if any, you might be using, but some colleagues have had some success using Hibernate Search[1] for this sorta thing. I've not used it, just a pointer in case you haven't come across it... seems that it would keep you above some low-level details if it fits... --tim [1] - http://www.hibernate.org/410.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org