Re: Syncing lucene index with a database

Tim Williams Thu, 26 Mar 2009 17:04:16 -0700

On Thu, Mar 26, 2009 at 6:28 PM, Matt Schraeder <mschrae...@btsb.com> wrote:
> I'm new to Lucene and just beginning my project of adding it to our web
> app.  We are indexing data from a MS SQL 2000 database and building
> full-text search from it.
>
> Everything I have read says that building the index is a resource heavy
> operation so we should use it sparingly.  For the most part the database
> table we are working from is updated once a day so as soon as the table
> itself is updated we can rebuild our Lucene indexes.  However, there are
> a few feilds that get updated with a cronjob every 15 minutes.  In terms
> of speed and efficiency, what would be a better system for keeping our
> data synced between the database and Lucene?
>
> Of course one option would be to rebuild the Lucene index each time the
> cronjob runs to keep the database and Lucene index synced.  We could
> either return the entire database table, loop through the rows, get a
> row's document in lucene remove/readd it, and do that for each row.
> Alternatively after we update the main table we return just the rows
> that were changed, loop through those and remove/readd them in lucene,
> and do that for just the rows that have changed.
>
> Alternatively I have thought of using Lucene purely for search to
> return just the primary key of items from our database table, then query
> the database for those items and get the most up to date data from the
> database to actually display our search results.  This would let us use
> Lucene's superior searching capabilities and searching speed, but would
> still require us to pull the data to be displayed from the database.
>
> Another option is that we could do the same, but only return the fields
> that could change frequently.  This would use Lucene to store and index
> the majority of what is displayed on a search results page, only using
> the database to return the 2 or 3 fields that might change in a search
> for each row that lucene returns.
>
> I'm honestly not sure what the "proper" choice should be, or if it
> really depends on our own test cases.  Is it perfectly okay to run an
> index update every 15 minutes? How much difference would it make in
> terms of search time to search with lucene AND pull from the database?
> My main issue with searching with lucene but getting the actual data
> from the database is that it seems like that would make our current
> search system that is entirely database driven to run slower.
>


Not sure what ORM framework, if any, you might be using, but some
colleagues have had some success using Hibernate Search[1] for this
sorta thing.  I've not used it, just a pointer in case you haven't
come across it...  seems that it would keep you above some low-level
details if it fits...

--tim

[1] - http://www.hibernate.org/410.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Syncing lucene index with a database

Reply via email to