Re: Combine data from index and db before sorting and pagination

Erick Erickson Thu, 09 Sep 2010 05:06:31 -0700

You can't avoid updating the whole document. You must delete the
original document and re-add it. Fortunately, this is pretty inexpensive.


Update does this under the covers...

Best
Erick

On Thu, Sep 9, 2010 at 12:15 AM, fulin tang <tangfu...@gmail.com> wrote:

> That is exactly what I am looking for now !
>
> Our mail search system has a field name flags, like read/unread etc,
> and it will change after the email indexed , so we need an update .
>
> But we only update one field, more exactly, one  Field.Index.NOT_ANALYZED
> and
> Field.Store.YES  field , how can we avoid update the whole document ?
>
>
> 梦的开始挣扎于城市的边缘
> 心的远方执着在脚步的瞬间
> 我的宿命埋藏了寂寞的永远
>
>
>
> 2010/9/2 Chris Lu <chris...@gmail.com>:
> > If there is an API to adjust the inverted index directly, it would be
> much
> > efficient.
> >
> > I guess Mirko's problem is similar to this: There could be a
> "main_record"
> > table and "category" table. Each "main_record" has a "category".
> > When one "category" is changed, quite some "main_record" are affected.
> >
> > If we denormalize the data, which is the only way currently for good
> sorting
> > performance, we would need to re-index all the affected documents.
> > However, all the re-indexing work is quite inefficient.
> >
> > Let's suppose the "category" is using Field.Index.NOT_ANALYZED and
> > Field.Store.YES.
> >
> > So in the inverted index is conceptually like this:
> >  "category_1": doc1,doc2,doc5,doc10.
> >  "category_2": doc3,doc4,doc7,doc8.
> > If the only change is that several "category_1" records are changed to
> > "category_2", take doc5 and doc10 for example, after all the reindexing
> > effort, the only changes is:
> >  "category_1": doc1,doc2.
> >  "category_2": doc3,doc4,doc5,doc7,doc8,doc10.
> >
> > Of course, to support this efficiently could be a big change, affecting
> all
> > the nice efficient DocDelta storage.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per request) got
> > 2.6 Million Euro funding!
> >
> > On Wed, Sep 1, 2010 at 4:29 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
> >
> >> The usual first choice when using Lucene to search database data is to
> >> denormalize the db data into the index. Yes, it's redundant, but it's
> often
> >> a better solution than trying to use both. Synchronization can be an
> issue,
> >> but you have to deal with that anyway since you're indexing from the db
> >> anyway.
> >>
> >>  But you haven't given us any indication of how much data you're talking
> >> about here. Without some such detail, it's really hard to make a
> >> recommendation.
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Sep 1, 2010 at 9:30 AM, Sertic Mirko, Bedag
> >> <mirko.ser...@bedag.ch>wrote:
> >>
> >> > The data from db is required for sorting, and one db entry matches to
> >> many
> >> > index entries, so storing it in the index would be redundant. Also
> there
> >> > would be the challenge to keep index and db in sync. Any ideas?
> >> >
> >> > Mirko
> >> >
> >> > -----Ursprüngliche Nachricht-----
> >> > Von: Ian Lea [mailto:ian....@gmail.com]
> >> > Gesendet: Mittwoch, 1. September 2010 15:17
> >> > An: java-user@lucene.apache.org
> >> > Betreff: Re: Combine data from index and db before sorting and
> pagination
> >> >
> >> > If the sorting and pagination doesn't require data from the database,
> >> > just do db lookups for the hits on a page, page by page as required.
> >> > But if the db data is required I'd suggest storing it in the index.
> >> >
> >> >
> >> > --
> >> > Ian.
> >> >
> >> > On Wed, Sep 1, 2010 at 1:43 PM, Sertic Mirko, Bedag
> >> > <mirko.ser...@bedag.ch> wrote:
> >> > > Hi
> >> > >
> >> > >
> >> > >
> >> > > I need to implement sorting and pagination of lucene search results.
> >> > > This is quite easy, but I have to combine Data from the index with
> data
> >> > > from a database. The index has the fulltext data plus a unique
> >> > > identifier for a record from the database. The database stores
> >> > > additional data. Fulltext search is only done on the index. I need
> to
> >> > > combine the search results from the index and the additional data
> from
> >> > > the database before sorting and pagination.
> >> > >
> >> > >
> >> > >
> >> > > Is the IndexReader.document() Method the right place to enrich the
> data
> >> > > from the index with data from the db? How should I implement this
> >> > > functionality with lucene?
> >> > >
> >> > >
> >> > >
> >> > > Thanks in advance
> >> > >
> >> > > Mirko
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Combine data from index and db before sorting and pagination

Reply via email to