Re: 答复: 答复: Lucene in large database contexts

Erick Erickson Fri, 10 Aug 2007 06:47:21 -0700

Well, closing/opening an index is MUCH less expensive than
rebuilding the whole thing, so I don't understand part of your
statements....


It *may* (but I haven't tried it) be possible to flush the writer rather
than
close/open it. But, you MUST close/reopen the reader you search with
even if flush works like I think it does.

But it's also possible to use a two tiered approach. 1G isn't all that big.
Could
you read it into a RAMDir and use that for your searches? Then, when you add
data, you add it to *both* indexes, but close/open the RAMdir for searching.

It's also possible to keep the RAMdir as the delta between the FSdir and
"current" states of your index. Add to both and search both. Although
deletes may be a problem here.

You haven't specified how often you expect changes, though. 100/second?
1/minute? How real is "real time"? You could do something like warm up
a new reader in the background whenever you decided you needed to be
absolutely up to date and swap your "live" reader for the newly warmed up
one whenever you deemed it wise.

Or you could just close/open your reader after each modification, fire off a

couple of warmup queries at it and let the users live with slow responses
if they happen to search before your warm-up queries completed.

The point is that there are many options, but to suggest the best one, we
need some throughput numbers and a better definition of what "real time"
means. Is a one minute delay acceptable? 10 seconds? a millisecond?
the answer defines the scope of reasonable solutions.....

Best
Erick

On 8/10/07, Antonello Provenzano <[EMAIL PROTECTED]> wrote:
>
> Kai,
>
> The context I'm going to work with requires a continuous addition of
> documents to the indexes, since it's user-driven content, and this
> would require the content to be always up-to-date.
> This is the problem I'm facing, since I cannot rebuild a 1Gb (at
> least) index every time a user inserts a new entry into the database.
>
> I know Digg, for instance, is using Lucene as search engine: since the
> amount of data they're dealing with is much higher than mine, I would
> like to understand the way they used to implement this kind of
> solution.
>
> Thank you again.
> Antonello
>
>
> On 8/10/07, Kai Hu <[EMAIL PROTECTED]> wrote:
> > Antonello,
> >         You are right,I think lucene indexsearcher will search the old
> information if IndexWriter was not closed(I think lucene release the Lock
> here),so I only add a few documents every time from buffer to implement
> index "real time".
> >
> > kai
> >
> >
> > 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 代表
> Antonello Provenzano
> > 发送时间: 2007年8月10日 星期五 17:59
> > 收件人: [email protected]
> > 主题: Re: 答复: Lucene in large database contexts
> >
> > Kai,
> >
> > Thanks. The problem I see it's that although I can add a Document
> > through IndexWriter or IndexModifier, this won't be searchable until
> > the index is closed and, possibly, optimized, since the score of the
> > document in the index context must be re-calculated on the basis of
> > the whole context.
> >
> > Is this assumption true? or am I completely wrong?
> >
> > Cheers.
> > Antonello
> >
> >
> > On 8/10/07, Kai Hu <[EMAIL PROTECTED]> wrote:
> > > Hi, Antonello
> > >         You can use IndexWriter.addDocument(Document document) to add
> single document,same to update,delete operation.
> > >
> > > kai
> > >
> > > -----邮件原件-----
> > > 发件人: Antonello Provenzano [mailto:[EMAIL PROTECTED]
> > > 发送时间: 2007年8月10日 星期五 17:09
> > > 收件人: [email protected]
> > > 主题: Lucene in large database contexts
> > >
> > > Hi There!
> > >
> > > I've been working for a while on the implementation of a website
> > > oriented to contents that would contain millions of entries, most of
> > > them indexable (such as descriptions, texts, names, etc.).
> > > The ideal solution to make them searchable would be to use Lucene as
> > > index and search engine.
> > >
> > > The reason I'm posting the mailing list is the following: since all
> > > the entries will be stored in a database (most likely MySQL InnoDB or
> > > Oracle), what's the best technique to implement a system that indexes
> > > in "real time" (eg. when an entry is inserted into the databsse) the
> > > content and make it searchable? Based on my understanding of Lucene,
> > > such this thing is not possible, since the index must be re-created to
> > > be able to search the indexed contents. Is this true?
> > >
> > > Eventually, could anyone point me to a working example about how to
> > > implement such a similar context?
> > >
> > >
> > > Thank you for the support.
> > > Antonello
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

Re: 答复: 答复: Lucene in large database contexts

Reply via email to