Re: 答复: 答复: Lucene in large database contexts

Askar Zaidi Fri, 10 Aug 2007 06:53:21 -0700

Hey Guys,

I am trying to do something similar. Make the content search-able as soon as
it is added to the website. The way it can work in my scenario is that , I
create the Index for a every new user account created.


Then, whenever a new document is uploaded, its contents are added to the
users Index using writer.addDocument(...)

As  for closing the writer, yes ! I'll close the writer and optimize after
its added to the index.

I really think this should work. Don't you ?

thanks
AZ

On 8/10/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> Well, closing/opening an index is MUCH less expensive than
> rebuilding the whole thing, so I don't understand part of your
> statements....
>
> It *may* (but I haven't tried it) be possible to flush the writer rather
> than
> close/open it. But, you MUST close/reopen the reader you search with
> even if flush works like I think it does.
>
> But it's also possible to use a two tiered approach. 1G isn't all that
> big.
> Could
> you read it into a RAMDir and use that for your searches? Then, when you
> add
> data, you add it to *both* indexes, but close/open the RAMdir for
> searching.
>
> It's also possible to keep the RAMdir as the delta between the FSdir and
> "current" states of your index. Add to both and search both. Although
> deletes may be a problem here.
>
> You haven't specified how often you expect changes, though. 100/second?
> 1/minute? How real is "real time"? You could do something like warm up
> a new reader in the background whenever you decided you needed to be
> absolutely up to date and swap your "live" reader for the newly warmed up
> one whenever you deemed it wise.
>
> Or you could just close/open your reader after each modification, fire off
> a
>
> couple of warmup queries at it and let the users live with slow responses
> if they happen to search before your warm-up queries completed.
>
> The point is that there are many options, but to suggest the best one, we
> need some throughput numbers and a better definition of what "real time"
> means. Is a one minute delay acceptable? 10 seconds? a millisecond?
> the answer defines the scope of reasonable solutions.....
>
> Best
> Erick
>
> On 8/10/07, Antonello Provenzano <[EMAIL PROTECTED]> wrote:
> >
> > Kai,
> >
> > The context I'm going to work with requires a continuous addition of
> > documents to the indexes, since it's user-driven content, and this
> > would require the content to be always up-to-date.
> > This is the problem I'm facing, since I cannot rebuild a 1Gb (at
> > least) index every time a user inserts a new entry into the database.
> >
> > I know Digg, for instance, is using Lucene as search engine: since the
> > amount of data they're dealing with is much higher than mine, I would
> > like to understand the way they used to implement this kind of
> > solution.
> >
> > Thank you again.
> > Antonello
> >
> >
> > On 8/10/07, Kai Hu <[EMAIL PROTECTED]> wrote:
> > > Antonello,
> > >         You are right,I think lucene indexsearcher will search the old
> > information if IndexWriter was not closed(I think lucene release the
> Lock
> > here),so I only add a few documents every time from buffer to implement
> > index "real time".
> > >
> > > kai
> > >
> > >
> > > 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 代表
> > Antonello Provenzano
> > > 发送时间: 2007年8月10日 星期五 17:59
> > > 收件人: [email protected]
> > > 主题: Re: 答复: Lucene in large database contexts
> > >
> > > Kai,
> > >
> > > Thanks. The problem I see it's that although I can add a Document
> > > through IndexWriter or IndexModifier, this won't be searchable until
> > > the index is closed and, possibly, optimized, since the score of the
> > > document in the index context must be re-calculated on the basis of
> > > the whole context.
> > >
> > > Is this assumption true? or am I completely wrong?
> > >
> > > Cheers.
> > > Antonello
> > >
> > >
> > > On 8/10/07, Kai Hu <[EMAIL PROTECTED]> wrote:
> > > > Hi, Antonello
> > > >         You can use IndexWriter.addDocument(Document document) to
> add
> > single document,same to update,delete operation.
> > > >
> > > > kai
> > > >
> > > > -----邮件原件-----
> > > > 发件人: Antonello Provenzano [mailto:[EMAIL PROTECTED]
> > > > 发送时间: 2007年8月10日 星期五 17:09
> > > > 收件人: [email protected]
> > > > 主题: Lucene in large database contexts
> > > >
> > > > Hi There!
> > > >
> > > > I've been working for a while on the implementation of a website
> > > > oriented to contents that would contain millions of entries, most of
> > > > them indexable (such as descriptions, texts, names, etc.).
> > > > The ideal solution to make them searchable would be to use Lucene as
> > > > index and search engine.
> > > >
> > > > The reason I'm posting the mailing list is the following: since all
> > > > the entries will be stored in a database (most likely MySQL InnoDB
> or
> > > > Oracle), what's the best technique to implement a system that
> indexes
> > > > in "real time" (eg. when an entry is inserted into the databsse) the
> > > > content and make it searchable? Based on my understanding of Lucene,
> > > > such this thing is not possible, since the index must be re-created
> to
> > > > be able to search the indexed contents. Is this true?
> > > >
> > > > Eventually, could anyone point me to a working example about how to
> > > > implement such a similar context?
> > > >
> > > >
> > > > Thank you for the support.
> > > > Antonello
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> > >
> >
>

Re: 答复: 答复: Lucene in large database contexts

Reply via email to