Re: 答复: 答复: Lucene in large database contexts

Patrick Turcotte Thu, 05 Mar 2009 07:46:07 -0800

mkjjyy

On 8/10/07, Askar Zaidi <askar.za...@gmail.com> wrote:

Hey Guys,
I am trying to do something similar. Make the content search-able assoon asit is added to the website. The way it can work in my scenario isthat , I
create the Index for a every new user account created.
Then, whenever a new document is uploaded, its contents are added tothe
users Index using writer.addDocument(...)
As for closing the writer, yes ! I'll close the writer and optimizeafter
its added to the index.

I really think this should work. Don't you ?

thanks
AZ

On 8/10/07, Erick Erickson <erickerick...@gmail.com> wrote:
Well, closing/opening an index is MUCH less expensive than
rebuilding the whole thing, so I don't understand part of your
statements....
It *may* (but I haven't tried it) be possible to flush the writerrather
than
close/open it. But, you MUST close/reopen the reader you search with
even if flush works like I think it does.
But it's also possible to use a two tiered approach. 1G isn't allthat
big.
Could
you read it into a RAMDir and use that for your searches? Then,when you
add
data, you add it to *both* indexes, but close/open the RAMdir for
searching.
It's also possible to keep the RAMdir as the delta between theFSdir and
"current" states of your index. Add to both and search both. Although
deletes may be a problem here.
You haven't specified how often you expect changes, though. 100/second?1/minute? How real is "real time"? You could do something like warmup
a new reader in the background whenever you decided you needed to be
absolutely up to date and swap your "live" reader for the newlywarmed up
one whenever you deemed it wise.
Or you could just close/open your reader after each modification,fire off
a
couple of warmup queries at it and let the users live with slowresponses
if they happen to search before your warm-up queries completed.
The point is that there are many options, but to suggest the bestone, weneed some throughput numbers and a better definition of what "realtime"
means. Is a one minute delay acceptable? 10 seconds? a millisecond?
the answer defines the scope of reasonable solutions.....

Best
Erick

On 8/10/07, Antonello Provenzano <antone...@deveel.com> wrote:
Kai,

The context I'm going to work with requires a continuous addition of
documents to the indexes, since it's user-driven content, and this
would require the content to be always up-to-date.
This is the problem I'm facing, since I cannot rebuild a 1Gb (at
least) index every time a user inserts a new entry into thedatabase.
I know Digg, for instance, is using Lucene as search engine: sincetheamount of data they're dealing with is much higher than mine, Iwould
like to understand the way they used to implement this kind of
solution.

Thank you again.
Antonello


On 8/10/07, Kai Hu <kai...@dusee.cn> wrote:
Antonello,
You are right,I think lucene indexsearcher will search theold
information if IndexWriter was not closed(I think lucene release the
Lock
here),so I only add a few documents every time from buffer toimplement
index "real time".
kai
发件人: antonellop...@gmail.com [mailto:antonellop...@gmail.com] 代表
Antonello Provenzano
发送时间: 2007年8月10日 星期五 17:59
收件人: java-user@lucene.apache.org
主题: Re: 答复: Lucene in large database contexts

Kai,

Thanks. The problem I see it's that although I can add a Document
through IndexWriter or IndexModifier, this won't be searchableuntilthe index is closed and, possibly, optimized, since the score ofthe
document in the index context must be re-calculated on the basis of
the whole context.

Is this assumption true? or am I completely wrong?

Cheers.
Antonello


On 8/10/07, Kai Hu <kai...@dusee.cn> wrote:
Hi, Antonello
       You can use IndexWriter.addDocument(Document document) to
add
single document,same to update,delete operation.
kai

-----邮件原件-----
发件人: Antonello Provenzano [mailto:antonellop...@gmail.com]
发送时间: 2007年8月10日 星期五 17:09
收件人: java-user@lucene.apache.org
主题: Lucene in large database contexts

Hi There!

I've been working for a while on the implementation of a website
oriented to contents that would contain millions of entries,most of
them indexable (such as descriptions, texts, names, etc.).
The ideal solution to make them searchable would be to useLucene as
index and search engine.
The reason I'm posting the mailing list is the following: sinceall
the entries will be stored in a database (most likely MySQL InnoDB
or
Oracle), what's the best technique to implement a system that
indexes
in "real time" (eg. when an entry is inserted into the databsse)thecontent and make it searchable? Based on my understanding ofLucene,such this thing is not possible, since the index must be re-created
to
be able to search the indexed contents. Is this true?
Eventually, could anyone point me to a working example about howto
implement such a similar context?


Thank you for the support.
Antonello
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: 答复: 答复: Lucene in large database contexts

Reply via email to