mkjjyy

On 8/10/07, Askar Zaidi <askar.za...@gmail.com> wrote:
Hey Guys,

I am trying to do something similar. Make the content search-able as soon as it is added to the website. The way it can work in my scenario is that , I
create the Index for a every new user account created.

Then, whenever a new document is uploaded, its contents are added to the
users Index using writer.addDocument(...)

As for closing the writer, yes ! I'll close the writer and optimize after
its added to the index.

I really think this should work. Don't you ?

thanks
AZ

On 8/10/07, Erick Erickson <erickerick...@gmail.com> wrote:

Well, closing/opening an index is MUCH less expensive than
rebuilding the whole thing, so I don't understand part of your
statements....

It *may* (but I haven't tried it) be possible to flush the writer rather
than
close/open it. But, you MUST close/reopen the reader you search with
even if flush works like I think it does.

But it's also possible to use a two tiered approach. 1G isn't all that
big.
Could
you read it into a RAMDir and use that for your searches? Then, when you
add
data, you add it to *both* indexes, but close/open the RAMdir for
searching.

It's also possible to keep the RAMdir as the delta between the FSdir and
"current" states of your index. Add to both and search both. Although
deletes may be a problem here.

You haven't specified how often you expect changes, though. 100/ second? 1/minute? How real is "real time"? You could do something like warm up
a new reader in the background whenever you decided you needed to be
absolutely up to date and swap your "live" reader for the newly warmed up
one whenever you deemed it wise.

Or you could just close/open your reader after each modification, fire off
a

couple of warmup queries at it and let the users live with slow responses
if they happen to search before your warm-up queries completed.

The point is that there are many options, but to suggest the best one, we need some throughput numbers and a better definition of what "real time"
means. Is a one minute delay acceptable? 10 seconds? a millisecond?
the answer defines the scope of reasonable solutions.....

Best
Erick

On 8/10/07, Antonello Provenzano <antone...@deveel.com> wrote:

Kai,

The context I'm going to work with requires a continuous addition of
documents to the indexes, since it's user-driven content, and this
would require the content to be always up-to-date.
This is the problem I'm facing, since I cannot rebuild a 1Gb (at
least) index every time a user inserts a new entry into the database.

I know Digg, for instance, is using Lucene as search engine: since the amount of data they're dealing with is much higher than mine, I would
like to understand the way they used to implement this kind of
solution.

Thank you again.
Antonello


On 8/10/07, Kai Hu <kai...@dusee.cn> wrote:
Antonello,
You are right,I think lucene indexsearcher will search the old
information if IndexWriter was not closed(I think lucene release the
Lock
here),so I only add a few documents every time from buffer to implement
index "real time".

kai


发件人: antonellop...@gmail.com [mailto:antonellop...@gmail.co m] 代表
Antonello Provenzano
发送时间: 2007年8月10日 星期五 17:59
收件人: java-user@lucene.apache.org
主题: Re: 答复: Lucene in large database contexts

Kai,

Thanks. The problem I see it's that although I can add a Document
through IndexWriter or IndexModifier, this won't be searchable until the index is closed and, possibly, optimized, since the score of the
document in the index context must be re-calculated on the basis of
the whole context.

Is this assumption true? or am I completely wrong?

Cheers.
Antonello


On 8/10/07, Kai Hu <kai...@dusee.cn> wrote:
Hi, Antonello
       You can use IndexWriter.addDocument(Document document) to
add
single document,same to update,delete operation.

kai

-----邮件原件-----
发件人: Antonello Provenzano [mailto:antonellop...@gmail.com]
发送时间: 2007年8月10日 星期五 17:09
收件人: java-user@lucene.apache.org
主题: Lucene in large database contexts

Hi There!

I've been working for a while on the implementation of a website
oriented to contents that would contain millions of entries, most of
them indexable (such as descriptions, texts, names, etc.).
The ideal solution to make them searchable would be to use Lucene as
index and search engine.

The reason I'm posting the mailing list is the following: since all
the entries will be stored in a database (most likely MySQL InnoDB
or
Oracle), what's the best technique to implement a system that
indexes
in "real time" (eg. when an entry is inserted into the databsse) the content and make it searchable? Based on my understanding of Lucene, such this thing is not possible, since the index must be re- created
to
be able to search the indexed contents. Is this true?

Eventually, could anyone point me to a working example about how to
implement such a similar context?


Thank you for the support.
Antonello


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--- ------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to