Hey Guys, I am trying to do something similar. Make the content search-able as soon as it is added to the website. The way it can work in my scenario is that , I create the Index for a every new user account created.
Then, whenever a new document is uploaded, its contents are added to the users Index using writer.addDocument(...) As for closing the writer, yes ! I'll close the writer and optimize after its added to the index. I really think this should work. Don't you ? thanks AZ On 8/10/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Well, closing/opening an index is MUCH less expensive than > rebuilding the whole thing, so I don't understand part of your > statements.... > > It *may* (but I haven't tried it) be possible to flush the writer rather > than > close/open it. But, you MUST close/reopen the reader you search with > even if flush works like I think it does. > > But it's also possible to use a two tiered approach. 1G isn't all that > big. > Could > you read it into a RAMDir and use that for your searches? Then, when you > add > data, you add it to *both* indexes, but close/open the RAMdir for > searching. > > It's also possible to keep the RAMdir as the delta between the FSdir and > "current" states of your index. Add to both and search both. Although > deletes may be a problem here. > > You haven't specified how often you expect changes, though. 100/second? > 1/minute? How real is "real time"? You could do something like warm up > a new reader in the background whenever you decided you needed to be > absolutely up to date and swap your "live" reader for the newly warmed up > one whenever you deemed it wise. > > Or you could just close/open your reader after each modification, fire off > a > > couple of warmup queries at it and let the users live with slow responses > if they happen to search before your warm-up queries completed. > > The point is that there are many options, but to suggest the best one, we > need some throughput numbers and a better definition of what "real time" > means. Is a one minute delay acceptable? 10 seconds? a millisecond? > the answer defines the scope of reasonable solutions..... > > Best > Erick > > On 8/10/07, Antonello Provenzano <[EMAIL PROTECTED]> wrote: > > > > Kai, > > > > The context I'm going to work with requires a continuous addition of > > documents to the indexes, since it's user-driven content, and this > > would require the content to be always up-to-date. > > This is the problem I'm facing, since I cannot rebuild a 1Gb (at > > least) index every time a user inserts a new entry into the database. > > > > I know Digg, for instance, is using Lucene as search engine: since the > > amount of data they're dealing with is much higher than mine, I would > > like to understand the way they used to implement this kind of > > solution. > > > > Thank you again. > > Antonello > > > > > > On 8/10/07, Kai Hu <[EMAIL PROTECTED]> wrote: > > > Antonello, > > > You are right,I think lucene indexsearcher will search the old > > information if IndexWriter was not closed(I think lucene release the > Lock > > here),so I only add a few documents every time from buffer to implement > > index "real time". > > > > > > kai > > > > > > > > > 发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 代表 > > Antonello Provenzano > > > 发送时间: 2007年8月10日 星期五 17:59 > > > 收件人: java-user@lucene.apache.org > > > 主题: Re: 答复: Lucene in large database contexts > > > > > > Kai, > > > > > > Thanks. The problem I see it's that although I can add a Document > > > through IndexWriter or IndexModifier, this won't be searchable until > > > the index is closed and, possibly, optimized, since the score of the > > > document in the index context must be re-calculated on the basis of > > > the whole context. > > > > > > Is this assumption true? or am I completely wrong? > > > > > > Cheers. > > > Antonello > > > > > > > > > On 8/10/07, Kai Hu <[EMAIL PROTECTED]> wrote: > > > > Hi, Antonello > > > > You can use IndexWriter.addDocument(Document document) to > add > > single document,same to update,delete operation. > > > > > > > > kai > > > > > > > > -----邮件原件----- > > > > 发件人: Antonello Provenzano [mailto:[EMAIL PROTECTED] > > > > 发送时间: 2007年8月10日 星期五 17:09 > > > > 收件人: java-user@lucene.apache.org > > > > 主题: Lucene in large database contexts > > > > > > > > Hi There! > > > > > > > > I've been working for a while on the implementation of a website > > > > oriented to contents that would contain millions of entries, most of > > > > them indexable (such as descriptions, texts, names, etc.). > > > > The ideal solution to make them searchable would be to use Lucene as > > > > index and search engine. > > > > > > > > The reason I'm posting the mailing list is the following: since all > > > > the entries will be stored in a database (most likely MySQL InnoDB > or > > > > Oracle), what's the best technique to implement a system that > indexes > > > > in "real time" (eg. when an entry is inserted into the databsse) the > > > > content and make it searchable? Based on my understanding of Lucene, > > > > such this thing is not possible, since the index must be re-created > to > > > > be able to search the indexed contents. Is this true? > > > > > > > > Eventually, could anyone point me to a working example about how to > > > > implement such a similar context? > > > > > > > > > > > > Thank you for the support. > > > > Antonello > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > >