hey Ian, thanks for the reply. I find it very useful. My report generating scheduler will run periodically, once done it will invoke the indexer and exit. In this case I do not know if the index has changed or not. How do i keep track of the changes in the index? As the two entities, scheduler/indexer and the web application, are totally different.
Vijay On 5/4/10, Ian Lea <ian....@gmail.com> wrote: > For best performance you should aim to keep a shared index searcher, > or the underlying index reader, open as long as possible. You may of > course need to reopen it if/when the index changes. As to scope, you > can store it wherever it makes sense for your application. > > > -- > Ian. > > > On Tue, May 4, 2010 at 10:13 AM, Vijay Veeraraghavan > <vijay.raghava...@gmail.com> wrote: >> Hi, >> Thanks for the reply. So I will have a dedicated servlet to search the >> index, but does it mean that the indexsearcher does not close the >> index, keep it open? Is it not possible to keep it in the application >> scope? >> >> Vijay >> >> On 5/3/10, Vijay Veeraraghavan <vijay.raghava...@gmail.com> wrote: >>> Hi all, >>> >>> In a clustered environment I search the index from the web >>> application. In the web application I am creating IndexReader on each >>> request. is it expensive to do like this? I read somewhere in the web >>> that try using the same reader as much as possible. Can i keep the >>> initially created IndexReader in the session/application scopes and >>> use the same for each request? Any other idea? >>> >>> Viay >>> >>> On 5/3/10, Vijay Veeraraghavan <vijay.raghava...@gmail.com> wrote: >>>> dear all, >>>> >>>> as replied below, does searching again for the document in the index >>>> and if found skip the indexing else index it, is this not similar to >>>> indexing all pdf documents once again, is not this overhead? As I am >>>> not going to index the details of the pdf (so if an indexed pdf was >>>> recreated i need not reindex it) but just the paths of the documents. >>>> >>>> Vijay >>>> >>>>>> Hey there, >>>>>> >>>>>> you might have to implement a some kind of unique identifier using an >>>>>> indexed lucene field. When you are indexing you should fire a query >>>>>> with >>>>>> the >>>>>> uuid of your document (maybe the path to you pdf document) and check >>>>>> if >>>>>> the >>>>>> document is in the index already. You could also do a boolean query >>>>>> combining UUID, timestamp and / or a hash value to see if the document >>>>>> has >>>>>> been changed. if so you can simply update the document by its UUID >>>>>> (something like indexwriter.updateDocument(new Term("uuid", >>>>>> value),document);) >>>>>> >>>>>> Unfortunately you have to implement this yourself but it should not be >>>>>> that >>>>>> much of a deal. >>>>>> >>>>>> simon >>>>>> >>>>>> On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan < >>>>>> vijay.raghava...@gmail.com> wrote: >>>>>> >>>>>>> Dear all, >>>>>>> I am using lucene 3.0 to index the pdf reports that I generate >>>>>>> dynamically. I index the pdf file name (without extension), file path >>>>>>> and its absolute path as fields. I search with the file name without >>>>>>> extension; it retrieves a list, as usually 2 or more files are >>>>>>> present >>>>>>> in the same name in different sub directories. As I create the index >>>>>>> for the first time it updates, assuming 100 pdf files in different >>>>>>> directories, the files meta info. If again I do indexing, while my >>>>>>> report generator scheduler has the produced 500 more pdf files >>>>>>> totaling to 600 files in different directories, I wish to index only >>>>>>> the new files to the index. But presently it’s doing the whole thing >>>>>>> again (600 files). How to implement this functionality? Think of the >>>>>>> thousands of pdf files created on each run. >>>>>>> >>>>>>> P.S: I cannot keep the meta-info of generated pdf files in the java >>>>>>> memory, as it exceeds thousands in a single run, and update the index >>>>>>> looping this list. >>>>>>> >>>>>>> new IndexWriter(FSDirectory.open(this.indexDir), new >>>>>>> StandardAnalyzer( >>>>>>> Version.LUCENE_CURRENT), true, >>>>>>> >>>>>>> IndexWriter.MaxFieldLength.LIMITED); >>>>>>> >>>>>>> is the boolean parameter is for this purpose? Please guide me. >>>>>>> >>>>>>> -- >>>>>>> Thanks >>>>>>> Vijay Veeraraghavan >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks & Regards >>>>>>> Vijay Veeraraghavan >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards >>>>> Vijay Veeraraghavan >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards >>>> Vijay Veeraraghavan >>>> >>> >>> >>> -- >>> Thanks & Regards >>> Vijay Veeraraghavan >>> >> >> >> -- >> Thanks & Regards >> Vijay Veeraraghavan >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Thanks & Regards Vijay Veeraraghavan --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org