Re: Using IndexReader in the web environment

Ian Lea Tue, 04 May 2010 06:32:41 -0700

For best performance you should aim to keep a shared index searcher,
or the underlying index reader, open as long as possible.  You may of
course need to reopen it if/when the index changes.  As to scope, you
can store it wherever it makes sense for your application.



--
Ian.


On Tue, May 4, 2010 at 10:13 AM, Vijay Veeraraghavan
<vijay.raghava...@gmail.com> wrote:
> Hi,
> Thanks for the reply. So I will have a dedicated servlet to search the
> index, but does it mean that the indexsearcher does not close the
> index, keep it open? Is it not possible to keep it in the application
> scope?
>
> Vijay
>
> On 5/3/10, Vijay Veeraraghavan <vijay.raghava...@gmail.com> wrote:
>> Hi all,
>>
>> In a clustered environment I search the index from the web
>> application. In the web application I am creating IndexReader on each
>> request. is it expensive to do like this? I read somewhere in the web
>> that try using the same reader as much as possible. Can i keep the
>> initially created IndexReader in the session/application scopes and
>> use the same for each request? Any other idea?
>>
>> Viay
>>
>> On 5/3/10, Vijay Veeraraghavan <vijay.raghava...@gmail.com> wrote:
>>> dear all,
>>>
>>> as replied below, does searching again for the document in the index
>>> and if found skip the indexing else index it, is this not similar to
>>> indexing all pdf documents once again, is not this overhead? As I am
>>> not going to index the details of the pdf (so if an indexed pdf was
>>> recreated i need not reindex it) but just the paths of the documents.
>>>
>>> Vijay
>>>
>>>>> Hey there,
>>>>>
>>>>> you might have to implement a some kind of unique identifier using an
>>>>> indexed lucene field. When you are indexing you should fire a query
>>>>> with
>>>>> the
>>>>> uuid of your document (maybe the path to you pdf document) and check if
>>>>> the
>>>>> document is in the index already. You could also do a boolean query
>>>>> combining UUID, timestamp and / or a hash value to see if the document
>>>>> has
>>>>> been changed. if so you can simply update the document by its UUID
>>>>> (something like indexwriter.updateDocument(new Term("uuid",
>>>>> value),document);)
>>>>>
>>>>> Unfortunately you have to implement this yourself but it should not be
>>>>> that
>>>>> much of a deal.
>>>>>
>>>>> simon
>>>>>
>>>>> On Mon, May 3, 2010 at 9:21 AM, Vijay Veeraraghavan <
>>>>> vijay.raghava...@gmail.com> wrote:
>>>>>
>>>>>> Dear all,
>>>>>> I am using lucene 3.0 to index the pdf reports that I generate
>>>>>> dynamically. I index the pdf file name (without extension), file path
>>>>>> and its absolute path as fields. I search with the file name without
>>>>>> extension; it retrieves a list, as usually 2 or more files are present
>>>>>> in the same name in different sub directories. As I create the index
>>>>>> for the first time it updates, assuming 100 pdf files in different
>>>>>> directories, the files meta info. If again I do indexing, while my
>>>>>> report generator scheduler has the produced 500 more pdf files
>>>>>> totaling to 600 files in different directories, I wish to index only
>>>>>> the new files to the index. But presently it’s doing the whole thing
>>>>>> again (600 files). How to implement this functionality? Think of the
>>>>>> thousands of pdf files created on each run.
>>>>>>
>>>>>> P.S: I cannot keep the meta-info of generated pdf files in the java
>>>>>> memory, as it exceeds thousands in a single run, and update the index
>>>>>> looping this list.
>>>>>>
>>>>>> new IndexWriter(FSDirectory.open(this.indexDir), new StandardAnalyzer(
>>>>>>                                        Version.LUCENE_CURRENT), true,
>>>>>>
>>>>>> IndexWriter.MaxFieldLength.LIMITED);
>>>>>>
>>>>>> is the boolean parameter is for this purpose? Please guide me.
>>>>>>
>>>>>> --
>>>>>> Thanks
>>>>>> Vijay Veeraraghavan
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks & Regards
>>>>>> Vijay Veeraraghavan
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards
>>>> Vijay Veeraraghavan
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards
>>> Vijay Veeraraghavan
>>>
>>
>>
>> --
>> Thanks & Regards
>> Vijay Veeraraghavan
>>
>
>
> --
> Thanks & Regards
> Vijay Veeraraghavan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using IndexReader in the web environment

Reply via email to