Re: Faceted Search using Lucene

Amin Mohammed-Coleman Sun, 01 Mar 2009 05:28:34 -0800

just a quick point:
 public void maybeReopen() throws IOException {                 //D
   long currentVersion = currentSearcher.getIndexReader().getVersion();
   if (IndexReader.getCurrentVersion(dir) != currentVersion) {
     IndexReader newReader = currentSearcher.getIndexReader().reopen();
     assert newReader != currentSearcher.getIndexReader();
     IndexSearcher newSearcher = new IndexSearcher(newReader);
     warm(newSearcher);
     swapSearcher(newSearcher);
   }
 }


should the above be synchronised?

On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <[email protected]>wrote:

> thanks.  i will rewrite..in between giving my baby her feed and playing
> with the other child and my wife who wants me to do several other things!
>
>
>
> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
> [email protected]> wrote:
>
>>
>> Amin Mohammed-Coleman wrote:
>>
>>  Hi
>>> Thanks for your input.  I would like to have a go at doing this myself
>>> first, Solr may be an option.
>>>
>>> * You are creating a new Analyzer & QueryParser every time, also
>>>  creating unnecessary garbage; instead, they should be created once
>>>  & reused.
>>>
>>> -- I can moved the code out so that it is only created once and reused.
>>>
>>>
>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>  when nothing has changed.  This just generates unnecessary garbage
>>>  which GC then must sweep up.
>>>
>>> -- This was something I thought about.  I could move it out so that it's
>>> created once.  However I presume inside my code i need to check whether
>>> the
>>> indexreaders are update to date.  This needs to be synchronized as well I
>>> guess(?)
>>>
>>
>> Yes you should synchronize the check for whether the IndexReader is
>> current.
>>
>>  * I don't see any synchronization -- it looks like two search
>>>  requests are allowed into this method at the same time?  Which is
>>>  dangerous... eg both (or, more) will wastefully reopen the
>>>  readers.
>>> --  So i need to extract the logic for reopening and provide a
>>> synchronisation mechanism.
>>>
>>
>> Yes.
>>
>>
>>  Ok.  So I have some work to do.  I'll refactor the code and see if I can
>>> get
>>> inline to your recommendations.
>>>
>>>
>>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>>> [email protected]> wrote:
>>>
>>>
>>>> On a quick look, I think there are a few problems with the code:
>>>>
>>>> * I don't see any synchronization -- it looks like two search
>>>>  requests are allowed into this method at the same time?  Which is
>>>>  dangerous... eg both (or, more) will wastefully reopen the
>>>>  readers.
>>>>
>>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>>  don't see a corresponding decRef.
>>>>
>>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>>  meaning the unlucky search request that hits a reopen pays the
>>>>  cost.  This might be OK if the index is small enough that
>>>>  reopening & warming takes very little time.  But if index gets
>>>>  large, making a random search pay that warming cost is not nice to
>>>>  the end user.  It erodes their trust in you.
>>>>
>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>  when nothing has changed.  This just generates unnecessary garbage
>>>>  which GC then must sweep up.
>>>>
>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>  creating unnecessary garbage; instead, they should be created once
>>>>  & reused.
>>>>
>>>> You should consider simply using Solr -- it handles all this logic for
>>>> you and has been well debugged with time...
>>>>
>>>> Mike
>>>>
>>>> Amin Mohammed-Coleman wrote:
>>>>
>>>> The reason for the indexreader.reopen is because I have a webapp which
>>>>
>>>>> enables users to upload files and then search for the documents.  If I
>>>>> don't
>>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>>
>>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>>> [email protected]
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>>> I have been able to get the code working for my scenario, however I
>>>>>> have
>>>>>> a
>>>>>> question and I was wondering if I could get some help.  I have a list
>>>>>> of
>>>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>>>> indexsearchers to get each indexreader and put them into a
>>>>>> MultiIndexReader.
>>>>>>
>>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>>
>>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>>
>>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>>
>>>>>> readers[i] = indexSearcher.getIndexReader();
>>>>>>
>>>>>>  IndexReader newReader = readers[i].reopen();
>>>>>>
>>>>>> if (newReader != readers[i]) {
>>>>>>
>>>>>> readers[i].close();
>>>>>>
>>>>>> }
>>>>>>
>>>>>> readers[i] = newReader;
>>>>>>
>>>>>>
>>>>>>
>>>>>> }
>>>>>>
>>>>>> multiReader = new MultiReader(readers);
>>>>>>
>>>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>>> newOpenBitSetFacetHitCounter();
>>>>>>
>>>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>>>
>>>>>>
>>>>>> I then use the indexseacher to do the facet stuff.  I end the code
>>>>>> with
>>>>>> closing the multireader.  This is causing problems in another method
>>>>>> where I
>>>>>> do some other search as the indexreaders are closed.  Is it ok to not
>>>>>> close
>>>>>> the multiindexreader or should I do some additional checks in the
>>>>>> other
>>>>>> method to see if the indexreader is closed?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> P.S. Hope that made sense...!
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>>>> [email protected]
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>>>
>>>>>>> Thanks just what I needed!
>>>>>>>
>>>>>>> Cheers
>>>>>>> Amin
>>>>>>>
>>>>>>>
>>>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Amin:
>>>>>>>
>>>>>>>  Please take a look a this blog post:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>>>> Best regards, Marcelo.
>>>>>>>>
>>>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sorry to re send this email but I was wondering if I could get some
>>>>>>>>> advice
>>>>>>>>> on this.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>> Amin
>>>>>>>>>
>>>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I am looking at building a faceted search using Lucene.  I know
>>>>>>>>>> that
>>>>>>>>>> Solr
>>>>>>>>>> comes with this built in, however I would like to try this by
>>>>>>>>>> myself
>>>>>>>>>> (something to add to my CV!).  I have been looking around and I
>>>>>>>>>> found
>>>>>>>>>> that
>>>>>>>>>> you can use the IndexReader and use TermVectors.  This looks ok
>>>>>>>>>> but
>>>>>>>>>> I'm
>>>>>>>>>> not
>>>>>>>>>> sure how to filter the results so that a particular user can only
>>>>>>>>>> see
>>>>>>>>>> a
>>>>>>>>>> subset of results.  The next option I was looking at was something
>>>>>>>>>> like
>>>>>>>>>>
>>>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>>>
>>>>>>>>>> The only problem here is that I have to provide the brand type
>>>>>>>>>> each
>>>>>>>>>> time a
>>>>>>>>>> new brand is created.  Again I'm not sure how I can filter the
>>>>>>>>>> results
>>>>>>>>>> here.
>>>>>>>>>> It may be that I'm using the wrong api methods to do this.
>>>>>>>>>>
>>>>>>>>>> I would be grateful if I could get some advice on this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Amin
>>>>>>>>>>
>>>>>>>>>> P.S.  I am basically trying to do something that displays the
>>>>>>>>>> following
>>>>>>>>>>
>>>>>>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Marcelo F. Ochoa
>>>>>>>> http://marceloochoa.blogspot.com/
>>>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>>>> ______________
>>>>>>>> Want to integrate Lucene and Oracle?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>>>> Is Oracle 11g REST ready?
>>>>>>>>
>>>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Faceted Search using Lucene

Reply via email to