Re: Faceted Search using Lucene

Amin Mohammed-Coleman Sun, 01 Mar 2009 04:37:59 -0800

Hi
Thanks for your input.  I would like to have a go at doing this myself
first, Solr may be an option.


* You are creating a new Analyzer & QueryParser every time, also
   creating unnecessary garbage; instead, they should be created once
   & reused.

-- I can moved the code out so that it is only created once and reused.


 * You always make a new IndexSearcher and a new MultiSearcher even
   when nothing has changed.  This just generates unnecessary garbage
   which GC then must sweep up.

-- This was something I thought about.  I could move it out so that it's
created once.  However I presume inside my code i need to check whether the
indexreaders are update to date.  This needs to be synchronized as well I
guess(?)

 * I don't see any synchronization -- it looks like two search
   requests are allowed into this method at the same time?  Which is
   dangerous... eg both (or, more) will wastefully reopen the
   readers.
--  So i need to extract the logic for reopening and provide a
synchronisation mechanism.


Ok.  So I have some work to do.  I'll refactor the code and see if I can get
inline to your recommendations.


On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
[email protected]> wrote:

>
> On a quick look, I think there are a few problems with the code:
>
>  * I don't see any synchronization -- it looks like two search
>    requests are allowed into this method at the same time?  Which is
>    dangerous... eg both (or, more) will wastefully reopen the
>    readers.
>
>  * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>    don't see a corresponding decRef.
>
>  * You reopen and warm your searchers "live" (vs with BG thread);
>    meaning the unlucky search request that hits a reopen pays the
>    cost.  This might be OK if the index is small enough that
>    reopening & warming takes very little time.  But if index gets
>    large, making a random search pay that warming cost is not nice to
>    the end user.  It erodes their trust in you.
>
>  * You always make a new IndexSearcher and a new MultiSearcher even
>    when nothing has changed.  This just generates unnecessary garbage
>    which GC then must sweep up.
>
>  * You are creating a new Analyzer & QueryParser every time, also
>    creating unnecessary garbage; instead, they should be created once
>    & reused.
>
> You should consider simply using Solr -- it handles all this logic for
> you and has been well debugged with time...
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  The reason for the indexreader.reopen is because I have a webapp which
>> enables users to upload files and then search for the documents.  If I
>> don't
>> reopen i'm concerned that the facet hit counter won't be updated.
>>
>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <[email protected]
>> >wrote:
>>
>>  Hi
>>> I have been able to get the code working for my scenario, however I have
>>> a
>>> question and I was wondering if I could get some help.  I have a list of
>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>> indexsearchers to get each indexreader and put them into a
>>> MultiIndexReader.
>>>
>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>
>>> for (int i =0 ; i < searchables.length;i++) {
>>>
>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>
>>> readers[i] = indexSearcher.getIndexReader();
>>>
>>>   IndexReader newReader = readers[i].reopen();
>>>
>>> if (newReader != readers[i]) {
>>>
>>> readers[i].close();
>>>
>>> }
>>>
>>> readers[i] = newReader;
>>>
>>>
>>>
>>> }
>>>
>>> multiReader = new MultiReader(readers);
>>>
>>> OpenBitSetFacetHitCounter facetHitCounter =
>>> newOpenBitSetFacetHitCounter();
>>>
>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>
>>>
>>> I then use the indexseacher to do the facet stuff.  I end the code with
>>> closing the multireader.  This is causing problems in another method
>>> where I
>>> do some other search as the indexreaders are closed.  Is it ok to not
>>> close
>>> the multiindexreader or should I do some additional checks in the other
>>> method to see if the indexreader is closed?
>>>
>>>
>>>
>>> Cheers
>>>
>>>
>>> P.S. Hope that made sense...!
>>>
>>>
>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <[email protected]
>>> >wrote:
>>>
>>>  Hi
>>>>
>>>> Thanks just what I needed!
>>>>
>>>> Cheers
>>>> Amin
>>>>
>>>>
>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <[email protected]>
>>>> wrote:
>>>>
>>>> Hi Amin:
>>>>
>>>>> Please take a look a this blog post:
>>>>>
>>>>>
>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>> Best regards, Marcelo.
>>>>>
>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>  Hi
>>>>>>
>>>>>> Sorry to re send this email but I was wondering if I could get some
>>>>>> advice
>>>>>> on this.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Amin
>>>>>>
>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi
>>>>>>
>>>>>>>
>>>>>>> I am looking at building a faceted search using Lucene.  I know that
>>>>>>> Solr
>>>>>>> comes with this built in, however I would like to try this by myself
>>>>>>> (something to add to my CV!).  I have been looking around and I found
>>>>>>> that
>>>>>>> you can use the IndexReader and use TermVectors.  This looks ok but
>>>>>>> I'm
>>>>>>> not
>>>>>>> sure how to filter the results so that a particular user can only see
>>>>>>> a
>>>>>>> subset of results.  The next option I was looking at was something
>>>>>>> like
>>>>>>>
>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>
>>>>>>> The only problem here is that I have to provide the brand type each
>>>>>>> time a
>>>>>>> new brand is created.  Again I'm not sure how I can filter the
>>>>>>> results
>>>>>>> here.
>>>>>>> It may be that I'm using the wrong api methods to do this.
>>>>>>>
>>>>>>> I would be grateful if I could get some advice on this.
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>> Amin
>>>>>>>
>>>>>>> P.S.  I am basically trying to do something that displays the
>>>>>>> following
>>>>>>>
>>>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Marcelo F. Ochoa
>>>>> http://marceloochoa.blogspot.com/
>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>> ______________
>>>>> Want to integrate Lucene and Oracle?
>>>>>
>>>>>
>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>> Is Oracle 11g REST ready?
>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Faceted Search using Lucene

Reply via email to