Re: Faceted Search using Lucene

Amin Mohammed-Coleman Sun, 01 Mar 2009 06:19:29 -0800

sorrry I added

release(multiSearcher);



instead of multiSearcher.close();

On Sun, Mar 1, 2009 at 2:17 PM, Amin Mohammed-Coleman <[email protected]>wrote:

> Hi
> I've now done the following:
>
> public Summary[] search(final SearchRequest searchRequest)  
> throwsSearchExecutionException {
>
> final String searchTerm = searchRequest.getSearchTerm();
>
> if (StringUtils.isBlank(searchTerm)) {
>
> throw new SearchExecutionException("Search string cannot be empty. There
> will be too many results to process.");
>
> }
>
> List<Summary> summaryList = new ArrayList<Summary>();
>
> StopWatch stopWatch = new StopWatch("searchStopWatch");
>
> stopWatch.start();
>
> List<IndexSearcher> indexSearchers = new ArrayList<IndexSearcher>();
>
> try {
>
> LOGGER.debug("Ensuring all index readers are up to date...");
>
> maybeReopen();
>
> LOGGER.debug("All Index Searchers are up to date. No of index searchers '"+ 
> indexSearchers.size() +
> "'");
>
>  Query query = queryParser.parse(searchTerm);
>
> LOGGER.debug("Search Term '" + searchTerm +"' ----> Lucene Query '" +
> query.toString() +"'");
>
>  Sort sort = null;
>
> sort = applySortIfApplicable(searchRequest);
>
>  Filter[] filters =applyFiltersIfApplicable(searchRequest);
>
>  ChainedFilter chainedFilter = null;
>
> if (filters != null) {
>
> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>
> }
>
> TopDocs topDocs = get().search(query,chainedFilter ,100,sort);
>
> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>
> LOGGER.debug("total number of hits for [" + query.toString() + " ] = 
> "+topDocs.
> totalHits);
>
>  for (ScoreDoc scoreDoc : scoreDocs) {
>
> final Document doc = multiSearcher.doc(scoreDoc.doc);
>
> float score = scoreDoc.score;
>
> final BaseDocument baseDocument = new BaseDocument(doc, score);
>
> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>
> summaryList.add(documentSummary);
>
> }
>
> multiSearcher.close();
>
> } catch (Exception e) {
>
> throw new IllegalStateException(e);
>
> }
>
> stopWatch.stop();
>
>  LOGGER.debug("total time taken for document seach: " +
> stopWatch.getTotalTimeMillis() + " ms");
>
> return summaryList.toArray(new Summary[] {});
>
> }
>
>
> And have the following methods:
>
> @PostConstruct
>
> public void initialiseQueryParser() {
>
> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
> analyzer);
>
> analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), 
> newKeywordAnalyzer());
>
> queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
> analyzerWrapper);
>
>  try {
>
> LOGGER.debug("Initialising multi searcher ....");
>
> this.multiSearcher = new MultiSearcher(searchers.toArray(newIndexSearcher[] 
> {}));
>
> LOGGER.debug("multi searcher initialised");
>
> } catch (IOException e) {
>
> throw new IllegalStateException(e);
>
> }
>
>  }
>
>
> Initialises mutltisearcher when this class is creared by spring.
>
>
>  private synchronized void swapMultiSearcher(MultiSearcher
> newMultiSearcher)  {
>
> try {
>
> release(multiSearcher);
>
> } catch (IOException e) {
>
> throw new IllegalStateException(e);
>
> }
>
> multiSearcher = newMultiSearcher;
>
> }
>
>   public void maybeReopen() throws IOException {
>
>  MultiSearcher newMultiSeacher = null;
>
>  boolean refreshMultiSeacher = false;
>
>  List<IndexSearcher> indexSearchers = new ArrayList<IndexSearcher>();
>
>  synchronized (searchers) {
>
>  for (IndexSearcher indexSearcher: searchers) {
>
>  IndexReader reader = indexSearcher.getIndexReader();
>
>  reader.incRef();
>
>  Directory directory = reader.directory();
>
>  long currentVersion = reader.getVersion();
>
>  if (IndexReader.getCurrentVersion(directory) != currentVersion) {
>
>  IndexReader newReader = indexSearcher.getIndexReader().reopen();
>
>  if (newReader != reader) {
>
>  reader.decRef();
>
>  refreshMultiSeacher = true;
>
>  }
>
>  reader = newReader;
>
>  IndexSearcher newSearcher = new IndexSearcher(newReader);
>
>  indexSearchers.add(newSearcher);
>
>  }
>
>  }
>
>  }
>
>
>
>  if (refreshMultiSeacher) {
>
> newMultiSeacher = new MultiSearcher(indexSearchers.toArray(newIndexSearcher[] 
> {}));
>
> warm(newMultiSeacher);
>
> swapMultiSearcher(newMultiSeacher);
>
>  }
>
>
>
>  }
>
>
>   private void warm(MultiSearcher newMultiSeacher) {
>
>  }
>
>
>
>  private synchronized MultiSearcher get() {
>
> for (IndexSearcher indexSearcher: searchers) {
>
> indexSearcher.getIndexReader().incRef();
>
> }
>
> return multiSearcher;
>
> }
>
>  private synchronized void release(MultiSearcher multiSearcher) 
> throwsIOException {
>
> for (IndexSearcher indexSearcher: searchers) {
>
> indexSearcher.getIndexReader().decRef();
>
> }
>
> }
>
>
> However I am now getting
>
>
> java.lang.IllegalStateException:
> org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
>
>
> on the call:
>
>
>  private synchronized MultiSearcher get() {
>
> for (IndexSearcher indexSearcher: searchers) {
>
> indexSearcher.getIndexReader().incRef();
>
> }
>
> return multiSearcher;
>
> }
>
>
> I'm doing something wrong ..obviously..not sure where though..
>
>
> Cheers
>
>
> On Sun, Mar 1, 2009 at 1:36 PM, Michael McCandless <
> [email protected]> wrote:
>
>>
>> I was wondering the same thing ;)
>>
>> It's best to call this method from a single BG "warming" thread, in which
>> case it would not need its own synchronization.
>>
>> But, to be safe, I'll add internal synchronization to it.  You can't
>> simply put synchronized in front of the method, since you don't want this to
>> block searching.
>>
>>
>> Mike
>>
>> Amin Mohammed-Coleman wrote:
>>
>>  just a quick point:
>>> public void maybeReopen() throws IOException {                 //D
>>>  long currentVersion = currentSearcher.getIndexReader().getVersion();
>>>  if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>>>    IndexReader newReader = currentSearcher.getIndexReader().reopen();
>>>    assert newReader != currentSearcher.getIndexReader();
>>>    IndexSearcher newSearcher = new IndexSearcher(newReader);
>>>    warm(newSearcher);
>>>    swapSearcher(newSearcher);
>>>  }
>>> }
>>>
>>> should the above be synchronised?
>>>
>>> On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <[email protected]
>>> >wrote:
>>>
>>>  thanks.  i will rewrite..in between giving my baby her feed and playing
>>>> with the other child and my wife who wants me to do several other
>>>> things!
>>>>
>>>>
>>>>
>>>> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
>>>> [email protected]> wrote:
>>>>
>>>>
>>>>> Amin Mohammed-Coleman wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>>> Thanks for your input.  I would like to have a go at doing this myself
>>>>>> first, Solr may be an option.
>>>>>>
>>>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>>> creating unnecessary garbage; instead, they should be created once
>>>>>> & reused.
>>>>>>
>>>>>> -- I can moved the code out so that it is only created once and
>>>>>> reused.
>>>>>>
>>>>>>
>>>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>>> when nothing has changed.  This just generates unnecessary garbage
>>>>>> which GC then must sweep up.
>>>>>>
>>>>>> -- This was something I thought about.  I could move it out so that
>>>>>> it's
>>>>>> created once.  However I presume inside my code i need to check
>>>>>> whether
>>>>>> the
>>>>>> indexreaders are update to date.  This needs to be synchronized as
>>>>>> well I
>>>>>> guess(?)
>>>>>>
>>>>>>
>>>>> Yes you should synchronize the check for whether the IndexReader is
>>>>> current.
>>>>>
>>>>> * I don't see any synchronization -- it looks like two search
>>>>>
>>>>>> requests are allowed into this method at the same time?  Which is
>>>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>>>> readers.
>>>>>> --  So i need to extract the logic for reopening and provide a
>>>>>> synchronisation mechanism.
>>>>>>
>>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>> Ok.  So I have some work to do.  I'll refactor the code and see if I
>>>>> can
>>>>>
>>>>>> get
>>>>>> inline to your recommendations.
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>  On a quick look, I think there are a few problems with the code:
>>>>>>>
>>>>>>> * I don't see any synchronization -- it looks like two search
>>>>>>> requests are allowed into this method at the same time?  Which is
>>>>>>> dangerous... eg both (or, more) will wastefully reopen the
>>>>>>> readers.
>>>>>>>
>>>>>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>>>>> don't see a corresponding decRef.
>>>>>>>
>>>>>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>>>>> meaning the unlucky search request that hits a reopen pays the
>>>>>>> cost.  This might be OK if the index is small enough that
>>>>>>> reopening & warming takes very little time.  But if index gets
>>>>>>> large, making a random search pay that warming cost is not nice to
>>>>>>> the end user.  It erodes their trust in you.
>>>>>>>
>>>>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>>>> when nothing has changed.  This just generates unnecessary garbage
>>>>>>> which GC then must sweep up.
>>>>>>>
>>>>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>>>> creating unnecessary garbage; instead, they should be created once
>>>>>>> & reused.
>>>>>>>
>>>>>>> You should consider simply using Solr -- it handles all this logic
>>>>>>> for
>>>>>>> you and has been well debugged with time...
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>> Amin Mohammed-Coleman wrote:
>>>>>>>
>>>>>>> The reason for the indexreader.reopen is because I have a webapp
>>>>>>> which
>>>>>>>
>>>>>>>  enables users to upload files and then search for the documents.  If
>>>>>>>> I
>>>>>>>> don't
>>>>>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>>>>>
>>>>>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>>  I have been able to get the code working for my scenario, however I
>>>>>>>>> have
>>>>>>>>> a
>>>>>>>>> question and I was wondering if I could get some help.  I have a
>>>>>>>>> list
>>>>>>>>> of
>>>>>>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>>>>>>> indexsearchers to get each indexreader and put them into a
>>>>>>>>> MultiIndexReader.
>>>>>>>>>
>>>>>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>>>>>
>>>>>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>>>>>
>>>>>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>>>>>
>>>>>>>>> readers[i] = indexSearcher.getIndexReader();
>>>>>>>>>
>>>>>>>>> IndexReader newReader = readers[i].reopen();
>>>>>>>>>
>>>>>>>>> if (newReader != readers[i]) {
>>>>>>>>>
>>>>>>>>> readers[i].close();
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> readers[i] = newReader;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> multiReader = new MultiReader(readers);
>>>>>>>>>
>>>>>>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>>>>>> newOpenBitSetFacetHitCounter();
>>>>>>>>>
>>>>>>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I then use the indexseacher to do the facet stuff.  I end the code
>>>>>>>>> with
>>>>>>>>> closing the multireader.  This is causing problems in another
>>>>>>>>> method
>>>>>>>>> where I
>>>>>>>>> do some other search as the indexreaders are closed.  Is it ok to
>>>>>>>>> not
>>>>>>>>> close
>>>>>>>>> the multiindexreader or should I do some additional checks in the
>>>>>>>>> other
>>>>>>>>> method to see if the indexreader is closed?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> P.S. Hope that made sense...!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks just what I needed!
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Amin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Amin:
>>>>>>>>>>
>>>>>>>>>> Please take a look a this blog post:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>>>>>>>>> Best regards, Marcelo.
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Sorry to re send this email but I was wondering if I could get
>>>>>>>>>>>> some
>>>>>>>>>>>> advice
>>>>>>>>>>>> on this.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>> Amin
>>>>>>>>>>>>
>>>>>>>>>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman <
>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  I am looking at building a faceted search using Lucene.  I know
>>>>>>>>>>>>> that
>>>>>>>>>>>>> Solr
>>>>>>>>>>>>> comes with this built in, however I would like to try this by
>>>>>>>>>>>>> myself
>>>>>>>>>>>>> (something to add to my CV!).  I have been looking around and I
>>>>>>>>>>>>> found
>>>>>>>>>>>>> that
>>>>>>>>>>>>> you can use the IndexReader and use TermVectors.  This looks ok
>>>>>>>>>>>>> but
>>>>>>>>>>>>> I'm
>>>>>>>>>>>>> not
>>>>>>>>>>>>> sure how to filter the results so that a particular user can
>>>>>>>>>>>>> only
>>>>>>>>>>>>> see
>>>>>>>>>>>>> a
>>>>>>>>>>>>> subset of results.  The next option I was looking at was
>>>>>>>>>>>>> something
>>>>>>>>>>>>> like
>>>>>>>>>>>>>
>>>>>>>>>>>>> Term term1 = new Term("brand", "ford");
>>>>>>>>>>>>> Term term2 = new Term("brand", "vw");
>>>>>>>>>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>>>>>>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>>>>>>>>>
>>>>>>>>>>>>> The only problem here is that I have to provide the brand type
>>>>>>>>>>>>> each
>>>>>>>>>>>>> time a
>>>>>>>>>>>>> new brand is created.  Again I'm not sure how I can filter the
>>>>>>>>>>>>> results
>>>>>>>>>>>>> here.
>>>>>>>>>>>>> It may be that I'm using the wrong api methods to do this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would be grateful if I could get some advice on this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>> Amin
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S.  I am basically trying to do something that displays the
>>>>>>>>>>>>> following
>>>>>>>>>>>>>
>>>>>>>>>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>  --
>>>>>>>>>>> Marcelo F. Ochoa
>>>>>>>>>>> http://marceloochoa.blogspot.com/
>>>>>>>>>>> http://marcelo.ochoa.googlepages.com/home
>>>>>>>>>>> ______________
>>>>>>>>>>> Want to integrate Lucene and Oracle?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>>>>>>>>>> Is Oracle 11g REST ready?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>
>>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Faceted Search using Lucene

Reply via email to