Re: Accumulating facets over a MultiReader

Shai Erera Wed, 03 Jul 2013 11:50:03 -0700

What do you mean addDocument()? You re-index it?
In that case, when you re-index it, just make sure to use
FacetFields.addFacets() on it, so its facets are re-indexed too.


Shai


On Wed, Jul 3, 2013 at 8:52 PM, Peng Gao <[email protected]> wrote:

> Shai,
> Thanks.
>
> I went with option #3 since the temp indexes are actually created in
> separate processes
> in my case.
> It works.
>
> Now one more complication.
> I have a case where I need to merge only unique docs in the temp indexes
> into the master
> index. I have a unique key for each doc. Before facets, I
> loop through the temp index, and for each doc, check if it's already in
> the master,
> addDocument() only if it doesn't exist.
> Now I have facets, how do I selectively merge docs?
>
> Thanks again for your help,
> Gao Peng
>
>
> > -----Original Message-----
> > From: Shai Erera [mailto:[email protected]]
> > Sent: Wednesday, July 03, 2013 9:02 AM
> > To: [email protected]
> > Subject: Re: Accumulating facets over a MultiReader
> >
> > Hi
> >
> > There are a couple of ways you can address that:
> >
> > Not create an index per-thread, but rather update the global index by all
> > threads. IndexWriter and TaxoWriter support multiple threads.
> >
> > -- Or, if you need to build an index per-thread --
> >
> > Use a single TaxonomyWriter instance and share between all the threads.
> > TaxoWriter is thread-safe, and that way you can build a single taxonomy
> > index and later use IW.addIndexes.
> >
> > -- Or, if you cannot share TW instance between threads --
> >
> > Have each thread create its own taxonomy index, but then when you call
> > addIndexes, you need to do two things:
> > - Create a new TW instance and call addTaxonomy on it.
> > - Call IW.addIndexes() with an OrdinalMappingAtomicReader. Look at its
> > jdocs for an example code.
> >
> > Let me know if that works for you.
> >
> > Shai
> >
> >
> >
> > On Wed, Jul 3, 2013 at 6:14 PM, Peng Gao <[email protected]> wrote:
> >
> > > Hi Shai,
> > > Thanks for the reply.
> > > Yes I used a single TaxonomyReader instance.
> > > I am adding facets to an existing app, which maintains two indexes,
> > > one for indexing system tools, and the other indexing user data in
> > > folders.
> > > The system tool index contains docs for describing the tool usage, and
> > > etc, which needs to be its own index.
> > >
> > > It turned out that my problem is not MultiReader. The problem is the
> > > index, i.e. the way it's created.
> > > The app crawls folders in multiple threads, and each thread creates a
> > > temp index.
> > > The main thread merges the temp indexes into the master index, using
> > > IndexWriter.AddIndexes().
> > > If the temp index has facet index, this approach creates a bad index.
> > >
> > > Is there a way I can build faceted index in multiple threads?
> > >
> > > - Gao Peng
> > >
> > > > -----Original Message-----
> > > > From: Shai Erera [mailto:[email protected]]
> > > > Sent: Monday, July 01, 2013 8:25 PM
> > > > To: [email protected]
> > > > Subject: Re: Accumulating facets over a MultiReader
> > > >
> > > > Hi,
> > > >
> > > > I assume that you use a single TaxonomyReader instance? It must be
> > > > the
> > > same
> > > > for both indexes, that is, both indexes must share the same taxonomy
> > > index,
> > > > or otherwise their ordinals would not match as well as you may hit
> > > > such exceptions since one index may have bigger ordinals than what
> > > > the
> > > taxonomy
> > > > reader knows about.
> > > >
> > > > Can you share a little bit about your scenario and why do you need
> > > > to
> > > use a
> > > > MultiReader?
> > > >
> > > > Shai
> > > >
> > > >
> > > >
> > > > On Tue, Jul 2, 2013 at 3:31 AM, Peng Gao <[email protected]> wrote:
> > > >
> > > > > How do I accumulate counts over a MultiReader (2 IndexReader)?
> > > > > The following code causes an IOException:
> > > > >
> > > > >       ArrayList<FacetRequest> facetRequests = new
> > > > > ArrayList<FacetRequest>();
> > > > >       for (String groupField : groupFields)
> > > > >         facetRequests.add(new CountFacetRequest(new
> > > > > CategoryPath(groupField, '/'), 1));
> > > > >
> > > > >       FacetSearchParams facetSearchParams = new
> > > > > FacetSearchParams(facetRequests);
> > > > >       StandardFacetsAccumulator accumulator = new
> > > > > StandardFacetsAccumulator(facetSearchParams, reader,
> taxonomyReader);
> > > > >       FacetsCollector facetsCollector =
> > > > > FacetsCollector.create(accumulator);
> > > > >
> > > > >       // perform documents search and facets accumulation
> > > > >       searcher.search(query, facetsCollector);
> > > > >
> > > > >       // return facets results in a proper format
> > > > >       return getFacetResults(facetsCollector, sr);
> > > > >
> > > > >
> > > > > Here reader is a MultiReader of 2. I am using Lucene 4.3.1.
> > > > >
> > > > > The following is the callstack. It looks like it has something to
> > > > > do with the MultiReader.
> > > > > How do I make it work?
> > > > >
> > > > >
> > > > > java.io.IOException: PANIC: Got unexpected exception while trying
> > > > > to get/calculate total counts
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulate(St
> > > andar
> > > > dFacetsAccumulator.java:156)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulate(St
> > > andar
> > > > dFacetsAccumulator.java:378)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.FacetsCollector.getFacetResults(FacetsC
> > > ollec
> > > > tor.java:214)
> > > > >       at
> > > > >
> > > >
> > > com.esri.arcgis.search.SearchHandler.getFacetResults(SearchHandler.jav
> > > a:551
> > > > )
> > > > >       at
> > > > > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:350)
> > > > >       at
> > > > > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:239)
> > > > >       at
> > > > > com.esri.arcgis.search.test.Searcher.invokeSearch(Searcher.java:58)
> > > > >       at
> > > > > com.esri.arcgis.search.test.Searcher.main(Searcher.java:32)
> > > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 34
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.CountingAggregator.aggregate(CountingAg
> > > grega
> > > > tor.java:43)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.StandardFacetsAccumulator.fillArraysFor
> > > Parti
> > > > tion(StandardFacetsAccumulator.java:309)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulate(St
> > > andar
> > > > dFacetsAccumulator.java:168)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.complements.TotalFacetCounts.compute(TotalFace
> > > tCoun
> > > > ts.java:176)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.complements.TotalFacetCountsCache.computeAndCa
> > > che(T
> > > > otalFacetCountsCache.java:157)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.complements.TotalFacetCountsCache.getTotalCoun
> > > ts(To
> > > > talFacetCountsCache.java:104)
> > > > >       at
> > > > >
> > > >
> > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accumulate(St
> > > andar
> > > > dFacetsAccumulator.java:129)
> > > > >       ... 7 more
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Accumulating facets over a MultiReader

Reply via email to