Thanks. Yes, that's the case. I'll try it out.
Is Option 1 more expensive than re-indexing? > -----Original Message----- > From: Shai Erera [mailto:ser...@gmail.com] > Sent: Friday, July 05, 2013 8:25 AM > To: java-user@lucene.apache.org > Subject: Re: Accumulating facets over a MultiReader > > Yes, there are two ways to do that. First, I assume that what you want to > do is a.addIndexes(b), and if a document D in b is in a, you don't want to > add it to a, right? > > In that case, two options: > > Option 1 > Iterate on the documents in b (their primary key) and if a doc is found in > a, delete it from b. Then reopen an IndexReader and add to a, the existing > docs won't be deleted. > That's the more expensive way, but easiest to code. > > Option 2 > Obtain b.getLiveDocs and unset the bits of every document that exists in a. > Then use addIndexes with an AtomicReader which overrides getLiveDocs to > return the modified live docs. > Same as option 1, but you don't actually do the delete operation, which is > more costly than just unsetting a bit. > > Shai > > > On Fri, Jul 5, 2013 at 6:10 PM, Peng Gao <p...@esri.com> wrote: > > > Shai, > > Once again, thanks for the help. > > Yes, I am re-indexing. Using FacetFields.addFacets() on the doc works. > > > > Given that I need to check the uniqueness before merging an index with > > facets into a master, is there better way to it without re-indexing? > > > > Gao Peng > > > > > > > -----Original Message----- > > > From: Shai Erera [mailto:ser...@gmail.com] > > > Sent: Wednesday, July 03, 2013 11:49 AM > > > To: java-user@lucene.apache.org > > > Subject: Re: Accumulating facets over a MultiReader > > > > > > What do you mean addDocument()? You re-index it? > > > In that case, when you re-index it, just make sure to use > > > FacetFields.addFacets() on it, so its facets are re-indexed too. > > > > > > Shai > > > > > > > > > On Wed, Jul 3, 2013 at 8:52 PM, Peng Gao <p...@esri.com> wrote: > > > > > > > Shai, > > > > Thanks. > > > > > > > > I went with option #3 since the temp indexes are actually created > > > > in separate processes in my case. > > > > It works. > > > > > > > > Now one more complication. > > > > I have a case where I need to merge only unique docs in the temp > > > > indexes into the master index. I have a unique key for each doc. > > > > Before facets, I loop through the temp index, and for each doc, > > > > check if it's already in the master, > > > > addDocument() only if it doesn't exist. > > > > Now I have facets, how do I selectively merge docs? > > > > > > > > Thanks again for your help, > > > > Gao Peng > > > > > > > > > > > > > -----Original Message----- > > > > > From: Shai Erera [mailto:ser...@gmail.com] > > > > > Sent: Wednesday, July 03, 2013 9:02 AM > > > > > To: java-user@lucene.apache.org > > > > > Subject: Re: Accumulating facets over a MultiReader > > > > > > > > > > Hi > > > > > > > > > > There are a couple of ways you can address that: > > > > > > > > > > Not create an index per-thread, but rather update the global > > > > > index by all threads. IndexWriter and TaxoWriter support multiple > threads. > > > > > > > > > > -- Or, if you need to build an index per-thread -- > > > > > > > > > > Use a single TaxonomyWriter instance and share between all the > > threads. > > > > > TaxoWriter is thread-safe, and that way you can build a single > > > > > taxonomy index and later use IW.addIndexes. > > > > > > > > > > -- Or, if you cannot share TW instance between threads -- > > > > > > > > > > Have each thread create its own taxonomy index, but then when > > > > > you call addIndexes, you need to do two things: > > > > > - Create a new TW instance and call addTaxonomy on it. > > > > > - Call IW.addIndexes() with an OrdinalMappingAtomicReader. Look > > > > > at its jdocs for an example code. > > > > > > > > > > Let me know if that works for you. > > > > > > > > > > Shai > > > > > > > > > > > > > > > > > > > > On Wed, Jul 3, 2013 at 6:14 PM, Peng Gao <p...@esri.com> wrote: > > > > > > > > > > > Hi Shai, > > > > > > Thanks for the reply. > > > > > > Yes I used a single TaxonomyReader instance. > > > > > > I am adding facets to an existing app, which maintains two > > > > > > indexes, one for indexing system tools, and the other indexing > > > > > > user data in folders. > > > > > > The system tool index contains docs for describing the tool > > > > > > usage, and etc, which needs to be its own index. > > > > > > > > > > > > It turned out that my problem is not MultiReader. The problem > > > > > > is the index, i.e. the way it's created. > > > > > > The app crawls folders in multiple threads, and each thread > > > > > > creates a temp index. > > > > > > The main thread merges the temp indexes into the master index, > > > > > > using IndexWriter.AddIndexes(). > > > > > > If the temp index has facet index, this approach creates a bad > > index. > > > > > > > > > > > > Is there a way I can build faceted index in multiple threads? > > > > > > > > > > > > - Gao Peng > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Shai Erera [mailto:ser...@gmail.com] > > > > > > > Sent: Monday, July 01, 2013 8:25 PM > > > > > > > To: java-user@lucene.apache.org > > > > > > > Subject: Re: Accumulating facets over a MultiReader > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I assume that you use a single TaxonomyReader instance? It > > > > > > > must be the > > > > > > same > > > > > > > for both indexes, that is, both indexes must share the same > > > > > > > taxonomy > > > > > > index, > > > > > > > or otherwise their ordinals would not match as well as you > > > > > > > may hit such exceptions since one index may have bigger > > > > > > > ordinals than what the > > > > > > taxonomy > > > > > > > reader knows about. > > > > > > > > > > > > > > Can you share a little bit about your scenario and why do > > > > > > > you need to > > > > > > use a > > > > > > > MultiReader? > > > > > > > > > > > > > > Shai > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 2, 2013 at 3:31 AM, Peng Gao <p...@esri.com> wrote: > > > > > > > > > > > > > > > How do I accumulate counts over a MultiReader (2 > IndexReader)? > > > > > > > > The following code causes an IOException: > > > > > > > > > > > > > > > > ArrayList<FacetRequest> facetRequests = new > > > > > > > > ArrayList<FacetRequest>(); > > > > > > > > for (String groupField : groupFields) > > > > > > > > facetRequests.add(new CountFacetRequest(new > > > > > > > > CategoryPath(groupField, '/'), 1)); > > > > > > > > > > > > > > > > FacetSearchParams facetSearchParams = new > > > > > > > > FacetSearchParams(facetRequests); > > > > > > > > StandardFacetsAccumulator accumulator = new > > > > > > > > StandardFacetsAccumulator(facetSearchParams, reader, > > > > taxonomyReader); > > > > > > > > FacetsCollector facetsCollector = > > > > > > > > FacetsCollector.create(accumulator); > > > > > > > > > > > > > > > > // perform documents search and facets accumulation > > > > > > > > searcher.search(query, facetsCollector); > > > > > > > > > > > > > > > > // return facets results in a proper format > > > > > > > > return getFacetResults(facetsCollector, sr); > > > > > > > > > > > > > > > > > > > > > > > > Here reader is a MultiReader of 2. I am using Lucene 4.3.1. > > > > > > > > > > > > > > > > The following is the callstack. It looks like it has > > > > > > > > something to do with the MultiReader. > > > > > > > > How do I make it work? > > > > > > > > > > > > > > > > > > > > > > > > java.io.IOException: PANIC: Got unexpected exception while > > > > > > > > trying to get/calculate total counts > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum > > > > > > ulat > > > > > > e(St > > > > > > andar > > > > > > > dFacetsAccumulator.java:156) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum > > > > > > ulat > > > > > > e(St > > > > > > andar > > > > > > > dFacetsAccumulator.java:378) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.FacetsCollector.getFacetResults > > > > > > (Fac > > > > > > etsC > > > > > > ollec > > > > > > > tor.java:214) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > com.esri.arcgis.search.SearchHandler.getFacetResults(SearchHan > > > > > > dler > > > > > > .jav > > > > > > a:551 > > > > > > > ) > > > > > > > > at > > > > > > > > > > > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:350) > > > > > > > > at > > > > > > > > > > > com.esri.arcgis.search.SearchHandler.search(SearchHandler.java:239) > > > > > > > > at > > > > > > > > > > > com.esri.arcgis.search.test.Searcher.invokeSearch(Searcher.java:58) > > > > > > > > at > > > > > > > > com.esri.arcgis.search.test.Searcher.main(Searcher.java:32 > > > > > > > > ) Caused by: java.lang.ArrayIndexOutOfBoundsException: 34 > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.CountingAggregator.aggregate(Co > > > > > > unti > > > > > > ngAg > > > > > > grega > > > > > > > tor.java:43) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.fillA > > > > > > rray > > > > > > sFor > > > > > > Parti > > > > > > > tion(StandardFacetsAccumulator.java:309) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum > > > > > > ulat > > > > > > e(St > > > > > > andar > > > > > > > dFacetsAccumulator.java:168) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.complements.TotalFacetCounts.compute(T > > > > > > otal > > > > > > Face > > > > > > tCoun > > > > > > > ts.java:176) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.complements.TotalFacetCountsCache.comp > > > > > > uteA > > > > > > ndCa > > > > > > che(T > > > > > > > otalFacetCountsCache.java:157) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.complements.TotalFacetCountsCache.getT > > > > > > otal > > > > > > Coun > > > > > > ts(To > > > > > > > talFacetCountsCache.java:104) > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.lucene.facet.search.StandardFacetsAccumulator.accum > > > > > > ulat > > > > > > e(St > > > > > > andar > > > > > > > dFacetsAccumulator.java:129) > > > > > > > > ... 7 more > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > ---- > > > > > > --- To unsubscribe, e-mail: > > > > > > java-user-unsubscr...@lucene.apache.org > > > > > > For additional commands, e-mail: > > > > > > java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > --- To unsubscribe, e-mail: > > > > java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org