On Tue, Oct 1, 2013 at 3:58 PM, Desidero <desid...@gmail.com> wrote: > Benson, > > Rather than forcing a random number of small segments into the index using > maxMergedSegmentMB, it might be better to split your index into multiple > shards. You can create a specific number of balanced shards to control the > parallelism and then forceMerge each shard down to 1 segment to avoid > spawning extra threads per shard. Once that's done, you just open all of > the shards with a MultiReader and use that with the IndexSearcher and an > ExecutorService. > > The downside to this is that it doesn't play nicely with near real-time > search, but if you have a relatively static index that gets pushed to > slaves periodically it gets the job done. > > As Mike said, it'd be nicer if there was a way to split the docID space > into virtual shards, but it's not currently available. I'm not sure if > anyone is even looking into it.
Thanks, folks, for all the help. I'm musing about the top-level issue here, which is whether the important case is many independent queries or latency of just one. In the case where it's just one, we'll follow the shard-related advice. > > Regards, > Matt > > > On Tue, Oct 1, 2013 at 7:09 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> You might want to set a smallish maxMergedSegmentMB in >> TieredMergePolicy to "force" enough segments in the index ... sort of >> the opposite of optimizing. >> >> Really, IndexSearcher's approach to using one thread per segment is >> rather silly, and, it's annoying/bad to expose change in behavior due >> to segment structure. >> >> I think it'd be better to carve up the overall docID space into N >> virtual shards. Ie, if you have 100M docs, then one thread searches >> docs 0-10M, another 10M-20M, etc. Nobody has created such a searcher >> impl but it should not be hard and it would be agnostic to the segment >> structure. >> >> But then again, this need (using concurrent hardware to reduce latency >> of a single query) is somewhat rare; most apps are fine using the >> concurrency across queries rather than within one query. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Tue, Oct 1, 2013 at 7:09 AM, Adrien Grand <jpou...@gmail.com> wrote: >> > Hi Benson, >> > >> > On Mon, Sep 30, 2013 at 5:21 PM, Benson Margulies <ben...@basistech.com> >> wrote: >> >> The multithreaded index searcher fans out across segments. How >> aggressively >> >> does 'optimize' reduce the number of segments? If the segment count goes >> >> way down, is there some other way to exploit multiple cores? >> > >> > forceMerge[1], formerly known as optimize, takes a parameter to >> > configure how many segments should remain in the index. >> > >> > Regarding multi-core usage, if your query load is high enough to use >> > all you CPUs (there are alwas #cores queries running in parrallel), >> > there is generally no need to use the multi-threaded IndexSearcher. >> > The multi-threaded index searcher can however help in case all CPU >> > power is not in use or if you care more about latency than throughput. >> > It indeed leverages the fact that the index is splitted into segments >> > to parallelize query execution, so a fully merged index will actually >> > run the query in a single thread in any case. >> > >> > There is no way to make query execution efficiently use several cores >> > on a single-segment index so if you really want to parallelize query >> > execution, you will have to shard the index to do at the index level >> > what the multi-threaded IndexSearcher does at the segment level. >> > >> > Side notes: >> > - A single segment index only runs more efficiently queries which are >> > terms-dictionary-intensive, it is generally discouraged to run >> > forceMerge on an index unless this index is read-only. >> > - The multi-threaded index searcher only parallelizes query execution >> > in certain cases. In particular, it never parallelizes execution when >> > the method takes a collector. This means that if you want to use >> > TotalHitCountCollector to count matches, you will have to do the >> > parallelization by yourself. >> > >> > [1] >> http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/index/IndexWriter.html#forceMerge%28int%29 >> > >> > -- >> > Adrien >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org