This is really nice Aaron. You've done the bulk of work already!!!
I think parallelism can be provided too for searching a single shard....
Just as a quick proposal, we can do a static initialization in
BlurIndexSimpleWriter
static LinkedBlockingQueue executorQueue = new LBQ(128/4);
static {
for(int i=0;i<128/4;i++) {
queue.add(Executors.newFixedThreadPool(4));
}
}
----
Incoming search request per-shard...
public IndexSearcher getIndexSearcher() {
.....
Executor current = executorQueue.poll();
if (current==null) {
//All thread-pools are busy or user has explicitly switched off via
config.
//Search proceeds in single threaded fashion utilizing calling-thread
itself
}
return new IndexSearcherCloseable(indexReader, current);
}
---
Btw, we can do this by over-riding a single method
IndexSearcher.slices(...) in lucene 5.x & above!!!
On Tue, Jun 28, 2016 at 8:01 PM, Aaron McCurry <[email protected]> wrote:
> Some time ago I created something similar, it's kinda a backport into
> Lucene 4.3:
>
>
> https://github.com/apache/incubator-blur/blob/65640200a8e7dd539c1dd4d920255c717102b9b2/blur-query/src/main/java/org/apache/blur/lucene/search/CloneableCollector.java#L25
>
> It's handles the execution of searching the segments in parallel but
> doesn't provide any limitations on parallelism.
>
> Aaron
>
>
>
> On Tue, Jun 28, 2016 at 6:37 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Aaron,
> >
> > Just an update..
> >
> > https://issues.apache.org/jira/browse/LUCENE-5299
> >
> > You can now use any collector & get guaranteed parallel execution. They
> > have also provided a "parallelism" hint that will limit the number of
> > search threads at request level...
> >
> > i.e., we can fix blur executor thread-count at 128 & limit "parallelism"
> at
> > a max of 4 threads per request..
> >
> > On Fri, Feb 6, 2015 at 5:25 PM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > Thanks for the clarifications.
> > >
> > > Another point I thought about is the disk efficiency of a serving a
> > > random-IO. Many parallel threads could end-up hitting just one or two
> > disks
> > > in the cluster…
> > >
> > > Think I can skip it safely for my work-loads.
> > >
> > > --
> > > Ravi
> > >
> > > On Fri, Feb 6, 2015 at 3:09 PM, Aaron McCurry <[email protected]>
> > wrote:
> > >
> > >> The ServiceExecutor (thread pool) put inside the IndexSearcher was an
> > >> attempt at making the segments search in parallel when available.
> > However
> > >> there is a limitation in Lucene that does not allow segment parallel
> > >> searches when you are using Collectors.
> > >>
> > >>
> > >>
> >
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_3_0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L595
> > >>
> > >> We override this method to allow for Tracing:
> > >>
> > >>
> > >>
> >
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableBase.java#L46
> > >>
> > >> and here:
> > >>
> > >>
> > >>
> >
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableSecureBase.java#L51
> > >>
> > >> I agree that if you are already running a lot of shards per server
> that
> > if
> > >> we were to enhance Lucene to allow for parallel searching of segments
> it
> > >> could become counter productive. I have seen underutilized systems
> that
> > >> could take advantage of the parallel segment search, so as with any
> > >> feature
> > >> like this, it depends. :-)
> > >>
> > >> Aaron
> > >>
> > >> On Fri, Feb 6, 2015 at 2:39 AM, Ravikumar Govindarajan <
> > >> [email protected]> wrote:
> > >>
> > >> > Blur by default uses a SearchExecutor for IndexSearcher. I believe
> > >> lucene
> > >> > helps searching segments of a single shard in parallel.
> > >> >
> > >> > Our previous index was built on a lower version of lucene where
> such a
> > >> > feature was absent and we ran sequential search per shard only…
> > >> >
> > >> > What is the general recommendation for blur? Is it advisable to use
> > the
> > >> > SearchExecutor? What will happen when there are many parallel
> queries
> > >> for
> > >> > different shards. Will SearchExecutor become a bottle-neck?
> > >> >
> > >> > Any help is much appreciated...
> > >> >
> > >> > --
> > >> > Ravi
> > >> >
> > >>
> > >
> > >
> >
>