Here's the Jira issue for the distributed search issue.
https://issues.apache.org/jira/browse/SOLR-1632
I tried applying this patch but, get the same error that is posted in the
discussion section for that issue. I will be glad to help too on this one.
On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson wrote:
> Ah, I should have read more carefully...
>
> I remember this being discussed on the dev list, and I thought there might
> be
> a Jira attached but I sure can't find it.
>
> If you're willing to work on it, you might hop over to the solr dev list
> and
> start
> a discussion, maybe ask for a place to start. I'm sure some of the devs
> have
> thought about this...
>
> If nobody on the dev list says "There's already a JIRA on it", then you
> should
> open one. The Jira issues are generally preferred when you start getting
> into
> design because the comments are preserved for the next person who tries
> the idea or makes changes, etc
>
> Best
> Erick
>
> On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess
> wrote:
>
> > Thanks Erick. The problem with multiple cores is that the documents are
> > scored independently in each core. I would like to be able to search
> across
> > both cores and have the scores 'normalized' in a way that's similar to
> what
> > Lucene's MultiSearcher would do. As far a I understand, multiple cores
> > would likely result in seriously skewed scores in my case since the
> > documents are not distributed evenly or randomly. I could have one
> > core/index with 20 million docs and another with 200.
> >
> > I've poked around in the code and this feature doesn't seem to exist. I
> > would be happy with finding a decent place to try to add it. I'm not
> sure
> > if there is a clean place for it.
> >
> > Ben
> >
> > On Oct 20, 2010, at 8:36 PM, Erick Erickson
> > wrote:
> >
> > > It seems to me that multiple cores are along the lines you
> > > need, a single instance of Solr that can search across multiple
> > > sub-indexes that do not necessarily share schemas, and are
> > > independently maintainable..
> > >
> > > This might be a good place to start:
> > http://wiki.apache.org/solr/CoreAdmin
> > >
> > > HTH
> > > Erick
> > >
> > > On Wed, Oct 20, 2010 at 3:23 PM, ben boggess
> > wrote:
> > >
> > >> We are trying to convert a Lucene-based search solution to a
> > >> Solr/Lucene-based solution. The problem we have is that we currently
> > have
> > >> our data split into many indexes and Solr expects things to be in a
> > single
> > >> index unless you're sharding. In addition to this, our indexes
> wouldn't
> > >> work well using the distributed search functionality in Solr because
> the
> > >> documents are not evenly or randomly distributed. We are currently
> > using
> > >> Lucene's MultiSearcher to search over subsets of these indexes.
> > >>
> > >> I know this has been brought up a number of times in previous posts
> and
> > the
> > >> typical response is that the best thing to do is to convert everything
> > into
> > >> a single index. One of the major reasons for having the indexes split
> > up
> > >> the way we do is because different types of data need to be indexed at
> > >> different intervals. You may need one index to be updated every 20
> > minutes
> > >> and another is only updated every week. If we move to a single index,
> > then
> > >> we will constantly be warming and replacing searchers for the entire
> > >> dataset, and will essentially render the searcher caches useless. If
> we
> > >> were able to have multiple indexes, they would each have a searcher
> and
> > >> updates would be isolated to a subset of the data.
> > >>
> > >> The other problem is that we will likely need to shard this large
> single
> > >> index and there isn't a clean way to shard randomly and evenly across
> > the
> > >> of
> > >> the data. We would, however like to shard a single data type. If we
> > could
> > >> use multiple indexes, we would likely be also sharding a small sub-set
> > of
> > >> them.
> > >>
> > >> Thanks in advance,
> > >>
> > >> Ben
> > >>
> >
>