Re: Multiple indexes inside a single core

2010-10-29 Thread Valli Indraganti
Here's the Jira issue for the distributed search issue.
https://issues.apache.org/jira/browse/SOLR-1632

I tried applying this patch but, get the same error that is posted in the
discussion section for that issue. I will be glad to help too on this one.

On Sat, Oct 23, 2010 at 2:35 PM, Erick Erickson wrote:

> Ah, I should have read more carefully...
>
> I remember this being discussed on the dev list, and I thought there might
> be
> a Jira attached but I sure can't find it.
>
> If you're willing to work on it, you might hop over to the solr dev list
> and
> start
> a discussion, maybe ask for a place to start. I'm sure some of the devs
> have
> thought about this...
>
> If nobody on the dev list says "There's already a JIRA on it", then you
> should
> open one. The Jira issues are generally preferred when you start getting
> into
> design because the comments are preserved for the next person who tries
> the idea or makes changes, etc
>
> Best
> Erick
>
> On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess 
> wrote:
>
> > Thanks Erick.  The problem with multiple cores is that the documents are
> > scored independently in each core.  I would like to be able to search
> across
> > both cores and have the scores 'normalized' in a way that's similar to
> what
> > Lucene's MultiSearcher would do.  As far a I understand, multiple cores
> > would likely result in seriously skewed scores in my case since the
> > documents are not distributed evenly or randomly.  I could have one
> > core/index with 20 million docs and another with 200.
> >
> > I've poked around in the code and this feature doesn't seem to exist.  I
> > would be happy with finding a decent place to try to add it.  I'm not
> sure
> > if there is a clean place for it.
> >
> > Ben
> >
> > On Oct 20, 2010, at 8:36 PM, Erick Erickson 
> > wrote:
> >
> > > It seems to me that multiple cores are along the lines you
> > > need, a single instance of Solr that can search across multiple
> > > sub-indexes that do not necessarily share schemas, and are
> > > independently maintainable..
> > >
> > > This might be a good place to start:
> > http://wiki.apache.org/solr/CoreAdmin
> > >
> > > HTH
> > > Erick
> > >
> > > On Wed, Oct 20, 2010 at 3:23 PM, ben boggess 
> > wrote:
> > >
> > >> We are trying to convert a Lucene-based search solution to a
> > >> Solr/Lucene-based solution.  The problem we have is that we currently
> > have
> > >> our data split into many indexes and Solr expects things to be in a
> > single
> > >> index unless you're sharding.  In addition to this, our indexes
> wouldn't
> > >> work well using the distributed search functionality in Solr because
> the
> > >> documents are not evenly or randomly distributed.  We are currently
> > using
> > >> Lucene's MultiSearcher to search over subsets of these indexes.
> > >>
> > >> I know this has been brought up a number of times in previous posts
> and
> > the
> > >> typical response is that the best thing to do is to convert everything
> > into
> > >> a single index.  One of the major reasons for having the indexes split
> > up
> > >> the way we do is because different types of data need to be indexed at
> > >> different intervals.  You may need one index to be updated every 20
> > minutes
> > >> and another is only updated every week.  If we move to a single index,
> > then
> > >> we will constantly be warming and replacing searchers for the entire
> > >> dataset, and will essentially render the searcher caches useless.  If
> we
> > >> were able to have multiple indexes, they would each have a searcher
> and
> > >> updates would be isolated to a subset of the data.
> > >>
> > >> The other problem is that we will likely need to shard this large
> single
> > >> index and there isn't a clean way to shard randomly and evenly across
> > the
> > >> of
> > >> the data.  We would, however like to shard a single data type.  If we
> > could
> > >> use multiple indexes, we would likely be also sharding a small sub-set
> > of
> > >> them.
> > >>
> > >> Thanks in advance,
> > >>
> > >> Ben
> > >>
> >
>


query results file for trec_eval

2010-10-19 Thread Valli Indraganti
Hello!

I am a student and I am trying to run evaluation for TREC format document. I
have the judgments. I would like to have the output of my queries for use
with trec_eval software. Can someone please point me how to make Solr spit
out output in this format? Or at least point me to some material that guides
me through this.

Thanks,
Valli


Multiple Indexes and relevance ranking question

2010-09-30 Thread Valli Indraganti
I an new to Solr and the search technologies. I am playing around with
multiple indexes. I configured Solr for Tomcat, created two tomcat fragments
so that two solr webapps listen on port 8080 in tomcat. I have created two
separate indexes using each webapp successfully.

My documents are very primitive. Below is the structure. I have four such
documents with different doc id and increased number of the word "Hello"
corresponding to the name of the document (this is only to make my analysis
of the results easier). Documents One and two are in shar1 and three and
four are in shard 2. obviously, document two is ranked higher when queried
against that index (for the word Hello). And document four is ranked higher
when queried against second index. When using the shards, parameter, the
scores remain unaltered.
My question is, if the distributed search does not consider IDF, how is it
able to rank these documents correctly? Or do I not have the indexes truely
distributed? Is something wrong with my term distribution?


 - <#> 
   Valli1
   One
   Hello!This is a test document testing relevancy
scores.