Distributed index

Hans Lund Mon, 26 Feb 2007 14:08:26 -0800

I'm trying to tailor lucene into support a simple distributed index
for searching.


Currently I have created a simple class that generates an distributed
stats index file(if used upon a FSDirectory) (which i principle
contains a) total numDoc b) updated TermInfo for all terms in a index
partition.

Then I have Implemented a simple FilteredIndexReader
(DistributedIndexReader) that overwrites

public int numDoc(){

}

public int termFreq(Term t){

}

to look-up values in the stats index file.

now instantiating a Searchable upon the the DistributedIndexReader in
principle will score docs as if the doc had been in the full index ???

of cause updating such an index introduce a level of complexity
compared to a regular index, but that could be handled by creating a
DistributingIndexWriter (indexing to an array of Directory). adding an
init method taking an additional int setting the level of redundancy
in the index, thereby making the final index fault tolerant by
duplicating docs from one partial index to the other partial indexes).
Coupling this with a MasterDistSearcher which only function would be
dispatching a query to all nodes, merge results and remove duplicates.


Any thoughts about this setup - any pitfalls I have missed?


What I hope to accomplish is to be able to run fairly large (>100GB
(and fairly) static indexes on 'cheap' equipment - preferably having
the index running in RAMDirectory.


Cheers
Hans Lund

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Distributed index

Reply via email to