Vic Bancroft wrote on 10/17/2006 02:44 AM:
> In some of my group's usage of lucene over large document collections,
> we have split the documents across several machines.  This has lead to
> a concern of whether the inverse document frequency was appropriate,
> since the score seems to be dependant on the partioning of documents
> over indexing hosts.  We have not formulated an experiment to
> determine if it seriously effects our results, though it has been
> discussed.

What version of Lucene are you using?  Are you using
ParallelMultiSearcher to manage the distributed indexes or have you
implemented your own mechanism?  There was a bug a couple years ago, in
the 1.4.3 version as I recall, where ParallelMultiSearcher was not
computing df's appropriately, but that has been fixed for a long time
now.  The df's are the sum of the df's from each distributed index and
thus are independent of the partitioning.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to