On Tue, Nov 24, 2009 at 9:31 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote:
> Hello, > > Regarding that monstrous term->idf map. > Is this something that one could use to adjust the scores in > http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitationsscenario? > Say a map like that was created periodically for each shard and > distributed to all other nodes (so in the end each node has all maps > locally). Couldn't the local scorer in the Solr instance (and in > distributed Lucene setup) consult idfs for relevant terms in all those maps > and adjust the scores of local scores before returning results? > > Why would you want all nodes to have all maps? Why not merge them into one map, then redistributed out to all nodes, which would be far smaller than many maps anyways? Then yes, the scoring can be done locally using this big idfMap to produce scores, instead of using reader.docFreq() for idf, that's what I do. But then what are you implying should be done? Just rescale the top scores based on the idfs before returning your top results? You'd need to know exactly which terms hit those top-scoring documents, right? Which implies the cost of basically explain(), doesn't it? Although with the per-field scoring (the thing I do to be able to train on sub-query field matches scores), this gets easier, because then you can try to hang onto this information if the query isn't too big, but this isn't something normal BooleanQueries will handle for you naturally. -jake