On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote:
> Hallo,
>
> I run in following problem. Perhaps somebody can help me.
>
> I have a index with different ids in the same field
> something like
>
> <s>00000000
> <s>45678565
> <s>87854546
>
> Situation: I have different documents with the entry <s>00000000 in the
> same index.
>
>
> document 1)
>
> <s>324235678565
> <s>324dssd5678565
> <s>45678324565
> <s>00000000
> <s>8785454324326
>
>
> document 2)
>
> <s>324235678565
> <s>00000000
> <s>45678324565
> <s>8785454324326
>
>
>
> when I search for "  s:00000000 "  I receive both docs, but document 1 has
> a better scoring than document 2.

Since the s field of document 2 is shorter, I'd expect document 2 to score 
higher. As mentioned, lengthNorm() is responsible for this.
Something does not add up here. Are the documents in the same index?

> The position of <s>00000000 in doc 1 is Field[4] and in doc 2 it's
> Field[2], so this seems to effect scoring.

Lucene's default scoring is independent of absolute term positions.

> How can I disable this behaviour, so doc 1 has the same scoring as doc 2???

Simply ignore the score. The easiest way is to use the low level scoring API
with your own HitCollector. Just make sure not to retrieve document field
values until you collected all your hits.

> Which method do I have to overwrite in DefaultSimilarity.
> Has anybody any idea, any help.

In which order to you want the resulting documents presented?
The low level api gives them in index order when the query consists
of single search term, afaik.

Regards,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to