Re: Similarity - position in Field[] effects scoring - how to change?
Joachim, I believe you'll have to replace the default Similarity class with one of your own. Not sure exactly what the settings should be - maybe some other list members can give you specifics. Otherwise, you'll probably have to experiment with it. Regards, Terry - Original Message - From: Joachim Schreiber [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 10:05 AM Subject: Similarity - position in Field[] effects scoring - how to change? Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in the same index. document 1) s324235678565 s324dssd5678565 s45678324565 s s8785454324326 document 2) s324235678565 s s45678324565 s8785454324326 when I search for s: I receive both docs, but document 1 has a better scoring than document 2. The position of s in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. Thanks yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Joachim, Why don't you use the method explain of IndexSearcher? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSear cher.html This is the best way to find why your documents are different. I suspect the lengthNorm method, which is used at indexation time. Julien - Original Message - From: Joachim Schreiber [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 4:05 PM Subject: Similarity - position in Field[] effects scoring - how to change? Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in the same index. document 1) s324235678565 s324dssd5678565 s45678324565 s s8785454324326 document 2) s324235678565 s s45678324565 s8785454324326 when I search for s: I receive both docs, but document 1 has a better scoring than document 2. The position of s in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. Thanks yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Similarity - position in Field[] effects scoring - how to change?
Thanks to Daniel the solutions is quite simple. Use the latest cvs src from the head and try the new sorting feature, it works very well ;-) This should be documented anywhere, perhaps in the wiki ! cool new feature! yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote: Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in the same index. document 1) s324235678565 s324dssd5678565 s45678324565 s s8785454324326 document 2) s324235678565 s s45678324565 s8785454324326 when I search for s: I receive both docs, but document 1 has a better scoring than document 2. Since the s field of document 2 is shorter, I'd expect document 2 to score higher. As mentioned, lengthNorm() is responsible for this. Something does not add up here. Are the documents in the same index? The position of s in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. Lucene's default scoring is independent of absolute term positions. How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Simply ignore the score. The easiest way is to use the low level scoring API with your own HitCollector. Just make sure not to retrieve document field values until you collected all your hits. Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. In which order to you want the resulting documents presented? The low level api gives them in index order when the query consists of single search term, afaik. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Why don't you use the method explain of IndexSearcher? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSear cher.html This is the best way to find why your documents are different. I suspect the lengthNorm method, which is used at indexation time. Yes but i think this is not a good choice because we have to receive all docs. this is not possible because i have hits with 300 000 and more yo Julien Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in the same index. document 1) s324235678565 s324dssd5678565 s45678324565 s s8785454324326 document 2) s324235678565 s s45678324565 s8785454324326 when I search for s: I receive both docs, but document 1 has a better scoring than document 2. The position of s in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. Thanks yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Terry, I believe you'll have to replace the default Similarity class with one of your own. Not sure exactly what the settings should be - maybe some other list members can give you specifics. Otherwise, you'll probably have to experiment with it. I tried the new sort feature from cvs and it works well ! But it's interesting, nobody knows exactly how scoring works (seems to me) ;-) thanks yo Regards, Terry - Original Message - From: Joachim Schreiber [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 10:05 AM Subject: Similarity - position in Field[] effects scoring - how to change? Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in the same index. document 1) s324235678565 s324dssd5678565 s45678324565 s s8785454324326 document 2) s324235678565 s s45678324565 s8785454324326 when I search for s: I receive both docs, but document 1 has a better scoring than document 2. The position of s in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. Thanks yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote: Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in the same index. document 1) s324235678565 s324dssd5678565 s45678324565 s s8785454324326 document 2) s324235678565 s s45678324565 s8785454324326 when I search for s: I receive both docs, but document 1 has a better scoring than document 2. Since the s field of document 2 is shorter, I'd expect document 2 to score higher. As mentioned, lengthNorm() is responsible for this. Something does not add up here. Are the documents in the same index? The position of s in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. Lucene's default scoring is independent of absolute term positions. hm... How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Simply ignore the score. The easiest way is to use the low level scoring API with your own HitCollector. Just make sure not to retrieve document field values until you collected all your hits. you think its possible to order by e.g. date field without retrieving all the values from the index?? Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. In which order to you want the resulting documents presented? The low level api gives them in index order when the query consists of single search term, afaik. in index order is ok but not very flexibel Regards, yo Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Joachim, ... you think its possible to order by e.g. date field without retrieving all the values from the index?? Yes, the new sorting feature from CVS does that, see Doug's last note on the subject. (It might have been on lucene-dev, I didn't keep a copy). Have fun, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]