using previous results on a new search
Hello, I am new to Lucene and more generally to search engines. As my company has decided to base its new software on Lucene, I have one first question about Lucene querying functionnalities. We are investigating the possibility to insert previous search results to a new query. Does anyone knows if it is possible or if such an evolution is under development Thanks Antoine Brun - Yahoo! Mail : votre e-mail personnel et gratuit qui vous suit partout ! Créez votre Yahoo! Mail Dialoguez en direct avec vos amis grâce à Yahoo! Messenger !
score and frequency
Hi, I am having some problems with the score of lucene. I am trying to get the results displayed according to hits.score and it is giving the results correctly. However I do not want the frequency factor to be used for the computation of the score. Is it possible to get the score which does not have the frequency factor in it ? Regards, Niraj
Re: using previous results on a new search
p.s. This ought to go on the wiki :) It's now included in a Lucene FAQ. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: using previous results on a new search
On Jun 4, 2004, at 3:07 AM, Antoine Brun wrote: We are investigating the possibility to insert previous search results to a new query. Does anyone knows if it is possible or if such an evolution is under development I suppose you mean search within search, so that the second search is constrained by the results of the first query. If so There are two primary options: - Use QueryFilter with the previous query as the filter (search the archives for QueryFilter and Doug's recommendations against using it for this purpose) - Combine the previous query with the current query using BooleanQuery, using the previous query as required. The BooleanQuery is the most recommended way. Erik p.s. This ought to go on the wiki :) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: score and frequency
Hi Erik, Thanks for the suggestion. I tried this: public class RelevanceSimilarity extends DefaultSimilarity { public float tf(float freq) { System.out.println(discounting frequency); return (float)1; } } and in my query class, I used : Similarity.setDefault(similarity); Hits hits = is.search(query); for(i = 0; i hits.length(); i ++) result = result + hits.score(i); However, this is still not giving me the expected result. Do I need to do something else? Regards, Niraj - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, June 04, 2004 1:55 PM Subject: Re: score and frequency On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote: Hi, I am having some problems with the score of lucene. I am trying to get the results displayed according to hits.score and it is giving the results correctly. However I do not want the frequency factor to be used for the computation of the score. Is it possible to get the score which does not have the frequency factor in it ? Have a look at the javadocs for Similarity. DefaultSimilarity is used unless otherwise specified. You could subclass that and override this: public float tf(float freq) { return (float)Math.sqrt(freq); } and return 1.0. This might give you the effect you want. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Author or SearchBean
Hi! Where can I get the mail address of the author of SearchBean (sandbox) from? Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Author or SearchBean
SearchBean should be discussed on this list - no need to contact the original developer directly (in fact, it's a better practice to discuss open source code in the appropriate public forums). Erik On Jun 4, 2004, at 5:56 AM, [EMAIL PROTECTED] wrote: Hi! Where can I get the mail address of the author of SearchBean (sandbox) from? Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
why the score is not always 1.0 when comparing two identical strings?
hi, i'm not so convinced by the way Lucene compute the score. I tried to compare two string by using a program. In the program, i index the first string as if i indexed a document and use the queryParser with the same analyzer that I used to index the first string to analyze my second string and to form a query from it. I run the program for the first time with the first string as: This is the text to index with Lucene CREATE TABLE Elements ( TYPELEMENT varchar (255) NULL , CLEELEMENT varchar (255) NULL , LIBELEM varchar (255) NULL , CODENTITE varchar (255) NULL , CLEENTITE varchar (255) NULL , DONNNEEA1 varchar (255) NULL , DONNEEB1 varchar (255) NULL , DONNEEA2 varchar (255) NULL , DONNEEB2 varchar (255) NULL , DONNEEA3 varchar (255) NULL , DONNEEB3 varchar (255) NULL , DONNEEA4 varchar (255) NULL , DONNEEB4 varchar (255) NULL , DONNEEA5 varchar (255) NULL , DONNEEB5 varchar (255) NULL , TOP1 varchar (255) NULL , TOP2 varchar (255) NULL , TOP3 varchar (255) NULL , TOP4 varchar (255) NULL , TOP5 varchar (255) NULL , QTE1 varchar (255) NULL , QTE2 varchar (255) NULL , QTE3 varchar (255) NULL , MONTANT1 varchar (255) NULL , MONTANT2 varchar (255) NULL , MONTANT3 varchar (255) NULL , DATE1 varchar (255) NULL , DATE2 varchar (255) NULL , DATE3 varchar (255) NULL , STATUT varchar (255) NULL , DATPRISENCPTSTAT varchar (255) NULL ). I used the same string as to form my query and i got the final score of these two string which is 1.0. Then something suprised me when i changed to two strings into All work and no play makes Jack a dull boy and compared them by using one as a document and other to form the query. The result was just not 1.0. it was 0.3033.. instead. I used Eclipse as my Java Editor. Any conflict with Lucene? Any idea/suggestion of what went wrong over here? Uddam - Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger
Re: why the score is not always 1.0 when comparing two identical strings?
You're not the first one to ask this question. I suggest you to have a look in the mailing list archive and to search for the messages 'Lucene Scoring Behavior'. Here is the link below: http://issues.apache.org/eyebrowse/SearchList?listId=listName=lucene-user%40jakarta.apache.orgsearchText=%22lucene+scoring+behavior%22defaultField=subjectSearch=Search Cheers, Franck uddam chukmol wrote: hi, i'm not so convinced by the way Lucene compute the score. I tried to compare two string by using a program. In the program, i index the first string as if i indexed a document and use the queryParser with the same analyzer that I used to index the first string to analyze my second string and to form a query from it. I run the program for the first time with the first string as: This is the text to index with Lucene CREATE TABLE Elements ( TYPELEMENT varchar (255) NULL , CLEELEMENT varchar (255) NULL , LIBELEM varchar (255) NULL , CODENTITE varchar (255) NULL , CLEENTITE varchar (255) NULL , DONNNEEA1 varchar (255) NULL , DONNEEB1 varchar (255) NULL , DONNEEA2 varchar (255) NULL , DONNEEB2 varchar (255) NULL , DONNEEA3 varchar (255) NULL , DONNEEB3 varchar (255) NULL , DONNEEA4 varchar (255) NULL , DONNEEB4 varchar (255) NULL , DONNEEA5 varchar (255) NULL , DONNEEB5 varchar (255) NULL , TOP1 varchar (255) NULL , TOP2 varchar (255) NULL , TOP3 varchar (255) NULL , TOP4 varchar (255) NULL , TOP5 varchar (255) NULL , QTE1 varchar (255) NULL , QTE2 varchar (255) NULL , QTE3 varchar (255) NULL , MONTANT1 varchar (255) NULL , MONTANT2 varchar (255) NULL , MONTANT3 varchar (255) NULL , DATE1 varchar (255) NULL , DATE2 varchar (255) NULL , DATE3 varchar (255) NULL , STATUT varchar (255) NULL , DATPRISENCPTSTAT varchar (255) NULL ). I used the same string as to form my query and i got the final score of these two string which is 1.0. Then something suprised me when i changed to two strings into All work and no play makes Jack a dull boy and compared them by using one as a document and other to form the query. The result was just not 1.0. it was 0.3033.. instead. I used Eclipse as my Java Editor. Any conflict with Lucene? Any idea/suggestion of what went wrong over here? Uddam - Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger -- Franck Brisbart RD http://www.kelkoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: score and frequency
Hi to all. Maybe the term frequency is not the only parameter you need to override to customize the score attributed by Lucene. Maybe you should consider the normalisation factor, the idf and the coord factor ? Philippe From: Niraj Alok [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re: score and frequency Date: Fri, 4 Jun 2004 15:13:32 +0530 Hi Erik, Thanks for the suggestion. I tried this: public class RelevanceSimilarity extends DefaultSimilarity { public float tf(float freq) { System.out.println(discounting frequency); return (float)1; } } and in my query class, I used : Similarity.setDefault(similarity); Hits hits = is.search(query); for(i = 0; i hits.length(); i ++) result = result + hits.score(i); However, this is still not giving me the expected result. Do I need to do something else? Regards, Niraj - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, June 04, 2004 1:55 PM Subject: Re: score and frequency On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote: Hi, I am having some problems with the score of lucene. I am trying to get the results displayed according to hits.score and it is giving the results correctly. However I do not want the frequency factor to be used for the computation of the score. Is it possible to get the score which does not have the frequency factor in it ? Have a look at the javadocs for Similarity. DefaultSimilarity is used unless otherwise specified. You could subclass that and override this: public float tf(float freq) { return (float)Math.sqrt(freq); } and return 1.0. This might give you the effect you want. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Bloquez les fenêtres pop-up, c'est gratuit ! http://toolbar.msn.fr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: score and frequency
Hi, Be careful to set the default similarity 'Similarity.setDefault(similarity)' before creating your search instance (IndexSearcher). If you change the default similarity after, you'll still use the old one. You'd better use the 'searcher.setSimilarity' method on your searcher. Franck Phil brunet wrote: Hi to all. Maybe the term frequency is not the only parameter you need to override to customize the score attributed by Lucene. Maybe you should consider the normalisation factor, the idf and the coord factor ? Philippe From: Niraj Alok [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re: score and frequency Date: Fri, 4 Jun 2004 15:13:32 +0530 Hi Erik, Thanks for the suggestion. I tried this: public class RelevanceSimilarity extends DefaultSimilarity { public float tf(float freq) { System.out.println(discounting frequency); return (float)1; } } and in my query class, I used : Similarity.setDefault(similarity); Hits hits = is.search(query); for(i = 0; i hits.length(); i ++) result = result + hits.score(i); However, this is still not giving me the expected result. Do I need to do something else? Regards, Niraj - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, June 04, 2004 1:55 PM Subject: Re: score and frequency On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote: Hi, I am having some problems with the score of lucene. I am trying to get the results displayed according to hits.score and it is giving the results correctly. However I do not want the frequency factor to be used for the computation of the score. Is it possible to get the score which does not have the frequency factor in it ? Have a look at the javadocs for Similarity. DefaultSimilarity is used unless otherwise specified. You could subclass that and override this: public float tf(float freq) { return (float)Math.sqrt(freq); } and return 1.0. This might give you the effect you want. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Bloquez les fenêtres pop-up, c'est gratuit ! http://toolbar.msn.fr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Franck Brisbart RD http://www.kelkoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Distributed searches and RAM Dir
Look up Mark Harwood and Lucene. ..provided some nice sequential UML diagrams with notes Those notes went missing recently when the ISP canned my free account. I've resurrected them at my new site here: http://www.inperspective.com/lucene/distrib/index.htm Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
Jayant Kumar wrote: Please find enclosed jvmdump.txt which contains a dump of our search program after about 20 seconds of starting the program. Also enclosed is the file queries.txt which contains few sample search queries. Thanks for the data. This is exactly what I was looking for. Thread-14 prio=1 tid=0x080a7420 nid=0x468e waiting for monitor entry [4d61a000..4d61ac18] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112) - waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader) Thread-12 prio=1 tid=0x080a58e0 nid=0x468e waiting for monitor entry [4d51a000..4d51ad18] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:112) - waiting to lock 0x44c95228 (a org.apache.lucene.index.TermInfosReader) These are all stuck looking terms up in the dictionary (TermInfos). Things would be much faster if your queries didn't have so many terms. Query : ( ( ( ( ( FIELD1: proof OR FIELD2: proof OR FIELD3: proof OR FIELD4: proof OR FIELD5: proof OR FIELD6: proof OR FIELD7: proof ) AND ( FIELD1: george bush OR FIELD2: george bush OR FIELD3: george bush OR FIELD4: george bush OR FIELD5: george bush OR FIELD6: george bush OR FIELD7: george bush ) ) AND ( FIELD1: script OR FIELD2: script OR FIELD3: script OR FIELD4: script OR FIELD5: script OR FIELD6: script OR FIELD7: script ) ) AND ( ( FIELD1: san OR FIELD2: san OR FIELD3: san OR FIELD4: san OR FIELD5: san OR FIELD6: san OR FIELD7: san ) OR ( ( FIELD1: war OR FIELD2: war OR FIELD3: war OR FIELD4: war OR FIELD5: war OR FIELD6: war OR FIELD7: war ) OR ( ( FIELD1: gulf OR FIELD2: gulf OR FIELD3: gulf OR FIELD4: gulf OR FIELD5: gulf OR FIELD6: gulf OR FIELD7: gulf ) OR ( ( FIELD1: laden OR FIELD2: laden OR FIELD3: laden OR FIELD4: laden OR FIELD5: laden OR FIELD6: laden OR FIELD7: laden ) OR ( ( FIE LD1: ttouristeat OR FIELD2: ttouristeat OR FIELD3: ttouristeat OR FIELD4: ttouristeat OR FIELD5: ttouristeat OR FIELD6: ttouristeat OR FIELD7: ttouristeat ) OR ( ( FIELD1: pow OR FIELD2: pow OR FIELD3: pow OR FIELD4: pow OR FIELD5: pow OR FIELD6: pow OR FIELD7: pow ) OR ( FIELD1: bin OR FIELD2: bin OR FIELD3: bin OR FIELD4: bin OR FIELD5: bin OR FIELD6: bin OR FIELD7: bin ) ) ) ) ) ) ) ) ) AND RANGE: ([ 0800 TO 1100 ]) AND ( S_IDa: (7 OR 8 OR 9 OR 10 OR 11 OR 12 OR 13 OR 14 OR 15 OR 16 OR 17 ) or S_IDb: (2 ) ) All your queries look for terms in fields 1-7. If you instead combined the contents of fields 1-7 in a single field, and searched that field, then your searches would contain far fewer terms and be much faster. Also, I don't know how many terms your RANGE queries match, but that could also be introducing large numbers of terms which would slow things down too. But, still, you have identified a bottleneck: TermInfosReader caches a TermEnum and hence access to it must be synchronized. Caching the enum greatly speeds sequential access to terms, e.g., when merging, performing range or prefix queries, etc. Perhaps however the cache should be done through a ThreadLocal, giving each thread its own cache and obviating the need for synchronization... Please tell me if you are able to simplify your queries and if that speeds things. I'll look into a ThreadLocal-based solution too. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Writing a stemmer
Leo Galambos wrote: Erik Hatcher [EMAIL PROTECTED] wrote: __ How proficient must I be in a language for which I wish to write the stemmer? I would venture to say you would need to be an expert in a language to write a decent stemmer. I'm sorry for a self-promo ;), but the stemmer of egothor project can be adapted to any language, and you needn't be a language expert. Moreover, the stemmer achieves better F-measure than Porter's stemmers. No reason to be too modest, Leo.. I tested your stemmer on English, Swedish and Polish texts (including F-measure vs. training set size plots), and it works exceptionally well indeed. Highly recommended! -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene in multithreaded environment
Doug Cutting wrote: Please tell me if you are able to simplify your queries and if that speeds things. I'll look into a ThreadLocal-based solution too. I've attached a patch that should help with the thread contention, although I've not tested it extensively. I still don't fully understand why your searches are so slow, though. Are the indexes stored on the local disk of the machine? Indexes accessed over the network can be very slow. Anyway, give this patch a try. Also, if anyone else can try this and report back whether it makes multi-threaded searching faster, or anything else slower, or is buggy, that would be great. Thanks, Doug Index: src/java/org/apache/lucene/index/TermInfosReader.java === RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/TermInfosReader.java,v retrieving revision 1.6 diff -u -u -r1.6 TermInfosReader.java --- src/java/org/apache/lucene/index/TermInfosReader.java 20 May 2004 11:23:53 - 1.6 +++ src/java/org/apache/lucene/index/TermInfosReader.java 4 Jun 2004 21:45:15 - @@ -29,7 +29,8 @@ private String segment; private FieldInfos fieldInfos; - private SegmentTermEnum enumerator; + private ThreadLocal enumerators = new ThreadLocal(); + private SegmentTermEnum origEnum; private long size; TermInfosReader(Directory dir, String seg, FieldInfos fis) @@ -38,19 +39,19 @@ segment = seg; fieldInfos = fis; -enumerator = new SegmentTermEnum(directory.openFile(segment + .tis), - fieldInfos, false); -size = enumerator.size; +origEnum = new SegmentTermEnum(directory.openFile(segment + .tis), + fieldInfos, false); +size = origEnum.size; readIndex(); } public int getSkipInterval() { -return enumerator.skipInterval; +return origEnum.skipInterval; } final void close() throws IOException { -if (enumerator != null) - enumerator.close(); +if (origEnum != null) + origEnum.close(); } /** Returns the number of term/value pairs in the set. */ @@ -58,6 +59,15 @@ return size; } + private SegmentTermEnum getEnum() { +SegmentTermEnum enum = (SegmentTermEnum)enumerators.get(); +if (enum == null) { + enum = terms(); + enumerators.set(enum); +} +return enum; + } + Term[] indexTerms = null; TermInfo[] indexInfos; long[] indexPointers; @@ -102,16 +112,17 @@ } private final void seekEnum(int indexOffset) throws IOException { -enumerator.seek(indexPointers[indexOffset], - (indexOffset * enumerator.indexInterval) - 1, +getEnum().seek(indexPointers[indexOffset], + (indexOffset * getEnum().indexInterval) - 1, indexTerms[indexOffset], indexInfos[indexOffset]); } /** Returns the TermInfo for a Term in the set, or null. */ - final synchronized TermInfo get(Term term) throws IOException { + TermInfo get(Term term) throws IOException { if (size == 0) return null; -// optimize sequential access: first try scanning cached enumerator w/o seeking +// optimize sequential access: first try scanning cached enum w/o seeking +SegmentTermEnum enumerator = getEnum(); if (enumerator.term() != null // term is at or past current ((enumerator.prev != null term.compareTo(enumerator.prev) 0) || term.compareTo(enumerator.term()) = 0)) { @@ -128,6 +139,7 @@ /** Scans within block for matching term. */ private final TermInfo scanEnum(Term term) throws IOException { +SegmentTermEnum enumerator = getEnum(); while (term.compareTo(enumerator.term()) 0 enumerator.next()) {} if (enumerator.term() != null term.compareTo(enumerator.term()) == 0) return enumerator.termInfo(); @@ -136,10 +148,12 @@ } /** Returns the nth term in the set. */ - final synchronized Term get(int position) throws IOException { + final Term get(int position) throws IOException { if (size == 0) return null; -if (enumerator != null enumerator.term() != null position = enumerator.position +SegmentTermEnum enumerator = getEnum(); +if (enumerator != null enumerator.term() != null +position = enumerator.position position (enumerator.position + enumerator.indexInterval)) return scanEnum(position); // can avoid seek @@ -148,6 +162,7 @@ } private final Term scanEnum(int position) throws IOException { +SegmentTermEnum enumerator = getEnum(); while(enumerator.position position) if (!enumerator.next()) return null; @@ -156,12 +171,13 @@ } /** Returns the position of a Term in the set or -1. */ - final synchronized long getPosition(Term term) throws IOException { + final long getPosition(Term term) throws IOException { if (size == 0) return -1; int indexOffset = getIndexOffset(term); seekEnum(indexOffset); +SegmentTermEnum enumerator = getEnum();
RE: Writing a stemmer
Leo Thanks for your reply. I have taken a look at egothor.org. It does appear to be pretty simple. However, I need to use Lucene as my search engine. From what I understand, it appears that I need to be pretty conversant (if not an expert) with a language for which I wish to write a stemmer. Moreover, this stemmer can be used with the egothor search engine only? Can I use this stemmer with Lucene? If yes, how? Regards, Anil -Original Message- From: Leo Galambos [mailto:[EMAIL PROTECTED] Sent: Thursday, June 03, 2004 8:54 PM To: Lucene Users List Subject: Re: Writing a stemmer Erik Hatcher [EMAIL PROTECTED] wrote: __ How proficient must I be in a language for which I wish to write the stemmer? I would venture to say you would need to be an expert in a language to write a decent stemmer. I'm sorry for a self-promo ;), but the stemmer of egothor project can be adapted to any language, and you needn't be a language expert. Moreover, the stemmer achieves better F-measure than Porter's stemmers. Cheers, Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: score and frequency
I have set the searcher.setSimilarity as well as also tried setting the coord factor to 1. The problem as given by an example is : Lets say I have titles to be displayed depending upon the search. E.g if i have ice hockey as the search item and if it is default similarity, my results are : ice hockey0.9994 ice hockey0.75 ice hockey0.75 winter Olympics: hockey, ice, medallists0.17402513 ice age0.073680125 National Hockey League0.020266924 Cracking the Ice Age0.018420031 ground-ice0.011512519 ice hockey: British Sekonda Superleague Play-Off Championship: finals0.0069075115 (the numbers indicating the score). But if i set the similarity as my overridden one, the results become: ice hockey0.9994 ice hockey0.75 ice hockey0.75 ice age0.22104037 winter Olympics: hockey, ice, medallists0.17402513 National Hockey League0.060800765 Cracking the Ice Age0.055260092 ground-ice0.034537554 ice hockey: British Sekonda Superleague Play-Off Championship: finals0.020722535 I want all the titles which have both ice and hockey to come above the rest (to have higher scores) Meaning i would wish the results to appear like: ice hockey ice hockey ice hockey winter Olympics: hockey, ice, medallists ice hockey: British Sekonda Superleague Play-Off Championship: finals ice age National Hockey League Cracking the Ice Age ground-ice My overriden similarity class contains just this method: public float coord(int overlap, int maxOverlap) { return 1.0f; } I feel it is the weight factor which is producing indesirable results. Any help in this regard would be highly appreciated. Regards, Niraj - Original Message - From: Brisbart Franck [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, June 04, 2004 8:46 PM Subject: Re: score and frequency Hi, Be careful to set the default similarity 'Similarity.setDefault(similarity)' before creating your search instance (IndexSearcher). If you change the default similarity after, you'll still use the old one. You'd better use the 'searcher.setSimilarity' method on your searcher. Franck Phil brunet wrote: Hi to all. Maybe the term frequency is not the only parameter you need to override to customize the score attributed by Lucene. Maybe you should consider the normalisation factor, the idf and the coord factor ? Philippe From: Niraj Alok [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re: score and frequency Date: Fri, 4 Jun 2004 15:13:32 +0530 Hi Erik, Thanks for the suggestion. I tried this: public class RelevanceSimilarity extends DefaultSimilarity { public float tf(float freq) { System.out.println(discounting frequency); return (float)1; } } and in my query class, I used : Similarity.setDefault(similarity); Hits hits = is.search(query); for(i = 0; i hits.length(); i ++) result = result + hits.score(i); However, this is still not giving me the expected result. Do I need to do something else? Regards, Niraj - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, June 04, 2004 1:55 PM Subject: Re: score and frequency On Jun 4, 2004, at 2:52 AM, Niraj Alok wrote: Hi, I am having some problems with the score of lucene. I am trying to get the results displayed according to hits.score and it is giving the results correctly. However I do not want the frequency factor to be used for the computation of the score. Is it possible to get the score which does not have the frequency factor in it ? Have a look at the javadocs for Similarity. DefaultSimilarity is used unless otherwise specified. You could subclass that and override this: public float tf(float freq) { return (float)Math.sqrt(freq); } and return 1.0. This might give you the effect you want. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Bloquez les fenêtres pop-up, c'est gratuit ! http://toolbar.msn.fr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Franck Brisbart RD http://www.kelkoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -