Thanks, Ann. You gave me some good pointers.
I see that the navigation menu is giving med all the trouble with ranking. Does somebody know a way to make the parser skip some content? I would like the parser to skip global header and navigation menu so the content contains the uniq stuff not everything. Guess this is not a simple thing. Regards, Ronny -----Opprinnelig melding----- Fra: Annona Keene [mailto:[EMAIL PROTECTED] Sendt: 26. juni 2007 20:52 Til: [EMAIL PROTECTED] Emne: Re: The ranking is wrong Hi Ronny, Have you looked at your explanation page to see where the document score is coming from? Often this is very helpful, especially when the rankings are not what you would expect. Luke doesn't show you the boosts you set, from my experience. Don't be concerned if Luke always says 1. You say that the actual parking document has parking as part of a combined word. What analyzer are you using? Are you stemming? If you're only matching exactly, parkingxxxx won't match parking. That's just something to keep in mind. First step I'd suggest: check your explanation page. That will tell you how many times it's matching each field in each document. Good luck, and have a good day, Ann ----- Original Message ---- From: "Naess, Ronny" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Tuesday, June 26, 2007 8:36:58 AM Subject: The ranking is wrong I have indexed our intranet with Nutch-0.9. I do a query 'parking location:stavanger language:no' and I recive some hits. (two extra fields added) The Nutch client ranks the hits not quite as expected. 1. Transport and parking - Stavanger Airport, Sola 2. Frontpage - Stavanger Airport, Sola 3. Parking - Stavanger Airport, Sola How it should have been 1. Parking - Stavanger Airport, Sola 2. Transport and parking - Stavanger Airport, Sola 3. Frontpage - Stavanger Airport, Sola (should not have been there at all if possible, but I recon it is not easy to not index a navigation menus since they are part of the page) The page "Parking - Stavanger Airport, Sola" has parking in the title, parking in the content (20+ times in some way, mostly combined words like xxxparking, or parkingxxx, but also about 5 times as only parking) and even parking in the url. I guess I have to alter the boosting for some fields. I tried to up the boost in index-basic plugin (hardcode it), but I can't see any changes in the index. Luke tells me that the field index still is 1.0 even after I changed them. Am I doing it wrong? Even if I search only for 'parking' and not filtering on location I recive a lot of hits but all is frontpage for the different frontpage. All of this pages seem to have a high boost outranking the real parking page (s) big time. Any help is appreciated. Best regards, Ronny N. ________________________________________________________________________ ____________ Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool. http://autos.yahoo.com/carfinder/ !DSPAM:4681607c153812984811091! ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
