I have indexed our intranet with Nutch-0.9. I do a query 'parking location:stavanger language:no' and I recive some hits. (two extra fields added)
The Nutch client ranks the hits not quite as expected. 1. Transport and parking - Stavanger Airport, Sola 2. Frontpage - Stavanger Airport, Sola 3. Parking - Stavanger Airport, Sola How it should have been 1. Parking - Stavanger Airport, Sola 2. Transport and parking - Stavanger Airport, Sola 3. Frontpage - Stavanger Airport, Sola (should not have been there at all if possible, but I recon it is not easy to not index a navigation menus since they are part of the page) The page "Parking - Stavanger Airport, Sola" has parking in the title, parking in the content (20+ times in some way, mostly combined words like xxxparking, or parkingxxx, but also about 5 times as only parking) and even parking in the url. I guess I have to alter the boosting for some fields. I tried to up the boost in index-basic plugin (hardcode it), but I can't see any changes in the index. Luke tells me that the field index still is 1.0 even after I changed them. Am I doing it wrong? Even if I search only for 'parking' and not filtering on location I recive a lot of hits but all is frontpage for the different frontpage. All of this pages seem to have a high boost outranking the real parking page (s) big time. Any help is appreciated. Best regards, Ronny N. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
