I have indexed our intranet with Nutch-0.9.

I do a query 'parking location:stavanger language:no' and I recive some
hits. (two extra fields added)

The Nutch client ranks the hits not quite as expected. 
1. Transport and parking - Stavanger Airport, Sola
2. Frontpage - Stavanger Airport, Sola
3. Parking - Stavanger Airport, Sola

How it should have been
1. Parking - Stavanger Airport, Sola
2. Transport and parking - Stavanger Airport, Sola
3. Frontpage - Stavanger Airport, Sola (should not have been there at
all if possible, but I recon it is not easy to not index a navigation
menus since they are part of the page) 

The page "Parking - Stavanger Airport, Sola" has parking in the title,
parking in the content (20+ times in some way, mostly combined words
like xxxparking, or parkingxxx, but also about 5 times as only parking)
and even parking in the url.

I guess I have to alter the boosting for some fields. I tried to up the
boost in index-basic plugin (hardcode it), but I can't see any changes
in the index. Luke tells me that the field index still is 1.0 even after
I changed them. Am I doing it wrong?

Even if I search only for 'parking' and not filtering on location I
recive a lot of hits but all is frontpage for the different frontpage.
All of this pages seem to have a high boost outranking the real parking
page (s) big time. 

Any help is appreciated.


Best regards, 

Ronny N.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to