Thanks, Ann.

You gave me some good pointers.

I see that the navigation menu is giving med all the trouble with
ranking. Does somebody know a way to make the parser skip some content?
I would like the parser to skip global header and navigation menu so the
content contains the uniq stuff not everything. Guess this is not a
simple thing.

Regards,
Ronny


-----Opprinnelig melding-----
Fra: Annona Keene [mailto:[EMAIL PROTECTED] 
Sendt: 26. juni 2007 20:52
Til: [EMAIL PROTECTED]
Emne: Re: The ranking is wrong

Hi Ronny,

Have you looked at your explanation page to see where the document score
is coming from? Often this is very helpful, especially when the rankings
are not what you would expect.

Luke doesn't show you the boosts you set, from my experience. Don't be
concerned if Luke always says 1.

You say that the actual parking document has parking as part of a
combined word. What analyzer are you using? Are you stemming? If you're
only matching exactly, parkingxxxx won't match parking.  That's just
something to keep in mind.

First step I'd suggest: check your explanation page. That will tell you
how many times it's matching each field in each document.

Good luck, and have a good day,
Ann

----- Original Message ----
From: "Naess, Ronny" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, June 26, 2007 8:36:58 AM
Subject: The ranking is wrong

I have indexed our intranet with Nutch-0.9.

I do a query 'parking location:stavanger language:no' and I recive some
hits. (two extra fields added)

The Nutch client ranks the hits not quite as expected. 
1. Transport and parking - Stavanger Airport, Sola 2. Frontpage -
Stavanger Airport, Sola 3. Parking - Stavanger Airport, Sola

How it should have been
1. Parking - Stavanger Airport, Sola
2. Transport and parking - Stavanger Airport, Sola 3. Frontpage -
Stavanger Airport, Sola (should not have been there at all if possible,
but I recon it is not easy to not index a navigation menus since they
are part of the page) 

The page "Parking - Stavanger Airport, Sola" has parking in the title,
parking in the content (20+ times in some way, mostly combined words
like xxxparking, or parkingxxx, but also about 5 times as only parking)
and even parking in the url.

I guess I have to alter the boosting for some fields. I tried to up the
boost in index-basic plugin (hardcode it), but I can't see any changes
in the index. Luke tells me that the field index still is 1.0 even after
I changed them. Am I doing it wrong?

Even if I search only for 'parking' and not filtering on location I
recive a lot of hits but all is frontpage for the different frontpage.
All of this pages seem to have a high boost outranking the real parking
page (s) big time. 

Any help is appreciated.


Best regards, 

Ronny N.







       
________________________________________________________________________
____________
Choose the right car based on your needs.  Check out Yahoo! Autos new
Car Finder tool.
http://autos.yahoo.com/carfinder/

!DSPAM:4681607c153812984811091!

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to