Hi,

I am having problems with search results with nutch. I am using nutch-0.7.2.
I am crawling exactly the hand-picked pages. I have crawled 5 sites totally.

I have lot of results that are coming from 1 site (let's say Site#A) and are
always ranked better than results from other 4 sites.
When I click on 'Show all hits' - Results from FIRST 10 PAGES are from site
#A. only last results are from non-site#A. 

The reason I believe is because:
a) That site(Site#A) contains internal links to their pages a lot in each
page.  
b) For a query I typed, there are 63 results in total. Out of these 63,
there are 55 results coming from that site(Site#A).
Hence, their score is high.

Q1: Do you know how I can solve this problem?

Q2: Since I am also hand-picking the exact pages, what are the fields can I
a) reduce/disable? Like URL, anchor...what else?
b) increase/enable? Like title...what else? Since the focus is going to be
on content, what fields do I need increase? 
c) can I change some class or nutch behavior other than this?

Q3: I have already crawled 5 sites. Should I recrawl these sites with these
values set?

Q4: If I don't need to recrawl, do I need to run other some operations like
re-indexing?

Q5: Will it help if I move to newer version of nutch? Should I recrawl again
since I moved to newer version and the previous crawl data are not binary
compatible?

Any help will be appreciated. Thanks.
-- 
View this message in context: 
http://www.nabble.com/nutch-search-results-problem-tf3648963.html#a10192478
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to