My Nutch 0.7.1 always tries to fetch same page two times.

Today I checked code from Trunk and found, that
1) html parser creates Outlink[]
2) Some code in core Nutch tries to create Outlink[] from plain (parsed)
text

Didn't have much time to check...
Another strange behavior: "anchor text" is sometimes huge, not the same
which I see on a web-page.



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to