Doug Cutting wrote:
Andrzej Bialecki wrote:
I read through your email exchange, and setting aside all emotional
content I think this is a valid request - indeed, as far as I can
tell other major crawlers don't follow these links. We could either
remove this, or make it optional (default not to use them).
Is this as simple as deleting line 60 from DOMContentUtils.java (in
the html-parser plugin)?
Yes.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers