there is a config option in nutch-default.xml <property> <name>db.ignore.internal.links</name> <value>true</value> <description>If true, when adding new links to a page, links from the same host are ignored. This is an effective way to limit the size of the link database, keeping only the highest quality links. </description> </property>
overwrite it in nutch-site.xml with a false value. Hrishikesh Agashe schrieb: > Hi, > > It seems that Nutch is not considering URLs with relative paths (<img src = > "../img/abc.jpg">) etc. > Is there any flag / patch to enable this in 1.0? If not, does anyone have > idea about how this can be achieved by changing code? > > --Hrishi > > DISCLAIMER > ========== > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute or > use this message. If you have received this communication in error, please > notify the sender and delete all copies of this message. Persistent Systems > Ltd. does not accept any liability for virus infected mails. >
