there is a config option in nutch-default.xml

<property>
  <name>db.ignore.internal.links</name>
  <value>true</value>
  <description>If true, when adding new links to a page, links from
  the same host are ignored.  This is an effective way to limit the
  size of the link database, keeping only the highest quality
  links.
  </description>
</property>

overwrite it in nutch-site.xml with a false value.


Hrishikesh Agashe schrieb:
> Hi,
>
> It seems that Nutch is not considering URLs with relative paths (<img src = 
> "../img/abc.jpg">) etc.
> Is there any flag / patch to enable this in 1.0? If not, does anyone have 
> idea about how this can be achieved by changing code? 
>
> --Hrishi
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems 
> Ltd. does not accept any liability for virus infected mails.
>   

Reply via email to