Dear Nutch developers:

Is there any way to inject URLs and define the inlink for those URLs? How
and where can I find the inlink from a certain URL?

Example:

We inject a URL www.example.com/john_doe. We start the crawl and maybe we
are crawling the URL www.example.com/john_doe4.

*=> www.example.com/john_doe*
==> www.example.com/john_doe1
====> www.example.com/john_doe4
==> www.example.com/john_doe2
====> www.example.com/john_doe5
==> www.example.com/john_doe3
===>www.example.com/john_doe6

Is there any way to find the base (inlink) URL www.example.com/john_doe ???

Thanks in advance.

Cheers,
MyD

Reply via email to