Dear Nutch developers: Is there any way to inject URLs and define the inlink for those URLs? How and where can I find the inlink from a certain URL?
Example: We inject a URL www.example.com/john_doe. We start the crawl and maybe we are crawling the URL www.example.com/john_doe4. *=> www.example.com/john_doe* ==> www.example.com/john_doe1 ====> www.example.com/john_doe4 ==> www.example.com/john_doe2 ====> www.example.com/john_doe5 ==> www.example.com/john_doe3 ===>www.example.com/john_doe6 Is there any way to find the base (inlink) URL www.example.com/john_doe ??? Thanks in advance. Cheers, MyD