Re: [Nutch-general] nutch scrawls only relative links

Denis Pimenov Wed, 24 Jan 2007 07:43:37 -0800

Denis Pimenov пишет:

I used this +^.* in crawl-urlfilter.txt, but it's don't working..it 
doesn't crawl relative links, but only absolute...
> Hello
>
> I am a newbie in nutch...  It seems to me that scrawling is not 
> working by relative urls by default. How to fix it?
>
> For example i have relative link on start page <a 
> href="/test/my.jsp">  is not scrawled(but browsers opens in with 
> proper prefix) , but  if i have link <a 
> href="http://mydomain.com:8080/test/my.jsp";> it's crawled well .. Is 
> there any configuration file or something else to fix that?.. I have 
> seen such question in mail archive but it wasn't answered
>
> Denis Pimenov
>
>
Denis Pimenov



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] nutch scrawls only relative links

Reply via email to