Augustin, Stefan wrote: > Hello, > > I want to crawle a web site which uses > <meta name="robots" content="nofollow" /> > in the HTML HEAD, > which should be XTHML instead of plain HTML. > But wget seems to ignore this control information. > > Unfortunately, I can't change the code in the HTML pages of this web server.
If I understand you correctly, I think you meant that "wget seems to obey this control information", otherwise, what would be preventing you from crawling a web site? Have a look at http://wget.addictivecode.org/FrequentlyAskedQuestions#robots for the solution. -- Micah J. Cowan http://micah.cowan.name/
