Hello, I want to crawle a web site which uses <meta name="robots" content="nofollow" /> in the HTML HEAD, which should be XTHML instead of plain HTML. But wget seems to ignore this control information.
Unfortunately, I can't change the code in the HTML pages of this web server. Can somebody help me? - is it a bug (or not implemented feature) in wget? - if so, is there a fix available? Best regards Stefan Augustin Siemens AG Corporate Technology CT IC 1 Otto-Hahn-Ring 6 81739 München, Deutschland Tel.: +49 (89) 636-47061 Fax: +49 (89) 636-49438 Mobil: +49 (172) 8455616 mailto:[email protected] <mailto:[email protected]> Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
