[Bug-wget] Bug in

Augustin, Stefan Thu, 04 Mar 2010 14:28:57 -0800

Hello,

I want to crawle a web site which uses
 <meta name="robots" content="nofollow" />
in the HTML HEAD,
which should be XTHML instead of plain HTML.
But wget seems to ignore this control information.


Unfortunately, I can't change the code in the HTML pages of this web server.
Can somebody help me?
- is it a bug (or not implemented feature) in wget?
- if so, is there a fix available?



Best regards
Stefan Augustin

Siemens AG
Corporate Technology
CT IC 1
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 (89) 636-47061 
Fax: +49 (89) 636-49438 
Mobil: +49 (172) 8455616
mailto:[email protected]  <mailto:[email protected]> 

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; 
Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe 
Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen; 
Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin 
Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322

[Bug-wget] Bug in

Reply via email to