-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/03/2012 15:53, valdis.kletni...@vt.edu wrote: > On Sat, 24 Mar 2012 10:26:48 -0000, Dave said: > >> Doesn't the the -e, robots=off, --page-requisites and -H wget directives >> enable >> one to collect all the necessary files that are called from a page? > > No, not *all* the files, for the same reason that if you visit a page with > NoScript enabled, you may end up with missing content and/or big open spaces > on > the page. > > Consider a page that has Javascript on it: > > todaysfile = "http://www.news-site.com/" + date_as_string; > document.load(todaysfile); > > Unless you interpret the javascript, you don't know what URL will get loaded, > because yesterday and tomorrow will get a different URL. So basically, > if you try to pull it down with wget or similar, you will miss *all* the stuff > that's pulled down via Javascript (and probably via css as well - does wget > know how to follow CSS references?). On many modern web designs, > this ends up being the vast majority of the content.
Thanks Valdis, Some things are pretty obvious when pointed out. Dave -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEVAwUBT24RNLIvn8UFHWSmAQLkiwgA2Zkc9GzAeOyeqQAxUGonPf3FoGDOP3ym QATyh9MRRZVmEP2Bz9B1V7r68XP1aw6NjCDgWgs0di+z/tzd4eRFQfkKvEF+f4Ri WsO/ywygxps/5UVIl4yo3whpczeza1yLJuOhC5AT3gcxk/Q5Vv/Cm409Wi8uul4S acgm3wZvv1O5V2VpLUjTt4ucLuH+iKeMQRQOO+qcKHMkL7wtxajrLzKlEd343eaz aq52jZ1xF1i7V632dvE2Cr2ipNv5sguKHHG26GfBpAjPSLlvtmO7lGQ3PQydUGXY PDYamLbP2WyTas2Yf1jYoVdo11d3HSu8E39xiQOj02eM84lUesCoxQ== =M8iL -----END PGP SIGNATURE----- _______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/