I've used wget extensively for web preservation. It's a remarkably powerful tool, but there are some notable features/caveats to be aware of:
1) You absolutely should use the --warc-file=<NAME> and --warc-header=<STRING> options. These will create a WARC file alongside the usual wget filedump, which captures essential information (process provenance, server request/responses, raw data before wget adjusts it) for preservation. The warc-header option includes user-added metadata, such as the name, purpose, etc. of the capture. It's likely that you won't use the WARC for access, but keeping it as a preservation copy of the site is invaluable. 2) Javascript, AJAX queries, links in rich media, and such are completely opaque to wget. As such, you'll need to QC aggressively to ensure that you captured everything you intended to. My method was to run a generic wget capture[1], QC it, and manually download missing objects. I'd then pass everything back into wget to create a complete WARC file containing the full capture. It's janky, but gets the job done. 3) Do be careful of commenting options, which often turn into spider traps. The latest versions of wget have regex support, so you can blacklist certain URLs that you know will trap the crawler. If the site is proving stubborn, I can take a look off-list. Best of luck, Alex [1] I've used the following successfully: wget --user-agent="AmigaVoyager/3.2 (AmigaOS/MC680x0)" --warc-file=<FILENAME> --warc-header="<STRING>" --page-requisites -e robots=off --random-wait --wait=5 --recursive --level=0 --no-parent --convert-links <URL>