A number of others have suggested other approaches, but since you started with wget, here are the two wget commands I recently used to archive a wordpress-behind-exproxy site. The first logs into ezproxy and saves the login as a cookie. The second uses to cookie to access a site through exproxy
wget --no-check-certificate --keep-session-cookies --save-cookies cookies.txt --post-data 'user=yeatesst&pass=PASSWORD&auth=d1&url' https://login.EZPROXYMACHINE/login wget --restrict-file-names=windows --default-page=index.php -e robots=off --mirror --user-agent="" --ignore-length --keep-session-cookies --save-cookies cookies.txt --load-cookies cookies.txt --recursive --page-requisites --convert-links --backup-converted "http://WORDPRESSMACHINE. EZPROXYMACHINE/BLOGNAME" cheers stuart -----Original Message----- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Phetteplace Sent: Monday, 6 October 2014 7:44 p.m. To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] wget archiving for dummies Hey C4L, If I wanted to archive a Wordpress site, how would I do so? More elaborate: our library recently got a "donation" of a remote Wordpress site, sitting one directory below the root of a domain. I can tell from a cursory look it's a Wordpress site. We've never archived a website before and I don't need to do anything fancy, just download a workable copy as it presently exists. I've heard this can be as simple as: wget -m $PATH_TO_SITE_ROOT but that's not working as planned. Wget's convert links feature doesn't seem to be quite so simple; if I download the site, disable my network connection, then host locally, some 20 resources aren't available. Mostly images which are under the same directory. Possibly loaded via AJAX. Advice? (Anticipated) pertinent advice: I shouldn't be doing this at all, we should outsource to Archive-It or similar, who actually know what they're doing. Yes/no? Best, Eric