I have macports set up on my machine

https://www.macports.org

so that makes it easy to add additional packages. With that I just do

sudo port install wget

enter my computer's password and it installs wget. Once you have wget you can crawl/archive an entire site with this command:

wget -mcrpk -o process.log http://www.somesite.com

Parameters:

O - output of the process is written to a file instead of the display. In this case I logged everything to process.log
M - Mirror - copies timestamps and recursion
C - Continues a partly-downloaded transfer. Probably not as big an issue on a functioning web site R - Recursion, but this might not have been needed with the M. Figured it didn't hurt.
P - Download any page dependencies like CSS, images etc.
K - Convert all links to relative URLs so it doesn't keep trying to link off to the original site or path

That should be it. You'll end up with a folder with everything in it. You can turn on apache on your Mac and browse the site with Safari if you like. You can also open the files in Safari but some stuff like Ajaxed dynamic content won't work. You probably won't want to re-host it as is since some shared assets will be duplicated. Another caveat is that if you have secret URLs that are not linked from anywhere on your site the crawler will not be able to find them.

CB

On 2/22/15 6:54 PM, Sabahattin Gucukoglu wrote:
Yep, both httrack and wget are options for OS X, and they’re both command-line 
accessible.  I’d choose httrack first, as that’s generally better at this sort 
of thing, but wget will work also if the site is not too complex and/or you 
just want the static files.


--
¯\_(ツ)_/¯

--
You received this message because you are subscribed to the Google Groups 
"MacVisionaries" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to macvisionaries+unsubscr...@googlegroups.com.
To post to this group, send email to macvisionaries@googlegroups.com.
Visit this group at http://groups.google.com/group/macvisionaries.
For more options, visit https://groups.google.com/d/optout.

Reply via email to