I have macports set up on my machine
https://www.macports.org
so that makes it easy to add additional packages. With that I just do
sudo port install wget
enter my computer's password and it installs wget. Once you have wget
you can crawl/archive an entire site with this command:
wget -mcrpk -o process.log http://www.somesite.com
Parameters:
O - output of the process is written to a file instead of the display.
In this case I logged everything to process.log
M - Mirror - copies timestamps and recursion
C - Continues a partly-downloaded transfer. Probably not as big an issue
on a functioning web site
R - Recursion, but this might not have been needed with the M. Figured
it didn't hurt.
P - Download any page dependencies like CSS, images etc.
K - Convert all links to relative URLs so it doesn't keep trying to link
off to the original site or path
That should be it. You'll end up with a folder with everything in it.
You can turn on apache on your Mac and browse the site with Safari if
you like. You can also open the files in Safari but some stuff like
Ajaxed dynamic content won't work. You probably won't want to re-host it
as is since some shared assets will be duplicated. Another caveat is
that if you have secret URLs that are not linked from anywhere on your
site the crawler will not be able to find them.
CB
On 2/22/15 6:54 PM, Sabahattin Gucukoglu wrote:
Yep, both httrack and wget are options for OS X, and they’re both command-line
accessible. I’d choose httrack first, as that’s generally better at this sort
of thing, but wget will work also if the site is not too complex and/or you
just want the static files.
--
¯\_(ツ)_/¯
--
You received this message because you are subscribed to the Google Groups
"MacVisionaries" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to macvisionaries+unsubscr...@googlegroups.com.
To post to this group, send email to macvisionaries@googlegroups.com.
Visit this group at http://groups.google.com/group/macvisionaries.
For more options, visit https://groups.google.com/d/optout.