wget works fine too. httrack is easier to configure to get sites from across domains, which is useful from sites with outgoing links. Plus, I was already using it on another script ;) If someone wants to post a version that uses wget, I'd be happy to switch.
I don't agree that this is piracy- Google, Archive.org, and others cache and store sites. As long as the robot respects robots.txt, I don't feel that this is something that's unprecedented. -Colin On Jul 10, 2006, at 6:34 PM, Matthew Toseland wrote: > Why httrack and not wget? > > Incidentally the freenet project does not endorse piracy; that > includes > mirroring sites without permission. :) > > On Mon, Jul 10, 2006 at 06:29:31PM -0400, Colin Davis wrote: >> While exclusive Freenet content is preferred, Until we have more of >> that, it might make sense to cache some popular websites into >> freenet. >> >> This allows people a way to browse these sites, and their links, >> without having to go through the public internet. I've been inserting >> a few of these, but it tends to overload my node. If others wanted >> to start doing the same thing, it'd be advantageous. >> >> As I understand it, there isn't harm in inserting the same page >> multiple times, since it will just collide on the CHK, and not be >> stored twice. >> >> PyFCP is a great tool for uploading content via a cronscript. You can >> set up a site once, and then just run freesitemgr update each day. >> http://www.freenet.org.nz/pyfcp/ >> >> >> The script I've been using follows- >> >> #!/bin/sh >> >> cd /usr/local/freenet >> >> >> #Move to each directory, get the files fromt he website- recurse two >> (or 3) levels. >> cd /usr/local/freenet/mirror/XXXXXXX >> httrack --mirror --update --mirrorlinks -r2 -%e2 -H3 -C2 --near >> http://XXXXXX.org -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de- >> de) AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg >> +*.png +*.js +*.css >> cd /usr/local/freenet/mirror/XXXXXXX >> httrack --mirror --update --mirrorlinks -r2 -%e2 -H3 -C2 --near >> http://XXXXXX.com -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de) >> AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg >> +*.png +*.js +*.css >> cd /usr/local/freenet/mirror/XXXXXXX >> httrack --mirror --update --mirrorlinks -r3 -%e2 -H3 -C2 --near >> http://XXXXXX.com -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de) >> AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg >> +*.png +*.js +*.css >> cd /usr/local/freenet/mirror/XXXXXXX >> httrack --mirror --update http://XXXXXXX.org/ >> cd /usr/local/freenet/mirror/XXXXXXX >> httrack --mirror --update --mirrorlinks -r3 -%e2 -H3 -C2 --near >> http://XXXXXX.net -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de) >> AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg >> +*.png +*.js +*.css >> >> >> #Move the httrack cache files out of the directories before insert- >> Httrack is a waste of space. It's big, and not very helpful to the >> insert, but we want it saved, so we don't keep downloading the same >> thing. >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> >> #Tell PyFCP to actually insert the content, and keep a log of it. >> /usr/bin/freesitemgr -v -v update | tee ~/nodelog.txt >> >> >> #Move the cache files back from the temp dirs >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/ >> mirror/XXXXXXX/hts-cache/ >> >> >> >> #request the keys, to help spread them. >> cd /usr/local/freenet/mirror/temp >> rm -rf /usr/local/freenet/mirror/temp/* >> wget --mirror http://127.0.0.1:8888/USK@ XXXXXXX XXXXXXX >> rm -rf /usr/local/freenet/mirror/temp/* >> wget --mirror http://127.0.0.1:8888/USK@ XXXXXXX XXXXXXX >> wget --mirror http://127.0.0.1:8888/USK@ XXXXXXX XXXXXXX >> rm -rf /usr/local/freenet/mirror/temp/* >> >> >> #request from elsewhere , to help spread them. This requests from >> Apophis, who is nice enough as to open his node up, which lets me use >> his to spread ;) >> cd /usr/local/freenet/mirror/temp >> rm -rf /usr/local/freenet/mirror/temp/* >> wget --mirror http://apophis.li/fn.php?url=http://127.0.0.1:8888/USK@ >> XXXXXXX XXXXXXX >> rm -rf /usr/local/freenet/mirror/temp/* >> wget --mirror http://apophis.li/fn.php?url=http://127.0.0.1:8888/USK@ >> XXXXXXX XXXXXXX >> rm -rf /usr/local/freenet/mirror/temp/* >> wget --mirror http://apophis.li/fn.php?url=http://127.0.0.1:8888/USK@ >> XXXXXXX XXXXXXX >> rm -rf /usr/local/freenet/mirror/temp/* >> >> >> _______________________________________________ >> Devl mailing list >> Devl at freenetproject.org >> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl >> > > -- > Matthew J Toseland - toad at amphibian.dyndns.org > Freenet Project Official Codemonkey - http://freenetproject.org/ > ICTHUS - Nothing is impossible. Our Boss says so. > _______________________________________________ > Devl mailing list > Devl at freenetproject.org > http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
