wget works fine too.

httrack is easier to configure to get sites from across domains,  
which is useful from sites with outgoing links. Plus, I was already  
using it on another script ;) If someone wants to post a version that  
uses wget, I'd be happy to switch.


I don't agree that this is piracy- Google, Archive.org, and others  
cache and store sites. As long as the robot respects robots.txt, I  
don't feel that this is something that's unprecedented.

-Colin


On Jul 10, 2006, at 6:34 PM, Matthew Toseland wrote:

> Why httrack and not wget?
>
> Incidentally the freenet project does not endorse piracy; that  
> includes
> mirroring sites without permission. :)
>
> On Mon, Jul 10, 2006 at 06:29:31PM -0400, Colin Davis wrote:
>> While exclusive Freenet content is preferred, Until we have more of
>> that, it might make sense to cache some popular websites into  
>> freenet.
>>
>> This allows people a way to browse these sites, and their links,
>> without having to go through the public internet. I've been inserting
>> a few of these, but it tends to overload my node.  If others wanted
>> to start doing the same thing, it'd be advantageous.
>>
>> As I understand it, there isn't harm in inserting the same page
>> multiple times, since it will just collide on the CHK, and not be
>> stored twice.
>>
>> PyFCP is a great tool for uploading content via a cronscript. You can
>> set up a site once, and then just run freesitemgr update each day.
>> http://www.freenet.org.nz/pyfcp/
>>
>>
>> The script I've been using follows-
>>
>> #!/bin/sh
>>
>> cd /usr/local/freenet
>>
>>
>> #Move to each directory, get the files fromt he website- recurse two
>> (or 3) levels.
>> cd /usr/local/freenet/mirror/XXXXXXX
>> httrack  --mirror --update --mirrorlinks -r2 -%e2 -H3 -C2 --near
>> http://XXXXXX.org  -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-
>> de) AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg
>> +*.png +*.js +*.css
>> cd /usr/local/freenet/mirror/XXXXXXX
>> httrack --mirror --update --mirrorlinks -r2 -%e2 -H3 -C2 --near
>> http://XXXXXX.com -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de)
>> AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg
>> +*.png +*.js +*.css
>> cd /usr/local/freenet/mirror/XXXXXXX
>> httrack  --mirror --update --mirrorlinks -r3 -%e2 -H3 -C2 --near
>> http://XXXXXX.com -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de)
>> AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg
>> +*.png +*.js +*.css
>> cd /usr/local/freenet/mirror/XXXXXXX
>> httrack  --mirror --update http://XXXXXXX.org/
>> cd /usr/local/freenet/mirror/XXXXXXX
>> httrack  --mirror --update --mirrorlinks -r3 -%e2 -H3 -C2 --near
>> http://XXXXXX.net -F "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; de-de)
>> AppleWebKit/412.6 (KHTML, like Gecko) Safari/412.2" +*.gif +*.jpg
>> +*.png +*.js +*.css
>>
>>
>> #Move the httrack cache files out of the directories before insert-
>> Httrack is a waste of space. It's big, and not very helpful to the
>> insert, but we want it saved, so we don't keep downloading the same
>> thing.
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>>
>> #Tell PyFCP to actually insert the content, and keep a log of it.
>> /usr/bin/freesitemgr -v -v  update | tee ~/nodelog.txt
>>
>>
>> #Move the cache files back from the temp dirs
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>> mv /usr/local/freenet/mirror/XXXXXXX/hts-cache/* /usr/local/freenet/
>> mirror/XXXXXXX/hts-cache/
>>
>>
>>
>> #request the keys, to help spread them.
>> cd /usr/local/freenet/mirror/temp
>> rm -rf /usr/local/freenet/mirror/temp/*
>> wget --mirror http://127.0.0.1:8888/USK@ XXXXXXX XXXXXXX
>> rm -rf /usr/local/freenet/mirror/temp/*
>> wget --mirror  http://127.0.0.1:8888/USK@ XXXXXXX XXXXXXX
>> wget --mirror  http://127.0.0.1:8888/USK@ XXXXXXX XXXXXXX
>> rm -rf /usr/local/freenet/mirror/temp/*
>>
>>
>> #request from elsewhere , to help spread them. This requests from
>> Apophis, who is nice enough as to open his node up, which lets me use
>> his to spread ;)
>> cd /usr/local/freenet/mirror/temp
>> rm -rf /usr/local/freenet/mirror/temp/*
>> wget --mirror http://apophis.li/fn.php?url=http://127.0.0.1:8888/USK@
>> XXXXXXX XXXXXXX
>> rm -rf /usr/local/freenet/mirror/temp/*
>> wget --mirror http://apophis.li/fn.php?url=http://127.0.0.1:8888/USK@
>> XXXXXXX XXXXXXX
>> rm -rf /usr/local/freenet/mirror/temp/*
>> wget --mirror http://apophis.li/fn.php?url=http://127.0.0.1:8888/USK@
>> XXXXXXX XXXXXXX
>> rm -rf /usr/local/freenet/mirror/temp/*
>>
>>
>> _______________________________________________
>> Devl mailing list
>> Devl at freenetproject.org
>> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl
>>
>
> -- 
> Matthew J Toseland - toad at amphibian.dyndns.org
> Freenet Project Official Codemonkey - http://freenetproject.org/
> ICTHUS - Nothing is impossible. Our Boss says so.
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl


Reply via email to