On Wed, 7 Oct 2009 21:39:34 +0200 Patrik Lembke <bla...@chebab.com> wrote:
> On Sun, 28 Jun 2009 19:13:53 +0200 Patrik Lembke <bla...@chebab.com> > wrote: > > B''H > > > > On Tue, 16 Jun 2009 21:59:13 +0930 Karl Goetz <k...@kgoetz.id.au> > > wrote: > > > On Tue, 16 Jun 2009 11:51:10 +0200 > > > Let us know how you progress. > > > kk > > > > Well currently I have a sort of working spider, just that it is a > > bit too good at crawling. For example it's hard too know what parts > > of the page is really wiki and whats not, example: > > http://wiki.gnewsense.org/ForumMain/ > > > > So I will have to use some sort of black list of pages not to fetch > > (currently it just checks that it is a page on wiki.gnewsense.org). > > > > img download and rewrite also "wiki-links" rewrite is also > > implemented currently. > > > > So I mostly need a list of "bad"-places, like the forum and > > http://wiki.gnewsense.org/Site/FASTMembership > > > > Bumping this since I really need this information to continue. If this > project is no longer interesting please let me know. > Have a look at the pagelists from [1]. If you need explicit lists I can unbreak the lines ('page list' needs to be 'pagelist' for the magic to happen). [1] http://wiki.gnewsense.org/Profiles/Kgoetz#toc7 kk -- Karl Goetz, (Kamping_Kaiser / VK5FOSS) Debian contributor / gNewSense Maintainer http://www.kgoetz.id.au No, I won't join your social networking group
signature.asc
Description: PGP signature
_______________________________________________ gNewSense-dev mailing list gNewSense-dev@nongnu.org http://lists.nongnu.org/mailman/listinfo/gnewsense-dev