On Wed, 7 Oct 2009 21:39:34 +0200
Patrik Lembke <bla...@chebab.com> wrote:

> On Sun, 28 Jun 2009 19:13:53 +0200 Patrik Lembke <bla...@chebab.com>
> wrote:
> > B''H
> > 
> > On Tue, 16 Jun 2009 21:59:13 +0930 Karl Goetz <k...@kgoetz.id.au>
> > wrote:
> > > On Tue, 16 Jun 2009 11:51:10 +0200
> > > Let us know how you progress.
> > > kk
> > 
> > Well currently I have a sort of working spider, just that it is a
> > bit too good at crawling. For example it's hard too know what parts
> > of the page is really wiki and whats not, example:
> > http://wiki.gnewsense.org/ForumMain/
> > 
> > So I will have to use some sort of black list of pages not to fetch
> > (currently it just checks that it is a page on wiki.gnewsense.org).
> > 
> > img download and rewrite also "wiki-links" rewrite is also
> > implemented currently.
> > 
> > So I mostly need a list of "bad"-places, like the forum and
> > http://wiki.gnewsense.org/Site/FASTMembership
> > 
> 
> Bumping this since I really need this information to continue. If this
> project is no longer interesting please let me know.
> 

Have a look at the pagelists from [1]. If you need explicit lists I can
unbreak the lines ('page list' needs to be 'pagelist' for the magic to
happen).

[1] http://wiki.gnewsense.org/Profiles/Kgoetz#toc7
kk
 
-- 
Karl Goetz, (Kamping_Kaiser / VK5FOSS)
Debian contributor / gNewSense Maintainer
http://www.kgoetz.id.au
No, I won't join your social networking group

Attachment: signature.asc
Description: PGP signature

_______________________________________________
gNewSense-dev mailing list
gNewSense-dev@nongnu.org
http://lists.nongnu.org/mailman/listinfo/gnewsense-dev

Reply via email to