On Sun, 28 Jun 2009 19:13:53 +0200 Patrik Lembke <bla...@chebab.com> wrote: > B''H > > On Tue, 16 Jun 2009 21:59:13 +0930 Karl Goetz <k...@kgoetz.id.au> wrote: > > On Tue, 16 Jun 2009 11:51:10 +0200 > > Let us know how you progress. > > kk > > Well currently I have a sort of working spider, just that it is a bit > too good at crawling. For example it's hard too know what parts of the > page is really wiki and whats not, example: > http://wiki.gnewsense.org/ForumMain/ > > So I will have to use some sort of black list of pages not to fetch > (currently it just checks that it is a page on wiki.gnewsense.org). > > img download and rewrite also "wiki-links" rewrite is also implemented > currently. > > So I mostly need a list of "bad"-places, like the forum and > http://wiki.gnewsense.org/Site/FASTMembership >
Bumping this since I really need this information to continue. If this project is no longer interesting please let me know. -- Patrik Lembke www: http://blambi.chebab.com/ jabber: bla...@lysator.liu.se GnuPG-key: http://gpg.chebab.com/8FA11A15.asc
signature.asc
Description: PGP signature
_______________________________________________ gNewSense-dev mailing list gNewSense-dev@nongnu.org http://lists.nongnu.org/mailman/listinfo/gnewsense-dev