> Just got a copy of Plucker last week and it's been a great help to me (I
> switched from AvantGo, when I find it refused to download some of the
> 200k-300k files I wanted to read offline).
Ah, another wonderful limitation of AvantGo. You just gave me an
idea for another article for the Plucker homepage =)
> If no one else is doing it I might teach myself Python/http and add it
> myself.
Save some of that time and effort, and pick up Sitescooper. It will
do what you want, and can "diff" pages for speed, if your content has not
changed (along with dozens of other neat features). I think you'll find it
does what you want, but it's a bit daunting to set up at first.
> On a related note, does anyone else think a 'MAXLINKS' feature would be
> good? Several of the pages I am interested in have a history page which
> makes it easy to download all the latest updates, but obviously a whole
> years worth of page history is rather a lot, and so some way of grabbing
> only the first N links could be helpful (currently I've hacked my local
> scripts to stop at 15, but it's not very configurable).
I actually suggested something like this awhile ago, with a
"depth-first" vs. "breadth-first" gathering process, which would have a
top-limit of number of links followed. This also falls along with the
hitting of ^C in the parser, which should not just trash any currently
gathered data, but take what it has up to the point of the ^C, and put it
into a pdb file, broken links and all.
/d