Larry W. Virden writes:
> I like having one site per file kind of better than having one large
> plucker file. I suspect there are things I could do about this with
> plucker as well.
Sure, you could do the same thing with plucker.
I use sitescooper for most of my daily/weekly news reading, because
its caching and smarts (content start/content end and table smarts)
clean up a lot of sites that were hard to read when I was getting
them directly with Spider.py (like Dave Barry, or Reuters).
> Each of us probably does it differently, so we'll all have different
> answers. I personally either point directly to the URL I want, using
> plucker-build -H -f and so on, or I'll use separate sitename.html files in
Before I used sitescooper, I had .plucker/html/news.html and
.plucker/html/weekly.html to pluck different sets of sites.
And I still use Spider.py to follow the links that plucker puts into
memopad.
> I'd love to get some user-submitted home.html examples of sites that
> they actively spider, so I can add them to the server itself, and allow
> other users to benefit from that knowledge and content. Hint hint.
Here are the two I used.
...Akkana
<html>
<head>
<title>News Links</title>
</head>
<body>
<h1>Akkana's News Links</h1>
News links, suitable for Plucker:
<p>
<a href="http://partners.nytimes.com/nytimes-partners/omnisky/technology.html"
STAYONHOST NOIMAGES MAXDEPTH=2>NY Times Technology</a>
<br>
<a href="http://news.bbc.co.uk/text_only.stm"
STAYONHOST NOIMAGES MAXDEPTH=2>BBC News</a>
<br>
<a href="http://www.theregister.co.uk"
STAYONHOST NOIMAGES MAXDEPTH=2>The Register</a>
<br>
<a href="http://www.wired.com/news_drop/palmpilot/topstories/"
STAYONHOST NOIMAGES MAXDEPTH=2>Wired</a>
<br>
<!--
Too much image/table crap, need to find a text feed
<a href="http://news.kusp.org/"
STAYONHOST NOIMAGES MAXDEPTH=2>KUSP</a>
<br>
-->
<a href="http://slashdot.org/index.pl?light=1&noboxes=1"
STAYONHOST NOIMAGES MAXDEPTH=1>Slashdot</a>
<br>
<a href="http://linuxtoday.com/indexpalm.php3"
STAYONHOST NOIMAGES MAXDEPTH=1>Linux Today</a>
<br>
<!-- Sites that aren't that good after all:
<a href="http://la.adnfo.com/pdq/wire.html"
STAYONHOST NOIMAGES MAXDEPTH=2>LA Times Newswire</a>
<br>
-->
</body>
</html>
<html>
<head>
<title>Weekly Links</title>
</head>
<body>
<h1>Weekly Links</h1>
Weekly links, suitable for Plucker:
<a href="http://mobile.theonion.com/"
STAYONHOST NOIMAGES MAXDEPTH=2>The Onion</a>
<br>
<a href="http://lwn.net/bigpage.php3"
STAYONHOST NOIMAGES MAXDEPTH=1>LWN</a>
<br>
<a href="http://kt.zork.net/kernel-traffic/latest_print.html"
STAYONHOST NOIMAGES MAXDEPTH=1>Kernel Traffic</a>
</body>
</html>