On Mon, 2005-02-14 at 17:37 -0500, David A. Desrosiers wrote: > > Agreed, and yet I barely have time (actually, I -don't- have time) > > to pay adequate attention to the FLOSS projects I've already more or > > less committed my time to. > > Unfortunately, many of us also have little or no time to spend > on these kinds of issues as well, so relying on the users to do as > much groundwork as they can to help us to fix it as fast as possible. > > So let's try to do that... what exactly is the problem you've > seen, and what exactly is the kind of result you'd expect? From what I > read, you basically want to exclude the building of the viewer > components, if not explicitly specified, yes?
I saw it myself, and got past it, but it required an extra iteration. But yes, a more adaptive autoconf would likely be valuable. > > That's about what I expected then, except I was expecting to only do > > an RSS feed... > > Why not just use the links that provide those RSS feeds > instead? > > http://plkr.org/rss.pl > http://plkr.org/rdf.pl I'm actually using http://plkr.org/rss.pl, but I'm converting it to HTML prior to converting it to .pdb using pyplucker. And I only today told my conversion script to add a MAXDEPTH=2. > > Should adding -c to my plucker-build help? > > I'm not even sure that works anymore.. lemme give it a test: > > [time passes] > > ...well, it writes the cache, but doesn't appear to check the > upstream site's Last-Modified header (if present), so it just > refetches the content over and over. Bummer. Sounds like a nice feature. > > (How do you check if something has changed, without first > > downloading it? Is there some sort of timestamping and/or message > > digesting going on that I'm not familiar with?) > > Generally, you issue a HEAD request to the server's resource, > and check the Last-Modified date (if present) and fetch it if it is > more-recent than the local copy of that stored resource. Most current > webservers support HEAD, but not all of them support Last-Modified > header, and many types of dynamic pages (even when the content they > serve up doesn't change) will present a new Last-Modified date, which > would force a re-fetch. There's a trick to checking that as well, by > checking Content-Length of the resource, but this also requires that > you keep (and track) these items locally, in some sort of local dbm, > cache, or whatever... at fetch time. Ah, cool. Thanks for the info. > > OK, no problem. I've reduced the frequency to once every two weeks, > > instead of once per day. The site doesn't seem to change much > > anyway. > > Well, what is the purpose? To find new News articles? Or to > find new pages on the site? Why turn the whole site into a Plucker > document with your Plucker spider a few times a day? I can easily just > create a new .pdb of the site when it changes, and you can just fetch > that daily, hourly, or whatever... or just use the RSS/RDF feeds. I just want an overview of what new stuff is happening in the plucker community. Spidering the entire website is unintended. > > Thanks. BTW, does plucker-build respect robots.txt? > > The Python, Java, and C++ versions of the Plucker distillers > presently, do not. Ew... I guess that's related. :) > > David A. Desrosiers > [EMAIL PROTECTED] > http://gnu-designs.com > _______________________________________________ > plucker-list mailing list > [email protected] > http://lists.rubberchicken.org/mailman/listinfo/plucker-list >
signature.asc
Description: This is a digitally signed message part

