Re: Plucker for Linux

Dan Stromberg Mon, 14 Feb 2005 14:45:37 -0800

On Mon, 2005-02-14 at 17:37 -0500, David A. Desrosiers wrote:
> > Agreed, and yet I barely have time (actually, I -don't- have time) 
> > to pay adequate attention to the FLOSS projects I've already more or 
> > less committed my time to.
> 
>       Unfortunately, many of us also have little or no time to spend 
> on these kinds of issues as well, so relying on the users to do as 
> much groundwork as they can to help us to fix it as fast as possible.
> 
>       So let's try to do that... what exactly is the problem you've 
> seen, and what exactly is the kind of result you'd expect? From what I 
> read, you basically want to exclude the building of the viewer 
> components, if not explicitly specified, yes?


I saw it myself, and got past it, but it required an extra iteration.
But yes, a more adaptive autoconf would likely be valuable.

> > That's about what I expected then, except I was expecting to only do 
> > an RSS feed...
> 
>       Why not just use the links that provide those RSS feeds 
> instead?
> 
>       http://plkr.org/rss.pl
>       http://plkr.org/rdf.pl

I'm actually using http://plkr.org/rss.pl, but I'm converting it to HTML
prior to converting it to .pdb using pyplucker.  And I only today told
my conversion script to add a MAXDEPTH=2.

> > Should adding -c to my plucker-build help?
> 
>       I'm not even sure that works anymore.. lemme give it a test:
> 
>       [time passes]
> 
>       ...well, it writes the cache, but doesn't appear to check the 
> upstream site's Last-Modified header (if present), so it just 
> refetches the content over and over.

Bummer.  Sounds like a nice feature.

> > (How do you check if something has changed, without first 
> > downloading it?  Is there some sort of timestamping and/or message 
> > digesting going on that I'm not familiar with?)
> 
>       Generally, you issue a HEAD request to the server's resource, 
> and check the Last-Modified date (if present) and fetch it if it is 
> more-recent than the local copy of that stored resource. Most current 
> webservers support HEAD, but not all of them support Last-Modified 
> header, and many types of dynamic pages (even when the content they 
> serve up doesn't change) will present a new Last-Modified date, which 
> would force a re-fetch. There's a trick to checking that as well, by 
> checking Content-Length of the resource, but this also requires that 
> you keep (and track) these items locally, in some sort of local dbm, 
> cache, or whatever... at fetch time.

Ah, cool.  Thanks for the info.

> > OK, no problem.  I've reduced the frequency to once every two weeks, 
> > instead of once per day.  The site doesn't seem to change much 
> > anyway.
> 
>       Well, what is the purpose? To find new News articles? Or to 
> find new pages on the site? Why turn the whole site into a Plucker 
> document with your Plucker spider a few times a day? I can easily just 
> create a new .pdb of the site when it changes, and you can just fetch 
> that daily, hourly, or whatever... or just use the RSS/RDF feeds.

I just want an overview of what new stuff is happening in the plucker
community.  Spidering the entire website is unintended.

> > Thanks.  BTW, does plucker-build respect robots.txt?
> 
>       The Python, Java, and C++ versions of the Plucker distillers 
> presently, do not.

Ew...  I guess that's related.  :)

> 
> David A. Desrosiers
> [EMAIL PROTECTED]
> http://gnu-designs.com
> _______________________________________________
> plucker-list mailing list
> [email protected]
> http://lists.rubberchicken.org/mailman/listinfo/plucker-list
>

signature.asc
Description: This is a digitally signed message part

Re: Plucker for Linux

Reply via email to