> I thought Michael had suggested a while back that we could thread the
> retriever to allow more than one page to be retrieved at a time.  Has any
> work been done on this?

        Let me throw a wrench in the works. Those threads have to have a
shared memory space. Why? Because what if you have two separate urls, which
conincidentally link to the same offsite page? You'll now have two separate
versions of the same content in your cache, which both will contain unique
identifiers. In the perl parser (yes, defunct for now) there is some code
there which builds an array of 'seen' and 'bad' links, and constantly does a
check and garbage collection on them.

        Just a design thought to consider, that's all...


/d


Reply via email to