On 04/06/12 04:41 AM, Per Jessen wrote:
Hi Jack

not sure how valid these comments might be, I have zero knowledge about

Your solution appears to address what I think of as the "first half of
the issue" - that a file may have several URLs (one per mirror).  Pick
the wrong one, and a previously cached copy will not be found.

Yes exactly, I have only tackled the first half of the issue you describe

If looking up the currently cached content is fast/efficient, rewriting
the header accordingly sounds okay, but I can't help thinking that it
would be easier to do what I do with Squid - rewrite the URLs when they
are stored?
If<primary>  is the primary location, e.g.
http://download.services.openoffice.org, and<mirror1-9>  are mirrors,
then files retrieved from<mirror1-9>  are stored as if they were
fetched from<primary>.  On subsequent retrievals, you would have a
direct cache hit with no need to look at the header.

Hmm, is there any way to automatically discover the list of mirrors? I know you automatically retrieve the list of mirrors from http://mirrors.opensuse.org/list/all.html, and you are looking for something less messy than scraping this HTML. But I think the proxy administrator must manually configure where to find the list of mirrors, for each different content distribution network (openSUSE, OpenOffice, etc.)

A strong motivation for using Metalink is that no manual intervention is required by the proxy administrator. Any content distribution network that supports Metalink should be automatically discovered

We are also thinking of examining "Digest: ..." headers. If a response
has a "Location: ..." header that's not already cached and a "Digest:
..." header, then the plugin would check the cache for a matching
digest. If found then it would rewrite the "Location: ..." header with
the cached URL

I'm not really very familiar with metalink, what is your thinking behind
wanting to use the digest to identify a cached object?

My thinking is that looking up cached content by digest might result in some additional cache hits where scanning the list of "Link: <...>; rel=duplicate" headers did not, e.g. if the content was downloaded from a server outside of the CDN, and therefore the URL is not among the "Link: <...>; rel=duplicate" headers

It might also be more efficient because the digest should be looked up only once, vs. scanning a possibly long list of "Link: <...>; rel=duplicate" URLs

mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
to the address mirrorbrain-requ...@mirrorbrain.org

Reply via email to