On 04/06/12 04:41 AM, Per Jessen wrote:
Hi Jack
not sure how valid these comments might be, I have zero knowledge about
ATS.
Your solution appears to address what I think of as the "first half of
the issue" - that a file may have several URLs (one per mirror). Pick
the wrong one, and a previously cached copy will not be found.
Yes exactly, I have only tackled the first half of the issue you describe
If looking up the currently cached content is fast/efficient, rewriting
the header accordingly sounds okay, but I can't help thinking that it
would be easier to do what I do with Squid - rewrite the URLs when they
are stored?
If<primary> is the primary location, e.g.
http://download.services.openoffice.org, and<mirror1-9> are mirrors,
then files retrieved from<mirror1-9> are stored as if they were
fetched from<primary>. On subsequent retrievals, you would have a
direct cache hit with no need to look at the header.
Hmm, is there any way to automatically discover the list of mirrors? I
know you automatically retrieve the list of mirrors from
http://mirrors.opensuse.org/list/all.html, and you are looking for
something less messy than scraping this HTML. But I think the proxy
administrator must manually configure where to find the list of mirrors,
for each different content distribution network (openSUSE, OpenOffice, etc.)
A strong motivation for using Metalink is that no manual intervention is
required by the proxy administrator. Any content distribution network
that supports Metalink should be automatically discovered
We are also thinking of examining "Digest: ..." headers. If a response
has a "Location: ..." header that's not already cached and a "Digest:
..." header, then the plugin would check the cache for a matching
digest. If found then it would rewrite the "Location: ..." header with
the cached URL
I'm not really very familiar with metalink, what is your thinking behind
wanting to use the digest to identify a cached object?
My thinking is that looking up cached content by digest might result in
some additional cache hits where scanning the list of "Link: <...>;
rel=duplicate" headers did not, e.g. if the content was downloaded from
a server outside of the CDN, and therefore the URL is not among the
"Link: <...>; rel=duplicate" headers
It might also be more efficient because the digest should be looked up
only once, vs. scanning a possibly long list of "Link: <...>;
rel=duplicate" URLs
_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/
Note: To remove yourself from this mailing list, send a mail with the content
unsubscribe
to the address mirrorbrain-requ...@mirrorbrain.org