On 1/25/2013 6:35 PM, Bram Neijt wrote:
Hi Eliezer,
I can't help you much with the details of how you can get Squid to
work with the data in the metalink files.
I'm a squid Developer (not a core one) so it's OK, didn't planned that
anyone here will help me with squid code.
Maybe I can help with some pointers on what you are trying to do. If
you would care to explain the approach: what data will squid look at,
what will squid then do?
OK, I will try to give you what we want to happen from the proxy point
of view and from the client one.
The cache proxy we are talking about is a forward one which is a
http+ftp ONLY proxy.
I developed a feature called store-id(see details below) which I hope
will get into squid.HEAD and squid 3.3 in the next weeks.
This feature allow admins to prevent duplication of http objects in the
cache using a small program which decides on each request url what is
the ID of it.
This feature is the successor of store_url_rewrite feature that existed
in squid 2.7.
Squid in no way for now a metalink client and from many security aspects
it's not advised for a proxy to be one.
Else then that squid and other proxies can benefit a lot from metalinks.
For full metalinks clients the everything is good since the hashes
available and they do support partial content.
With proxies there are other issues which dosn't exists on full
metalinks clients.
Since Squid dosn't implement caching for partial content the only
benefit for squid from metalinks is identifying duplicates of one object
by url.
The main issue in this case is that metalinks rely on hashes to verify
the download content While the store-id feature actually works only on a
URL.
The above can open a very deep security hole for cache poisoning.
Implementing a same-origin\same-domain policy is not an option since the
url object in the metalinks files can be from different domains\ip and
subdirectories.
A same filename policy also dosn't apply since a simple script\rewriting
can fake it.
Another issue that is not related directly to metalinks but more to
squid and maybe some other cache software is that the relevant metalink
data is suppose to be in the response to the original request which in
this stage of the download is not helping too much since the store-id is
already decided.
I can do another thing such as using the first link any-user tires as
store-id for the same urls from the metalink file.
I know it was a bit long and not directly related but since the rfc for
clients is being written and in draft mode now I think it's good to
raise these issue and maybe decide on a way to cover these gaps for
proxies benefit.
* store-id feature details:
The helper gets the request url and decides the "store-id" based on
admins algorithms.
If the admins knows about a CDN\mirror pattern of a url such as in
sourceforge(real world example):
^http:\/\/.*\.dl\.sourceforge\.net\/(.*)
which all download mirrors in their network has the same url path but
different .dl.sourceforge.net subdomain.
all requests for the file /xyx/example.tar.gz can be retrieved using
http://examplemirro1.dl.sourceforge.net/xyx/example.tar.gz
http://examplemirro2.dl.sourceforge.net/xyx/example.tar.gz
http://examplemirro3.dl.sourceforge.net/xyx/example.tar.gz
In this case the admin can use a store-id such as:
"http://dl.sourceforge.net.squid.internal/xyx/example.tar.gz" this will
result squid to store the requests from any of the mirrors into one
unified object.
The result is that if the url\file\object exists in the cache by an
older request from a mirror the current request from another mirror will
be served from cache rather then from the origin server.
<SNIP>
Best regards,
Eliezer
--
You received this message because you are subscribed to the Google Groups "Metalink
Discussion" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
Visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.