On 1/25/2013 6:35 PM, Bram Neijt wrote:
Hi Eliezer,

I can't help you much with the details of how you can get Squid to
work with the data in the metalink files.
I'm a squid Developer (not a core one) so it's OK, didn't planned that anyone here will help me with squid code.

Maybe I can help with some pointers on what you are trying to do. If
you would care to explain the approach: what data will squid look at,
what will squid then do?

OK, I will try to give you what we want to happen from the proxy point of view and from the client one.

The cache proxy we are talking about is a forward one which is a http+ftp ONLY proxy. I developed a feature called store-id(see details below) which I hope will get into squid.HEAD and squid 3.3 in the next weeks. This feature allow admins to prevent duplication of http objects in the cache using a small program which decides on each request url what is the ID of it. This feature is the successor of store_url_rewrite feature that existed in squid 2.7.

Squid in no way for now a metalink client and from many security aspects it's not advised for a proxy to be one.
Else then that squid and other proxies can benefit a lot from metalinks.
For full metalinks clients the everything is good since the hashes available and they do support partial content. With proxies there are other issues which dosn't exists on full metalinks clients. Since Squid dosn't implement caching for partial content the only benefit for squid from metalinks is identifying duplicates of one object by url.

The main issue in this case is that metalinks rely on hashes to verify the download content While the store-id feature actually works only on a URL.
The above can open a very deep security hole for cache poisoning.
Implementing a same-origin\same-domain policy is not an option since the url object in the metalinks files can be from different domains\ip and subdirectories. A same filename policy also dosn't apply since a simple script\rewriting can fake it.

Another issue that is not related directly to metalinks but more to squid and maybe some other cache software is that the relevant metalink data is suppose to be in the response to the original request which in this stage of the download is not helping too much since the store-id is already decided. I can do another thing such as using the first link any-user tires as store-id for the same urls from the metalink file.

I know it was a bit long and not directly related but since the rfc for clients is being written and in draft mode now I think it's good to raise these issue and maybe decide on a way to cover these gaps for proxies benefit.

* store-id feature details:
The helper gets the request url and decides the "store-id" based on admins algorithms. If the admins knows about a CDN\mirror pattern of a url such as in sourceforge(real world example):
^http:\/\/.*\.dl\.sourceforge\.net\/(.*)
which all download mirrors in their network has the same url path but different .dl.sourceforge.net subdomain.
all requests for the file /xyx/example.tar.gz can be retrieved using
http://examplemirro1.dl.sourceforge.net/xyx/example.tar.gz
http://examplemirro2.dl.sourceforge.net/xyx/example.tar.gz
http://examplemirro3.dl.sourceforge.net/xyx/example.tar.gz

In this case the admin can use a store-id such as: "http://dl.sourceforge.net.squid.internal/xyx/example.tar.gz"; this will result squid to store the requests from any of the mirrors into one unified object. The result is that if the url\file\object exists in the cache by an older request from a mirror the current request from another mirror will be served from cache rather then from the origin server.

<SNIP>
Best regards,
Eliezer

--
You received this message because you are subscribed to the Google Groups "Metalink 
Discussion" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
Visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to