Hi Eliezer,

I'm not yet clear on how Squid would come to know about the metalink.
Will an admin add the metalink to a list, so you have a list of
supported/trusted metalinks, or will squid detect the download of a
metalink and do something with it? Because I was in the mood, I've
added a section on both scenarios.

==== If Squid detects a metalink download and tries to do something smart
I don't really see a way of having metalinks interact with the mirror
path identifier/store-id features. The big problem, I think, is that
you have to be sure the client requesting any of the metalink urls
actually has the URL and will verify the integrity of the download
afterwards. Otherwise the client could just be visiting one of the
urls mentioned in a metalink that happened to pass through the proxy
and get the wrong data.

One thing I could think of was having squid detect the URLS in a
metalink file as probably almost static and up the cache time
regardless of what the real HTTP server hosting the file would respond
in the future.

==== If Squid trusts the metalink content, for example it was added by an admin
Then squid could use the urls to generate a regex for the store-id
extraction and you could have Squid consider the different urls as
equal using your plugin. You might need to add some functionality to
your plugin (for example use a search replace regex instead of "group
1 determines the store id", which I gathered from your example).


Good luck with the plugin!

Bram





On Mon, Jan 28, 2013 at 3:08 PM, Eliezer Croitoru <[email protected]> wrote:
> On 1/25/2013 6:35 PM, Bram Neijt wrote:
>>
>> Hi Eliezer,
>>
>> I can't help you much with the details of how you can get Squid to
>> work with the data in the metalink files.
>
> I'm a squid Developer (not a core one) so it's OK, didn't planned that
> anyone here will help me with squid code.
>
>
>> Maybe I can help with some pointers on what you are trying to do. If
>> you would care to explain the approach: what data will squid look at,
>> what will squid then do?
>
>
> OK, I will try to give you what we want to happen from the proxy point of
> view and from the client one.
>
> The cache proxy we are talking about is a forward one which is a http+ftp
> ONLY proxy.
> I developed a feature called store-id(see details below) which I hope will
> get into squid.HEAD and squid 3.3 in the next weeks.
> This feature allow admins to prevent duplication of http objects in the
> cache using a small program which decides on each request url what is the ID
> of it.
> This feature is the successor of store_url_rewrite feature that existed in
> squid 2.7.
>
> Squid in no way for now a metalink client and from many security aspects
> it's not advised for a proxy to be one.
> Else then that squid and other proxies can benefit a lot from metalinks.
> For full metalinks clients the everything is good since the hashes available
> and they do support partial content.
> With proxies there are other issues which dosn't exists on full metalinks
> clients.
> Since Squid dosn't implement caching for partial content the only benefit
> for squid from metalinks is identifying duplicates of one object by url.
>
> The main issue in this case is that metalinks rely on hashes to verify the
> download content While the store-id feature actually works only on a URL.
> The above can open a very deep security hole for cache poisoning.
> Implementing a same-origin\same-domain policy is not an option since the url
> object in the metalinks files can be from different domains\ip and
> subdirectories.
> A same filename policy also dosn't apply since a simple script\rewriting can
> fake it.
>
> Another issue that is not related directly to metalinks but more to squid
> and maybe some other cache software is that the relevant metalink data is
> suppose to be in the response to the original request which in this stage of
> the download is not helping too much since the store-id is already decided.
> I can do another thing such as using the first link any-user tires as
> store-id for the same urls from the metalink file.
>
> I know it was a bit long and not directly related but since the rfc for
> clients is being written and in draft mode now I think it's good to raise
> these issue and maybe decide on a way to cover these gaps for proxies
> benefit.
>
> * store-id feature details:
> The helper gets the request url and decides the "store-id" based on admins
> algorithms.
> If the admins knows about a CDN\mirror pattern of a url such as in
> sourceforge(real world example):
> ^http:\/\/.*\.dl\.sourceforge\.net\/(.*)
> which all download mirrors in their network has the same url path but
> different .dl.sourceforge.net subdomain.
> all requests for the file /xyx/example.tar.gz can be retrieved using
> http://examplemirro1.dl.sourceforge.net/xyx/example.tar.gz
> http://examplemirro2.dl.sourceforge.net/xyx/example.tar.gz
> http://examplemirro3.dl.sourceforge.net/xyx/example.tar.gz
>
> In this case the admin can use a store-id such as:
> "http://dl.sourceforge.net.squid.internal/xyx/example.tar.gz"; this will
> result squid to store the requests from any of the mirrors into one unified
> object.
> The result is that if the url\file\object exists in the cache by an older
> request from a mirror the current request from another mirror will be served
> from cache rather then from the origin server.
>
> <SNIP>
> Best regards,
> Eliezer
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Metalink Discussion" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
>
> Visit this group at
> http://groups.google.com/group/metalink-discussion?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Metalink Discussion" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
Visit this group at http://groups.google.com/group/metalink-discussion?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to