Re: [squid-users] cache github zip repositories

2016-09-16 Thread Hardik Dangar
Amos,

Thank you very much for you reply,

But I would just like to confirm this,

I thought ETAG is something which we can verify like,

- when the server is sending response it sends ETAG.
- squid then decides to cache it with ETAG
- now if somebody tries to fetch the same URL given they are on same squid
network, squid will send previously saved copy of the ETag along with the
request in a "If-None-Match" field.
- On this subsequent request, the server may now compare the client's ETag
with the ETag for the current version of the resource. If the ETag values
match, meaning that the resource has not changed, then the server may send
back a very short response with a HTTP 304 Not Modified status. The 304
status tells the client that its cached version is still good and that it
should use that.

Now comparing this to our GitHub example.
- when we request https://codeload.github.com/hardikdangar/test/zip/master
it has
 ETag: "9ea9838812d6f7bc53763eb1577da04e2fa473d5"
- even if you request it after a day ETAG remains the same as long as
repository is not changed.
- So when i request that file again squid would not send request with ETAG
to server ?

( just to clarify, github does not change ETAG unless you change files in
repo). My concern is if ETAG is not changed then we want to use the cache
but if its change then we want to download copy of new version.

now based on your answer you have said it will not request the file again
until min time defined in referesh_pattern. what if i set the request min
time to be like 1 minute in referesh_pattern ? will it cause squid to check
ETAG with server and it will serve cached file as ETAG is not changed. or
will it just drop the cached file after a minute and download new file ?

Sorry for all of my noob questions but i am just trying to understand squid
and its options. it's really fascinating software and really appreciate
your answers here.

Thank you very much.
Have a good day.




On Fri, Sep 16, 2016 at 12:39 PM, Hardik Dangar <
hardikdangar+sq...@gmail.com> wrote:

> *Amos,*
>
> Thanks for the reply but it seems i am not able to tell you what i want to 
> do. i don't want to cache repo files. i want to cache .zip files only. i 
> don't want .git file to cache but only .zip files which are fetch from 
> github.com,
>
> Also you have said things about commits but i am talking about zip file which 
> is given by github via download button or composer fetches those files via 
> command line directly. as soon as someone commits zip file's ETag is changed 
> when you fetch it.
>
> So there is no way to achieve this ? Do you think there is a way to achieve 
> this ?
>
>
> *Consider: how does Squid know the ETag has changed on the server?**What you 
> know about things happening in RL is not what Squid knows.*
>> *I fact how do *you* know someone else did not commit a change during*
>> *that ~1 second it takes to look at the page and click the download 
>> button?** Simply, you don't, and cannot until the new object has been 
>> fetched.*
>> *Likewise, Squid cannot know if the object is the same until it has*
>> *fetched a MISS from the server. Except that Squid does not look at the*
>> *previous page content, so it cannot even 'see' if there is a commit*
>> *listed there that might be different since whenever it got the 
>> previous**object.*
>> *There is no Cache-Control or Expires header indicating a specific*
>> *storage timeout or revalidation procedure. So refresh_pattern defaults*
>> *will be used. These responses will be cached for the refresh_pattern**'Min' 
>> duration (900 minutes) before being considered for revalidated.*
>> *NP 1: Synthesizing Last-Modified from the Date header is only just being*
>> *fixed in Squid the past few weeks, and some parts of it still to be*
>> *committed. So I would not expect that response to be revalidated, 
>> just**re-fetched fully in older Squid.*
>> *NP 2: The Vary header indicates that every person logged in gets a*
>> *differently cached response based on how their credentials are hashed on*
>> *each request (in Authorization tokens). So caching these objects will*
>> *not help much with many developers involved. It will be of most help 
>> for**the anonymous visitors where username is always a generic NIL value.*
>> *HTH**Amos*
>
>
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] cache github zip repositories

2016-09-16 Thread Amos Jeffries
On 16/09/2016 7:09 p.m., Hardik Dangar wrote:
> *Amos,*
> 
> Thanks for the reply but it seems i am not able to tell you what i
> want to do. i don't want to cache repo files. i want to cache .zip
> files only. i don't want .git file to cache but only .zip files which
> are fetch from github.com,

I undersand that. Maybe my reply was not clear about the problem being
faced.


> 
> Also you have said things about commits but i am talking about zip
> file which is given by github via download button or composer fetches
> those files via command line directly. as soon as someone commits zip
> file's ETag is changed when you fetch it.

That is because the .zip file is generated by github on-demand.

When someone commmits something to the repository the contents of the
next .zip to be downloaded change. The ETag is, or represents, a hash of
the .zip current contents.


> 
> So there is no way to achieve this ? Do you think there is a way to
> achieve this ?
> 

Since the .zip file (and ETag) is able to change at any time Squid
cannot know whether the file it has from a previous request is still
usable. Current versions of Squid must fetch a new .zip to find out - at
that point they might as well just deliver the new copy.

Future Squid releases** will allow these .zip object to be cached and
reused for a short period defined by a matching refresh_pattern line.

But really, unless github change how they create those .zip files
caching of them will not work very well.


** If you really need this behaviour right now you can build the latest
Squid-4 snapshot and apply Eduards patch from
.

Amos

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] cache github zip repositories

2016-09-16 Thread Hardik Dangar
*Amos,*

Thanks for the reply but it seems i am not able to tell you what i
want to do. i don't want to cache repo files. i want to cache .zip
files only. i don't want .git file to cache but only .zip files which
are fetch from github.com,

Also you have said things about commits but i am talking about zip
file which is given by github via download button or composer fetches
those files via command line directly. as soon as someone commits zip
file's ETag is changed when you fetch it.

So there is no way to achieve this ? Do you think there is a way to
achieve this ?


*Consider: how does Squid know the ETag has changed on the
server?**What you know about things happening in RL is not what Squid
knows.*
> *I fact how do *you* know someone else did not commit a change during*
> *that ~1 second it takes to look at the page and click the download button?** 
> Simply, you don't, and cannot until the new object has been fetched.*
> *Likewise, Squid cannot know if the object is the same until it has*
> *fetched a MISS from the server. Except that Squid does not look at the*
> *previous page content, so it cannot even 'see' if there is a commit*
> *listed there that might be different since whenever it got the 
> previous**object.*
> *There is no Cache-Control or Expires header indicating a specific*
> *storage timeout or revalidation procedure. So refresh_pattern defaults*
> *will be used. These responses will be cached for the refresh_pattern**'Min' 
> duration (900 minutes) before being considered for revalidated.*
> *NP 1: Synthesizing Last-Modified from the Date header is only just being*
> *fixed in Squid the past few weeks, and some parts of it still to be*
> *committed. So I would not expect that response to be revalidated, 
> just**re-fetched fully in older Squid.*
> *NP 2: The Vary header indicates that every person logged in gets a*
> *differently cached response based on how their credentials are hashed on*
> *each request (in Authorization tokens). So caching these objects will*
> *not help much with many developers involved. It will be of most help 
> for**the anonymous visitors where username is always a generic NIL value.*
> *HTH**Amos*
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] cache github zip repositories

2016-09-14 Thread Amos Jeffries
On 15/09/2016 11:54 a.m., Hardik Dangar wrote:
> Hello,
> 
> I am trying to cache Github zip URL's so it can be effectively cached as a
> composer(php dependency management tool) uses them and in our local setup (
> we are about 40 developers on a Lan and it will really help us managing
> cache.). My squid version is 3.5.12 and our squid cache server is ubuntu
> 16.04. Here is squid.conf file we use,
> https://gist.github.com/hardikdangar/df31d5bce725eff66e06f3abd6e77600
> 
> Here is the part which I want to cache,
> say for example you want to download repo from GitHub then URL looks like
> https://github.com/hardikdangar/test/archive/master.zip
> but it redirects to the following,
> https://codeload.github.com/hardikdangar/test/zip/master
> 
> You can see the response parameters via redbot.org
> https://redbot.org/?uri=https%3A%2F%2Fcodeload.github.com%
> 2Fhardikdangar%2Ftest%2Fzip%2Fmaster
> 
>   HTTP/1.1 200 OK
> Content-Length: 929
> Access-Control-Allow-Origin: https://render.githubusercontent.com
> Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'
> Strict-Transport-Security: max-age=31536000
> Vary: Authorization,Accept-Encoding
> X-Content-Type-Options: nosniff
> X-Frame-Options: deny
> X-XSS-Protection: 1; mode=block
> ETag: "9ea9838812d6f7bc53763eb1577da04e2fa473d5"
> Content-Type: application/zip
> Content-Disposition: attachment; filename=test-master.zip
> X-Geo-Block-List:
> Date: Wed, 14 Sep 2016 23:24:44 GMT
> X-GitHub-Request-Id: 77092BF1:7F40:346461:57D9DC3C
> 
> Now if i do any change to above repository github does change ETAG and if i
> don't change anything then ETAG remains the same so i believe we should be
> able to cache those .zip files.
> 
> By default, squid does not cache codeload.github.com, to put it into cache,
> I added,
> refresh_pattern codeload.github.com 900 20% 4320 reload-into-ims
> 
> Now as per my understanding this should check etag as Last-Modified is not
> provided by github for each new request. This does cache the zip file but
> what happens is in next request even if i change the content and etag
> changes squid sends the cached file from its cache instead of downloading
> new file.
> 
> I have no clue why this happens. Can anyone help me figure out what's wrong
> here? why squid does not detect new etag when repository is updated? why it
> sends cache file even though there is new file available.
> 

Consider: how does Squid know the ETag has changed on the server?

What you know about things happening in RL is not what Squid knows.

I fact how do *you* know someone else did not commit a change during
that ~1 second it takes to look at the page and click the download button?
 Simply, you don't, and cannot until the new object has been fetched.

Likewise, Squid cannot know if the object is the same until it has
fetched a MISS from the server. Except that Squid does not look at the
previous page content, so it cannot even 'see' if there is a commit
listed there that might be different since whenever it got the previous
object.

There is no Cache-Control or Expires header indicating a specific
storage timeout or revalidation procedure. So refresh_pattern defaults
will be used. These responses will be cached for the refresh_pattern
'Min' duration (900 minutes) before being considered for revalidated.


NP 1: Synthesizing Last-Modified from the Date header is only just being
fixed in Squid the past few weeks, and some parts of it still to be
committed. So I would not expect that response to be revalidated, just
re-fetched fully in older Squid.


NP 2: The Vary header indicates that every person logged in gets a
differently cached response based on how their credentials are hashed on
each request (in Authorization tokens). So caching these objects will
not help much with many developers involved. It will be of most help for
the anonymous visitors where username is always a generic NIL value.

HTH
Amos

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


[squid-users] cache github zip repositories

2016-09-14 Thread Hardik Dangar
Hello,

I am trying to cache Github zip URL's so it can be effectively cached as a
composer(php dependency management tool) uses them and in our local setup (
we are about 40 developers on a Lan and it will really help us managing
cache.). My squid version is 3.5.12 and our squid cache server is ubuntu
16.04. Here is squid.conf file we use,
https://gist.github.com/hardikdangar/df31d5bce725eff66e06f3abd6e77600

Here is the part which I want to cache,
say for example you want to download repo from GitHub then URL looks like
https://github.com/hardikdangar/test/archive/master.zip
but it redirects to the following,
https://codeload.github.com/hardikdangar/test/zip/master

You can see the response parameters via redbot.org
https://redbot.org/?uri=https%3A%2F%2Fcodeload.github.com%
2Fhardikdangar%2Ftest%2Fzip%2Fmaster

  HTTP/1.1 200 OK
Content-Length: 929
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
ETag: "9ea9838812d6f7bc53763eb1577da04e2fa473d5"
Content-Type: application/zip
Content-Disposition: attachment; filename=test-master.zip
X-Geo-Block-List:
Date: Wed, 14 Sep 2016 23:24:44 GMT
X-GitHub-Request-Id: 77092BF1:7F40:346461:57D9DC3C

Now if i do any change to above repository github does change ETAG and if i
don't change anything then ETAG remains the same so i believe we should be
able to cache those .zip files.

By default, squid does not cache codeload.github.com, to put it into cache,
I added,
refresh_pattern codeload.github.com 900 20% 4320 reload-into-ims

Now as per my understanding this should check etag as Last-Modified is not
provided by github for each new request. This does cache the zip file but
what happens is in next request even if i change the content and etag
changes squid sends the cached file from its cache instead of downloading
new file.

I have no clue why this happens. Can anyone help me figure out what's wrong
here? why squid does not detect new etag when repository is updated? why it
sends cache file even though there is new file available.

Thank you very much in advance for reading upto this point and have a good
day.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users