Re: [squid-users] Not all html objects are being cached

2017-02-01 Thread Eliezer Croitoru
After some time reading the thread and getting to the bottom of it I think I 
have an idea on how to give another angle on the caching subject.
This is an example access.log which I got to with the help of Amos:
https://gist.github.com/elico/2ea2253ef1c09872ba90becb961acd91

It can reveal to the system administrator couple good reasons why an object 
wasn't cached.
It can serve both academic and real world decisions.
I believe that today squid does a nice job deciding what to cache and what to 
not in the limits of any LRU cache based system\software.

I think wireshark was a good choice to analyze the situation.
There is a possibility that some CMS or some web site will do things the wrong 
way and this software will enforce an object to be "non-cachable" but from this 
to break the fundamentals of HTTP there is a lot to learn.
First learn(we are here to assist) and then break if required.
There are many tools to break caching "rules" but squid tries to play the role 
of the most "friendly" on the Internet.
From one hand it will give you couple very good API's and configuration but 
requires you, the admin to know what you are doing. When you don't know or do 
not understand take couple minutes to find the right document or 
tutorial(textual or video).
If you cannot reach these we are here for the rescue to help if needed.
Just ask!!!
The answers will be there  sooner or later in the thread.

All The Bests,
Eliezer


Eliezer Croitoru
Linux System Administrator
Mobile: +972-5-28704261
Email: elie...@ngtech.co.il


-Original Message-
From: squid-users [mailto:squid-users-boun...@lists.squid-cache.org] On Behalf 
Of boruc
Sent: Tuesday, January 24, 2017 5:53 PM
To: squid-users@lists.squid-cache.org
Subject: [squid-users] Not all html objects are being cached

Hi everyone,

I was wondering why some of visited pages are not being cached (I mean
"main" pages, like www.example.com). If I visit 50 pages only 10 will be
cached. Below text is from log files:

store.log:
1485272001.646 RELEASE -1  04F7FA9EAA7FE3D531A2224F4C7DDE5A  200
1485272011-1 375007920 text/html -1/222442 GET http://www.wykop.pl/

access.log
1485272001.646423 10.10.10.136 TCP_MISS/200 223422 GET
http://www.wykop.pl/ - DIRECT/185.66.120.38 text/html

According to Squid Wiki: "if a RELEASE code was logged with file number
, the object existed only in memory, and was released from memory."
- I understand that requested html file wasn't saved to disk, but why?

I'm also posting my squid.conf below. I'd be grateful for your answers!


acl manager proto cache_object
acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 ::1
acl my_network src 192.168.0.0/24
acl my_phone src 192.168.54.0/24
acl my_net dst 192.168.0.0/24
acl mgr src 10.48.5.0/24
acl new_net src 10.10.10.0/24
acl ex_ft url_regex -i "/etc/squid3/excluded_filetypes.txt" 
acl ex_do url_regex -i "/etc/squid3/excluded_domains.txt" #doesnt include
any of 50 visited pages

acl SSL_ports port 443
acl Safe_ports port 80  # http
acl Safe_ports port 21  # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70  # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

http_access allow my_network
http_access allow my_phone
http_access allow my_net
http_access allow mgr
http_access allow new_net
http_access allow manager localhost
http_access deny manager

http_access deny !Safe_ports

http_access deny CONNECT !SSL_ports

http_access allow localhost
http_access allow all

http_port 3128

maximum_object_size_in_memory 1024 KB

cache_dir ufs /var/spool/squid3 1000 16 256

cache_store_log /var/log/squid3/store.log

coredump_dir /var/spool/squid3

cache deny ex_ft
cache deny ex_do

refresh_pattern ^ftp:   144020% 10080
refresh_pattern ^gopher:14400%  1440
refresh_pattern -i (/cgi-bin/|\?) 0 0%  0
refresh_pattern (Release|Packages(.gz)*)$  0   20% 2880

refresh_pattern .   1000   20% 4320

request_header_access Accept-Encoding deny all



--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293.html
Sent from the Squid - Users mailing list archive at Nabble.com.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-02-01 Thread Yuri Voinov
You'r welcome.

I do not understand what the hell you have clung to me. I have my own
point of view on the problem. Tell tales of the guy who started this
thread. I know the developer's position.

So, let's stop useless discussion. This is wasted time only.

01.02.2017 21:48, Amos Jeffries пишет:
> On 28/01/2017 1:35 a.m., Yuri wrote:
>> I just want to have a choice and an opportunity to say - "F*ck you, man,
>> I'm the System Administrator".
> Does that go down well in parties or something?
>
>> If you do not want to violate the RFC - remove violations HTTP at all.
>> If you remember, this mode is now enabled by default.
> That mode does not mean what you seem to think it means.
>
> It means that *some* *specific* things which are known not to cause much
> damage are allowed which violate HTTP _a little bit_ when it helps the
> traffic work better. Most things it does is enabling Squid to detect and
> talk with broken software that are themselves not quite following HTTP
> right.
>  For example, a client forgetting to %20 some whitespace inside a URL.
>
>> You do not have to teach me that I use. I - an administrator and wish to
>> be able to select tools. And do not be in a situation where the choice
>> is made for me.
>>
>
> Have you tried starting regular conversations with your friends and
> family with the words "F*k you, man, I'm the System Administrator" so
> they know that your way is always right no matter what. Then proceeding
> to say everything else in the conversation at the loudest volume your
> mouth can produce while injecting weird words randomly into each
> sentence? just because you were created with those abilities you might
> as well try using them. It definitely will make conversations short and
> efficient (hmm.. just like 100% caching makes HTTP 'quick').
>
>
> Anyhow, my point is all languages have rules and protocols of behaviour
> that have to be followed for the sentences/messages to be called
> "speaking" that language. If you don't follow those rules you are simply
> not speaking that language. You might be speaking some other language or
> just being a weirdo - either way you are not speaking that language.
>
> HTTP is as much a language as any spoken one. It is just for Internet
> software to 'talk' to each other. By not following its rules you are ...
> well ... not using HTTP.
>
> What you keep saying about how you/admin "must" be allowed to violate
> HTTP just because you are administrator and want to. That makes as much
> sense as being proud about shouting at everyone you talk to in real
> life. It's dumb, on a scale that demonstrates one is not worthy of the
> privilege of being a sysadmin and can lead to early retirement in a
> small padded cell.
>
>
 Antonio, you've seen at least once, so I complained about the
 consequences of my own actions?
>>> You seem to continually complain that people are recommending not to
>>> try going
>>> against standards, or trying to defeat the anti-caching directives on
>>> websites
>>> you find.
>>>
>>> It's your choice to try doing that; people are saying "but if you do
>>> that, bad
>>> things will happen, or things will break, or it just won't work the
>>> way you
>>> want it to", and then you say "but I don't like having to follow the
>>> rules".
>>>
>>> That's what I meant about complaining about the consequences of your
>>> actions.
>> It is my right and my choice. Personally, I do not complain of the
>> consequences, having enough tools to solve any problem.
>>
> Hahahahaha "not complain about the consequences", ROFLMAO.
> Thanks dude, I needed a good laugh today.
>
> Amos
>
> ___
> squid-users mailing list
> squid-users@lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users

-- 
Bugs to the Future


0x613DEC46.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-02-01 Thread Amos Jeffries
On 28/01/2017 1:35 a.m., Yuri wrote:
> 
> I just want to have a choice and an opportunity to say - "F*ck you, man,
> I'm the System Administrator".

Does that go down well in parties or something?

> 
> If you do not want to violate the RFC - remove violations HTTP at all.
> If you remember, this mode is now enabled by default.

That mode does not mean what you seem to think it means.

It means that *some* *specific* things which are known not to cause much
damage are allowed which violate HTTP _a little bit_ when it helps the
traffic work better. Most things it does is enabling Squid to detect and
talk with broken software that are themselves not quite following HTTP
right.
 For example, a client forgetting to %20 some whitespace inside a URL.

> 
> You do not have to teach me that I use. I - an administrator and wish to
> be able to select tools. And do not be in a situation where the choice
> is made for me.
> 


Have you tried starting regular conversations with your friends and
family with the words "F*k you, man, I'm the System Administrator" so
they know that your way is always right no matter what. Then proceeding
to say everything else in the conversation at the loudest volume your
mouth can produce while injecting weird words randomly into each
sentence? just because you were created with those abilities you might
as well try using them. It definitely will make conversations short and
efficient (hmm.. just like 100% caching makes HTTP 'quick').


Anyhow, my point is all languages have rules and protocols of behaviour
that have to be followed for the sentences/messages to be called
"speaking" that language. If you don't follow those rules you are simply
not speaking that language. You might be speaking some other language or
just being a weirdo - either way you are not speaking that language.

HTTP is as much a language as any spoken one. It is just for Internet
software to 'talk' to each other. By not following its rules you are ...
well ... not using HTTP.

What you keep saying about how you/admin "must" be allowed to violate
HTTP just because you are administrator and want to. That makes as much
sense as being proud about shouting at everyone you talk to in real
life. It's dumb, on a scale that demonstrates one is not worthy of the
privilege of being a sysadmin and can lead to early retirement in a
small padded cell.


> 
>>
>>> Antonio, you've seen at least once, so I complained about the
>>> consequences of my own actions?
>> You seem to continually complain that people are recommending not to
>> try going
>> against standards, or trying to defeat the anti-caching directives on
>> websites
>> you find.
>>
>> It's your choice to try doing that; people are saying "but if you do
>> that, bad
>> things will happen, or things will break, or it just won't work the
>> way you
>> want it to", and then you say "but I don't like having to follow the
>> rules".
>>
>> That's what I meant about complaining about the consequences of your
>> actions.
> It is my right and my choice. Personally, I do not complain of the
> consequences, having enough tools to solve any problem.
> 

Hahahahaha "not complain about the consequences", ROFLMAO.
Thanks dude, I needed a good laugh today.

Amos

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread joseph
im not here to fight dont mention RFC caus its alredy violating RFC just
using enable-http-violations
pls re read my post or get someone to translate the structure of it 
else no benefit explaining or protecting RFC shit
so pls careful reading my point of view else waisting time with one year
experienced guy
bye folx





--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293p4681368.html
Sent from the Squid - Users mailing list archive at Nabble.com.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Yuri Voinov


27.01.2017 19:35, Garri Djavadyan пишет:
> On Fri, 2017-01-27 at 17:58 +0600, Yuri wrote:
>> 27.01.2017 17:54, Garri Djavadyan пишет:
>>> On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
 --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
 Connecting to 127.0.0.1:3128... connected.
 Proxy request sent, awaiting response...
 HTTP/1.1 200 OK
 Cache-Control: no-cache, no-store
 Pragma: no-cache
 Content-Type: text/html
 Expires: -1
 Server: Microsoft-IIS/8.0
 CorrelationVector: BzssVwiBIUaXqyOh.1.1
 X-AspNet-Version: 4.0.30319
 X-Powered-By: ASP.NET
 Access-Control-Allow-Headers: Origin, X-Requested-With,
 Content-
 Type,
 Accept
 Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
 Access-Control-Allow-Credentials: true
 P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD
 TAI
 TELo
 OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
 X-Frame-Options: SAMEORIGIN
 Vary: Accept-Encoding
 Content-Encoding: gzip
 Date: Fri, 27 Jan 2017 09:29:56 GMT
 Content-Length: 13322
 Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
 expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
 Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
 expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
 Strict-Transport-Security: max-age=0; includeSubDomains
 X-CCC: NL
 X-CID: 2
 X-Cache: MISS from khorne
 X-Cache-Lookup: MISS from khorne:3128
 Connection: keep-alive
 Length: 13322 (13K) [text/html]
 Saving to: 'index.html'

 index.html  100%[==>]  13.01K --.-
 KB/sin
 0s

 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved
 [13322/13322]

 Can you explain me - for what static index.html has this:

 Cache-Control: no-cache, no-store
 Pragma: no-cache

 ?

 What can be broken to ignore CC in this page?
>>> Hi Yuri,
>>>
>>>
>>> Why do you think the page returned for URL
>>> [https://www.microsot.cpom/r
>>> u-kz/] is static and not dynamically generated one?
>> And for me, what's the difference? Does it change anything? In
>> addition, 
>> it is easy to see on the page and even the eyes - strangely enough -
>> to 
>> open its code. And? What do you see there?
> I see an official home page of Microsoft company for KZ region. The
> page is full of javascripts and products offer. It makes sense to
> expect that the page could be changed intensively enough.
In essence, the question is, what to say? In addition to the general
discussion of particulars or examples? As I said - this is just one
example. A lot of them. And I think sometimes it's better to chew than talk.
>
>
>>> The index.html file is default file name for wget.
>> And also the name of the default home page in the web. Imagine - I
>> know 
>> the obvious things. But the question was about something else.
>>> man wget:
>>>--default-page=name
>>> Use name as the default file name when it isn't known
>>> (i.e., for
>>> URLs that end in a slash), instead of index.html.
>>>
>>> In fact the https://www.microsoft.com/ru-kz/index.html is a stub
>>> page
>>> (The page you requested cannot be found.).
>> You living in wrong region. This is geo-dependent page, as obvious,
>> yes?
> What I mean is the pages https://www.microsoft.com/ru-kz/ and https://w
> ww.microsoft.com/ru-kz/index.html are not same. You can easily confirm
> it.
>
>
>> Again. What is the difference? I open it from different
>> workstations, 
>> from different browsers - I see the same thing. The code is
>> identical. I 
>> can is to cache? Yes or no?
> I'm a new member of Squid community (about 1 year). While tracking for
> community activity I found that you can't grasp the advantages of
> HTTP/1.1 over HTTP/1.0 for caching systems. Especially, its ability to
> _safely_ cache and serve same amount (but I believe even more) of the
> objects as HTTP/1.0 compliant caches do (while not breaking internet).
> The main tool of HTTP/1.1 compliant proxies is _revalidation_ process.
> HTTP/1.1 compliant caches like Squid tend to cache all possible objects
> but later use revalidation for dubious requests. In fact the
> revalidation is not costly process, especially using conditional GET
> requests.
Nuff said. Let's stop waste time. Take a look on attachement.
>
> I found that most of your complains in the mail list and Bugzilla are
> related to HTTPS scheme. FYI: The primary tool (revalidation) does not
> work for HTTPS scheme using all current Squid branches at the moment.
> See bug 4648.
Forgot about it. Now I've solved all of my problems.
>
> Try to apply the proposed patch and update all related bug reports.
I have no unresolved problems with caching. For me personally, this
debate - only of academic interest. You can continue to spen

Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Garri Djavadyan
On Fri, 2017-01-27 at 06:15 -0800, joseph wrote:
> hi its not about https scheme its about evrything

Hi,

First of all, I can't brag about my English and writing style, but your
writing style is _very_ offensive to other members. Please, try it
better. First of all, it is very difficult to catch the idea of many
sentences. I believe punctuation marks could help a lot. Thanks in
advance.

> i decide not to involve with arg...
> but why not its the last one i should say it once
> they ar right most of the ppl admin have no knwoleg so its ok to baby
> sit
> them as its
> but
> --enable-http-violations should be fully ignore cache control and in
> refresh
> pattern  admin shuld control the behavior of his need else they
> should  take
> of  —enable-http-violations or alow us to do so
> controlling the 
> Pragma: no-cache and  Cache-Control: no-cache + + ++ +
> in both request reply

Squid, as HTTP/1.1 compliant cache successfully caches and serves
CC:no-cache replies. Below is excerpt from the RFC7234:

5.2.2.2.  no-cache

   The "no-cache" response directive indicates that the response MUST
   NOT be used to satisfy a subsequent request without successful
   validation on the origin server.

The key word is _validation_. There is nothing bad with revalidation.
It is inexpensive but saves us from possible problems. The log entry
'TCP_REFRESH_UNMODIFIED' should be welcomed as TCP_HIT or TCP_MEM_HIT.

Example:

$ curl -v -s -x http://127.0.0.1:3128 http://sandbox.comnet.local/test.
bin >/dev/null

< HTTP/1.1 200 OK
< Last-Modified: Wed, 31 Aug 2016 19:00:00 GMT
< Accept-Ranges: bytes
< Content-Length: 262146
< Content-Type: application/octet-stream
< Expires: Thu, 01 Dec 1994 16:00:00 GMT
< Date: Fri, 27 Jan 2017 14:55:09 GMT
< Server: Apache
< ETag: "ea0cd5-40002-53b62b438ac00"
< Cache-Control: no-cache
< Age: 3
< X-Cache: HIT from gentoo.comnet.uz
< Via: 1.1 gentoo.comnet.uz (squid/3.5.23-BZR)
< Connection: keep-alive

1485528912.222 18 127.0.0.1 TCP_REFRESH_UNMODIFIED/200 262565 GET h
ttp://sandbox.comnet.local/test.bin - HIER_DIRECT/192.168.24.5
application/octet-stream


As you can see, there are no problems with the no-cache reply.


I advise you to consider every specific case where you believe Squid's
transition to HTTP/1.1 compliance restricts you to cache something.


Garri
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread joseph
hi its not about https scheme its about evrything
i decide not to involve with arg...
but why not its the last one i should say it once
they ar right most of the ppl admin have no knwoleg so its ok to baby sit
them as its
but
--enable-http-violations should be fully ignore cache control and in refresh
pattern  admin shuld control the behavior of his need else they should  take
of  —enable-http-violations or alow us to do so
controlling the 
Pragma: no-cache and  Cache-Control: no-cache + + ++ +
in both request reply 
and its up to us to fix broke site   since almost 80% or more from the web
admin programmer using them just to prevent caching not becaus it brake the
page
has nothing to do with old damen page that we can fix the obj to be fresh
soon all web programmer will use those control and squid will become suks
end up having cache server not being able to cache all lool s
let other admin use squid without --enable-http-violations  if they ar worry
about braking web shit bad site
and let other good admin that know wat they ar doing control wat they need
using --enable-http-violations fully open no restriction at all
https is  rarely used not everywhere can use depend on country 
bye
joseph
so as my structure i have http only and as its squid  save me only 5% from
all the http bandwith




--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293p4681365.html
Sent from the Squid - Users mailing list archive at Nabble.com.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Garri Djavadyan
On Fri, 2017-01-27 at 17:58 +0600, Yuri wrote:
> 
> 27.01.2017 17:54, Garri Djavadyan пишет:
> > On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
> > > --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
> > > Connecting to 127.0.0.1:3128... connected.
> > > Proxy request sent, awaiting response...
> > > HTTP/1.1 200 OK
> > > Cache-Control: no-cache, no-store
> > > Pragma: no-cache
> > > Content-Type: text/html
> > > Expires: -1
> > > Server: Microsoft-IIS/8.0
> > > CorrelationVector: BzssVwiBIUaXqyOh.1.1
> > > X-AspNet-Version: 4.0.30319
> > > X-Powered-By: ASP.NET
> > > Access-Control-Allow-Headers: Origin, X-Requested-With,
> > > Content-
> > > Type,
> > > Accept
> > > Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
> > > Access-Control-Allow-Credentials: true
> > > P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD
> > > TAI
> > > TELo
> > > OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
> > > X-Frame-Options: SAMEORIGIN
> > > Vary: Accept-Encoding
> > > Content-Encoding: gzip
> > > Date: Fri, 27 Jan 2017 09:29:56 GMT
> > > Content-Length: 13322
> > > Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
> > > expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
> > > Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
> > > expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
> > > Strict-Transport-Security: max-age=0; includeSubDomains
> > > X-CCC: NL
> > > X-CID: 2
> > > X-Cache: MISS from khorne
> > > X-Cache-Lookup: MISS from khorne:3128
> > > Connection: keep-alive
> > > Length: 13322 (13K) [text/html]
> > > Saving to: 'index.html'
> > > 
> > > index.html  100%[==>]  13.01K --.-
> > > KB/sin
> > > 0s
> > > 
> > > 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved
> > > [13322/13322]
> > > 
> > > Can you explain me - for what static index.html has this:
> > > 
> > > Cache-Control: no-cache, no-store
> > > Pragma: no-cache
> > > 
> > > ?
> > > 
> > > What can be broken to ignore CC in this page?
> > 
> > Hi Yuri,
> > 
> > 
> > Why do you think the page returned for URL
> > [https://www.microsot.cpom/r
> > u-kz/] is static and not dynamically generated one?
> 
> And for me, what's the difference? Does it change anything? In
> addition, 
> it is easy to see on the page and even the eyes - strangely enough -
> to 
> open its code. And? What do you see there?

I see an official home page of Microsoft company for KZ region. The
page is full of javascripts and products offer. It makes sense to
expect that the page could be changed intensively enough.


> > The index.html file is default file name for wget.
> 
> And also the name of the default home page in the web. Imagine - I
> know 
> the obvious things. But the question was about something else.
> > 
> > man wget:
> >    --default-page=name
> > Use name as the default file name when it isn't known
> > (i.e., for
> > URLs that end in a slash), instead of index.html.
> > 
> > In fact the https://www.microsoft.com/ru-kz/index.html is a stub
> > page
> > (The page you requested cannot be found.).
> 
> You living in wrong region. This is geo-dependent page, as obvious,
> yes?

What I mean is the pages https://www.microsoft.com/ru-kz/ and https://w
ww.microsoft.com/ru-kz/index.html are not same. You can easily confirm
it.


> Again. What is the difference? I open it from different
> workstations, 
> from different browsers - I see the same thing. The code is
> identical. I 
> can is to cache? Yes or no?

I'm a new member of Squid community (about 1 year). While tracking for
community activity I found that you can't grasp the advantages of
HTTP/1.1 over HTTP/1.0 for caching systems. Especially, its ability to
_safely_ cache and serve same amount (but I believe even more) of the
objects as HTTP/1.0 compliant caches do (while not breaking internet).
The main tool of HTTP/1.1 compliant proxies is _revalidation_ process.
HTTP/1.1 compliant caches like Squid tend to cache all possible objects
but later use revalidation for dubious requests. In fact the
revalidation is not costly process, especially using conditional GET
requests.

I found that most of your complains in the mail list and Bugzilla are
related to HTTPS scheme. FYI: The primary tool (revalidation) does not
work for HTTPS scheme using all current Squid branches at the moment.
See bug 4648.

Try to apply the proposed patch and update all related bug reports.

HTH


Garri
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Yuri



27.01.2017 18:25, Antony Stone пишет:

On Friday 27 January 2017 at 13:15:21, Yuri wrote:


27.01.2017 18:05, Antony Stone пишет:


You're entitled to do whatever you want to, following standards and
recommendations or not - just don't complain when choosing not to follow
those standards and recommendations results in behaviour different from
what you wanted (or what someone else intended).

All this crazy debate reminds me of Microsoft Windows. Windows is better
to know why the administrator should not have full access. Windows is
better to know how to work. Windows is better to know how to tell the
system administrator so that he called the system administrator.

That should remind you of OS X and Android as well, at the very least (and
quite possibly systemd as well)

My opinion is that it's your choice whether to run Microsoft Windows (or Apple
OS X, or Google Android) or not - but you have to accept it as a whole
package; you can't say "I want some of the neat features, but I want them to
work *my* way".

If you don't accept all aspects of the package, then don't use it.
I just want to have a choice and an opportunity to say - "F*ck you, man, 
I'm the System Administrator".


If you do not want to violate the RFC - remove violations HTTP at all. 
If you remember, this mode is now enabled by default.


You do not have to teach me that I use. I - an administrator and wish to 
be able to select tools. And do not be in a situation where the choice 
is made for me.






Antonio, you've seen at least once, so I complained about the
consequences of my own actions?

You seem to continually complain that people are recommending not to try going
against standards, or trying to defeat the anti-caching directives on websites
you find.

It's your choice to try doing that; people are saying "but if you do that, bad
things will happen, or things will break, or it just won't work the way you
want it to", and then you say "but I don't like having to follow the rules".

That's what I meant about complaining about the consequences of your actions.
It is my right and my choice. Personally, I do not complain of the 
consequences, having enough tools to solve any problem.


Enough to learn me. Op asked why he did not cached static html. That 
explains to him that in fact there live dragons and why he is wrong in 
desires to cache any and all.



Antony.



___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Antony Stone
On Friday 27 January 2017 at 13:15:21, Yuri wrote:

> 27.01.2017 18:05, Antony Stone пишет:
> 
> > You're entitled to do whatever you want to, following standards and
> > recommendations or not - just don't complain when choosing not to follow
> > those standards and recommendations results in behaviour different from
> > what you wanted (or what someone else intended).
> 
> All this crazy debate reminds me of Microsoft Windows. Windows is better
> to know why the administrator should not have full access. Windows is
> better to know how to work. Windows is better to know how to tell the
> system administrator so that he called the system administrator.

That should remind you of OS X and Android as well, at the very least (and 
quite possibly systemd as well)

My opinion is that it's your choice whether to run Microsoft Windows (or Apple 
OS X, or Google Android) or not - but you have to accept it as a whole 
package; you can't say "I want some of the neat features, but I want them to 
work *my* way".

If you don't accept all aspects of the package, then don't use it.

> Antonio, you've seen at least once, so I complained about the
> consequences of my own actions?

You seem to continually complain that people are recommending not to try going 
against standards, or trying to defeat the anti-caching directives on websites 
you find.

It's your choice to try doing that; people are saying "but if you do that, bad 
things will happen, or things will break, or it just won't work the way you 
want it to", and then you say "but I don't like having to follow the rules".

That's what I meant about complaining about the consequences of your actions.


Antony.

-- 
"Life is just a lot better if you feel you're having 10 [small] wins a day 
rather than a [big] win every 10 years or so."

 - Chris Hadfield, former skiing (and ski racing) instructor

   Please reply to the list;
 please *don't* CC me.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Yuri



27.01.2017 18:05, Antony Stone пишет:

On Friday 27 January 2017 at 12:58:52, Yuri wrote:


Again. What is the difference? I open it from different workstations,
from different browsers - I see the same thing. The code is identical. I
can is to cache? Yes or no?

You're entitled to do whatever you want to, following standards and
recommendations or not - just don't complain when choosing not to follow those
standards and recommendations results in behaviour different from what you
wanted (or what someone else intended).
All this crazy debate reminds me of Microsoft Windows. Windows is better 
to know why the administrator should not have full access. Windows is 
better to know how to work. Windows is better to know how to tell the 
system administrator so that he called the system administrator.


Antonio, you've seen at least once, so I complained about the 
consequences of my own actions?




Oh, and by the way, what did you mean earlier when you said:


You either wear pants or remove the cross, as they say.

?

This is the end of a good Russian joke about a priest who had sex.
I meant that we should ever stop having sex - or remove the pectoral 
cross. This is to ensure that the need to be consistent.



Antony.



___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Antony Stone
On Friday 27 January 2017 at 12:58:52, Yuri wrote:

> Again. What is the difference? I open it from different workstations,
> from different browsers - I see the same thing. The code is identical. I
> can is to cache? Yes or no?

You're entitled to do whatever you want to, following standards and 
recommendations or not - just don't complain when choosing not to follow those 
standards and recommendations results in behaviour different from what you 
wanted (or what someone else intended).

Oh, and by the way, what did you mean earlier when you said:

> You either wear pants or remove the cross, as they say.

?


Antony.

-- 
"640 kilobytes (of RAM) should be enough for anybody."

 - Bill Gates

   Please reply to the list;
 please *don't* CC me.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Yuri
I understand that I want to conclusively prove its case. But for the 
sake of objectivity - dynamically generated only dynamic pages? Maybe 
the solution is still the administrator to leave? If I see that 
something is broken or users complain about me - directive *cache deny* 
already canceled?



27.01.2017 17:54, Garri Djavadyan пишет:

On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:

--2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store
Pragma: no-cache
Content-Type: text/html
Expires: -1
Server: Microsoft-IIS/8.0
CorrelationVector: BzssVwiBIUaXqyOh.1.1
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-
Type,
Accept
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Credentials: true
P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI
TELo
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
X-Frame-Options: SAMEORIGIN
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Fri, 27 Jan 2017 09:29:56 GMT
Content-Length: 13322
Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
Strict-Transport-Security: max-age=0; includeSubDomains
X-CCC: NL
X-CID: 2
X-Cache: MISS from khorne
X-Cache-Lookup: MISS from khorne:3128
Connection: keep-alive
Length: 13322 (13K) [text/html]
Saving to: 'index.html'

index.html  100%[==>]  13.01K --.-KB/sin
0s

2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]

Can you explain me - for what static index.html has this:

Cache-Control: no-cache, no-store
Pragma: no-cache

?

What can be broken to ignore CC in this page?

Hi Yuri,


Why do you think the page returned for URL [https://www.microsoft.com/r
u-kz/] is static and not dynamically generated one?

The index.html file is default file name for wget.

man wget:
   --default-page=name
Use name as the default file name when it isn't known (i.e., for
URLs that end in a slash), instead of index.html.

In fact the https://www.microsoft.com/ru-kz/index.html is a stub page
(The page you requested cannot be found.).


Garri
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Yuri



27.01.2017 17:54, Garri Djavadyan пишет:

On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:

--2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
HTTP/1.1 200 OK
Cache-Control: no-cache, no-store
Pragma: no-cache
Content-Type: text/html
Expires: -1
Server: Microsoft-IIS/8.0
CorrelationVector: BzssVwiBIUaXqyOh.1.1
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Access-Control-Allow-Headers: Origin, X-Requested-With, Content-
Type,
Accept
Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
Access-Control-Allow-Credentials: true
P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI
TELo
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
X-Frame-Options: SAMEORIGIN
Vary: Accept-Encoding
Content-Encoding: gzip
Date: Fri, 27 Jan 2017 09:29:56 GMT
Content-Length: 13322
Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com;
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
Strict-Transport-Security: max-age=0; includeSubDomains
X-CCC: NL
X-CID: 2
X-Cache: MISS from khorne
X-Cache-Lookup: MISS from khorne:3128
Connection: keep-alive
Length: 13322 (13K) [text/html]
Saving to: 'index.html'

index.html  100%[==>]  13.01K --.-KB/sin
0s

2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]

Can you explain me - for what static index.html has this:

Cache-Control: no-cache, no-store
Pragma: no-cache

?

What can be broken to ignore CC in this page?

Hi Yuri,


Why do you think the page returned for URL [https://www.microsoft.com/r
u-kz/] is static and not dynamically generated one?
And for me, what's the difference? Does it change anything? In addition, 
it is easy to see on the page and even the eyes - strangely enough - to 
open its code. And? What do you see there?


The index.html file is default file name for wget.
And also the name of the default home page in the web. Imagine - I know 
the obvious things. But the question was about something else.


man wget:
   --default-page=name
Use name as the default file name when it isn't known (i.e., for
URLs that end in a slash), instead of index.html.

In fact the https://www.microsoft.com/ru-kz/index.html is a stub page
(The page you requested cannot be found.).

You living in wrong region. This is geo-dependent page, as obvious, yes?

Again. What is the difference? I open it from different workstations, 
from different browsers - I see the same thing. The code is identical. I 
can is to cache? Yes or no?



Garri
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Garri Djavadyan
On Fri, 2017-01-27 at 15:47 +0600, Yuri wrote:
> --2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
> Connecting to 127.0.0.1:3128... connected.
> Proxy request sent, awaiting response...
>    HTTP/1.1 200 OK
>    Cache-Control: no-cache, no-store
>    Pragma: no-cache
>    Content-Type: text/html
>    Expires: -1
>    Server: Microsoft-IIS/8.0
>    CorrelationVector: BzssVwiBIUaXqyOh.1.1
>    X-AspNet-Version: 4.0.30319
>    X-Powered-By: ASP.NET
>    Access-Control-Allow-Headers: Origin, X-Requested-With, Content-
> Type, 
> Accept
>    Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
>    Access-Control-Allow-Credentials: true
>    P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI
> TELo 
> OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
>    X-Frame-Options: SAMEORIGIN
>    Vary: Accept-Encoding
>    Content-Encoding: gzip
>    Date: Fri, 27 Jan 2017 09:29:56 GMT
>    Content-Length: 13322
>    Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com; 
> expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
>    Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.2; domain=.microsoft.com; 
> expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
>    Strict-Transport-Security: max-age=0; includeSubDomains
>    X-CCC: NL
>    X-CID: 2
>    X-Cache: MISS from khorne
>    X-Cache-Lookup: MISS from khorne:3128
>    Connection: keep-alive
> Length: 13322 (13K) [text/html]
> Saving to: 'index.html'
> 
> index.html  100%[==>]  13.01K --.-KB/sin
> 0s
> 
> 2017-01-27 15:29:57 (32.2 MB/s) - 'index.html' saved [13322/13322]
> 
> Can you explain me - for what static index.html has this:
> 
> Cache-Control: no-cache, no-store
> Pragma: no-cache
> 
> ?
> 
> What can be broken to ignore CC in this page?

Hi Yuri,


Why do you think the page returned for URL [https://www.microsoft.com/r
u-kz/] is static and not dynamically generated one?

The index.html file is default file name for wget.

man wget:
  --default-page=name
   Use name as the default file name when it isn't known (i.e., for
   URLs that end in a slash), instead of index.html.

In fact the https://www.microsoft.com/ru-kz/index.html is a stub page
(The page you requested cannot be found.).


Garri
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-27 Thread Yuri



27.01.2017 9:10, Amos Jeffries пишет:

On 27/01/2017 9:46 a.m., Yuri Voinov wrote:


27.01.2017 2:44, Matus UHLAR - fantomas пишет:

26.01.2017 2:22, boruc пишет:

After a little bit of analyzing requests and responses with WireShark I
noticed that many sites that weren't cached had different
combination of
below parameters:

Cache-Control: no-cache, no-store, must-revalidate, post-check,
pre-check,
private, public, max-age, public
Pragma: no-cache

On 26.01.17 02:44, Yuri Voinov wrote:

If the webmaster has done this - he had good reason to. Trying to break
the RFC in this way, you break the Internet.

Actually, no. If the webmaster has done the above - he has no damn
idea what
those mean (private and public?) , and how to provide properly cacheable
content.

It was sarcasm.


You may have intended it to be. But you spoke the simple truth.

Other than 'public' there really are situations which have "good reason"
to send that set of controls all at once.

For example; any admin who wants a RESTful or SaaS application to
actually work for all their potential customers.


I have been watching the below cycle take place for the past 20 years in
HTTP:

Webmaster: dont cache this please.

   "Cache-Control: no-store"

Proxy Admin: ignore-no-store


Webmaster: I meant it. Dont deliver anything you cached without fetching
a updated version.

   ... "no-store, no-cache"

Proxy Admin: ignore-no-cache


Webmaster: really you MUST revalidate before using ths data.

  ... "no-store, no-cache, must-revalidate"

Proxy Admin: ignore-must-revalidate


Webmaster: Really I meant it. This is non-storable PRIVATE DATA!

... "no-store, no-cache, must-revalidate, private"

Proxy Admin: ignore-private


Webmaster: Seriously. I'm changing it on EVERY request! dont store it.

... "no-store, no-cache, must-revalidate, private, max-age=0"
"Expires: -1"

Proxy Admin: ignore-expires


Webmaster: are you one of those dumb HTTP/1.0 proxies who dont
understand Cache-Control?

"Pragma: no-cache"
"Expires: 1 Jan 1970"

Proxy Admin: hehe! I already ignore-no-cache ignore-expires


Webmaster: F*U!  May your clients batch up their traffic to slam you
with it all at once!

... "no-store, no-cache, must-revalidate, private, max-age=0,
pre-check=1, post-check=1"


Proxy Admin: My bandwidth! I need to cache more!

Webmaster: Doh! Oh well, so I have to write my application to force new
content then.

Proxy Admin: ignore-reload


Webmaster: Now What? Oh HTTPS wont have any damn proxies in the way

... the cycle repeats again within HTTPS. Took all of 5 years this time.

... the cycle repeats again within SPDY. That took only ~1 year.

... the cycle repeats again within CoAP. The standards are not even
finished yet and its underway.


Stop this cycle of stupidity. It really HAS "broken the Internet".
All that would be just great if a webmaster was conscientious. I will 
give just one example.


Only one example.

root @ khorne /patch # wget -S http://www.microsoft.com
--2017-01-27 15:29:54--  http://www.microsoft.com/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 302 Found
  Server: AkamaiGHost
  Content-Length: 0
  Location: http://www.microsoft.com/ru-kz/
  Date: Fri, 27 Jan 2017 09:29:54 GMT
  X-CCC: NL
  X-CID: 2
  X-Cache: MISS from khorne
  X-Cache-Lookup: MISS from khorne:3128
  Connection: keep-alive
Location: http://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54--  http://www.microsoft.com/ru-kz/
Reusing existing connection to 127.0.0.1:3128.
Proxy request sent, awaiting response...
  HTTP/1.1 301 Moved Permanently
  Server: AkamaiGHost
  Content-Length: 0
  Location: https://www.microsoft.com/ru-kz/
  Date: Fri, 27 Jan 2017 09:29:54 GMT
  Set-Cookie: 
akacd_OneRF=1493285394~rv=7~id=6a2316770abdbb58a85c16676a0f84fd; path=/; 
Expires=Thu, 27 Apr 2017 09:29:54 GMT

  X-CCC: NL
  X-CID: 2
  X-Cache: MISS from khorne
  X-Cache-Lookup: MISS from khorne:3128
  Connection: keep-alive
Location: https://www.microsoft.com/ru-kz/ [following]
--2017-01-27 15:29:54--  https://www.microsoft.com/ru-kz/
Connecting to 127.0.0.1:3128... connected.
Proxy request sent, awaiting response...
  HTTP/1.1 200 OK
  Cache-Control: no-cache, no-store
  Pragma: no-cache
  Content-Type: text/html
  Expires: -1
  Server: Microsoft-IIS/8.0
  CorrelationVector: BzssVwiBIUaXqyOh.1.1
  X-AspNet-Version: 4.0.30319
  X-Powered-By: ASP.NET
  Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, 
Accept

  Access-Control-Allow-Methods: GET, POST, PUT, DELETE, OPTIONS
  Access-Control-Allow-Credentials: true
  P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo 
OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"

  X-Frame-Options: SAMEORIGIN
  Vary: Accept-Encoding
  Content-Encoding: gzip
  Date: Fri, 27 Jan 2017 09:29:56 GMT
  Content-Length: 13322
  Set-Cookie: MS-CV=BzssVwiBIUaXqyOh.1; domain=.microsoft.com; 
expires=Sat, 28-Jan-2017 09:29:56 GMT; path=/
  Set-Cookie: MS-CV=BzssV

Re: [squid-users] Not all html objects are being cached

2017-01-26 Thread Amos Jeffries
On 27/01/2017 9:46 a.m., Yuri Voinov wrote:
> 
> 
> 27.01.2017 2:44, Matus UHLAR - fantomas пишет:
>>> 26.01.2017 2:22, boruc пишет:
 After a little bit of analyzing requests and responses with WireShark I
 noticed that many sites that weren't cached had different
 combination of
 below parameters:

 Cache-Control: no-cache, no-store, must-revalidate, post-check,
 pre-check,
 private, public, max-age, public
 Pragma: no-cache
>>
>> On 26.01.17 02:44, Yuri Voinov wrote:
>>> If the webmaster has done this - he had good reason to. Trying to break
>>> the RFC in this way, you break the Internet.
>>
>> Actually, no. If the webmaster has done the above - he has no damn
>> idea what
>> those mean (private and public?) , and how to provide properly cacheable
>> content.
> It was sarcasm.


You may have intended it to be. But you spoke the simple truth.

Other than 'public' there really are situations which have "good reason"
to send that set of controls all at once.

For example; any admin who wants a RESTful or SaaS application to
actually work for all their potential customers.


I have been watching the below cycle take place for the past 20 years in
HTTP:

Webmaster: dont cache this please.

  "Cache-Control: no-store"

Proxy Admin: ignore-no-store


Webmaster: I meant it. Dont deliver anything you cached without fetching
a updated version.

  ... "no-store, no-cache"

Proxy Admin: ignore-no-cache


Webmaster: really you MUST revalidate before using ths data.

 ... "no-store, no-cache, must-revalidate"

Proxy Admin: ignore-must-revalidate


Webmaster: Really I meant it. This is non-storable PRIVATE DATA!

... "no-store, no-cache, must-revalidate, private"

Proxy Admin: ignore-private


Webmaster: Seriously. I'm changing it on EVERY request! dont store it.

... "no-store, no-cache, must-revalidate, private, max-age=0"
"Expires: -1"

Proxy Admin: ignore-expires


Webmaster: are you one of those dumb HTTP/1.0 proxies who dont
understand Cache-Control?

"Pragma: no-cache"
"Expires: 1 Jan 1970"

Proxy Admin: hehe! I already ignore-no-cache ignore-expires


Webmaster: F*U!  May your clients batch up their traffic to slam you
with it all at once!

... "no-store, no-cache, must-revalidate, private, max-age=0,
pre-check=1, post-check=1"


Proxy Admin: My bandwidth! I need to cache more!

Webmaster: Doh! Oh well, so I have to write my application to force new
content then.

Proxy Admin: ignore-reload


Webmaster: Now What? Oh HTTPS wont have any damn proxies in the way

... the cycle repeats again within HTTPS. Took all of 5 years this time.

... the cycle repeats again within SPDY. That took only ~1 year.

... the cycle repeats again within CoAP. The standards are not even
finished yet and its underway.


Stop this cycle of stupidity. It really HAS "broken the Internet".


HTH
Amos
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-26 Thread Amos Jeffries
On 27/01/2017 9:44 a.m., Matus UHLAR - fantomas wrote:
>> 26.01.2017 2:22, boruc пишет:
>>> After a little bit of analyzing requests and responses with WireShark I
>>> noticed that many sites that weren't cached had different combination of
>>> below parameters:
>>>
>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>> pre-check,
>>> private, public, max-age, public
>>> Pragma: no-cache
> 
> On 26.01.17 02:44, Yuri Voinov wrote:
>> If the webmaster has done this - he had good reason to. Trying to break
>> the RFC in this way, you break the Internet.
> 
> Actually, no. If the webmaster has done the above - he has no damn idea
> what
> those mean (private and public?) , and how to provide properly cacheable
> content.
> 


I think boruc has just listed all the cache controls he has noticed in
one line. Not actually what is being seen ...


> Which is very common and also a reason why many proxy admins tend to ignore
> those controls...
> 

... the URLs used for expanded details show the usual combos webmasters
use to 'fix' broken behaviour of such proxies. For example adding
"no-cache, private, max-age=0" to get around proxies ignoring various of
the controls.

Amos
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-26 Thread Amos Jeffries
On 27/01/2017 11:08 a.m., reinerotto wrote:
>> reply_header_access Cache-Control deny all<
> Will this only affect downstream caches, or will this squid itself also
> ignore any Cache-Control header info
> received from upstream ? 
> 

It will only affect the clients caches. eg. their browser cache.

Amos

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-26 Thread reinerotto
>reply_header_access Cache-Control deny all<
Will this only affect downstream caches, or will this squid itself also
ignore any Cache-Control header info
received from upstream ? 




--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293p4681339.html
Sent from the Squid - Users mailing list archive at Nabble.com.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-26 Thread Yuri Voinov


27.01.2017 2:44, Matus UHLAR - fantomas пишет:
>> 26.01.2017 2:22, boruc пишет:
>>> After a little bit of analyzing requests and responses with WireShark I
>>> noticed that many sites that weren't cached had different
>>> combination of
>>> below parameters:
>>>
>>> Cache-Control: no-cache, no-store, must-revalidate, post-check,
>>> pre-check,
>>> private, public, max-age, public
>>> Pragma: no-cache
>
> On 26.01.17 02:44, Yuri Voinov wrote:
>> If the webmaster has done this - he had good reason to. Trying to break
>> the RFC in this way, you break the Internet.
>
> Actually, no. If the webmaster has done the above - he has no damn
> idea what
> those mean (private and public?) , and how to provide properly cacheable
> content.
It was sarcasm.
>
> Which is very common and also a reason why many proxy admins tend to
> ignore
> those controls...
>

-- 
Bugs to the Future


0x613DEC46.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-26 Thread Matus UHLAR - fantomas

26.01.2017 2:22, boruc пишет:

After a little bit of analyzing requests and responses with WireShark I
noticed that many sites that weren't cached had different combination of
below parameters:

Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
private, public, max-age, public
Pragma: no-cache


On 26.01.17 02:44, Yuri Voinov wrote:

If the webmaster has done this - he had good reason to. Trying to break
the RFC in this way, you break the Internet.


Actually, no. If the webmaster has done the above - he has no damn idea what
those mean (private and public?) , and how to provide properly cacheable
content.

Which is very common and also a reason why many proxy admins tend to ignore
those controls...

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
There's a long-standing bug relating to the x86 architecture that
allows you to install Windows.   -- Matthew D. Fuller
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-25 Thread Amos Jeffries
On 26/01/2017 9:44 a.m., Yuri Voinov wrote:
> 
> 
> 26.01.2017 2:22, boruc пишет:
>> After a little bit of analyzing requests and responses with WireShark I
>> noticed that many sites that weren't cached had different combination of
>> below parameters:
>>
>> Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
>> private, public, max-age, public
>> Pragma: no-cache
> If the webmaster has done this - he had good reason to. Trying to break
> the RFC in this way, you break the Internet.

Instead use the latest Squid you can. Squid by default caches as much as
it can within the restrictions imposed by the web environment. But
'latest is best' etc. since we are still working on support for HTTP/1.1
features.


I recommend you use the tool at  to check URLs
cacheability instead of wireshark. It will tell you what those controls
actually *mean* in regards to cacheability, not just that they are used.
And whether there are other problems you may not have noticed in the
various different ways there are to fetch any given URL.


The Squid options available are mostly for disabling some caching
operation - so that if you are in a situation where disabling operation
X causes operation Y to cache better you can tune the behaviour.

You can't really *force* things which are not cacheable to be stored.
They will just be replaced with a newer copy shortly after with no
benefit gained - just some possibly nasty side effects, or real monetary
costs.


>>
>> There is a possibility to disable this in squid by using
> Don't do it.
>> request_header_access and reply_header_access, however it doesn't work for
>> me, many pages aren't still in cache. I am currently using lines below:
>>
>> request_header_access Cache-Control deny all
>> request_header_access Pragma deny all
>> request_header_access Accept-Encoding deny all
>> reply_header_access Cache-Control deny all
>> reply_header_access Pragma deny all
>> reply_header_access Accept-Encoding deny all
>>

Ah, changing the headers on the *outgoing* traffic does not in any way
affect how Squid interprets the _previously_ received inbound messages.

==> In other words; doing the above is pointless and screws everybody
using your proxy over. Dont do that.


By erasing the Cache-Controls response header delivered along with that
content you are technically in violation of International copyright laws.
==> Dont do that.


By removing the Accept-Encoding on requests (only) you can improve HIT
ratio (only a small amount), but at cost of 50-90% bandwidth increase on
each MISS - so the cost increase usually swamps the gains.

==> Making this change lead to the opposite of what you intended. Dont
do that.


Removing the Accept-Encoding header on responses. Is just pointless. It
controls POST/PUT payload data, which Squid cannot cache anyway. So all
you did was prevent the clients using less bandwidth.

==> More bandwidth, more costs. Dont do that.


Removing the Pragma header is also pointless. It's used by very ancient
software from the 1990's and such.

==> if the web application was actually using the Pragma for anything
important (some do) you just screwed them over with no gains to
yourself. Dont do that.


>> I could also try refresh_pattern, but I don't think that code below will
>> work because not every URL ends with .html or .htm (because you visit
>> /www.example.com/, not /www.example.com/index.html/)
>> refresh_pattern -i \.(html|htm)$  1440   40% 40320 ignore-no-cache
>> ignore-no-store ignore-private override-expire reload-into-ims
>>


Quite. So configure the correct options.

No software is psychic enough to do operation X which you want, when you
configure it to do *only* some other non-X operation.


Amos

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-25 Thread Yuri Voinov


26.01.2017 2:22, boruc пишет:
> After a little bit of analyzing requests and responses with WireShark I
> noticed that many sites that weren't cached had different combination of
> below parameters:
>
> Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
> private, public, max-age, public
> Pragma: no-cache
If the webmaster has done this - he had good reason to. Trying to break
the RFC in this way, you break the Internet.
>
> There is a possibility to disable this in squid by using
Don't do it.
> request_header_access and reply_header_access, however it doesn't work for
> me, many pages aren't still in cache. I am currently using lines below:
>
> request_header_access Cache-Control deny all
> request_header_access Pragma deny all
> request_header_access Accept-Encoding deny all
> reply_header_access Cache-Control deny all
> reply_header_access Pragma deny all
> reply_header_access Accept-Encoding deny all
>
> I could also try refresh_pattern, but I don't think that code below will
> work because not every URL ends with .html or .htm (because you visit
> /www.example.com/, not /www.example.com/index.html/)
> refresh_pattern -i \.(html|htm)$  1440   40% 40320 ignore-no-cache
> ignore-no-store ignore-private override-expire reload-into-ims
>
> Thank you in advance.
You're welcome.
>
>
>
> --
> View this message in context: 
> http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293p4681326.html
> Sent from the Squid - Users mailing list archive at Nabble.com.
> ___
> squid-users mailing list
> squid-users@lists.squid-cache.org
> http://lists.squid-cache.org/listinfo/squid-users



0x613DEC46.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Not all html objects are being cached

2017-01-25 Thread boruc
After a little bit of analyzing requests and responses with WireShark I
noticed that many sites that weren't cached had different combination of
below parameters:

Cache-Control: no-cache, no-store, must-revalidate, post-check, pre-check,
private, public, max-age, public
Pragma: no-cache

There is a possibility to disable this in squid by using
request_header_access and reply_header_access, however it doesn't work for
me, many pages aren't still in cache. I am currently using lines below:

request_header_access Cache-Control deny all
request_header_access Pragma deny all
request_header_access Accept-Encoding deny all
reply_header_access Cache-Control deny all
reply_header_access Pragma deny all
reply_header_access Accept-Encoding deny all

I could also try refresh_pattern, but I don't think that code below will
work because not every URL ends with .html or .htm (because you visit
/www.example.com/, not /www.example.com/index.html/)
refresh_pattern -i \.(html|htm)$  1440   40% 40320 ignore-no-cache
ignore-no-store ignore-private override-expire reload-into-ims

Thank you in advance.



--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293p4681326.html
Sent from the Squid - Users mailing list archive at Nabble.com.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


[squid-users] Not all html objects are being cached

2017-01-24 Thread boruc
Hi everyone,

I was wondering why some of visited pages are not being cached (I mean
"main" pages, like www.example.com). If I visit 50 pages only 10 will be
cached. Below text is from log files:

store.log:
1485272001.646 RELEASE -1  04F7FA9EAA7FE3D531A2224F4C7DDE5A  200
1485272011-1 375007920 text/html -1/222442 GET http://www.wykop.pl/

access.log
1485272001.646423 10.10.10.136 TCP_MISS/200 223422 GET
http://www.wykop.pl/ - DIRECT/185.66.120.38 text/html

According to Squid Wiki: "if a RELEASE code was logged with file number
, the object existed only in memory, and was released from memory."
- I understand that requested html file wasn't saved to disk, but why?

I'm also posting my squid.conf below. I'd be grateful for your answers!


acl manager proto cache_object
acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 ::1
acl my_network src 192.168.0.0/24
acl my_phone src 192.168.54.0/24
acl my_net dst 192.168.0.0/24
acl mgr src 10.48.5.0/24
acl new_net src 10.10.10.0/24
acl ex_ft url_regex -i "/etc/squid3/excluded_filetypes.txt" 
acl ex_do url_regex -i "/etc/squid3/excluded_domains.txt" #doesnt include
any of 50 visited pages

acl SSL_ports port 443
acl Safe_ports port 80  # http
acl Safe_ports port 21  # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70  # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT

http_access allow my_network
http_access allow my_phone
http_access allow my_net
http_access allow mgr
http_access allow new_net
http_access allow manager localhost
http_access deny manager

http_access deny !Safe_ports

http_access deny CONNECT !SSL_ports

http_access allow localhost
http_access allow all

http_port 3128

maximum_object_size_in_memory 1024 KB

cache_dir ufs /var/spool/squid3 1000 16 256

cache_store_log /var/log/squid3/store.log

coredump_dir /var/spool/squid3

cache deny ex_ft
cache deny ex_do

refresh_pattern ^ftp:   144020% 10080
refresh_pattern ^gopher:14400%  1440
refresh_pattern -i (/cgi-bin/|\?) 0 0%  0
refresh_pattern (Release|Packages(.gz)*)$  0   20% 2880

refresh_pattern .   1000   20% 4320

request_header_access Accept-Encoding deny all



--
View this message in context: 
http://squid-web-proxy-cache.1019090.n4.nabble.com/Not-all-html-objects-are-being-cached-tp4681293.html
Sent from the Squid - Users mailing list archive at Nabble.com.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users