special chars in URI.

2014-05-13 Thread Eliezer Croitoru

I am not sure but am almost sure I asked this before.
I have a google url that came into my external_acl helper and could not 
be parsed by the URI library.

the issue is that in the query there are symbols such as "|" aka pipe.

Now the client indeed sent it for sure.
I am not sure if in the logs and external_acl helpers I need to see that 
it's a "|" but rather the encoded value.


What do you think?

Eliezer


Re: [MERGE] Squid Patch (revision 10487)

2014-05-13 Thread Michael Pye

On 2014-05-13 11:53, Amos Jeffries wrote:

On 13/05/2014 10:52 p.m., Amos Jeffries wrote:

On 13/05/2014 4:56 a.m., Michael Pye wrote:

On 2014-05-12 16:52, Amos Jeffries wrote:

Here is the guts of the original patch as should be relevant to
Squid-3.5 (or 3.4).
http://master.squid-cache.org/~amosjeffries/patches/Pye_s35.patch


Many thanks Amos, I've just tried this and it does indeed query 
sibling

for stale content as we want.

I think there is some logic that will need updating however in
HttpHeader.cc possibly, as when an object becomes stale and the 
sibling
does not have a fresh copy, the origin is queried but the "Age:" 
and
"X-Cache:HIT from" headers are still on the new request, even 
though it

was a fresh check from the origin. I think these headers need to be
deleted/updated.


Is this a fully new set of headers+object from the server?
 or a revalidation update/refresh from the server?

On the former Age and X-Cache are correct because it is "just" a 
fresher

update of the initial cache HIT object.


Oops I meant s/former/latter/.


Sorry - reread what you said. So if its a revalidation and the content 
hasn't changed (304 from origin), under normal squid *without* this 
patch, Age will return to 0 and count up to the max-age again before 
doing another revalidation. A max-age 60 second object will remain in 
cache for 60 seconds before revalidation.


With the patch applied, if squidA gets a copy of the content from 
siblingB where Age already has X seconds set, when that object is next 
revalidated against the origin (now siblingB copy has expired), its Age 
remains at X seconds, counting up to max-age. The max-age 60 object is 
now revalidating against the origin every 30 seconds, I would have 
expected it to return to validating every 60 seconds.


Thanks
Michael




Re: [PATCH] experiment in private caching

2014-05-13 Thread Alex Rousskov
On 05/13/2014 04:39 AM, Amos Jeffries wrote:
> On 13/05/2014 9:04 a.m., Alex Rousskov wrote:
>> Performance concerns: To determine whether the requested object is in
>> the cache, SMP Store requires semi-expensive lookups (one for each SMP
>> cache_dir and one for the shared cache_mem). AFAICT, this patch doubles
>> the number of such lookups per typical miss response due to a new
>> http.cc loop. The implementation using existing Store APIs (suggested
>> below) avoids this additional overhead.

> There are also large(r) performance issues with the StoreID approach.

The "large(r)" part is debatable and probably depends on the traffic mix
and the environment: The proposed changes slightly penalize nearly all
miss transactions but greatly benefit the "broken" ones.


> Using StoreID it becomes impossible for the unverified requests to ever
> HIT on any existing content cached from verified sources. This actively
> reduces both the HIT ratio

Yes, as I disclosed.


>  and safety of responses. HIT being safer than MISS in these cases.

Not sure what you mean by "safer" in this context, but whether HIT or
MISS is correct (from client and server points of view) depends on the
nature of brokenness, does not it? If Squid cache is stale and the
client is connecting to the right/fresh address, then MISS is actually
the correct action (i.e., it will serve the intended content) even
though it is more "expensive".


> In theory the one I picked has a higher overall HIT ratio possible.

Agreed. I do not think "highest hit ratio possible" trumps other
concerns in this case.





>> Reusing Store ID will not allow requests that fail validation to hit
>> cached entries that correspond to successfully validated requests, but
>> (a) that could be a good thing in some cases and (b) I wonder whether
>> that is a small price to pay for avoiding new uncertainties related to a
>> new way to generate a store key?
> 
> Regarding (a) - which cases?

The client destination is actually correct. Squid DNS is
inaccurate/misconfigured/etc., and Squid cache has wrong/stale content
(from server point of view). The client should go where it wants to go
instead of getting stale content from the cache.


> I know it can work. It is just being driven by the "higher HIT ratio"
> group. So options that decrease the HIT ratio are not accepted, and we
> may as well do nothing if those are the only options available. I
> personanly am happy not adding any of this patch.

Let's not add it then! If you want, you can still offer the "group" many
other options, including:

1a) An option to disable Host validation in their environment.
Can be ACL-driven.

1b) An option not to treat requests that failed validation specially
as far as caching is concerned (allow HITs and caching as usual).
Can be ACL-driven.

2)  A StoreID-based implementation with lower HIT ratio than (1a) or
(1b) but increased protection from malicious requests.


In the future, if you think the patch should not be added or are happy
about it not being added, please say so explicitly. Since you are
wearing both developer and committer hats, you have to play devil's
advocate (well, every developer should, but committers are obligated to
do that). I would not have spent hours on the review if I knew that you
are happy abandoning the patch!


>> BTW, are you sure that serving content cached by regular requests to
>> requests that fail validation is always desirable? Is it possible that
>> some failed-validation requests will actually get to a different/newer
>> version of the site (e.g., when DNS changed but Squid does not know
>> about that yet)? If you are not sure, please note that the patch will
>> introduce breakage in these cases that does not exist today. Using the
>> Store ID approach avoids that AFAICT.
> 
> It is possible without this patch. If it is a problem then the site
> cache controls are broken or Squid is explicitly configured with
> refresh_pattern abuses.

Not necessarily, but this is not a question of "fresh content" from HTTP
rules point of view. It is a question of client getting what it wants.

The site servers may have moved and updated their content after the
move. In the worst case, imagine Squid trying to validate the cached
content and failing to do that because the server has changed its IP
address but Squid DNS does not have access to the new address yet. Squid
will then respond with an error or serve stale content. Neither is what
the client (or server) wants and deserves in this case.

You may argue that the server is at fault in that case, but that does
not help clients and Squid admins serving them.


>> Request adaptation happens after host validation. Request adaptation may
>> change the request URI and/or the Host header. However, storeSecurityKey
>> will remain set if it was set before adaptation and will not be set if
>> the new request actually becomes invalid. Are both behaviors desirable?
>>
> 
> That we need to discuss some more.

Re: Squid Won't Cache private

2014-05-13 Thread Amos Jeffries
On 13/05/2014 11:05 p.m., Julius Thomas wrote:
> Hi Amos,
> 
> thanks for your answer.
> This are the full reply headers.
> 

Okay. For caching the server needs to produce at least a Date: header
with current timestamp of when its responding. And a Last-Modified
header with timestamp the object was created/modified or at least the
same detail as Date. Squid refresh_pattern needs those to estimate
future freshness duration for caching.
 The alternative is to provide an Expires: header.

Responses using "no-cache" must provide the above and should also
provide an ETag header to improve revalidation accuracy.

Note that "no-cache" in HTTP/1.1 is equivalent to "must-revalidate". So
Squid-3.2 and later will query with an If-Modified-Since/If-Not-Modified
revalidation on every client request. This should not be confused with
not caching.

> Squid is acting as an reverse proxy.
> i cant change the reply headers from the origin server.

Then that result is not cacheable with the current Squid releases.

> 
> cache_peer *.*.*.* parent 80 0 sourcehash no-digest name=test1 weight=3
> cache_peer *.*.*.* parent 80 0 sourcehash no-digest name=test2 weight=2
> 

Those should also have originserver options set I think.

Amos

> Regards
> 
> julius
> 
> 
> Am 13.05.2014 13:01, schrieb Amos Jeffries:
>> On 13/05/2014 7:35 a.m., Julius Thomas wrote:
   Dear Squid Developers,
> we are using squid 3.4.2.'--enable-http-violations'
> I can't cache a xml file with this headers:
> Cache-Control |no-cache|
> Connection|Keep-Alive|
> Content-Length|168|
> Content-Type  ||
>
>   
>
>
> I tried different things, different configs, googled, but i cant get the 
> answer:
> refresh_pattern -i \.crossdomain.(xml)$ 10080 90% 43200 override-expire
> ignore-no-cache ignore-no-store ignore-private
>
>> Hi Julius,
>>Is that the full set of reply headers?
>>
>>What are the request headers used?
>>
>>How is this traffic arriving into Squid? (forward-proxy, reverse-proxy
>> or intercepted)
>>
>> Because there is no Date nor Last-Modified headers in there to tell
>> Squid how long
>>
>> NP: ignore-no-cache is meaningless configuration option in squid 3.4.
>>
>>
>> Amos
>>
> 
> -- 
> *Julius Thomas*
> Geschäftsführer
> 
> 3q_logo_small
> 
> 3Q Medien GmbH
> Wetzlarer Str. 86
> D-14482 Potsdam
> Fon +49 (0)331 / 27 97 866 - 6
> Fax +49 (0)331 / 27 97 866 - 1
> 
> www.3qsdn.com
> Vertrieb: sa...@3qsdn.com
> 
> 
> Hauptsitz der Gesellschaft: Siemensstr. 3, 84478 Waldkraiburg
> Vertretungsberechtigte Geschäftsführer: Holm Krämer, Julius Thomas
> Registergericht: Handelsregister Traunstein, HRB 19129
> 
> Diese E-Mail und eventuelle Anlagen können vertrauliche und/oder rechtlich 
> geschützte Informationen enthalten. Wenn Sie
> nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, 
> informieren Sie bitte sofort den Absender und
> vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte 
> Weitergabe dieser Mail sind nicht gestattet.
> 
> Sie können der Nutzung Ihrer Daten widersprechen. Wenn Sie zukünftig keine 
> Informationen mehr zu interessanten
> Leistungen von 3Q Medien GmbH erhalten möchten, teilen Sie dies bitte an oben 
> genannte Adresse mit.
> 



Re: [MERGE] Squid Patch (revision 10487)

2014-05-13 Thread Michael Pye

On 2014-05-13 11:53, Amos Jeffries wrote:

On 13/05/2014 10:52 p.m., Amos Jeffries wrote:

On 13/05/2014 4:56 a.m., Michael Pye wrote:

On 2014-05-12 16:52, Amos Jeffries wrote:

Here is the guts of the original patch as should be relevant to
Squid-3.5 (or 3.4).
http://master.squid-cache.org/~amosjeffries/patches/Pye_s35.patch


Many thanks Amos, I've just tried this and it does indeed query 
sibling

for stale content as we want.

I think there is some logic that will need updating however in
HttpHeader.cc possibly, as when an object becomes stale and the 
sibling
does not have a fresh copy, the origin is queried but the "Age:" 
and
"X-Cache:HIT from" headers are still on the new request, even 
though it

was a fresh check from the origin. I think these headers need to be
deleted/updated.


Is this a fully new set of headers+object from the server?
 or a revalidation update/refresh from the server?


It's a revalidation, so it sends an "if-modified-since" to the origin 
which replies with a "304 not modified". At this point it seems the Age: 
needs recalculating.


Thanks
Michael


Re: [PATCH] Fix for Squid 3.4.5 segfault

2014-05-13 Thread Alex Rousskov
On 05/13/2014 02:45 AM, Steve Hill wrote:
> On 12.05.14 18:20, Alex Rousskov wrote:
>> The Token class in v3.4 uses an ugly union (instead of "struct") for the
>> data member. Thus, data.timespec should be identical to data.string. The
>> fact that changing .timespec to .string makes a difference indicates
>> that something else is probably broken.


> Ok, I hadn't spotted that.  On closer inspection, in 3.4.4 it is a
> union, but 3.4.5 changes it to a struct

Sorry, I apparently used stale v3.4 branch code. You are right: Trunk
and v3.4.5 code/problems are the same in this area.


> In any case, I don't see any reason for having both "timespec" and
> "string" since they are both the same thing, so I think removing
> timespec is still probably the right thing to do for the sake of
> readability.

I am not an expert in this code area, but I agree: Your fix should be
committed to both trunk and v3.4 branch IMO.


Thank you,

Alex.



Re: [PATCH] Use SBuf for tunnel.cc I/O

2014-05-13 Thread Alex Rousskov
On 05/13/2014 04:03 AM, Amos Jeffries wrote:
> On 13/05/2014 10:46 a.m., Alex Rousskov wrote:
>> On 05/11/2014 11:16 AM, Amos Jeffries wrote:
>>> Unlike the char* API the buffer can be appended to before write(2) is
>>> completed and the appended bytes will be handled as part of the original
>>> write request.

>> That approach scares me because the caller has to be extra careful not
>> to leave the being-updated-area of the buffer in inconsistent state
>> across async calls. However, I doubt we have such complex callers today,
>> and the approach also reduces the number of usually pointless write
>> callback calls, so I am willing to experiment.

> If you have a better approach I would like to hear it.

The simple alternative is to use the current approach of fixed-size
Write() sequences. It is difficult for me to say which approach is
better, but it is clear which one is simpler and which one is scary.
Again, I do not object to experimenting with the new scary approach.


> The raw-pointer usage which appears to be scaring you is the existing
> status quo anyway...

The above "scare" has nothing to do with raw pointers.



>  * SBuf API passes a reference/raw-pointer to the smart pointer SBuf
> instance. All operations on the MemBlob backing store remain valid.
>   => requirement that the caller do nothing to the SBuf instance other
> than allocate a new one while write is scheduled.

I do not think there is such a requirement. In fact, such a requirement
would defeat the purpose of allowing SBuf updates during the Comm
writing sequence. Also, such a requirement is nearly impossible to
enforce because the caller may be destroyed for reasons outside its
control, destroying the SBuf instance with it.


>==> Note this is the same requirement as made on both other API
> already. 

Not exactly. None of the existing Comm write APIs allow the caller to
change the size of to-be-written data after the Write call (the scary
experimental part I have agreed to above) and all of them allow the
caller to completely and safely disassociate itself from the writing
sequence via providing the free function for the buffer (the buf2
objection discussed below).

All APIs can be abused or misused in similar ways, but I am not focusing
on abuse/misuse right now. Both the "scary" experiment and the buf2 mess
exist outside the intentional abuse areas.


>> You forgot to mention the fact that the new Write API, unlike the
>> existing two, relies on the caller to keep the buffer area alive. That
>> requirement, IMO, is a show-stopper. I strongly object to that change.
> 
> If by "buffer area" you mean MemBlob that is not a requirement. 

I mean the object pointed to buf2 and everything that objects points to.
If the caller destroys the object buf2 points to (which may happen for
many reasons, some of which are beyond the caller control), Squid will
crash or worse. I do not think we should design new APIs like that.

The caller can also change the object that buf2 points to (to append
more data for writing). Such changes are the "scary" part I agreed to
above. They scare me, but I do not object to them.


>  One either sends an SBuf which is a member of the Job/*StateData object
> itself or an SBuf member of the Params object which is delivered to the
> callback. Both methods ensure its alive until at least the callback is
> scheduled.

If Comm was storing an SBuf object, the show-stopping problem would not
exist. With the buf2 design, Comm stores a pointer to an SBuf object
instead. I do not think that should be allowed. We worked hard on
SBuf/MemBlob API to prevent such storage.


>> Please avoid these and similar changes if they are unrelated to your
>> patch scope.

> With the removal of client.len/server.len byte accounting the
> delayId.bytesIn() is now used directly for delay pool byte accounting.
> Which makes the delayId wrappers of all types needless.

Please explain that in the proposed commit message. It is not a good
idea to force others to guess whether these seemingly unrelated changes
are accidental or intentional.


>>
>>> -debugs(26, DBG_DATA, "Tunnel server PUSH Payload: \n" << 
>>> Raw("", tunnelState->server.buf, tunnelState->server.len) << 
>>> "\n--");
>>> +debugs(26, DBG_DATA, "Tunnel server PUSH Payload: \n" << 
>>> tunnelState->server.buf << "\n--");
>>
>> Please do not remove Raw() protection for dumping raw data.

> SBuf provides generic operator<<(std::ostream&) for debugs.

I know. That generic operator is not suitable for dumping raw data. Use
Raw() for that. I believe Raw() can be used with SBufs as-is, but you
can also add an SBuf-friendly version of Raw() if you prefer to add that
to the patch scope.


> If Raw() protection is required on the buffer it needs to be provided by
> that. 

No, it should not be provided by that generic << operator. That operator
is for writing "meaningful" strings. Raw() is for debugging/dumping
raw/opaque data. We have opened

Jenkins build is back to normal : 3.HEAD-x64-debian-unstable #595

2014-05-13 Thread noc
See 



Build failed in Jenkins: 3.HEAD-amd64-FreeBSD-10-clang #78

2014-05-13 Thread noc
See 

Changes:

[Amos Jeffries] Regression: segfault logging with %tg format specifier

In trunk rev.13387 Token class data member was converted from union to
struct without adding initializer for the timespec field.

timespec is a redundant field anyway, just remove it.

--
[...truncated 28579 lines...]
 ^
../../../src/icmp/Icmp4.cc:116:9: error: member access into incomplete type 
'struct icmp'
icmp->icmp_code = 0;
^
../../../src/icmp/Icmp4.cc:95:28: note: forward declaration of 'icmp'
LOCAL_ARRAY(char, pkt, MAX_PKT4_SZ);
   ^
../../../src/icmp/Icmp.h:40:91: note: expanded from macro 'MAX_PKT4_SZ'
#define MAX_PKT4_SZ (MAX_PAYLOAD + sizeof(struct timeval) + sizeof (char) + 
sizeof(struct icmphdr) + 1)

  ^
../../../src/icmp/Icmp4.h:48:17: note: expanded from macro 'icmphdr'
#define icmphdr icmp
^
../../../include/leakcheck.h:10:54: note: expanded from macro 'LOCAL_ARRAY'
#define LOCAL_ARRAY(type,name,size) static type name[size]
 ^
../../../src/icmp/Icmp4.cc:117:9: error: member access into incomplete type 
'struct icmp'
icmp->icmp_cksum = 0;
^
../../../src/icmp/Icmp4.cc:95:28: note: forward declaration of 'icmp'
LOCAL_ARRAY(char, pkt, MAX_PKT4_SZ);
   ^
../../../src/icmp/Icmp.h:40:91: note: expanded from macro 'MAX_PKT4_SZ'
#define MAX_PKT4_SZ (MAX_PAYLOAD + sizeof(struct timeval) + sizeof (char) + 
sizeof(struct icmphdr) + 1)

  ^
../../../src/icmp/Icmp4.h:48:17: note: expanded from macro 'icmphdr'
#define icmphdr icmp
^
../../../include/leakcheck.h:10:54: note: expanded from macro 'LOCAL_ARRAY'
#define LOCAL_ARRAY(type,name,size) static type name[size]
 ^
../../../src/icmp/Icmp4.cc:118:9: error: member access into incomplete type 
'struct icmp'
icmp->icmp_id = icmp_ident;
^
../../../src/icmp/Icmp4.cc:95:28: note: forward declaration of 'icmp'
LOCAL_ARRAY(char, pkt, MAX_PKT4_SZ);
   ^
../../../src/icmp/Icmp.h:40:91: note: expanded from macro 'MAX_PKT4_SZ'
#define MAX_PKT4_SZ (MAX_PAYLOAD + sizeof(struct timeval) + sizeof (char) + 
sizeof(struct icmphdr) + 1)

  ^
../../../src/icmp/Icmp4.h:48:17: note: expanded from macro 'icmphdr'
#define icmphdr icmp
^
../../../include/leakcheck.h:10:54: note: expanded from macro 'LOCAL_ARRAY'
#define LOCAL_ARRAY(type,name,size) static type name[size]
 ^
../../../src/icmp/Icmp4.cc:119:9: error: member access into incomplete type 
'struct icmp'
icmp->icmp_seq = (unsigned short) icmp_pkts_sent;
^
../../../src/icmp/Icmp4.cc:95:28: note: forward declaration of 'icmp'
LOCAL_ARRAY(char, pkt, MAX_PKT4_SZ);
   ^
../../../src/icmp/Icmp.h:40:91: note: expanded from macro 'MAX_PKT4_SZ'
#define MAX_PKT4_SZ (MAX_PAYLOAD + sizeof(struct timeval) + sizeof (char) + 
sizeof(struct icmphdr) + 1)

  ^
../../../src/icmp/Icmp4.h:48:17: note: expanded from macro 'icmphdr'
#define icmphdr icmp
^
../../../include/leakcheck.h:10:54: note: expanded from macro 'LOCAL_ARRAY'
#define LOCAL_ARRAY(type,name,size) static type name[size]
 ^
../../../src/icmp/Icmp4.cc:123:35: error: arithmetic on a pointer to an 
incomplete type 'struct icmp'
echo = (icmpEchoData *) (icmp + 1);
  ^
../../../src/icmp/Icmp4.cc:95:28: note: forward declaration of 'icmp'
LOCAL_ARRAY(char, pkt, MAX_PKT4_SZ);
   ^
../../../src/icmp/Icmp.h:40:91: note: expanded from macro 'MAX_PKT4_SZ'
#define MAX_PKT4_SZ (MAX_PAYLOAD + sizeof(struct timeval) + sizeof (char) + 
sizeof(struct icmphdr) + 1)

  ^
../../../src/icmp/Icmp4.h:48:17: note: expanded from macro 'icmphdr'
#define icmphdr icmp
^
../../../include/leakcheck.h:10:54: note: expanded from macro 'LOCAL_ARRAY'
#define LOCAL_ARRAY(type,name,size) static type name[size]
 ^
../../../src/icmp/Icmp4.cc:138:9: error: member access into incomplete type 
'struct icmp'
icmp->icmp_cksum = CheckSum((unsigned short *) icmp, icmp_pktsize);
^
../../../src/icmp/Icmp4.cc:95:28: note: forward declaration of 'icmp'
LOCAL_ARRAY(char

Re: Squid Won't Cache private

2014-05-13 Thread Julius Thomas

Hi Amos,

thanks for your answer.
This are the full reply headers.

Squid is acting as an reverse proxy.
i cant change the reply headers from the origin server.

cache_peer *.*.*.* parent 80 0 sourcehash no-digest name=test1 weight=3
cache_peer *.*.*.* parent 80 0 sourcehash no-digest name=test2 weight=2

Regards

julius


Am 13.05.2014 13:01, schrieb Amos Jeffries:

On 13/05/2014 7:35 a.m., Julius Thomas wrote:

  Dear Squid Developers,

we are using squid 3.4.2.'--enable-http-violations'
I can't cache a xml file with this headers:
Cache-Control   |no-cache|
Connection  |Keep-Alive|
Content-Length  |168|
Content-Type||




I tried different things, different configs, googled, but i cant get the answer:
refresh_pattern -i \.crossdomain.(xml)$ 10080 90% 43200 override-expire
ignore-no-cache ignore-no-store ignore-private


Hi Julius,
   Is that the full set of reply headers?

   What are the request headers used?

   How is this traffic arriving into Squid? (forward-proxy, reverse-proxy
or intercepted)

Because there is no Date nor Last-Modified headers in there to tell
Squid how long

NP: ignore-no-cache is meaningless configuration option in squid 3.4.


Amos



--
*Julius Thomas*
Geschäftsführer

3q_logo_small

3Q Medien GmbH
Wetzlarer Str. 86
D-14482 Potsdam
Fon +49 (0)331 / 27 97 866 - 6
Fax +49 (0)331 / 27 97 866 - 1

www.3qsdn.com
Vertrieb: sa...@3qsdn.com


Hauptsitz der Gesellschaft: Siemensstr. 3, 84478 Waldkraiburg
Vertretungsberechtigte Geschäftsführer: Holm Krämer, Julius Thomas
Registergericht: Handelsregister Traunstein, HRB 19129

Diese E-Mail und eventuelle Anlagen können vertrauliche und/oder 
rechtlich geschützte Informationen enthalten. Wenn Sie
nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte 
Weitergabe dieser Mail sind nicht gestattet.


Sie können der Nutzung Ihrer Daten widersprechen. Wenn Sie zukünftig 
keine Informationen mehr zu interessanten
Leistungen von 3Q Medien GmbH erhalten möchten, teilen Sie dies bitte an 
oben genannte Adresse mit.




Re: Squid Won't Cache private

2014-05-13 Thread Amos Jeffries
On 13/05/2014 7:35 a.m., Julius Thomas wrote:
>>  Dear Squid Developers,
>> >
>> > we are using squid 3.4.2.'--enable-http-violations'
>> > I can't cache a xml file with this headers:
>> > Cache-Control  |no-cache|
>> > Connection |Keep-Alive|
>> > Content-Length |168|
>> > Content-Type   ||
>> >
>> >
>> >
>> >
>> > I tried different things, different configs, googled, but i cant get the 
>> > answer:
>> > refresh_pattern -i \.crossdomain.(xml)$ 10080 90% 43200 override-expire 
>> > ignore-no-cache ignore-no-store ignore-private
>> >

Hi Julius,
  Is that the full set of reply headers?

  What are the request headers used?

  How is this traffic arriving into Squid? (forward-proxy, reverse-proxy
or intercepted)

Because there is no Date nor Last-Modified headers in there to tell
Squid how long

NP: ignore-no-cache is meaningless configuration option in squid 3.4.


Amos



Re: [MERGE] Squid Patch (revision 10487)

2014-05-13 Thread Amos Jeffries
On 13/05/2014 10:52 p.m., Amos Jeffries wrote:
> On 13/05/2014 4:56 a.m., Michael Pye wrote:
>> On 2014-05-12 16:52, Amos Jeffries wrote:
>>> Here is the guts of the original patch as should be relevant to
>>> Squid-3.5 (or 3.4).
>>> http://master.squid-cache.org/~amosjeffries/patches/Pye_s35.patch
>>
>> Many thanks Amos, I've just tried this and it does indeed query sibling
>> for stale content as we want.
>>
>> I think there is some logic that will need updating however in
>> HttpHeader.cc possibly, as when an object becomes stale and the sibling
>> does not have a fresh copy, the origin is queried but the "Age:" and
>> "X-Cache:HIT from" headers are still on the new request, even though it
>> was a fresh check from the origin. I think these headers need to be
>> deleted/updated.
> 
> Is this a fully new set of headers+object from the server?
>  or a revalidation update/refresh from the server?
> 
> On the former Age and X-Cache are correct because it is "just" a fresher
> update of the initial cache HIT object.

Oops I meant s/former/latter/.

Amos


Re: [MERGE] Squid Patch (revision 10487)

2014-05-13 Thread Amos Jeffries
On 13/05/2014 4:56 a.m., Michael Pye wrote:
> On 2014-05-12 16:52, Amos Jeffries wrote:
>> Here is the guts of the original patch as should be relevant to
>> Squid-3.5 (or 3.4).
>> http://master.squid-cache.org/~amosjeffries/patches/Pye_s35.patch
> 
> Many thanks Amos, I've just tried this and it does indeed query sibling
> for stale content as we want.
> 
> I think there is some logic that will need updating however in
> HttpHeader.cc possibly, as when an object becomes stale and the sibling
> does not have a fresh copy, the origin is queried but the "Age:" and
> "X-Cache:HIT from" headers are still on the new request, even though it
> was a fresh check from the origin. I think these headers need to be
> deleted/updated.

Is this a fully new set of headers+object from the server?
 or a revalidation update/refresh from the server?

On the former Age and X-Cache are correct because it is "just" a fresher
update of the initial cache HIT object.

Amos



Re: [PATCH] Fix for Squid 3.4.5 segfault

2014-05-13 Thread Amos Jeffries
On 13/05/2014 8:45 p.m., Steve Hill wrote:
> On 12.05.14 18:20, Alex Rousskov wrote:
> 
>> The Token class in v3.4 uses an ugly union (instead of "struct") for the
>> data member. Thus, data.timespec should be identical to data.string. The
>> fact that changing .timespec to .string makes a difference indicates
>> that something else is probably broken.
> 
> Ok, I hadn't spotted that.  On closer inspection, in 3.4.4 it is a
> union, but 3.4.5 changes it to a struct but doesn't initialise the
> timespec (which wasn't necessary when it was a union).
> 
> In any case, I don't see any reason for having both "timespec" and
> "string" since they are both the same thing, so I think removing
> timespec is still probably the right thing to do for the sake of
> readability.
> 


Agreed and applied. Thank you.

Amos


Re: [PATCH] experiment in private caching

2014-05-13 Thread Amos Jeffries
On 13/05/2014 9:04 a.m., Alex Rousskov wrote:
> On 05/11/2014 05:29 AM, Amos Jeffries wrote:
>> This patch seeks to reduce the constant MISS situation we have from
>> false failures doing Host: header verify.
> 
> General comments:
> 
> FWIW, I have a general feeling we are going to regret the proposed
> change, but I do not have enough specific reasons to object to it. If
> verification failures are so prevalent that we need this change, I
> wonder whether the verification itself is such a good idea...

So far it is for just a few CDN networks (Google and Akamai
predominantly). Unfortunately they are also popular and the kind where
an attacker can host their server as easily as the victum. Thus
similar-IP is no protection at all.

These CDN are so large AFAICT they use the entire /24 worth of IPs per
end-user network and only return 10-20 in any given DNS lookup. Leaving
~230-240 IPs "unverified". If the admin does not follow the
recommendation of ensuring they and client use the same recursive
resolvers this explodes to a full /16 or more. Clients using Googles'
recursive resolvers get shunted randomly all over the world as their
resolver exit point. Which screws with geo-DNS based CDN responses like
these.


> 
> One of the problems with this patch is that it implicitly adds a second,
> optional store key to a transaction. This may look innocent enough
> because the code changes are simple, but I suspect it may create
> significant Store API/design headaches long-term, in an area where the
> current Store API is already quite bad. Below, I will suggest a
> different implementation vector that does not solve all the problems but
> keeps Store API principles intact.
> 
> Performance concerns: To determine whether the requested object is in
> the cache, SMP Store requires semi-expensive lookups (one for each SMP
> cache_dir and one for the shared cache_mem). AFAICT, this patch doubles
> the number of such lookups per typical miss response due to a new
> http.cc loop. The implementation using existing Store APIs (suggested
> below) avoids this additional overhead.

There are also large(r) performance issues with the StoreID approach.

Using StoreID it becomes impossible for the unverified requests to ever
HIT on any existing content cached from verified sources. This actively
reduces both the HIT ratio and safety of responses. HIT being safer than
MISS in these cases.

In theory the one I picked has a higher overall HIT ratio possible.
 * best-case scenario all unverified traffic gets HITs.
 * worst-case all of it becomes MISSes for unverified IP and the ratio
drops to one cached object/variant per unique dstIP. Which is the
maximum possible in the Store-ID approach.


> 
> If somebody is complaining about these misses now, I bet somebody will
> later ask you to merge several destination IPs together, to further
> reduce miss ratio in cases where _multiple_ destination IPs correspond
> to the same host and all fail validation. I do not like using "slippery
> slope" arguments so I am not using this as an objection, but do keep
> such possible requests in mind as you evaluate different design choices.
> 

Oh yes, that one has already come up. See top comment.


> 
> Specific comments follow.
> 
>> +/// An additional context to add to the store-ID key for security.
>> +String storeSecurityKey;
> 
> Can we reuse a part of the existing Store ID API here, instead of adding
> one more way to form a cache key? That is, instead of appending
> storeSecurityKey to the computed key hash, can we just make it a part of
> the stored Store ID value itself?

We can. But see above. I think we would end up with a worse HIT ratio
than we have right now.

> 
> Reusing Store ID will not allow requests that fail validation to hit
> cached entries that correspond to successfully validated requests, but
> (a) that could be a good thing in some cases and (b) I wonder whether
> that is a small price to pay for avoiding new uncertainties related to a
> new way to generate a store key?

Regarding (a) - which cases?

> 
> Today there are two major store key paths:
> 
>   1. From store_id, if any.
>   2. From Request URI if there is no store_id.
> 
> If you adjust your patch to use Store ID API, we will have:
> 
>   1. From store_id, if any.
>   2. From Request URI if there is no store_id.
> 
> No changes!
> 
> Today there is one way to get store_id set:
> 
>   i. If there is store_id helper:
>  From store_id helper response.
> 
> If you adjust your patch to use Store ID API, we will have:
> 
>   i. If there is store_id helper:
>  From store_id helper response,
>  with storeSecurityKey appended when needed.
> 
>   ii. If there is no store_id helper:
>   From Request URI with storeSecurityKey appended
>   (always needed if store ID is set without the store_id helper)
> 
> The added benefit of the Store ID API approach is that the
> storeSecurityKey information may not even need to go into the
> HttpRequest in t

Re: [PATCH] Use SBuf for tunnel.cc I/O

2014-05-13 Thread Amos Jeffries
On 13/05/2014 10:46 a.m., Alex Rousskov wrote:
> On 05/11/2014 11:16 AM, Amos Jeffries wrote:
> 
>> This patch adds a Comm::Write API for accepting SBuf output buffers.
> 
> 
>> Unlike the char* API the buffer can be appended to before write(2) is
>> completed and the appended bytes will be handled as part of the original
>> write request.
> 
> That approach scares me because the caller has to be extra careful not
> to leave the being-updated-area of the buffer in inconsistent state
> across async calls. However, I doubt we have such complex callers today,
> and the approach also reduces the number of usually pointless write
> callback calls, so I am willing to experiment.

If you have a better approach I would like to hear it.

The only one I can think of is possibly Ref-Counting the SBuf instance
itself. Adding that capability though seems to be implying that
SBuf::Pointer in general is better than just passing SBuf references
around (its not).


The raw-pointer usage which appears to be scaring you is the existing
status quo anyway...

 * char* API passes a raw pointer to the store/buffer. Any alterations
of the content are racing the actual write(2) call for those bytes,
appends are lost on callback, and truncate are not seen by the write(2).
Corrupted output one way or another.
  => requirement that the caller do nothing other than allocate a new
buffer while write is scheduled. Buffer may be re-used after the callback.

 * MemBuf API passes a raw pointer to its _internal_ data store/buffer.
It goes further by passing the store deallocator. Any alterations of the
content are racing the actual write(2) call for those bytes, appends are
either lost on callback or reallocate the backing store for the caller,
and truncate are not seen by the write(2). All operations are
potentially done on a buffer pointer already deallocated by comm.
  => requirement that the caller do nothing other than MemBuf::init()
while write is scheduled (ie allocate a new buffer). Same condition
applies even after callback (ouch).

 * SBuf API passes a reference/raw-pointer to the smart pointer SBuf
instance. All operations on the MemBlob backing store remain valid.
  => requirement that the caller do nothing to the SBuf instance other
than allocate a new one while write is scheduled.
   ==> Note this is the same requirement as made on both other API
already. But no longer restricted on what can be done with the backing
store by the caller. ie it can continue to use the SBuf it has for
regular SBuf operations.


Since these SBuf are members of a caller Job (or equivalent state
object) instead of a dynamically allocated/deallocated SBuf object that
seems a reasonable requirement.


> 
>> All other behaviour is identical between the three Comm::Write APIs.
> 
> You forgot to mention the fact that the new Write API, unlike the
> existing two, relies on the caller to keep the buffer area alive. That
> requirement, IMO, is a show-stopper. I strongly object to that change.

If by "buffer area" you mean MemBlob that is not a requirement. It can
have anything at all done to it during Comm::Write. If it is cleared the
write will probably return 0 bytes written to the callback. Most
callbacks assume that means the connection has terminated unexpectedly.
So its not advisable, but possible without any internal problems to comm.

If by "buffer area" you means SBuf. See above comparison with the other
two APIs.

I think that is reasonable to retain that requirement. Especially since
it only applies on the small SBuf instance.
 One either sends an SBuf which is a member of the Job/*StateData object
itself or an SBuf member of the Params object which is delivered to the
callback. Both methods ensure its alive until at least the callback is
scheduled.


If/when we start delivering lists of small SBuf objects that is a whole
other API needed for SBufList. OR perhapse a Job loop writing each SBuf
in the list individually ... with the Job doing this keeping the
relevant SBuf instance alive.

> 
> The buf2 hack has already been added during unaudited "Convert the
> ConnStateData input buffer to SBuf" commit (r13324), so there is little
> I can do about it until it causes a bug, but I do object to the use of
> such raw/unprotected links between Comm and its callers in any code I am
> lucky to review before the commit.
> 
> If others overrule this objection, please at least change the
> Comm::Write() sb argument type from reference to a pointer. Passing an
> argument by reference (implying temporary use) and then taking an
> address of that argument for long term object storage is wrong.

Fine.

> 
>> /// Buffer to store read(2) into when set.
>> // This is a pointer to the Jobs buffer rather than an SBuf using
>> // the same store since we cannot know when or how the Job will
>> // alter its SBuf while we are reading.
>> SBuf *buf2;
>>
>> // Legacy c-string buffers used when buf2 is unset.
>> char *buf;
> 
> 
> Another side-

Re: [PATCH] Fix for Squid 3.4.5 segfault

2014-05-13 Thread Steve Hill

On 12.05.14 18:20, Alex Rousskov wrote:


The Token class in v3.4 uses an ugly union (instead of "struct") for the
data member. Thus, data.timespec should be identical to data.string. The
fact that changing .timespec to .string makes a difference indicates
that something else is probably broken.


Ok, I hadn't spotted that.  On closer inspection, in 3.4.4 it is a 
union, but 3.4.5 changes it to a struct but doesn't initialise the 
timespec (which wasn't necessary when it was a union).


In any case, I don't see any reason for having both "timespec" and 
"string" since they are both the same thing, so I think removing 
timespec is still probably the right thing to do for the sake of 
readability.


--
 - Steve Hill
   Technical Director
   Opendium Limited http://www.opendium.com

Direct contacts:
   Instant messager: xmpp:st...@opendium.com
   Email:st...@opendium.com
   Phone:sip:st...@opendium.com

Sales / enquiries contacts:
   Email:sa...@opendium.com
   Phone:+44-844-9791439 / sip:sa...@opendium.com

Support contacts:
   Email:supp...@opendium.com
   Phone:+44-844-4844916 / sip:supp...@opendium.com