Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 4/07/2012 4:54 p.m., Alex Rousskov wrote:

On 07/03/2012 08:10 PM, Amos Jeffries wrote:


[channel-ID] (OK/ERR/BH) [key-pairs] blob terminator

How will the generic code be able to tell where key-pairs end and blob
begins?

The generic code will have a format for key-pair of ( 1#token =
1#(token / quoted-string) )  [to be finalized still] with a set of known
key names. If the bytes being parsed do not match that format and keys
its not part of the key-pairs.

There are two problems with this approach, IMO:

   * It cannot handle a body that just happened to start with supported
key-value pair characters.


This is shared with any body which just happens to start with the BS 
token as well. This is a good reason to use key-pair across the board 
and not document any blob support.




   * It cannot detect a case of helper sending unsupported key-value
pairs (if that helper may send a body too).

Yes.


It also requires extra code/work to pre-register all known key names,
but that is minor.


This is not an issue. We have to code operations for handling any 
accepted key-pair in the handler callbacks anyway.






The plan for patch-#2 is to discuss key-pair syntax (which we seem to
have started), and move some particular key-pair parsing from helper
handlers into the shared one. Opening up user=, tag=, password=, log=,
message= keys to non-ACL helper responses will be VERY useful for RADIUS
and systems needing to maintain cross-helper request state.

I think we should keep it as simple as possible without causing problems
with existing parsers and leaving some room for future extensions:

* key/name: alphanumeric char followed by anything except (whitespace or
'=' or terminator)

* value: either quoted-string or anything except (whitespace or
terminator); could be empty

The parser strips whitespace and strips quotes around quoted strings.
Quoted strings use backslash to escape any octet.


Okay. So modelling on the ssl_crtd parser instead of external_acl_type 
parser.






If you want one format for all, you probably need something like

  [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator

where BS is some special marker that starts all bodies and cannot be
found in key-pairs. Since any body-carrying message is likely to have
new lines (and, hence, would need a non-newline terminator), this BS
sequence could be something like body:\n, I guess.

We REQUIRE a format that does not rely on new special markers like that
in order to cope with the existing helpers responses in a
backward-compatible way without additional user configuration.

When your parser sees n=v  after ERR and n is a known key name, how
would it distinguish these two possibilities:

   * This is a start of a blob
   * This is a key=value pair


The second. *IF* it was in the [key-pair] area directly after ERR. Once 
blob starts everything is blob.



Or would you advocate moving the external_acl_type protocol=N.N config
option into the helper child object for use by all helpers on
determining what key-pairs and format is accepted?

I am probably missing some important use cases, but my generic parser
would just call a virtual method for each key=value pair found (with no
pre-registration of supported keys) and another virtual method for the
body blob, if any. It will parse everything using the key, value, and
BS formats discussed above.


You mean function pointers right? not virtual methods in the HelperReply 
object?
 and um, this is a high-use code path - both in parsing and in 
retrieving the details out of HelperReply.


That style of generic parser is good for new formats but not so much for 
backward-compatibility from old formats. I am going to have to manually 
receive things like the TT/LD/AF/NA NTLM result codes and map the 
existing format into record of the HelperReply object.


Amos


Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 12:02 AM, Amos Jeffries wrote:
 On 4/07/2012 4:54 p.m., Alex Rousskov wrote:
 On 07/03/2012 08:10 PM, Amos Jeffries wrote:

 [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator
 How will the generic code be able to tell where key-pairs end and blob
 begins?
 The generic code will have a format for key-pair of ( 1#token =
 1#(token / quoted-string) )  [to be finalized still] with a set of known
 key names. If the bytes being parsed do not match that format and keys
 its not part of the key-pairs.
 There are two problems with this approach, IMO:

* It cannot handle a body that just happened to start with supported
 key-value pair characters.
 
 This is shared with any body which just happens to start with the BS
 token as well. 

No, it is not. BS is required if the body is present and BS is not a
valid key name. Thus, BS cannot be confused with a start of a key-value
pair _and_ if a body starts with BS as well, there is no problem because
we already know to expect a body there (after BS).

[ but I am glad you acknowledge this issue with the BS-less approach ]


 I think we should keep it as simple as possible without causing problems
 with existing parsers and leaving some room for future extensions:

 * key/name: alphanumeric char followed by anything except (whitespace or
 '=' or terminator)

 * value: either quoted-string or anything except (whitespace or
 terminator); could be empty

 The parser strips whitespace and strips quotes around quoted strings.
 Quoted strings use backslash to escape any octet.

 Okay. So modelling on the ssl_crtd parser instead of external_acl_type
 parser.

I am trying to model this on the above as simple as ... criteria. I do
not really remember the specifics of any particular parser we have (so
adjustments may be necessary).



 If you want one format for all, you probably need something like

   [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator

 where BS is some special marker that starts all bodies and cannot be
 found in key-pairs. Since any body-carrying message is likely to have
 new lines (and, hence, would need a non-newline terminator), this BS
 sequence could be something like body:\n, I guess.
 We REQUIRE a format that does not rely on new special markers like that
 in order to cope with the existing helpers responses in a
 backward-compatible way without additional user configuration.
 When your parser sees n=v  after ERR and n is a known key name, how
 would it distinguish these two possibilities:

* This is a start of a blob
* This is a key=value pair
 
 The second. *IF* it was in the [key-pair] area directly after ERR.

And how do you decide that IF? It is exactly the same question I asked,
just phrased differently. The BS-less approach does not work (as you
have already acknowledged in the beginning of this email): The end of
[key-pair] area is indistinguishable from the beginning of a blob in
this case. In other words, you do not know whether the blob has
started or not. That is exactly the problem BS is solving (because it is
not a valid key).


 Or would you advocate moving the external_acl_type protocol=N.N config
 option into the helper child object for use by all helpers on
 determining what key-pairs and format is accepted?
 I am probably missing some important use cases, but my generic parser
 would just call a virtual method for each key=value pair found (with no
 pre-registration of supported keys) and another virtual method for the
 body blob, if any. It will parse everything using the key, value, and
 BS formats discussed above.
 
 You mean function pointers right? not virtual methods in the HelperReply
 object?

I did mean virtual methods. The child parsers will form HelperReply's
kid objects by converting key-value pairs and other parsed strings into
helper-specific replies. Those parsers may be integrated with
HelperReply hierarchy or be separate from it. If they are integrated, it
would look like this:

class HelperReply {
...
 virtual void setResult(string);
 virtual void setKeyValue(string, string);
 virtual void setBody(MemBuf);
};
...

// a sketch of a child implementation of setKeyValue()
Ssl::CrtdReply::setKeyValue(string k, string v)
{
 if (k == foo)
... this-foo_ = StringToInt(v);
 else
HelperReply::parseKeyValue(k,v);
}


  and um, this is a high-use code path - both in parsing and in
 retrieving the details out of HelperReply.

Agreed. I cannot think of a significantly more efficient but still
general approach that results in helper-specific ready-to-use replies.


 That style of generic parser is good for new formats but not so much for
 backward-compatibility from old formats. I am going to have to manually
 receive things like the TT/LD/AF/NA NTLM result codes and map the
 existing format into record of the HelperReply object.

Why not have NtmlHelperReply::setResult() do the mapping? Is there

Re: Generic helper I/O format

2012-07-04 Thread Henrik Nordström
ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:

 No, it is not. BS is required if the body is present and BS is not a
 valid key name. Thus, BS cannot be confused with a start of a key-value
 pair _and_ if a body starts with BS as well, there is no problem because
 we already know to expect a body there (after BS).

Why do we need a blob marker? Why not simply use key=value pairs across
the board?


 Why not have NtmlHelperReply::setResult() do the mapping? Is there
 something format-incompatible with those NTLM result codes that the
 generic parser cannot parse them using the following format?
 
   [channel-ID] result [key-pairs] [BS blob] terminator

NTLM do not need a blob. It's just a value. Moving to key=value is fine.
Additionally worth noting is that = is not a valid character in NTLM
blobs so same parser can be used if tokens without = is supported
(keyless values).

Regards
Henrik



Re: Squid-3.2 status update

2012-07-04 Thread Alex Rousskov
On 06/27/2012 03:12 AM, Amos Jeffries wrote:

 A quick review of the other major bugs shows that each will take some
 large design and code changes to implement a proper fix or even a
 workaround.
 
 
 Are there any objections to ignoring these bugs when considering a 3.2
 stable release:

Our definition of a stable release has two criteria:

1. Meant for production caches.

2. begin when all known major bugs have been fixed [for 14 days].

Criterion #1 should probably be interpreted as Squid Project considers
the version suitable for production deployment. If you think we are
there, I have no objections -- I do not have enough information to say
whether enough users will be satisfied with current v3.2 code in
production today. Perhaps this is something we should ask on squid-users
after we close all bugs that we think should be closed?

As for Criteria #2, your question means that either we stop considering
those bugs as major OR we change criterion #2. IMHO, we should adjust
that criterion so that we do not have to play these games where we mark
something as a major bug but then decide that in the interest of a
speedier stable designation we are going to ignore it.

An adjusted initialization criteria could be phrased as

2'.  begin when #1 is satisfied for at least 14 days


This gives us enough flexibility to release [what we consider
suitable-for-production] code that might have major bugs in some
environments. I added at least because otherwise we may have to
release v3.3 as stable 14 days after v3.2 is marked stable :-). In
practice, the version should have enough improvements to warrant its
numbering and its release but I do not want to digress in that discussion.



 3124 - Cache manager stops responding when multiple workers used
   ** requires implementing non-blocking IPC packets between workers and
 coordinator.

Has this been discussed somewhere? IPC communication is already
non-blocking so I suspect some other issue is at play here. The specific
examples of mgr commands in the bug report (userhash, sourcehash,
client_list, and netdb) seem like non-essential in most environments
and, hence, not justifying the major designation, but perhaps they
indicate some major implementation problem that must be fixed.


 3389 - Auto-reconnect for tcp access_log
   ** requires asynchronous handling of log opening and blocking Squid
 operation

Since we have stable file-based logging, this bug does not have to block
a stable designation if TCP logging is declared experimental. You
already have a patch that addresses 90% of the core problem for those
who care.

If you do not want to mark TCP logging as experimental and highlight
this shortcoming, then the bug ought to be fixed IMHO because there is
consensus that accurate logging is critical for many deployments.


 3478 - Host verify catching dynamic CDN hosted sites
   ** requires designing a CONNECT and bump handling mechanism

I am not an expert on this, but it feels like we are trying to enforce a
[good] rule ignored by the [bad] real world, especially in interception
environments. As a result, Squid lies and scares admins for no good
reason (in most cases). We will not win this battle.

I suggest that the host_verify_strict off behavior is adjusted to
cause no harm, even if some malicious requests will get through.

If you do not want to do that, please add a [fast] ACL so that admins
are not stuck without a solution and can whitelist bad (or all) sites.


Said that, the bug report itself does not explicitly say that something
is _seriously_ broken, does it? I bet the cache.log messages are
excessive on any busy site with a diverse user population, but we can
rate-limit these messages and downgrade the severity of the bug while
waiting for a real use case where these new checks break things (despite
host_verify_strict being off).



 3517 - Workers ldap digest
   ** requires SMP atomic access support for all user credentials

This is not a blocker IMO. SMP has several known limitations, complex
authentication schemes being one of them. This does not affect stability
of supported SMP configurations.


 Which would leave us with only these to locate (any takers?) :
 
 3551 - store_rebuild.cc:116: store_errors == 0 assertion

It would be nice to figure this one out, at least for ufs, because many
folks will try ufs with SMP and there is clearly some kind of corruption
problem there. I assigned the bug to self for now.

However, if I cannot reproduce it, I will not be able to make much
progress. Please note that the original reported moved on to rock store
and does not consider this bug to be affecting him any more (per comment
#10).


 3556 - assertion failed: comm.cc:1093: isOpen(fd)

I recommend adding a guard for the comm_close() call in the Connection
destructor to avoid the call for !isOpen(fd) orphan connections. And
print the value of isOpen() in the BUG message.


 3562 - StoreEntry::kickProducer Segmentation fault

I suspect Squid is corrupting 

Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 02:34 PM, Henrik Nordström wrote:
 ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: 

 Why not simply use key=value pairs across the board?

We need blobs because some values in use today are not expressed as
tokens or quoted strings (the two value formats Amos is considering).

We could, of course, _encode_ those inconvenient values so that things
like SSL certificates can be represented as huge quoted strings. Is that
what you are getting at? I am not against that, even though I am a
little worried about the added encoding overhead. This will require
changing ssl_crtd helper code, but I do not think that would be a big
problem.



 Why not have NtmlHelperReply::setResult() do the mapping? Is there
 something format-incompatible with those NTLM result codes that the
 generic parser cannot parse them using the following format?

   [channel-ID] result [key-pairs] [BS blob] terminator
 
 NTLM do not need a blob.

We are talking about a general format, not NTLM-specific format. Blob is
optional. If NTLM helper does not need it, it will not use it.


 It's just a value. Moving to key=value is fine.
 Additionally worth noting is that = is not a valid character in NTLM
 blobs so same parser can be used if tokens without = is supported
 (keyless values).

Yes, anonymous values or valueless names can be allowed if it helps
existing helpers. Can somebody paste a typical NTLM helper response or two?


Thank you,

Alex.


Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 05.07.2012 08:34, Henrik Nordström wrote:

ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:


No, it is not. BS is required if the body is present and BS is not a
valid key name. Thus, BS cannot be confused with a start of a 
key-value
pair _and_ if a body starts with BS as well, there is no problem 
because

we already know to expect a body there (after BS).


Why do we need a blob marker? Why not simply use key=value pairs 
across

the board?


We need it as a temporary measure to handle old helpers responses.

I agree on the desire to move to key-pair across the board and am 
trying to reach it out of this forest of individual fixed-syntax 
parsers. I think its possible (easy even) to do it using a blob instead 
of new config settings.







Why not have NtmlHelperReply::setResult() do the mapping? Is there
something format-incompatible with those NTLM result codes that the
generic parser cannot parse them using the following format?

  [channel-ID] result [key-pairs] [BS blob] terminator


NTLM do not need a blob. It's just a value. Moving to key=value is 
fine.

Additionally worth noting is that = is not a valid character in NTLM
blobs so same parser can be used if tokens without = is supported
(keyless values).



Good point here. Same stands for all the other helpers.

Alex,
 before starting I went through each of the helper response styles we 
have and documented exactly what comes and goes 
(http://wiki.squid-cache.org/Features/AddonHelpers) to determined that 
none of the response formats started with a token containing [a-z0-9] 
then =. We are *not* starting from a random-input state where blob 
can overlap with key.


Anything which does not start with a registered result status code, 
is blob automatically. The ones which do use result are safely 
mapped to the key-pair format based on the presence of a valid key= 
token, or its absence indicating blob to be handled by the old parser.


Have a look through that wiki page at the set of Result line sent back 
to Squid and its clear.


The proposed shared format is *exactly* what external ACL already 
processes. It also if you look at the code has the proposed background 
blob for backward compatibility with some squid-2.5 era helpers which 
only sent a reason field in plain-text (random input! yuck, as an 
established PoC it seems not to have had any complaints).


Amos



Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 05.07.2012 10:15, Alex Rousskov wrote:

On 07/04/2012 02:34 PM, Henrik Nordström wrote:

ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:



Why not simply use key=value pairs across the board?


We need blobs because some values in use today are not expressed as
tokens or quoted strings (the two value formats Amos is considering).

We could, of course, _encode_ those inconvenient values so that 
things
like SSL certificates can be represented as huge quoted strings. Is 
that

what you are getting at? I am not against that, even though I am a
little worried about the added encoding overhead. This will require
changing ssl_crtd helper code, but I do not think that would be a big
problem.




Why not have NtmlHelperReply::setResult() do the mapping? Is there
something format-incompatible with those NTLM result codes that the
generic parser cannot parse them using the following format?

  [channel-ID] result [key-pairs] [BS blob] terminator


NTLM do not need a blob.


We are talking about a general format, not NTLM-specific format. Blob 
is

optional. If NTLM helper does not need it, it will not use it.



It's just a value. Moving to key=value is fine.
Additionally worth noting is that = is not a valid character in NTLM
blobs so same parser can be used if tokens without = is supported
(keyless values).


Yes, anonymous values or valueless names can be allowed if it helps
existing helpers. Can somebody paste a typical NTLM helper response 
or two?




http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme

If you note, the reason field is usually something other than OK/ERR, 
so we can parse the old syntax differently to the new OK/ERR responses.


Where there is an overlap (BH) the reason field is an error message, 
usually the program name and a ':' colon (not valid in the proposed 
key-name) or number in brackets (again not valid in key name) or plain 
text message (no '=' expected on human-readable words).



Amos



Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 04:52 PM, Amos Jeffries wrote:
 On 05.07.2012 08:34, Henrik Nordström wrote:
 ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:
 
 Alex,
  before starting I went through each of the helper response styles we
 have and documented exactly what comes and goes
 (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that
 none of the response formats started with a token containing [a-z0-9]
 then =. We are *not* starting from a random-input state where blob
 can overlap with key.

I thought you wanted a general format. I agree that current bodies will
not clash with current keys.

Alex.


 Anything which does not start with a registered result status code, is
 blob automatically. The ones which do use result are safely mapped
 to the key-pair format based on the presence of a valid key= token, or
 its absence indicating blob to be handled by the old parser.
 
 Have a look through that wiki page at the set of Result line sent back
 to Squid and its clear.



 The proposed shared format is *exactly* what external ACL already
 processes. It also if you look at the code has the proposed background
 blob for backward compatibility with some squid-2.5 era helpers which
 only sent a reason field in plain-text (random input! yuck, as an
 established PoC it seems not to have had any complaints).
 
 Amos




Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 05:07 PM, Amos Jeffries wrote:

 http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme
 
 If you note, the reason field is usually something other than OK/ERR,
 so we can parse the old syntax differently to the new OK/ERR responses.
 
 Where there is an overlap (BH) the reason field is an error message,
 usually the program name and a ':' colon (not valid in the proposed
 key-name) or number in brackets (again not valid in key name) or plain
 text message (no '=' expected on human-readable words).

You can let setResult() and setKeyValue() methods in HelperReply
children to indicate that what follows is a blob:

NtlmHelperReply::setResult(string result)
{
if (result == BH) {
... remember that the result was BH ...
// no more key=value pairs; read the entire reason
expectBody(true);
}
...
}


This is not something you can easily express in a static format spec,
but it will work nicely to avoid custom code when handling reason
phrases in BH responses or ssl_crtd certificates. After expectBody() is
called, the general parser will get everything up to the terminator
and call setBody().


HTH,

Alex.


Re: Couple pointers

2012-07-04 Thread Alex Rousskov
On 07/03/2012 11:11 PM, Amos Jeffries wrote:
 On 4/07/2012 6:04 a.m., Alex Rousskov wrote:
 On 07/02/2012 05:42 PM, Amos Jeffries wrote:

   ... is anyone able to come up with a script or tester to detect
 classes
 which violate our Big-3 constructor/destructor/assignment operator
 guideline? I know there are portions of the code which are broken today
 and it is easily overlooked, this would be a nice one to enforce
 automatically.
 Hi Amos,

  Yes, Measurement Factory has purchased a Coverity static analysis
 license to automatically detect bugs like the ones you describe. I have
 run one Squid test during evaluation, and the tool did detect missing
 constructor/destructor/assignment operators (among hundreds of other
 bugs and non-bugs).

 We are waiting for Factory sysadmin to set the tool up and for Squid
 Project sysadmin to setup [automated] Squid tests the results of which
 can then be somehow shared with Squid developers (we did get Coverity
 permission to do that). I hope this will be done in July.


 Sweet. I wondered what happend with Coverity. We had an FOSS account
 there years ago but after the ring level changes they stopped contacting
 us and even scanning recent sources, kept scanning daily but only on old
 sources.

Last time I asked, somebody (either you or Henrik IIRC) told me that
trying to work with their free scan project was hopeless. Having control
over the tool will give us a lot more flexibility, of course. It remains
to be seen whether the bugs found by these scans will justify the
expense, but we need to get the tool installed and running first.


Cheers,

Alex.


Re: Squid-3.2 status update

2012-07-04 Thread Amos Jeffries

On 05.07.2012 10:00, Alex Rousskov wrote:

On 06/27/2012 03:12 AM, Amos Jeffries wrote:

A quick review of the other major bugs shows that each will take 
some

large design and code changes to implement a proper fix or even a
workaround.


Are there any objections to ignoring these bugs when considering a 
3.2

stable release:


Our definition of a stable release has two criteria:

1. Meant for production caches.

2. begin when all known major bugs have been fixed [for 14 days].

Criterion #1 should probably be interpreted as Squid Project 
considers

the version suitable for production deployment. If you think we are
there, I have no objections -- I do not have enough information to 
say

whether enough users will be satisfied with current v3.2 code in
production today. Perhaps this is something we should ask on 
squid-users

after we close all bugs that we think should be closed?

As for Criteria #2, your question means that either we stop 
considering

those bugs as major OR we change criterion #2. IMHO, we should adjust
that criterion so that we do not have to play these games where we 
mark

something as a major bug but then decide that in the interest of a
speedier stable designation we are going to ignore it.

An adjusted initialization criteria could be phrased as

2'.  begin when #1 is satisfied for at least 14 days


This gives us enough flexibility to release [what we consider
suitable-for-production] code that might have major bugs in some
environments. I added at least because otherwise we may have to
release v3.3 as stable 14 days after v3.2 is marked stable :-). In
practice, the version should have enough improvements to warrant 
its
numbering and its release but I do not want to digress in that 
discussion.





3124 - Cache manager stops responding when multiple workers used
  ** requires implementing non-blocking IPC packets between workers 
and

coordinator.


Has this been discussed somewhere? IPC communication is already
non-blocking so I suspect some other issue is at play here. The 
specific

examples of mgr commands in the bug report (userhash, sourcehash,
client_list, and netdb) seem like non-essential in most environments
and, hence, not justifying the major designation, but perhaps they
indicate some major implementation problem that must be fixed.



UNIX sockets apparently guarantee the write() is blocked until 
recipient process has read() the packet. Meaning each IPC packet is 
blocked behind whatever longer AsyncCall or delay the recipient has 
going on. Last I looked the coordinator handling function also called 
component handler functions synchronously for them to create the 
response IPC packet.


AFAIK this is waiting on the Subscription and generic (immediate-ACK) 
IPC packets, which will free up the coordinator and workers for other 
async operations even if a large process is underway.






3389 - Auto-reconnect for tcp access_log
  ** requires asynchronous handling of log opening and blocking 
Squid

operation


Since we have stable file-based logging, this bug does not have to 
block

a stable designation if TCP logging is declared experimental. You
already have a patch that addresses 90% of the core problem for those
who care.

If you do not want to mark TCP logging as experimental and highlight
this shortcoming, then the bug ought to be fixed IMHO because there 
is

consensus that accurate logging is critical for many deployments.



3478 - Host verify catching dynamic CDN hosted sites
  ** requires designing a CONNECT and bump handling mechanism


I am not an expert on this, but it feels like we are trying to 
enforce a
[good] rule ignored by the [bad] real world, especially in 
interception

environments. As a result, Squid lies and scares admins for no good
reason (in most cases). We will not win this battle.

I suggest that the host_verify_strict off behavior is adjusted to
cause no harm, even if some malicious requests will get through.



It does that now. The no harm means we can't re-write the request 
headers to something we are not sure about and would actively cause 
problems if we got it wrong.
 The current state is that Squid goes DIRECT, instead of through peers. 
Breaking interception+cluster setups.



I can open that up again, but it will mean updating the CVE to indicate 
2nd-stage proxies are still vulnerable.




If you do not want to do that, please add a [fast] ACL so that admins
are not stuck without a solution and can whitelist bad (or all) 
sites.



Said that, the bug report itself does not explicitly say that 
something

is _seriously_ broken, does it? I bet the cache.log messages are
excessive on any busy site with a diverse user population, but we can
rate-limit these messages and downgrade the severity of the bug while
waiting for a real use case where these new checks break things 
(despite

host_verify_strict being off).



cache_peer relay is almost completely disabled for some major sites. 
Everything else works well.







Re: Squid-3.2 status update

2012-07-04 Thread Alex Rousskov
On 07/04/2012 05:34 PM, Amos Jeffries wrote:
 On 05.07.2012 10:00, Alex Rousskov wrote:
 3478 - Host verify catching dynamic CDN hosted sites
   ** requires designing a CONNECT and bump handling mechanism

 I am not an expert on this, but it feels like we are trying to enforce a
 [good] rule ignored by the [bad] real world, especially in interception
 environments. As a result, Squid lies and scares admins for no good
 reason (in most cases). We will not win this battle.

 I suggest that the host_verify_strict off behavior is adjusted to
 cause no harm, even if some malicious requests will get through.


 It does that now. The no harm means we can't re-write the request
 headers to something we are not sure about and would actively cause
 problems if we got it wrong.
  The current state is that Squid goes DIRECT, instead of through peers.
 Breaking interception+cluster setups.

That last part means do harm to those admins who discover nonworking
setups that used to work fine (from their perspective). I understand
that your definition of harm may be different from theirs. This
conflict should be resolved by configuration knobs IMO.


 cache_peer relay is almost completely disabled for some major sites.
 Everything else works well.

Well, we can wait for somebody to complain about that and then decide
what to do, I guess. With some luck, nobody will complain.

I certainly do not insist on treating this issue as a blocker for v3.2
stable designation; only suggesting ways to close it.


Cheers,

Alex.


[PATCH] Do not release entries that should be kept in local memory cache.

2012-07-04 Thread Dmitry Kurochkin
Hi all.

Local memory caching does not work in recent Squid trunk.  The patch
attempts to fix that.


Since r11969, Squid calls trimMemory() for all entries, including
non-swappable, to release unused MemObjects memory.  But it should not
release objects that are or should be stored in local memory cache.
StoreEntry::trimMemory() has a check for IN_MEMORY status for that.
But IN_MEMORY status is set too late in
StoreController::handleIdleEntry(), after trimMemory() marks entry for
release:

  clientReplyContext::removeStoreReference()

storeUnregister()
  StoreEntry::swapOut()
StoreEntry::trimMemory()
  StoreEntry::releaseRequest()

StoreEntry::unlock()
  StoreController::handleIdleEntry() // never get here because entry is
set IN_MEMORY status // already marked for release

The patch adds StoreEntry::keepInLocalMemory() method to check if an
entry should be stored in local memory cache.  If it is true, do not
release entry in trimMemory().

Regards,
  Dmitry
Do not release entries that should be kept in local memory cache.

Since r11969, Squid calls trimMemory() for all entries, including
non-swappable, to release unused MemObjects memory.  But it should not
release objects that are or should be stored in local memory cache.
StoreEntry::trimMemory() has a check for IN_MEMORY status for that.
But IN_MEMORY status is set too late in
StoreController::handleIdleEntry(), after trimMemory() marks entry for
release:

  clientReplyContext::removeStoreReference()

storeUnregister()
  StoreEntry::swapOut()
StoreEntry::trimMemory()
  StoreEntry::releaseRequest()

StoreEntry::unlock()
  StoreController::handleIdleEntry() // never get here because entry is
set IN_MEMORY status // already marked for release

The patch adds StoreEntry::keepInLocalMemory() method to check if an
entry should be stored in local memory cache.  If it is true, do not
release entry in trimMemory().

=== modified file 'src/Store.h'
--- src/Store.h	2012-01-13 13:49:26 +
+++ src/Store.h	2012-07-05 01:23:03 +
@@ -101,40 +101,42 @@ public:
 void makePrivate();
 void setPublicKey();
 void setPrivateKey();
 void expireNow();
 void releaseRequest();
 void negativeCache();
 void cacheNegatively();		/** \todo argh, why both? */
 void invokeHandlers();
 void purgeMem();
 void cacheInMemory(); /// start or continue storing in memory cache
 void swapOut();
 /// whether we are in the process of writing this entry to disk
 bool swappingOut() const { return swap_status == SWAPOUT_WRITING; }
 void swapOutFileClose(int how);
 const char *url() const;
 int checkCachable();
 int checkNegativeHit() const;
 int locked() const;
 int validToSend() const;
 bool memoryCachable() const; /// may be cached in memory
+/// may be cached in memory and local memory cache is not overflowing
+bool keepInLocalMemory() const;
 void createMemObject(const char *, const char *);
 void hideMemObject(); /// no mem_obj for callers until createMemObject
 void dump(int debug_lvl) const;
 void hashDelete();
 void hashInsert(const cache_key *);
 void registerAbort(STABH * cb, void *);
 void reset();
 void setMemStatus(mem_status_t);
 void timestampsSet();
 void unregisterAbort();
 void destroyMemObject();
 int checkTooSmall();
 
 void delayAwareRead(const Comm::ConnectionPointer conn, char *buf, int len, AsyncCall::Pointer callback);
 
 void setNoDelay (bool const);
 bool modifiedSince(HttpRequest * request) const;
 /// has ETag matching at least one of the If-Match etags
 bool hasIfMatchEtag(const HttpRequest request) const;
 /// has ETag matching at least one of the If-None-Match etags

=== modified file 'src/store.cc'
--- src/store.cc	2012-01-13 13:49:26 +
+++ src/store.cc	2012-07-05 00:30:45 +
@@ -1446,40 +1446,46 @@ StoreEntry::memoryCachable() const
 if (mem_obj-data_hdr.size() == 0)
 return 0;
 
 if (mem_obj-inmem_lo != 0)
 return 0;
 
 if (!Config.onoff.memory_cache_first  swap_status == SWAPOUT_DONE  refcount == 1)
 return 0;
 
 if (Config.memShared  IamWorkerProcess()) {
 const int64_t expectedSize = mem_obj-expectedReplySize();
 // objects of unknown size are not allowed into memory cache, for now
 if (expectedSize  0 ||
 expectedSize  static_castint64_t(Config.Store.maxInMemObjSize))
 return 0;
 }
 
 return 1;
 }
 
+bool
+StoreEntry::keepInLocalMemory() const
+{
+return memoryCachable()  (mem_node::InUseCount() = store_pages_max);
+}
+
 int
 StoreEntry::checkNegativeHit() const
 {
 if (!EBIT_TEST(flags, ENTRY_NEGCACHED))
 return 0;
 
 if (expires = squid_curtime)
 return 0;
 
 if (store_status != STORE_OK)
 return 0;
 
 return 1;
 }
 
 /**
  * Set object for 

Re: Generic helper I/O format

2012-07-04 Thread Amos Jeffries

On 5/07/2012 11:10 a.m., Alex Rousskov wrote:

On 07/04/2012 04:52 PM, Amos Jeffries wrote:

On 05.07.2012 08:34, Henrik Nordström wrote:

ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov:

Alex,
  before starting I went through each of the helper response styles we
have and documented exactly what comes and goes
(http://wiki.squid-cache.org/Features/AddonHelpers) to determined that
none of the response formats started with a token containing [a-z0-9]
then =. We are *not* starting from a random-input state where blob
can overlap with key.

I thought you wanted a general format. I agree that current bodies will
not clash with current keys.


General as in shared by all helpers yes. Preferably where we can add 
fields at will in future without having to code around positioning of 
specific details and special cases.


The blob only exists in this discussion for two reasons; the old helpers 
backward compatibility requires it, and  you wanted to discuss a body 
field for the responses. Even not understanding properly the specifics 
of why you wanted to discuss it I dont think its a good idea since we 
can key-pair and encode anything new.


If we can agree on making it key-pair across the board for all the 
details which might be passed back we can move on.


Amos



Re: Generic helper I/O format

2012-07-04 Thread Robert Collins
On Thu, Jul 5, 2012 at 4:00 PM, Amos Jeffries squ...@treenet.co.nz wrote:

Why do we need backwards compat in the new protocol?

As an alternative, consider setting a protocol= option on the helpers,
making the default our latest-and-greatest,a nd folk running
third-party helpers can set protocl=v1 or whatever to get backwards
compat.

This lets us also warn when we start deprecating the old protocol,
that its going away.

-Rob


Re: [PATCH] Do not release entries that should be kept in local memory cache.

2012-07-04 Thread Alex Rousskov
On 07/04/2012 07:31 PM, Dmitry Kurochkin wrote:

 Since r11969, Squid calls trimMemory() for all entries, including
 non-swappable, to release unused MemObjects memory.  But it should not
 release objects that are or should be stored in local memory cache.

Agreed, at least for now. I suspect the We must not let go any data for
IN_MEMORY objects no longer applies because the trimMemory() method is
not called for IN_MEMORY objects any more, but I doubt we should
investigate that right now, even if we can relax or simplify that
condition under some circumstances.


 The patch adds StoreEntry::keepInLocalMemory() method to check if an
 entry should be stored in local memory cache.  If it is true, do not
 release entry in trimMemory().

That does not feel right, on several levels:

A1) A store entry itself cannot tell whether it will be eventually kept
in some cache. That [sometimes complex] decision is the responsibility
of the cache, not the entry. The moved decision code for the non-shared
memory cache is now misplaced.

This problem is easy to fix. Just have StoreEntry::trimMemory() ask the
new boolean StoreController::keepInLocalMemory(e) whether the Store
wants to keep the given [busy or idle] entry whole. If not, the trimming
code can proceed. The old StoreController::handleIdleEntry() will call
the same method when memStore is nil.

However, perhaps we should decide how to handle (A2) and (B2) below
before we fix the minor design bugs of (A1) and (B1).


A2) Will your changes have much effect in non-SMP mode after the memory
cache is completely filled? The (mem_node::InUseCount() =
store_pages_max) condition will be very often false when that happens,
right? Thus, many/most entries will continue to get trimmed as they are
getting trimmed now. Do you agree?


B1) These changes appear to ignore the shared memory cache which has a
different decision logic when it comes to keeping an entry in the cache.
See MemStore::considerKeeping().

To fix this, we can copy (or move?) some of the checks from
MemStore::considerKeeping() to the new MemStore::keepInLocalMemory(e)
that StoreController::keepInLocalMemory(e) will call, similar to how
current StoreController::handleIdleEntry() is organized. Which checks to
copy or move? I am thinking about the ones that do not depend on whether
the entry is fully loaded, but others might make sense to.


B2) Should there be more trimming checks, in addition to the ones in
MemStore::considerKeeping()? I think so. When we start reading the entry
from the server it is in transit. It is not in the memory cache yet.
We cannot keep all cachable in transit entries in RAM. There has to be
a limit or we may run out of RAM. However, I do not see where we can get
that limit.

Should the shared memory cache keep a [shared] total size of these
in-transit cachable entries and start trimming them if they are not
going to fit? This will create a kind of reservation system for
in-transit cachable entries. However, the entry size may be
changing/growing so we do not know how much to reserve in general. We
can cheat today because the current shared memory cache uses fixed-size
slots (but we will have to solve this problem later).


How did the old (before SMP changes) code solved that problem? What
prevented it from creating too-many in-transit cachable entries, keeping
them all in RAM, and eventually running out of RAM if the conditions are
right? Nothing??

If that is the case, then I suggest that we use the same logic and
prevent trimming of cachable in-transit objects regardless of the
current RAM usage state. The keepInLocalMemory(e) methods will ignore
the current memory cache size and capacity then, solving (A2) and (B2):
If it might be cached later in RAM, it should not be trimmed now, even
if the memory cache is full.

If we do that, we should add some source code comments to document this
understanding/assumption though.


Thank you,

Alex.


Re: Generic helper I/O format

2012-07-04 Thread Alex Rousskov
On 07/04/2012 10:00 PM, Amos Jeffries wrote:

 The blob only exists in this discussion for two reasons; the old helpers
 backward compatibility requires it, and  you wanted to discuss a body
 field for the responses. Even not understanding properly the specifics
 of why you wanted to discuss it I dont think its a good idea since we
 can key-pair and encode anything new.

But if blob is required for backward compatibility, why not use it for
new stuff as well, especially since it will reduce encoding overheads?
Seems like a good solution to me. And since specific helper replies can
tell the generic parser when to expect the blob, we do not have to argue
about the BS part; that part is not needed.


 If we can agree on making it key-pair across the board for all the
 details which might be passed back we can move on.

but we cannot use key-pair across the board because, as you said, the
old helpers backward compatibility requires [the blob].

I am OK with going key-pair across the board if no other helper (old or
new) is going to use blobs. Otherwise, the [new] helpers that can
benefit from blobs should be able to use them. Why waste a good feature
if it is required for other reasons anyway?


Thank you,

Alex.


Re: Squid-3.2 status update

2012-07-04 Thread Alex Rousskov
On 07/04/2012 05:34 PM, Amos Jeffries wrote:
 3124 - Cache manager stops responding when multiple workers used
   ** requires implementing non-blocking IPC packets between workers and
 coordinator.

 Has this been discussed somewhere? IPC communication is already
 non-blocking so I suspect some other issue is at play here. The specific
 examples of mgr commands in the bug report (userhash, sourcehash,
 client_list, and netdb) seem like non-essential in most environments
 and, hence, not justifying the major designation, but perhaps they
 indicate some major implementation problem that must be fixed.


 UNIX sockets apparently guarantee the write() is blocked until recipient
 process has read() the packet.

That is not true in general. I just wrote a basic UDS client and server
to test this (attached), and I can send packets much faster than the
server reads them. Linux keeps a queue of messages. The I/O may become
blocking if the queue is full, but I suspect select(2) or equivalent
will not let us send a new message under that condition (or the send
will fail rather than block).

There is a sysctrl option (net.unix.max_dgram_qlen in recent kernels)
that controls the number of messages that can be queued between the
client and server.

It is possible that UDS sockets behave differently in some environments
that I have not tested, but I doubt.

Why do you think that UNIX sockets block write() until recipient has
read() the packet?


 Last I
 looked the coordinator handling function also called component handler
 functions synchronously for them to create the response IPC packet.

Ipc::Coordinator::handleCacheMgrRequest() starts an async job to satisfy
the received cache manager request.

There are some Ipc::Coordinator::handle*() methods that create the final
response synchronously, but they should all be very fast and not worth
creating an async job.

Are you talking about some other coordinator handling functions that
block for a long time?


 AFAIK this is waiting on the Subscription and generic (immediate-ACK)
 IPC packets, which will free up the coordinator and workers for other
 async operations even if a large process is underway.

IIRC, subscription was needed to resolve IPC linking problems. It is
possible that it is needed for this bug as well, but since I cannot tell
what this bug is, I do not know whether subscription is the solution. I
thought you knew because of your requires implementing non-blocking IPC
packets solution summary. That is why I started asking questions...

Alex.
P.S. Output of the attached UDS server that sleeps to be slower than the
client. All sent messages are received, some after the client is gone:

 $ ./uds-server.pl /tmp/uds
 1341466849 waiting for messages
 1341466853 got msg #01 after 4.00 seconds ... sleeping for 3.00 seconds
 1341466856 got msg #02 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466859 got msg #03 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466862 got msg #04 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466865 got msg #05 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466868 got msg #06 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466871 got msg #07 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466874 got msg #08 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466877 got msg #09 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466880 got msg #10 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466883 got msg #11 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466886 got msg #12 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466889 got msg #13 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466892 got msg #14 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466895 got msg #15 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466898 got msg #16 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466901 got msg #17 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466904 got msg #18 after 3.00 seconds ... sleeping for 3.00 seconds
 1341466907 got msg #19 after 3.00 seconds ... sleeping for 3.00 seconds


UDS client that sends as fast as it can. Note the blocking after the
queue gets full around msg #12 (we are using blocking I/O here):

 $ ./uds-client.pl /tmp/uds
 1341466853 sending with max queue length of 10 messages
 1341466853 sent msg #01 after 0.00 seconds
 1341466853 sent msg #02 after 0.00 seconds
 1341466853 sent msg #03 after 0.00 seconds
 1341466853 sent msg #04 after 0.00 seconds
 1341466853 sent msg #05 after 0.00 seconds
 1341466853 sent msg #06 after 0.00 seconds
 1341466853 sent msg #07 after 0.00 seconds
 1341466853 sent msg #08 after 0.00 seconds
 1341466853 sent msg #09 after 0.00 seconds
 1341466853 sent msg #10 after 0.00 seconds
 1341466853 sent msg #11 after 0.00 seconds
 1341466853 sent msg #12 after 0.00 seconds
 1341466856 sent msg #13 after 3.00 seconds
 1341466859 sent msg #14 after 3.00 seconds
 1341466862 sent msg #15 after