Re: Generic helper I/O format
On 4/07/2012 4:54 p.m., Alex Rousskov wrote: On 07/03/2012 08:10 PM, Amos Jeffries wrote: [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator How will the generic code be able to tell where key-pairs end and blob begins? The generic code will have a format for key-pair of ( 1#token = 1#(token / quoted-string) ) [to be finalized still] with a set of known key names. If the bytes being parsed do not match that format and keys its not part of the key-pairs. There are two problems with this approach, IMO: * It cannot handle a body that just happened to start with supported key-value pair characters. This is shared with any body which just happens to start with the BS token as well. This is a good reason to use key-pair across the board and not document any blob support. * It cannot detect a case of helper sending unsupported key-value pairs (if that helper may send a body too). Yes. It also requires extra code/work to pre-register all known key names, but that is minor. This is not an issue. We have to code operations for handling any accepted key-pair in the handler callbacks anyway. The plan for patch-#2 is to discuss key-pair syntax (which we seem to have started), and move some particular key-pair parsing from helper handlers into the shared one. Opening up user=, tag=, password=, log=, message= keys to non-ACL helper responses will be VERY useful for RADIUS and systems needing to maintain cross-helper request state. I think we should keep it as simple as possible without causing problems with existing parsers and leaving some room for future extensions: * key/name: alphanumeric char followed by anything except (whitespace or '=' or terminator) * value: either quoted-string or anything except (whitespace or terminator); could be empty The parser strips whitespace and strips quotes around quoted strings. Quoted strings use backslash to escape any octet. Okay. So modelling on the ssl_crtd parser instead of external_acl_type parser. If you want one format for all, you probably need something like [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator where BS is some special marker that starts all bodies and cannot be found in key-pairs. Since any body-carrying message is likely to have new lines (and, hence, would need a non-newline terminator), this BS sequence could be something like body:\n, I guess. We REQUIRE a format that does not rely on new special markers like that in order to cope with the existing helpers responses in a backward-compatible way without additional user configuration. When your parser sees n=v after ERR and n is a known key name, how would it distinguish these two possibilities: * This is a start of a blob * This is a key=value pair The second. *IF* it was in the [key-pair] area directly after ERR. Once blob starts everything is blob. Or would you advocate moving the external_acl_type protocol=N.N config option into the helper child object for use by all helpers on determining what key-pairs and format is accepted? I am probably missing some important use cases, but my generic parser would just call a virtual method for each key=value pair found (with no pre-registration of supported keys) and another virtual method for the body blob, if any. It will parse everything using the key, value, and BS formats discussed above. You mean function pointers right? not virtual methods in the HelperReply object? and um, this is a high-use code path - both in parsing and in retrieving the details out of HelperReply. That style of generic parser is good for new formats but not so much for backward-compatibility from old formats. I am going to have to manually receive things like the TT/LD/AF/NA NTLM result codes and map the existing format into record of the HelperReply object. Amos
Re: Generic helper I/O format
On 07/04/2012 12:02 AM, Amos Jeffries wrote: On 4/07/2012 4:54 p.m., Alex Rousskov wrote: On 07/03/2012 08:10 PM, Amos Jeffries wrote: [channel-ID] (OK/ERR/BH) [key-pairs] blob terminator How will the generic code be able to tell where key-pairs end and blob begins? The generic code will have a format for key-pair of ( 1#token = 1#(token / quoted-string) ) [to be finalized still] with a set of known key names. If the bytes being parsed do not match that format and keys its not part of the key-pairs. There are two problems with this approach, IMO: * It cannot handle a body that just happened to start with supported key-value pair characters. This is shared with any body which just happens to start with the BS token as well. No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). [ but I am glad you acknowledge this issue with the BS-less approach ] I think we should keep it as simple as possible without causing problems with existing parsers and leaving some room for future extensions: * key/name: alphanumeric char followed by anything except (whitespace or '=' or terminator) * value: either quoted-string or anything except (whitespace or terminator); could be empty The parser strips whitespace and strips quotes around quoted strings. Quoted strings use backslash to escape any octet. Okay. So modelling on the ssl_crtd parser instead of external_acl_type parser. I am trying to model this on the above as simple as ... criteria. I do not really remember the specifics of any particular parser we have (so adjustments may be necessary). If you want one format for all, you probably need something like [channel-ID] (OK/ERR/BH) [key-pairs] [BS body] terminator where BS is some special marker that starts all bodies and cannot be found in key-pairs. Since any body-carrying message is likely to have new lines (and, hence, would need a non-newline terminator), this BS sequence could be something like body:\n, I guess. We REQUIRE a format that does not rely on new special markers like that in order to cope with the existing helpers responses in a backward-compatible way without additional user configuration. When your parser sees n=v after ERR and n is a known key name, how would it distinguish these two possibilities: * This is a start of a blob * This is a key=value pair The second. *IF* it was in the [key-pair] area directly after ERR. And how do you decide that IF? It is exactly the same question I asked, just phrased differently. The BS-less approach does not work (as you have already acknowledged in the beginning of this email): The end of [key-pair] area is indistinguishable from the beginning of a blob in this case. In other words, you do not know whether the blob has started or not. That is exactly the problem BS is solving (because it is not a valid key). Or would you advocate moving the external_acl_type protocol=N.N config option into the helper child object for use by all helpers on determining what key-pairs and format is accepted? I am probably missing some important use cases, but my generic parser would just call a virtual method for each key=value pair found (with no pre-registration of supported keys) and another virtual method for the body blob, if any. It will parse everything using the key, value, and BS formats discussed above. You mean function pointers right? not virtual methods in the HelperReply object? I did mean virtual methods. The child parsers will form HelperReply's kid objects by converting key-value pairs and other parsed strings into helper-specific replies. Those parsers may be integrated with HelperReply hierarchy or be separate from it. If they are integrated, it would look like this: class HelperReply { ... virtual void setResult(string); virtual void setKeyValue(string, string); virtual void setBody(MemBuf); }; ... // a sketch of a child implementation of setKeyValue() Ssl::CrtdReply::setKeyValue(string k, string v) { if (k == foo) ... this-foo_ = StringToInt(v); else HelperReply::parseKeyValue(k,v); } and um, this is a high-use code path - both in parsing and in retrieving the details out of HelperReply. Agreed. I cannot think of a significantly more efficient but still general approach that results in helper-specific ready-to-use replies. That style of generic parser is good for new formats but not so much for backward-compatibility from old formats. I am going to have to manually receive things like the TT/LD/AF/NA NTLM result codes and map the existing format into record of the HelperReply object. Why not have NtmlHelperReply::setResult() do the mapping? Is there
Re: Generic helper I/O format
ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). Why do we need a blob marker? Why not simply use key=value pairs across the board? Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Regards Henrik
Re: Squid-3.2 status update
On 06/27/2012 03:12 AM, Amos Jeffries wrote: A quick review of the other major bugs shows that each will take some large design and code changes to implement a proper fix or even a workaround. Are there any objections to ignoring these bugs when considering a 3.2 stable release: Our definition of a stable release has two criteria: 1. Meant for production caches. 2. begin when all known major bugs have been fixed [for 14 days]. Criterion #1 should probably be interpreted as Squid Project considers the version suitable for production deployment. If you think we are there, I have no objections -- I do not have enough information to say whether enough users will be satisfied with current v3.2 code in production today. Perhaps this is something we should ask on squid-users after we close all bugs that we think should be closed? As for Criteria #2, your question means that either we stop considering those bugs as major OR we change criterion #2. IMHO, we should adjust that criterion so that we do not have to play these games where we mark something as a major bug but then decide that in the interest of a speedier stable designation we are going to ignore it. An adjusted initialization criteria could be phrased as 2'. begin when #1 is satisfied for at least 14 days This gives us enough flexibility to release [what we consider suitable-for-production] code that might have major bugs in some environments. I added at least because otherwise we may have to release v3.3 as stable 14 days after v3.2 is marked stable :-). In practice, the version should have enough improvements to warrant its numbering and its release but I do not want to digress in that discussion. 3124 - Cache manager stops responding when multiple workers used ** requires implementing non-blocking IPC packets between workers and coordinator. Has this been discussed somewhere? IPC communication is already non-blocking so I suspect some other issue is at play here. The specific examples of mgr commands in the bug report (userhash, sourcehash, client_list, and netdb) seem like non-essential in most environments and, hence, not justifying the major designation, but perhaps they indicate some major implementation problem that must be fixed. 3389 - Auto-reconnect for tcp access_log ** requires asynchronous handling of log opening and blocking Squid operation Since we have stable file-based logging, this bug does not have to block a stable designation if TCP logging is declared experimental. You already have a patch that addresses 90% of the core problem for those who care. If you do not want to mark TCP logging as experimental and highlight this shortcoming, then the bug ought to be fixed IMHO because there is consensus that accurate logging is critical for many deployments. 3478 - Host verify catching dynamic CDN hosted sites ** requires designing a CONNECT and bump handling mechanism I am not an expert on this, but it feels like we are trying to enforce a [good] rule ignored by the [bad] real world, especially in interception environments. As a result, Squid lies and scares admins for no good reason (in most cases). We will not win this battle. I suggest that the host_verify_strict off behavior is adjusted to cause no harm, even if some malicious requests will get through. If you do not want to do that, please add a [fast] ACL so that admins are not stuck without a solution and can whitelist bad (or all) sites. Said that, the bug report itself does not explicitly say that something is _seriously_ broken, does it? I bet the cache.log messages are excessive on any busy site with a diverse user population, but we can rate-limit these messages and downgrade the severity of the bug while waiting for a real use case where these new checks break things (despite host_verify_strict being off). 3517 - Workers ldap digest ** requires SMP atomic access support for all user credentials This is not a blocker IMO. SMP has several known limitations, complex authentication schemes being one of them. This does not affect stability of supported SMP configurations. Which would leave us with only these to locate (any takers?) : 3551 - store_rebuild.cc:116: store_errors == 0 assertion It would be nice to figure this one out, at least for ufs, because many folks will try ufs with SMP and there is clearly some kind of corruption problem there. I assigned the bug to self for now. However, if I cannot reproduce it, I will not be able to make much progress. Please note that the original reported moved on to rock store and does not consider this bug to be affecting him any more (per comment #10). 3556 - assertion failed: comm.cc:1093: isOpen(fd) I recommend adding a guard for the comm_close() call in the Connection destructor to avoid the call for !isOpen(fd) orphan connections. And print the value of isOpen() in the BUG message. 3562 - StoreEntry::kickProducer Segmentation fault I suspect Squid is corrupting
Re: Generic helper I/O format
On 07/04/2012 02:34 PM, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Why not simply use key=value pairs across the board? We need blobs because some values in use today are not expressed as tokens or quoted strings (the two value formats Amos is considering). We could, of course, _encode_ those inconvenient values so that things like SSL certificates can be represented as huge quoted strings. Is that what you are getting at? I am not against that, even though I am a little worried about the added encoding overhead. This will require changing ssl_crtd helper code, but I do not think that would be a big problem. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. We are talking about a general format, not NTLM-specific format. Blob is optional. If NTLM helper does not need it, it will not use it. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Yes, anonymous values or valueless names can be allowed if it helps existing helpers. Can somebody paste a typical NTLM helper response or two? Thank you, Alex.
Re: Generic helper I/O format
On 05.07.2012 08:34, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: No, it is not. BS is required if the body is present and BS is not a valid key name. Thus, BS cannot be confused with a start of a key-value pair _and_ if a body starts with BS as well, there is no problem because we already know to expect a body there (after BS). Why do we need a blob marker? Why not simply use key=value pairs across the board? We need it as a temporary measure to handle old helpers responses. I agree on the desire to move to key-pair across the board and am trying to reach it out of this forest of individual fixed-syntax parsers. I think its possible (easy even) to do it using a blob instead of new config settings. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Good point here. Same stands for all the other helpers. Alex, before starting I went through each of the helper response styles we have and documented exactly what comes and goes (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that none of the response formats started with a token containing [a-z0-9] then =. We are *not* starting from a random-input state where blob can overlap with key. Anything which does not start with a registered result status code, is blob automatically. The ones which do use result are safely mapped to the key-pair format based on the presence of a valid key= token, or its absence indicating blob to be handled by the old parser. Have a look through that wiki page at the set of Result line sent back to Squid and its clear. The proposed shared format is *exactly* what external ACL already processes. It also if you look at the code has the proposed background blob for backward compatibility with some squid-2.5 era helpers which only sent a reason field in plain-text (random input! yuck, as an established PoC it seems not to have had any complaints). Amos
Re: Generic helper I/O format
On 05.07.2012 10:15, Alex Rousskov wrote: On 07/04/2012 02:34 PM, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Why not simply use key=value pairs across the board? We need blobs because some values in use today are not expressed as tokens or quoted strings (the two value formats Amos is considering). We could, of course, _encode_ those inconvenient values so that things like SSL certificates can be represented as huge quoted strings. Is that what you are getting at? I am not against that, even though I am a little worried about the added encoding overhead. This will require changing ssl_crtd helper code, but I do not think that would be a big problem. Why not have NtmlHelperReply::setResult() do the mapping? Is there something format-incompatible with those NTLM result codes that the generic parser cannot parse them using the following format? [channel-ID] result [key-pairs] [BS blob] terminator NTLM do not need a blob. We are talking about a general format, not NTLM-specific format. Blob is optional. If NTLM helper does not need it, it will not use it. It's just a value. Moving to key=value is fine. Additionally worth noting is that = is not a valid character in NTLM blobs so same parser can be used if tokens without = is supported (keyless values). Yes, anonymous values or valueless names can be allowed if it helps existing helpers. Can somebody paste a typical NTLM helper response or two? http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme If you note, the reason field is usually something other than OK/ERR, so we can parse the old syntax differently to the new OK/ERR responses. Where there is an overlap (BH) the reason field is an error message, usually the program name and a ':' colon (not valid in the proposed key-name) or number in brackets (again not valid in key name) or plain text message (no '=' expected on human-readable words). Amos
Re: Generic helper I/O format
On 07/04/2012 04:52 PM, Amos Jeffries wrote: On 05.07.2012 08:34, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Alex, before starting I went through each of the helper response styles we have and documented exactly what comes and goes (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that none of the response formats started with a token containing [a-z0-9] then =. We are *not* starting from a random-input state where blob can overlap with key. I thought you wanted a general format. I agree that current bodies will not clash with current keys. Alex. Anything which does not start with a registered result status code, is blob automatically. The ones which do use result are safely mapped to the key-pair format based on the presence of a valid key= token, or its absence indicating blob to be handled by the old parser. Have a look through that wiki page at the set of Result line sent back to Squid and its clear. The proposed shared format is *exactly* what external ACL already processes. It also if you look at the code has the proposed background blob for backward compatibility with some squid-2.5 era helpers which only sent a reason field in plain-text (random input! yuck, as an established PoC it seems not to have had any complaints). Amos
Re: Generic helper I/O format
On 07/04/2012 05:07 PM, Amos Jeffries wrote: http://wiki.squid-cache.org/Features/AddonHelpers#Negotiate_and_NTLM_Scheme If you note, the reason field is usually something other than OK/ERR, so we can parse the old syntax differently to the new OK/ERR responses. Where there is an overlap (BH) the reason field is an error message, usually the program name and a ':' colon (not valid in the proposed key-name) or number in brackets (again not valid in key name) or plain text message (no '=' expected on human-readable words). You can let setResult() and setKeyValue() methods in HelperReply children to indicate that what follows is a blob: NtlmHelperReply::setResult(string result) { if (result == BH) { ... remember that the result was BH ... // no more key=value pairs; read the entire reason expectBody(true); } ... } This is not something you can easily express in a static format spec, but it will work nicely to avoid custom code when handling reason phrases in BH responses or ssl_crtd certificates. After expectBody() is called, the general parser will get everything up to the terminator and call setBody(). HTH, Alex.
Re: Couple pointers
On 07/03/2012 11:11 PM, Amos Jeffries wrote: On 4/07/2012 6:04 a.m., Alex Rousskov wrote: On 07/02/2012 05:42 PM, Amos Jeffries wrote: ... is anyone able to come up with a script or tester to detect classes which violate our Big-3 constructor/destructor/assignment operator guideline? I know there are portions of the code which are broken today and it is easily overlooked, this would be a nice one to enforce automatically. Hi Amos, Yes, Measurement Factory has purchased a Coverity static analysis license to automatically detect bugs like the ones you describe. I have run one Squid test during evaluation, and the tool did detect missing constructor/destructor/assignment operators (among hundreds of other bugs and non-bugs). We are waiting for Factory sysadmin to set the tool up and for Squid Project sysadmin to setup [automated] Squid tests the results of which can then be somehow shared with Squid developers (we did get Coverity permission to do that). I hope this will be done in July. Sweet. I wondered what happend with Coverity. We had an FOSS account there years ago but after the ring level changes they stopped contacting us and even scanning recent sources, kept scanning daily but only on old sources. Last time I asked, somebody (either you or Henrik IIRC) told me that trying to work with their free scan project was hopeless. Having control over the tool will give us a lot more flexibility, of course. It remains to be seen whether the bugs found by these scans will justify the expense, but we need to get the tool installed and running first. Cheers, Alex.
Re: Squid-3.2 status update
On 05.07.2012 10:00, Alex Rousskov wrote: On 06/27/2012 03:12 AM, Amos Jeffries wrote: A quick review of the other major bugs shows that each will take some large design and code changes to implement a proper fix or even a workaround. Are there any objections to ignoring these bugs when considering a 3.2 stable release: Our definition of a stable release has two criteria: 1. Meant for production caches. 2. begin when all known major bugs have been fixed [for 14 days]. Criterion #1 should probably be interpreted as Squid Project considers the version suitable for production deployment. If you think we are there, I have no objections -- I do not have enough information to say whether enough users will be satisfied with current v3.2 code in production today. Perhaps this is something we should ask on squid-users after we close all bugs that we think should be closed? As for Criteria #2, your question means that either we stop considering those bugs as major OR we change criterion #2. IMHO, we should adjust that criterion so that we do not have to play these games where we mark something as a major bug but then decide that in the interest of a speedier stable designation we are going to ignore it. An adjusted initialization criteria could be phrased as 2'. begin when #1 is satisfied for at least 14 days This gives us enough flexibility to release [what we consider suitable-for-production] code that might have major bugs in some environments. I added at least because otherwise we may have to release v3.3 as stable 14 days after v3.2 is marked stable :-). In practice, the version should have enough improvements to warrant its numbering and its release but I do not want to digress in that discussion. 3124 - Cache manager stops responding when multiple workers used ** requires implementing non-blocking IPC packets between workers and coordinator. Has this been discussed somewhere? IPC communication is already non-blocking so I suspect some other issue is at play here. The specific examples of mgr commands in the bug report (userhash, sourcehash, client_list, and netdb) seem like non-essential in most environments and, hence, not justifying the major designation, but perhaps they indicate some major implementation problem that must be fixed. UNIX sockets apparently guarantee the write() is blocked until recipient process has read() the packet. Meaning each IPC packet is blocked behind whatever longer AsyncCall or delay the recipient has going on. Last I looked the coordinator handling function also called component handler functions synchronously for them to create the response IPC packet. AFAIK this is waiting on the Subscription and generic (immediate-ACK) IPC packets, which will free up the coordinator and workers for other async operations even if a large process is underway. 3389 - Auto-reconnect for tcp access_log ** requires asynchronous handling of log opening and blocking Squid operation Since we have stable file-based logging, this bug does not have to block a stable designation if TCP logging is declared experimental. You already have a patch that addresses 90% of the core problem for those who care. If you do not want to mark TCP logging as experimental and highlight this shortcoming, then the bug ought to be fixed IMHO because there is consensus that accurate logging is critical for many deployments. 3478 - Host verify catching dynamic CDN hosted sites ** requires designing a CONNECT and bump handling mechanism I am not an expert on this, but it feels like we are trying to enforce a [good] rule ignored by the [bad] real world, especially in interception environments. As a result, Squid lies and scares admins for no good reason (in most cases). We will not win this battle. I suggest that the host_verify_strict off behavior is adjusted to cause no harm, even if some malicious requests will get through. It does that now. The no harm means we can't re-write the request headers to something we are not sure about and would actively cause problems if we got it wrong. The current state is that Squid goes DIRECT, instead of through peers. Breaking interception+cluster setups. I can open that up again, but it will mean updating the CVE to indicate 2nd-stage proxies are still vulnerable. If you do not want to do that, please add a [fast] ACL so that admins are not stuck without a solution and can whitelist bad (or all) sites. Said that, the bug report itself does not explicitly say that something is _seriously_ broken, does it? I bet the cache.log messages are excessive on any busy site with a diverse user population, but we can rate-limit these messages and downgrade the severity of the bug while waiting for a real use case where these new checks break things (despite host_verify_strict being off). cache_peer relay is almost completely disabled for some major sites. Everything else works well.
Re: Squid-3.2 status update
On 07/04/2012 05:34 PM, Amos Jeffries wrote: On 05.07.2012 10:00, Alex Rousskov wrote: 3478 - Host verify catching dynamic CDN hosted sites ** requires designing a CONNECT and bump handling mechanism I am not an expert on this, but it feels like we are trying to enforce a [good] rule ignored by the [bad] real world, especially in interception environments. As a result, Squid lies and scares admins for no good reason (in most cases). We will not win this battle. I suggest that the host_verify_strict off behavior is adjusted to cause no harm, even if some malicious requests will get through. It does that now. The no harm means we can't re-write the request headers to something we are not sure about and would actively cause problems if we got it wrong. The current state is that Squid goes DIRECT, instead of through peers. Breaking interception+cluster setups. That last part means do harm to those admins who discover nonworking setups that used to work fine (from their perspective). I understand that your definition of harm may be different from theirs. This conflict should be resolved by configuration knobs IMO. cache_peer relay is almost completely disabled for some major sites. Everything else works well. Well, we can wait for somebody to complain about that and then decide what to do, I guess. With some luck, nobody will complain. I certainly do not insist on treating this issue as a blocker for v3.2 stable designation; only suggesting ways to close it. Cheers, Alex.
[PATCH] Do not release entries that should be kept in local memory cache.
Hi all. Local memory caching does not work in recent Squid trunk. The patch attempts to fix that. Since r11969, Squid calls trimMemory() for all entries, including non-swappable, to release unused MemObjects memory. But it should not release objects that are or should be stored in local memory cache. StoreEntry::trimMemory() has a check for IN_MEMORY status for that. But IN_MEMORY status is set too late in StoreController::handleIdleEntry(), after trimMemory() marks entry for release: clientReplyContext::removeStoreReference() storeUnregister() StoreEntry::swapOut() StoreEntry::trimMemory() StoreEntry::releaseRequest() StoreEntry::unlock() StoreController::handleIdleEntry() // never get here because entry is set IN_MEMORY status // already marked for release The patch adds StoreEntry::keepInLocalMemory() method to check if an entry should be stored in local memory cache. If it is true, do not release entry in trimMemory(). Regards, Dmitry Do not release entries that should be kept in local memory cache. Since r11969, Squid calls trimMemory() for all entries, including non-swappable, to release unused MemObjects memory. But it should not release objects that are or should be stored in local memory cache. StoreEntry::trimMemory() has a check for IN_MEMORY status for that. But IN_MEMORY status is set too late in StoreController::handleIdleEntry(), after trimMemory() marks entry for release: clientReplyContext::removeStoreReference() storeUnregister() StoreEntry::swapOut() StoreEntry::trimMemory() StoreEntry::releaseRequest() StoreEntry::unlock() StoreController::handleIdleEntry() // never get here because entry is set IN_MEMORY status // already marked for release The patch adds StoreEntry::keepInLocalMemory() method to check if an entry should be stored in local memory cache. If it is true, do not release entry in trimMemory(). === modified file 'src/Store.h' --- src/Store.h 2012-01-13 13:49:26 + +++ src/Store.h 2012-07-05 01:23:03 + @@ -101,40 +101,42 @@ public: void makePrivate(); void setPublicKey(); void setPrivateKey(); void expireNow(); void releaseRequest(); void negativeCache(); void cacheNegatively(); /** \todo argh, why both? */ void invokeHandlers(); void purgeMem(); void cacheInMemory(); /// start or continue storing in memory cache void swapOut(); /// whether we are in the process of writing this entry to disk bool swappingOut() const { return swap_status == SWAPOUT_WRITING; } void swapOutFileClose(int how); const char *url() const; int checkCachable(); int checkNegativeHit() const; int locked() const; int validToSend() const; bool memoryCachable() const; /// may be cached in memory +/// may be cached in memory and local memory cache is not overflowing +bool keepInLocalMemory() const; void createMemObject(const char *, const char *); void hideMemObject(); /// no mem_obj for callers until createMemObject void dump(int debug_lvl) const; void hashDelete(); void hashInsert(const cache_key *); void registerAbort(STABH * cb, void *); void reset(); void setMemStatus(mem_status_t); void timestampsSet(); void unregisterAbort(); void destroyMemObject(); int checkTooSmall(); void delayAwareRead(const Comm::ConnectionPointer conn, char *buf, int len, AsyncCall::Pointer callback); void setNoDelay (bool const); bool modifiedSince(HttpRequest * request) const; /// has ETag matching at least one of the If-Match etags bool hasIfMatchEtag(const HttpRequest request) const; /// has ETag matching at least one of the If-None-Match etags === modified file 'src/store.cc' --- src/store.cc 2012-01-13 13:49:26 + +++ src/store.cc 2012-07-05 00:30:45 + @@ -1446,40 +1446,46 @@ StoreEntry::memoryCachable() const if (mem_obj-data_hdr.size() == 0) return 0; if (mem_obj-inmem_lo != 0) return 0; if (!Config.onoff.memory_cache_first swap_status == SWAPOUT_DONE refcount == 1) return 0; if (Config.memShared IamWorkerProcess()) { const int64_t expectedSize = mem_obj-expectedReplySize(); // objects of unknown size are not allowed into memory cache, for now if (expectedSize 0 || expectedSize static_castint64_t(Config.Store.maxInMemObjSize)) return 0; } return 1; } +bool +StoreEntry::keepInLocalMemory() const +{ +return memoryCachable() (mem_node::InUseCount() = store_pages_max); +} + int StoreEntry::checkNegativeHit() const { if (!EBIT_TEST(flags, ENTRY_NEGCACHED)) return 0; if (expires = squid_curtime) return 0; if (store_status != STORE_OK) return 0; return 1; } /** * Set object for
Re: Generic helper I/O format
On 5/07/2012 11:10 a.m., Alex Rousskov wrote: On 07/04/2012 04:52 PM, Amos Jeffries wrote: On 05.07.2012 08:34, Henrik Nordström wrote: ons 2012-07-04 klockan 13:02 -0600 skrev Alex Rousskov: Alex, before starting I went through each of the helper response styles we have and documented exactly what comes and goes (http://wiki.squid-cache.org/Features/AddonHelpers) to determined that none of the response formats started with a token containing [a-z0-9] then =. We are *not* starting from a random-input state where blob can overlap with key. I thought you wanted a general format. I agree that current bodies will not clash with current keys. General as in shared by all helpers yes. Preferably where we can add fields at will in future without having to code around positioning of specific details and special cases. The blob only exists in this discussion for two reasons; the old helpers backward compatibility requires it, and you wanted to discuss a body field for the responses. Even not understanding properly the specifics of why you wanted to discuss it I dont think its a good idea since we can key-pair and encode anything new. If we can agree on making it key-pair across the board for all the details which might be passed back we can move on. Amos
Re: Generic helper I/O format
On Thu, Jul 5, 2012 at 4:00 PM, Amos Jeffries squ...@treenet.co.nz wrote: Why do we need backwards compat in the new protocol? As an alternative, consider setting a protocol= option on the helpers, making the default our latest-and-greatest,a nd folk running third-party helpers can set protocl=v1 or whatever to get backwards compat. This lets us also warn when we start deprecating the old protocol, that its going away. -Rob
Re: [PATCH] Do not release entries that should be kept in local memory cache.
On 07/04/2012 07:31 PM, Dmitry Kurochkin wrote: Since r11969, Squid calls trimMemory() for all entries, including non-swappable, to release unused MemObjects memory. But it should not release objects that are or should be stored in local memory cache. Agreed, at least for now. I suspect the We must not let go any data for IN_MEMORY objects no longer applies because the trimMemory() method is not called for IN_MEMORY objects any more, but I doubt we should investigate that right now, even if we can relax or simplify that condition under some circumstances. The patch adds StoreEntry::keepInLocalMemory() method to check if an entry should be stored in local memory cache. If it is true, do not release entry in trimMemory(). That does not feel right, on several levels: A1) A store entry itself cannot tell whether it will be eventually kept in some cache. That [sometimes complex] decision is the responsibility of the cache, not the entry. The moved decision code for the non-shared memory cache is now misplaced. This problem is easy to fix. Just have StoreEntry::trimMemory() ask the new boolean StoreController::keepInLocalMemory(e) whether the Store wants to keep the given [busy or idle] entry whole. If not, the trimming code can proceed. The old StoreController::handleIdleEntry() will call the same method when memStore is nil. However, perhaps we should decide how to handle (A2) and (B2) below before we fix the minor design bugs of (A1) and (B1). A2) Will your changes have much effect in non-SMP mode after the memory cache is completely filled? The (mem_node::InUseCount() = store_pages_max) condition will be very often false when that happens, right? Thus, many/most entries will continue to get trimmed as they are getting trimmed now. Do you agree? B1) These changes appear to ignore the shared memory cache which has a different decision logic when it comes to keeping an entry in the cache. See MemStore::considerKeeping(). To fix this, we can copy (or move?) some of the checks from MemStore::considerKeeping() to the new MemStore::keepInLocalMemory(e) that StoreController::keepInLocalMemory(e) will call, similar to how current StoreController::handleIdleEntry() is organized. Which checks to copy or move? I am thinking about the ones that do not depend on whether the entry is fully loaded, but others might make sense to. B2) Should there be more trimming checks, in addition to the ones in MemStore::considerKeeping()? I think so. When we start reading the entry from the server it is in transit. It is not in the memory cache yet. We cannot keep all cachable in transit entries in RAM. There has to be a limit or we may run out of RAM. However, I do not see where we can get that limit. Should the shared memory cache keep a [shared] total size of these in-transit cachable entries and start trimming them if they are not going to fit? This will create a kind of reservation system for in-transit cachable entries. However, the entry size may be changing/growing so we do not know how much to reserve in general. We can cheat today because the current shared memory cache uses fixed-size slots (but we will have to solve this problem later). How did the old (before SMP changes) code solved that problem? What prevented it from creating too-many in-transit cachable entries, keeping them all in RAM, and eventually running out of RAM if the conditions are right? Nothing?? If that is the case, then I suggest that we use the same logic and prevent trimming of cachable in-transit objects regardless of the current RAM usage state. The keepInLocalMemory(e) methods will ignore the current memory cache size and capacity then, solving (A2) and (B2): If it might be cached later in RAM, it should not be trimmed now, even if the memory cache is full. If we do that, we should add some source code comments to document this understanding/assumption though. Thank you, Alex.
Re: Generic helper I/O format
On 07/04/2012 10:00 PM, Amos Jeffries wrote: The blob only exists in this discussion for two reasons; the old helpers backward compatibility requires it, and you wanted to discuss a body field for the responses. Even not understanding properly the specifics of why you wanted to discuss it I dont think its a good idea since we can key-pair and encode anything new. But if blob is required for backward compatibility, why not use it for new stuff as well, especially since it will reduce encoding overheads? Seems like a good solution to me. And since specific helper replies can tell the generic parser when to expect the blob, we do not have to argue about the BS part; that part is not needed. If we can agree on making it key-pair across the board for all the details which might be passed back we can move on. but we cannot use key-pair across the board because, as you said, the old helpers backward compatibility requires [the blob]. I am OK with going key-pair across the board if no other helper (old or new) is going to use blobs. Otherwise, the [new] helpers that can benefit from blobs should be able to use them. Why waste a good feature if it is required for other reasons anyway? Thank you, Alex.
Re: Squid-3.2 status update
On 07/04/2012 05:34 PM, Amos Jeffries wrote: 3124 - Cache manager stops responding when multiple workers used ** requires implementing non-blocking IPC packets between workers and coordinator. Has this been discussed somewhere? IPC communication is already non-blocking so I suspect some other issue is at play here. The specific examples of mgr commands in the bug report (userhash, sourcehash, client_list, and netdb) seem like non-essential in most environments and, hence, not justifying the major designation, but perhaps they indicate some major implementation problem that must be fixed. UNIX sockets apparently guarantee the write() is blocked until recipient process has read() the packet. That is not true in general. I just wrote a basic UDS client and server to test this (attached), and I can send packets much faster than the server reads them. Linux keeps a queue of messages. The I/O may become blocking if the queue is full, but I suspect select(2) or equivalent will not let us send a new message under that condition (or the send will fail rather than block). There is a sysctrl option (net.unix.max_dgram_qlen in recent kernels) that controls the number of messages that can be queued between the client and server. It is possible that UDS sockets behave differently in some environments that I have not tested, but I doubt. Why do you think that UNIX sockets block write() until recipient has read() the packet? Last I looked the coordinator handling function also called component handler functions synchronously for them to create the response IPC packet. Ipc::Coordinator::handleCacheMgrRequest() starts an async job to satisfy the received cache manager request. There are some Ipc::Coordinator::handle*() methods that create the final response synchronously, but they should all be very fast and not worth creating an async job. Are you talking about some other coordinator handling functions that block for a long time? AFAIK this is waiting on the Subscription and generic (immediate-ACK) IPC packets, which will free up the coordinator and workers for other async operations even if a large process is underway. IIRC, subscription was needed to resolve IPC linking problems. It is possible that it is needed for this bug as well, but since I cannot tell what this bug is, I do not know whether subscription is the solution. I thought you knew because of your requires implementing non-blocking IPC packets solution summary. That is why I started asking questions... Alex. P.S. Output of the attached UDS server that sleeps to be slower than the client. All sent messages are received, some after the client is gone: $ ./uds-server.pl /tmp/uds 1341466849 waiting for messages 1341466853 got msg #01 after 4.00 seconds ... sleeping for 3.00 seconds 1341466856 got msg #02 after 3.00 seconds ... sleeping for 3.00 seconds 1341466859 got msg #03 after 3.00 seconds ... sleeping for 3.00 seconds 1341466862 got msg #04 after 3.00 seconds ... sleeping for 3.00 seconds 1341466865 got msg #05 after 3.00 seconds ... sleeping for 3.00 seconds 1341466868 got msg #06 after 3.00 seconds ... sleeping for 3.00 seconds 1341466871 got msg #07 after 3.00 seconds ... sleeping for 3.00 seconds 1341466874 got msg #08 after 3.00 seconds ... sleeping for 3.00 seconds 1341466877 got msg #09 after 3.00 seconds ... sleeping for 3.00 seconds 1341466880 got msg #10 after 3.00 seconds ... sleeping for 3.00 seconds 1341466883 got msg #11 after 3.00 seconds ... sleeping for 3.00 seconds 1341466886 got msg #12 after 3.00 seconds ... sleeping for 3.00 seconds 1341466889 got msg #13 after 3.00 seconds ... sleeping for 3.00 seconds 1341466892 got msg #14 after 3.00 seconds ... sleeping for 3.00 seconds 1341466895 got msg #15 after 3.00 seconds ... sleeping for 3.00 seconds 1341466898 got msg #16 after 3.00 seconds ... sleeping for 3.00 seconds 1341466901 got msg #17 after 3.00 seconds ... sleeping for 3.00 seconds 1341466904 got msg #18 after 3.00 seconds ... sleeping for 3.00 seconds 1341466907 got msg #19 after 3.00 seconds ... sleeping for 3.00 seconds UDS client that sends as fast as it can. Note the blocking after the queue gets full around msg #12 (we are using blocking I/O here): $ ./uds-client.pl /tmp/uds 1341466853 sending with max queue length of 10 messages 1341466853 sent msg #01 after 0.00 seconds 1341466853 sent msg #02 after 0.00 seconds 1341466853 sent msg #03 after 0.00 seconds 1341466853 sent msg #04 after 0.00 seconds 1341466853 sent msg #05 after 0.00 seconds 1341466853 sent msg #06 after 0.00 seconds 1341466853 sent msg #07 after 0.00 seconds 1341466853 sent msg #08 after 0.00 seconds 1341466853 sent msg #09 after 0.00 seconds 1341466853 sent msg #10 after 0.00 seconds 1341466853 sent msg #11 after 0.00 seconds 1341466853 sent msg #12 after 0.00 seconds 1341466856 sent msg #13 after 3.00 seconds 1341466859 sent msg #14 after 3.00 seconds 1341466862 sent msg #15 after