RE: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes
On Thu, 2003-03-20 at 10:03, Lightfoot.Michael wrote: As I stated (not very clearly) I have had no success in getting a core dump from squid 2.5. I have (before Robert's suggestion and since to make sure I hadn't missed anything) read the FAQ section on core dumps and debugging. I am considering running in debug mode for a while, but this _is_ a produciton server and I don't want to impact any further on my users. Something you might try: build a unpatched squid-2.5 with a *different* prefix (configure --prefix=foo) give it an used port (say 3140 :}). ensure your ulimits etc and any solaris requirements for getting a core are theoretically correct. run it with -N. now, use kill -SIGSEGV pid of this test squid that should cause a core file to be generated. And if it doesn't, it will let you experiment without impacting your users to find out how to get a core on solaris. Rob -- GPG key available at: http://users.bigpond.net.au/robertc/keys.txt. signature.asc Description: This is a digitally signed message part
RE: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes
Something you might try: build a unpatched squid-2.5 with a *different* prefix (configure --prefix=foo) give it an used port (say 3140 :}). ensure your ulimits etc and any solaris requirements for getting a core are theoretically correct. run it with -N. now, use kill -SIGSEGV pid of this test squid that should cause a core file to be generated. And if it doesn't, it will let you experiment without impacting your users to find out how to get a core on solaris. Good suggestion. I do have a test squid config on the same box which currently uses the same squid binary, without SF turned on (there are two tokens added to squid.conf that enable or disable SF.) Incidetally, I don't get crashes when I turn the SF config off, even while running the SF patched code. This does support the theory that it is something specific in their patches that is causing the two failure symptoms, and that if the code is not traversed squid behaves perfectly. Please accept my apologies for the uncalled for accusation of arrogance yesterday. I guess I'm under a bit of pressure this week, but that's not a credible excuse is it? :-) BTW, I did indirectly follow your suggestions in your original message a month ago - I later turned off aufs and then upgraded squid 2.5S1 to a more recent snapshot a couple of weeks ago. Neither made any difference. :-( Michael Lightfoot Unix Consultant ISG Host Systems Comcare +61 2 62750680 Apologies for the rubbish that follows... NOTICE: This e-mail message and attachments may contain confidential information. If you are not the intended recipient you should not use or disclose any information in the message or attachments. If received in error, please notify the sender by return email immediately. Comcare does not waive any confidentiality or privilege.
Re: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes
On Wed, 2003-03-19 at 11:45, Lightfoot.Michael wrote: Quite a while ago (over a month) I reported a problem to this list with squid 2.5STABLE1 and Secure Computing's SmartFilter software. SmartFilter integrates with squid by patching several source files to redirect the URL to itself (rather than using the standard redirector interface,) by adding a couple of lines to squid.conf and by adding a couple of fields to access.log. At that stage I was running 2.5STABLE1-20021118, SmartFilter 3.1.1 on Solaris 7. I am also running Cameron Simpson's Ad Zapper as a standard redirector (12 instances.) I have been conversing with Secure Computing's technical support ever since and after some time they decided that Solaris 7 was unsupported. I have since upgraded the server to Solaris 9 with all the latest patches (uname -a reports SunOS minotaur.comcare.gov.au 5.9 Generic_112233-04 sun4u sparc SUNW,Ultra-60), which of course is also unsupported (they are promising support soon and have started testing.) This caused the frequency of crashes to increase dramatically for a few days while I got around to upgrading squid to 2.5STABLE2-20030318 and SmartFilter 3.2.0 yesterday. Ad Zapper is regularly upgraded by an automatic download every few days. I am now getting crashes about once per hour or so this morning. Mostly all squid's cache.log tells me is that it had a segment violation, but I did get the following a short time back: 2003/03/19 10:46:18| comm_accept: FD 26: (130) Software caused connection abort 2003/03/19 10:46:18| httpAccept: FD 26: accept failure: (130) Software caused connection abort 2003/03/19 10:46:51| assertion failed: store_client.c:201: sc-callback == NULL 2003/03/19 10:47:01| Starting Squid Cache version 2.5.STABLE2-20030318 for sparc-sun-solaris2.9... The offending code segment is: /* copy bytes requested by the client */ void storeClientCopy(store_client * sc, StoreEntry * e, off_t seen_offset, off_t copy_offset, size_t size, char *buf, STCB * callback, void *data) { assert(!EBIT_TEST(e-flags, ENTRY_ABORTED)); debug(20, 3) (storeClientCopy: %s, seen %d, want %d, size %d, cb %p, cbdata %p\n, storeKeyText(e-hash.key), (int) seen_offset, (int) copy_offset, (int) size, callback, data); assert(sc != NULL); #if STORE_CLIENT_LIST_DEBUG assert(sc == storeClientListSearch(e-mem_obj, data)); #endif assert(sc-callback == NULL); assert(sc-entry == e); sc-seen_offset = seen_offset; sc-callback = callback; sc-copy_buf = buf; sc-copy_size = size; sc-copy_offset = copy_offset; storeClientCopy2(e, sc); } Does the above mean anything to anybody? How can I get a better indication of where the segment violation is occurring? No segment violation is occuring. A logic violation is occuring and triggering an assert. asserts are used to detect programmer error - where the programmer has either misued an internal API, or hasn't covered some corner case and that resulted in inconsistent internal state. And please no lectures about source code hacks by commercial vendors! :-) I won't lecture you, but I also can't support you as I don't know what the code you are running looks like. Nor do I want to know. I am also running squid 2.5STABLE1 on another server under Solaris 2.6 without SmartFilter or Ad Zapper. It hasn't missed a beat. Right. That should give you a clue :}. Rob -- GPG key available at: http://users.bigpond.net.au/robertc/keys.txt. signature.asc Description: This is a digitally signed message part
RE: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes
On Wed, 2003-03-19 at 14:22, Lightfoot.Michael wrote: Your email is ( I think ) taken as intended Does the above mean anything to anybody? How can I get a better indication of where the segment violation is occurring? No segment violation is occuring. A logic violation is occuring and triggering an assert. asserts are used to detect programmer error - where the programmer has either misued an internal API, or hasn't covered some corner case and that resulted in inconsistent internal state. Well can you tell me whether this is a squid code problem or a SmartFilter problem? Not for sure, no. But I really do believe it is a SmartFilter-patch-to-squid problem. And I _know_ this is a different error to the non-committal one about a segment violation. I thought they _might_ be related. It's quite likely that they are. Neither produces a core file so I can't even get past square one. Ah. Well, if you can trigger the segment violation under gdb you will be able to enter 'bt' to get a back trace. There is a FAQ entry on creation of core files (http://www.squid-cache.org/Doc/FAQ/FAQ-11.html#ss11.19). I'm not a solaris guru, I can't be of much help there sight unseen I'm afraid. And please no lectures about source code hacks by commercial vendors! :-) I won't lecture you, but I also can't support you as I don't know what the code you are running looks like. Nor do I want to know. And I won't bother you with it, except to note that store_client.c line 201 and the surrounding code is not changed by the SmartFilter patches. In fact none of the store*.c files are changed by SmartFilter. Perhaps this assert is really a squid problem, triggered by something that the SmartFilter code does? Until we get the same assert from a non-smartfilter squid-2.5, I'm assuming that the assert is triggered by the SmartFilter code. It's highly likely that something they do is causing this, and it's only the squid code that is detecting it. store_client is used most heavily by client_side.c. store_client provides the logic to grab data from *either* the disk or the upstream server. To troubleshoot, we really need to know whats in the stack frames immediately leading into the assert (and likewise for the segment violation). Running under gdb is a good way to get this if you cannot get a core file). There is a FAQ entry on how to get gdb to do the right thing as well. Right. That should give you a clue :}. And that isn't a lecture, smiley or not? I didn't think it was. I was indicating that the 99.% probable cause of the fault is in the proprietary code. I wasn't intending to talk about the ethics or political issues relating to proprietary patches to GPL code. I could, if you wanted me to - but that is best for a different list, at a different time. Robert, you gave me a useless answer last time as well. Please don't be so arrogant. This I take some offence to. I wasn't intending arrogance. I am trying to point you in the right direction. In terms of a useless reply, my previous one (http://www.squid-cache.org/mail-archive/squid-users/200302/0016.html) is much less than useless. It's concise yes, but: Your version stamp in the previously reported fault was prior to the aufs bugfix that is on the 2.5 errata page (http://www.squid-cache.org/Versions/v2/2.5/bugs/). Telling you to apply the errata was apparently a waste of time, or you might not have called that reply useless. And in fact, it appears to have reduced the rate of the segfault you where experiencing, no? As to telling you to create a core when you said that one was not being created, you had not specified that you had trouble *getting one*, rather that one was *not being created*. For all I knew you had your ulimits were/are too low. Please, take my advice *before* calling me arrogant. If the advice is wrong: then yes, call me that, and it may be accurate. Calling me arrogant when you hadn't followed through on the advice you got is, well. Silly. What I am looking for is some guidance on how I can beat Secure Computing about the head. They are being largely unhelpful, you are being wholly unhelpful. Maybe you could read my message again and try to give me an answer to my question: What can I do to get better diagnostics? To get better diagnostics you need to: * get a stack trace. Somehow. Anyhow. Thats the key initial step. Alternatively, you could up the debug levels via the debug_options squid.conf statement, which will be *very* noisy in the log files, but may give you *some* insight. I am sympathetic to your plight with Secure Computing. I really am. But there really is very little I can do: I can't examine your source without being tainted. I can't refer to my source here to provide guidance. And without a stack trace, I cannot even *guess* at the root cause of the fault. What I can say, I pretty much already have. I make my living of squid consulting, and it