RE: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes

2003-03-19 Thread Robert Collins
On Thu, 2003-03-20 at 10:03, Lightfoot.Michael wrote:


 As I stated (not very clearly) I have had no success in getting a core
 dump from squid 2.5.  I have (before Robert's suggestion and since to
 make sure I hadn't missed anything) read the FAQ section on core dumps
 and debugging.  I am considering running in debug mode for a while, but
 this _is_ a produciton server and I don't want to impact any further on
 my users.

Something you might try:
build a unpatched squid-2.5 with a *different* prefix (configure
--prefix=foo)

give it an used port (say 3140 :}).

ensure your ulimits etc and any solaris requirements for getting a core
are theoretically correct.

run it with -N.

now, use kill -SIGSEGV pid of this test squid

that should cause a core file to be generated. And if it doesn't, it
will let you experiment without impacting your users to find out how to
get a core on solaris.

Rob

-- 
GPG key available at: http://users.bigpond.net.au/robertc/keys.txt.


signature.asc
Description: This is a digitally signed message part


RE: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes

2003-03-19 Thread Lightfoot.Michael
 
 Something you might try:
 build a unpatched squid-2.5 with a *different* prefix (configure
 --prefix=foo)
 
 give it an used port (say 3140 :}).
 
 ensure your ulimits etc and any solaris requirements for 
 getting a core are theoretically correct.
 
 run it with -N.
 
 now, use kill -SIGSEGV pid of this test squid
 
 that should cause a core file to be generated. And if it 
 doesn't, it will let you experiment without impacting your 
 users to find out how to get a core on solaris.
 
Good suggestion.  I do have a test squid config on the same box which
currently uses the same squid binary, without SF turned on (there are
two tokens added to squid.conf that enable or disable SF.)  Incidetally,
I don't get crashes when I turn the SF config off, even while running
the SF patched code.  This does support the theory that it is something
specific in their patches that is causing the two failure symptoms, and
that if the code is not traversed squid behaves perfectly.

Please accept my apologies for the uncalled for accusation of arrogance
yesterday.  I guess I'm under a bit of pressure this week, but that's
not a credible excuse is it?  :-)

BTW, I did indirectly follow your suggestions in your original message a
month ago - I later turned off aufs and then upgraded squid 2.5S1 to a
more recent snapshot a couple of weeks ago.  Neither made any
difference.  :-(




Michael Lightfoot
Unix Consultant
ISG Host Systems
Comcare
+61 2 62750680
Apologies for the rubbish that follows...

 

NOTICE: This e-mail message and attachments may contain confidential 
information. If you are not the intended recipient you should not use or 
disclose any information in the message or attachments. If received in 
error, please notify the sender by return email immediately.  Comcare 
does not waive any confidentiality or privilege.




Re: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes

2003-03-18 Thread Robert Collins
On Wed, 2003-03-19 at 11:45, Lightfoot.Michael wrote:
 Quite a while ago (over a month) I reported a problem to this list with
 squid 2.5STABLE1 and Secure Computing's SmartFilter software.
 SmartFilter integrates with squid by patching several source files to
 redirect the URL to itself (rather than using the standard redirector
 interface,) by adding a couple of lines to squid.conf and by adding a
 couple of fields to access.log.
 
 At that stage I was running 2.5STABLE1-20021118, SmartFilter 3.1.1 on
 Solaris 7.  I am also running Cameron Simpson's Ad Zapper as a standard
 redirector (12 instances.)
 
 I have been conversing with Secure Computing's technical support ever
 since and after some time they decided that Solaris 7 was unsupported.
 I have since upgraded the server to Solaris 9 with all the latest
 patches (uname -a reports SunOS minotaur.comcare.gov.au 5.9
 Generic_112233-04 sun4u sparc SUNW,Ultra-60), which of course is also
 unsupported (they are promising support soon and have started
 testing.)
 
 This caused the frequency of crashes to increase dramatically for a few
 days while I got around to upgrading squid to 2.5STABLE2-20030318 and
 SmartFilter 3.2.0 yesterday.  Ad Zapper is regularly upgraded by an
 automatic download every few days.
 
 I am now getting crashes about once per hour or so this morning.  Mostly
 all squid's cache.log tells me is that it had a segment violation, but I
 did get the following a short time back:
 
 2003/03/19 10:46:18| comm_accept: FD 26: (130) Software caused
 connection abort
 2003/03/19 10:46:18| httpAccept: FD 26: accept failure: (130) Software
 caused connection abort
 2003/03/19 10:46:51| assertion failed: store_client.c:201: sc-callback
 == NULL
 2003/03/19 10:47:01| Starting Squid Cache version 2.5.STABLE2-20030318
 for sparc-sun-solaris2.9...
 
 The offending code segment is:
 
 /* copy bytes requested by the client */
 void
 storeClientCopy(store_client * sc,
 StoreEntry * e,
 off_t seen_offset,
 off_t copy_offset,
 size_t size,
 char *buf,
 STCB * callback,
 void *data)
 {
 assert(!EBIT_TEST(e-flags, ENTRY_ABORTED));
 debug(20, 3) (storeClientCopy: %s, seen %d, want %d, size %d, cb
 %p, cbdata %p\n,
 storeKeyText(e-hash.key),
 (int) seen_offset,
 (int) copy_offset,
 (int) size,
 callback,
 data);
 assert(sc != NULL);
 #if STORE_CLIENT_LIST_DEBUG
 assert(sc == storeClientListSearch(e-mem_obj, data));
 #endif
 assert(sc-callback == NULL);
 assert(sc-entry == e);
 sc-seen_offset = seen_offset;
 sc-callback = callback;
 sc-copy_buf = buf;
 sc-copy_size = size;
 sc-copy_offset = copy_offset;
 storeClientCopy2(e, sc);
 }
 
 Does the above mean anything to anybody?  How can I get a better
 indication of where the segment violation is occurring?  

No segment violation is occuring. A logic violation is occuring and
triggering an assert. asserts are used to detect programmer error -
where the programmer has either misued an internal API, or hasn't
covered some corner case and that resulted in inconsistent internal
state.

 And please no
 lectures about source code hacks by commercial vendors!  :-)

I won't lecture you, but I also can't support you as I don't know what
the code you are running looks like. Nor do I want to know.

 I am also running squid 2.5STABLE1 on another server under Solaris 2.6
 without SmartFilter or Ad Zapper.  It hasn't missed a beat.

Right. That should give you a clue :}.

Rob
-- 
GPG key available at: http://users.bigpond.net.au/robertc/keys.txt.


signature.asc
Description: This is a digitally signed message part


RE: [squid-users] Squid 2.5 and SmartFilter causing frequentcrashes

2003-03-18 Thread Robert Collins
On Wed, 2003-03-19 at 14:22, Lightfoot.Michael wrote:

Your email is ( I think ) taken as intended

   Does the above mean anything to anybody?  How can I get a better 
   indication of where the segment violation is occurring?
  
  No segment violation is occuring. A logic violation is 
  occuring and triggering an assert. asserts are used to detect 
  programmer error - where the programmer has either misued an 
  internal API, or hasn't covered some corner case and that 
  resulted in inconsistent internal state.
  
 Well can you tell me whether this is a squid code problem or a
 SmartFilter problem?

Not for sure, no. But I really do believe it is a
SmartFilter-patch-to-squid problem.

   And I _know_ this is a different error to the
 non-committal one about a segment violation.  I thought they _might_ be
 related.

It's quite likely that they are.

   Neither produces a core file so I can't even get past square
 one.

Ah. Well, if you can trigger the segment violation under gdb you will be
able to enter 'bt' to get a back trace. There is a FAQ entry on creation
of core files (http://www.squid-cache.org/Doc/FAQ/FAQ-11.html#ss11.19).
I'm not a solaris guru, I can't be of much help there sight unseen I'm
afraid.

   And please no
   lectures about source code hacks by commercial vendors!  :-)
  
  I won't lecture you, but I also can't support you as I don't 
  know what the code you are running looks like. Nor do I want to know.
  
 And I won't bother you with it, except to note that store_client.c line
 201 and the surrounding code is not changed by the SmartFilter patches.
 In fact none of the store*.c files are changed by SmartFilter.  Perhaps
 this assert is really a squid problem, triggered by something that the
 SmartFilter code does?

Until we get the same assert from a non-smartfilter squid-2.5, I'm
assuming that the assert is triggered by the SmartFilter code. It's
highly likely that something they do is causing this, and it's only the
squid code that is detecting it. store_client is used most heavily by
client_side.c. store_client provides the logic to grab data from
*either* the disk or the upstream server.

To troubleshoot, we really need to know whats in the stack frames
immediately leading into the assert (and likewise for the segment
violation). Running under gdb is a good way to get this if you cannot
get a core file). There is a FAQ entry on how to get gdb to do the right
thing as well.

  Right. That should give you a clue :}.
  
 And that isn't a lecture, smiley or not?

I didn't think it was. I was indicating that the 99.% probable cause
of the fault is in the proprietary code. I wasn't intending to talk
about the ethics or political issues relating to proprietary patches to
GPL code. I could, if you wanted me to - but that is best for a
different list, at a different time.

 Robert, you gave me a useless answer last time as well.  Please don't be
 so arrogant.

This I take some offence to. I wasn't intending arrogance. I am trying
to point you in the right direction.

In terms of a useless reply, my previous one
(http://www.squid-cache.org/mail-archive/squid-users/200302/0016.html)
is much less than useless. It's concise yes, but:
Your version stamp in the previously reported fault was prior to the
aufs bugfix that is on the 2.5 errata page
(http://www.squid-cache.org/Versions/v2/2.5/bugs/).

Telling you to apply the errata was apparently a waste of time, or you
might not have called that reply useless. And in fact, it appears to
have reduced the rate of the segfault you where experiencing, no?

As to telling you to create a core when you said that one was not being
created, you had not specified that you had trouble *getting one*,
rather that one was *not being created*. For all I knew you had your
ulimits were/are too low.

Please, take my advice *before* calling me arrogant. If the advice is
wrong: then yes, call me that, and it may be accurate. Calling me
arrogant when you hadn't followed through on the advice you got is,
well. Silly.

   What I am looking for is some guidance on how I can beat
 Secure Computing about the head.  They are being largely unhelpful, you
 are being wholly unhelpful.
   Maybe you could read my message again and
 try to give me an answer to my question: What can I do to get better
 diagnostics?

To get better diagnostics you need to:
* get a stack trace. Somehow. Anyhow. 

Thats the key initial step.

Alternatively, you could up the debug levels via the debug_options
squid.conf statement, which will be *very* noisy in the log files, but
may give you *some* insight.

I am sympathetic to your plight with Secure Computing. I really am. But
there really is very little I can do: I can't examine your source
without being tainted. I can't refer to my source here to provide
guidance. And without a stack trace, I cannot even *guess* at the root
cause of the fault.

What I can say, I pretty much already have.

I make my living of squid consulting, and it