Anton Tilstra wrote: >>> Lars Stavholm wrote: >>> >>> dspam has served us well for a good while, >>> but on occasion it fails with: >>> >>> dspam[11075]: segfault at 00002aaaaad8ca68 rip 000000000041ea8b rsp >>> 0000000040ffae88 error 4 >>> >>> ...and I have no idea how to even start trying to solve the problem, >>> whatever the problem is. Can anyone give me some advice on this one? >>> >>> Any input appreciated, I haven't got a clue at this point. >> Apologies, I left out some crucial info: >> >> Platform: SuSE Linux 10.2 (64 bit) > > I'm by no means an expert, but if I didn't have anything else to go on, > I would see if I could isolate the source of the problem as best as I > could (e.g. hardware or software). > > First I think I would make sure it's not a hardware issue, and RAM would > be my first suspect. Do you have the option to temporarily run the > machine with different memory modules (borrowed from another machine > perhaps), or take one out at a time to troubleshoot? Or, the best way > that I know to "test" memory in the machine is to actually do a Linux > kernel compilation - I have seen this cause segfaults due to bad memory > where memory diagnostics (memtest86 specifically) did not find errors > even after extended testing. If a kernel compilation causes segfaults, > especially when it's not consistently at the same place each time, > there's a good change you have a bad memory stick.
I don't think it's a hardware issue, since we're using the same binarieson three different 64 bit SuSE Linux 10.2 boxes, and they all give the same symptom. > Something else to check out on the hardware side may be an overheating > problem, but that's probably more of a long shot. Is there a common > denominator that you have found with the segfaults such as peak system > loads? Can you monitor the system's temperature? We are constantly monitoring these machines temperature, and we have found no problems so far. >> Build options: >> ./configure --prefix=/usr >> --bindir=/usr/sbin >> --sysconfdir=/etc >> --libexecdir=/usr/lib >> --libdir=/usr/lib64 >> --with-dspam-home=/var/lib/dspam >> --mandir=/usr/share/man >> --enable-daemon >> --enable-debug >> --enable-clamav >> --enable-syslog >> --disable-trusted-user-security >> >> I've enabled Debug mode, but the log shows absolutely nothing >> of any use at all, just normal processing messages suddenly >> interrupted (by the segfault). >> >> While writing this and thinking about it, we've used a 32 bit >> successfully, i.e. same source different build. Any chance this >> might be some 64 bit platform issue maybe? > > I'm not sure either way about the 64 bit platform itself being an issue. > Do you have other machines running the same platform? Are there any > other programs at all that segfault, either on this machine or another > 64 bit one? As mentioned above. > One thing that comes to mind in that department is a compiler issue (not > my first guess, but who knows), assuming I'm understanding correctly > that you've compiled it yourself on this machine. If so, is there a 64 > bit SuSE package of DSPAM you could try to run? This would of course be > pretty involved, so I don't think I would do that on a production box. To my knowledge there's no dspam rpm package available for these platforms, that's why we built our own. Also, we do a 32 bit build as well, and that one works just fine. > Anyway, just some thoughts off the top of my head, I hope you find the > problem. Thank you for your input, it is much appreciated when there's almost nothing to go on. Thank you /Lars
