Anton Tilstra wrote:
>>> Lars Stavholm wrote:
>>>
>>> dspam has served us well for a good while,
>>> but on occasion it fails with:
>>>
>>> dspam[11075]: segfault at 00002aaaaad8ca68 rip 000000000041ea8b rsp
>>> 0000000040ffae88 error 4
>>>
>>> ...and I have no idea how to even start trying to solve the problem,
>>> whatever the problem is. Can anyone give me some advice on this one?
>>>
>>> Any input appreciated, I haven't got a clue at this point.
>> Apologies, I left out some crucial info:
>>
>> Platform: SuSE Linux 10.2 (64 bit)
> 
> I'm by no means an expert, but if I didn't have anything else to go on,
> I would see if I could isolate the source of the problem as best as I
> could (e.g. hardware or software).
> 
> First I think I would make sure it's not a hardware issue, and RAM would
> be my first suspect. Do you have the option to temporarily run the
> machine with different memory modules (borrowed from another machine
> perhaps), or take one out at a time to troubleshoot? Or, the best way
> that I know to "test" memory in the machine is to actually do a Linux
> kernel compilation - I have seen this cause segfaults due to bad memory
> where memory diagnostics (memtest86 specifically) did not find errors
> even after extended testing. If a kernel compilation causes segfaults,
> especially when it's not consistently at the same place each time,
> there's a good change you have a bad memory stick.

I don't think it's a hardware issue, since we're using the same
binarieson three different 64 bit SuSE Linux 10.2 boxes, and they
all give the same symptom.

> Something else to check out on the hardware side may be an overheating
> problem, but that's probably more of a long shot. Is there a common
> denominator that you have found with the segfaults such as peak system
> loads? Can you monitor the system's temperature?

We are constantly monitoring these machines temperature,
and we have found no problems so far.

>> Build options:
>> ./configure --prefix=/usr
>>             --bindir=/usr/sbin
>>             --sysconfdir=/etc
>>             --libexecdir=/usr/lib
>>             --libdir=/usr/lib64
>>             --with-dspam-home=/var/lib/dspam
>>             --mandir=/usr/share/man
>>             --enable-daemon
>>             --enable-debug
>>             --enable-clamav
>>             --enable-syslog
>>             --disable-trusted-user-security
>>
>> I've enabled Debug mode, but the log shows absolutely nothing
>> of any use at all, just normal processing messages suddenly
>> interrupted (by the segfault).
>>
>> While writing this and thinking about it, we've used a 32 bit
>> successfully, i.e. same source different build. Any chance this
>> might be some 64 bit platform issue maybe?
> 
> I'm not sure either way about the 64 bit platform itself being an issue.
> Do you have other machines running the same platform? Are there any
> other programs at all that segfault, either on this machine or another
> 64 bit one?

As mentioned above.

> One thing that comes to mind in that department is a compiler issue (not
> my first guess, but who knows), assuming I'm understanding correctly
> that you've compiled it yourself on this machine. If so, is there a 64
> bit SuSE package of DSPAM you could try to run? This would of course be
> pretty involved, so I don't think I would do that on a production box.

To my knowledge there's no dspam rpm package available for these
platforms, that's why we built our own. Also, we do a 32 bit build
as well, and that one works just fine.

> Anyway, just some thoughts off the top of my head, I hope you find the
> problem.

Thank you for your input, it is much appreciated when there's
almost nothing to go on.

Thank you
/Lars

Reply via email to