I'll jump in and mention that I noticed the same error while I was
testing a couple of diffs I sent to tech@ a while back.  Although in
my case, since I was operating on a test box with no load whatsoever,
the way the bug manifested itself was by doing "relayctl reload"
multiple times in quick succession.  When I was testing, it only
happened when I had relays that used SSL in some way, and the stack
trace ended somewhere in the bowels of OpenSSL, which was way over my
head.  If it's helpful I can fire up my test bed and see if I can
capture the stack trace again.

It sounds similar to what you're seeing, Zack, but if it's not the
same I apologize for my unintentional thread-jacking.  The error that
I logged a while back was "relayd in free(): error: chunk is already
free 0x203490200"; however I don't know if that was with with the
stock relayd or after me trying to fix it.  I'll try again with a
stock relayd later on tonight and report my results.

(Zack, your dmesg and relayd.conf didn't come through--send them
inline, not as attachments.)

I'm late to a meeting, but if a dmesg and/or relayd.conf are
requested, I can and will provide them later.

--
Seth



On Wed, Jan 18, 2012 at 2:24 PM, Zack G. <posixb...@gmail.com> wrote:
> Here's what I can tell you:
>
> When the system is under high PPS load, it relayd seems to restart
> (and frequently at that)! unless I significantly raise the check
> delays and timeouts.  Otherwise, relayd functions normally (excepting
> the lost hce child) with the lower, more preferable values.
>
> This bug is elusive as hell and doesn't rear its head often.  But,
> when it does, it usually does this repeatedly and continuously.  I use
> a command to auto-restart relayd when it signal 6's and the output
> ends up looking like:
>
> Tue Jan 17 13:33:40 MST 2012
> restarted
> Tue Jan 17 13:34:04 MST 2012
> restarted
> Tue Jan 17 13:34:28 MST 2012
> restarted
> Tue Jan 17 13:34:56 MST 2012
> restarted
> Tue Jan 17 13:35:06 MST 2012
> restarted
> Tue Jan 17 13:35:24 MST 2012
> restarted
> Tue Jan 17 13:35:48 MST 2012
> restarted
> Tue Jan 17 13:35:55 MST 2012
> restarted
> Tue Jan 17 13:36:20 MST 2012
> restarted
>
> So, as you can see, this occurs rather frequently during high load PPS
> load times.
>
> The error I see when running relayd with -dv is:
>
> relayd in free(): error: bogus pointer (double free?) 0x206ac8000
> lost child: hce terminated; signal 6
> pfe exiting, pid 12691
> relay exiting, pid 31468
> relay exiting, pid 5714
> relay exiting, pid 2319
> relay exiting, pid 19145
> relay exiting, pid 20233
> parent terminating, pid 5977
>
> dmesg.boot.bz2 as requested by the FAQ is attached.
>
> I've also included a copy of the relayd.conf.bz2.
>
> I wish I could provide you with more information, but, this is as much
> as I can provide at this point in time.  Unfortunately, this problem
> is most of an issue on our production router (as it's the only one
> that receives such high traffic at any given point in time).  I can't
> tweak around with it enough to get further trace information and I
> don't have the time/resources to dig further into this issue at the
> moment.
>
> I hope this is enough to get started on the bug.  If you need any more
> information from me on my environment, I will do my best to get it for
> you.
>
> Happy hacking and all the best,
>
> Zack
>
> [demime 1.01d removed an attachment of type application/x-bzip2 which had a
name of relayd.conf.bz2]
>
> [demime 1.01d removed an attachment of type application/x-bzip2 which had a
name of dmesg.boot.bz2]

Reply via email to