On Wed, 2015-02-25 at 09:04 -0500, Michael Goulish wrote:
> Good point!  I'm afraid it will take me the rest of my life
> to reproduce under valgrind .. but ... I'll see what I can do....

Try this in your environment:
 export MALLOC_PERTURB_=66
That will cause malloc to immediately fill freed memory with 0x42 bytes
so it is obvious when you gdb the core dump if someone is using freed
memory. 

It's not as informative as valgrind but has no peformance impact that I
can detect, and it often helps to crash faster and closer to the real
problem. Freed memory can hold valid-seeming values for a while so your
code may not notice immediately, whereas 4242424242 is rarely valid for
anything. 
 
> In the meantime -- I'm not sure what to do with a Jira if the
> provenance is in doubt...

Maybe just put a note on it till we know more.

> 
> 
> ----- Original Message -----
> > This isn't necessarily a proton bug. Nothing in the referenced checkin
> > actually touches the logic around allocating/freeing error strings, it
> > merely causes pn_send/pn_recv to make use of pn_io_t's pn_error_t where
> > previously it threw away the error information. This would suggest that
> > there is perhaps a pre-existing bug in dispatch where it is calling
> > pn_send/pn_recv with a pn_io_t that has been freed, and it is only now
> > triggering due to the additional asserts that are encountered due to not
> > ignoring the error information.
> > 
> > I could be mistaken, but I would try reproducing this under valgrind. That
> > will tell you where the first free occurred and that should hopefully make
> > it obvious whether this is indeed a proton bug or whether dispatch is
> > somehow freeing the pn_io_t sooner than it should.
> > 
> > (FWIW, if it is indeed a proton bug, then I would agree it is a blocker.)
> > 
> > --Rafael
> > 
> > On Wed, Feb 25, 2015 at 7:54 AM, Michael Goulish <mgoul...@redhat.com>
> > wrote:
> > 
> > > ...but if not, somebody please feel free to correct me.
> > >
> > > The Jira that I just created -- PROTON-826 -- is for a
> > > bug I found with my topology testing of the Dispatch Router,
> > > in which I repeatedly kill and restart a router and make
> > > sure that the router network comes back to the same topology
> > > that it had before.
> > >
> > > As of checkin 01cb00c -- which had no Jira -- it is pretty
> > > easy for my test to blow core.  It looks like an error
> > > string is being double-freed (maybe) in the proton library.
> > >
> > > ( full info in the Jira.  https://issues.apache.org/jira/browse/PROTON-826
> > > )
> > >
> > >
> > >
> > 


Reply via email to