[ 
https://issues.apache.org/jira/browse/PROTON-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541496#comment-17541496
 ] 

Clifford Jansen commented on PROTON-2543:
-----------------------------------------

Thank you for the bug report and suggested patch.

Unfortunately your suggested fix targets the symptom you are seeing but not the 
underlying problem.

It should never be possible that p->resched_cutoff is non-null while 
p->polled_resched_count is zero, so your code should have no effect.  Yet we 
know it does.

The patch allows the proactor to keep running even though one of its critical 
scheduling lists is in an undefined state.  This could lead to crashes or hangs 
even further removed from the actual problem.

Have you tried running your reproducer with a "Debug" CMake build?  There are 
several asserts in the code that might catch the broken list earlier or point 
us closer to a good place to look.

Alternatively, can your reproducer be pared down and shared in this JIRA?

Otherwise, is it possible for you to trigger the bug using rr?  In the crash 
analysis is should be possible to check for the point at which the list looses 
its integrity from the most recent poller_do_epoll() to a subsequent 
resched_pop_front().

> Crash in epoll.c resched_pop_front
> ----------------------------------
>
>                 Key: PROTON-2543
>                 URL: https://issues.apache.org/jira/browse/PROTON-2543
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>            Reporter: Fredrik Hallenberg
>            Assignee: Clifford Jansen
>            Priority: Major
>         Attachments: qpid-epoll-crash.patch
>
>
> During stress testing it is fairly easy to reproduce a segfault in 
> resched_pop_front. Using gdb it is easy to see that polled_resched_front can 
> be zero when entering this function which causes the value to wrap and then a 
> crash in later calls.
> polled_resched_front is not checked when calling this function in one 
> instance, the trivial fix to check this value is seen in the attached patch 
> seems to work.
> Tested with Qpid Proton C++ 0.37.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to