[ 
https://issues.apache.org/jira/browse/PROTON-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489339#comment-16489339
 ] 

ASF GitHub Bot commented on PROTON-1842:
----------------------------------------

GitHub user cliffjansen opened a pull request:

    https://github.com/apache/qpid-proton/pull/145

    Potential chained epoll impl for PROTON-1842

    Removes temp fix, adds chained/secondary epollfd_2.
    
    Still has failure in helgrind threaderciser racecheck.  Still unsure if a 
problem or false positive.
    
    Previous temp fix also has the same threaderciser error, so at least not a 
regression.
    
    I have not been able to detect any change in performance with my various 
epoll proactor load tests.
    
    Instrumented runs show the secondary/chained arming occurs about 0.1% in 
proton ctest.  It is closer to 0.001% in dispatch router runs under heavy load, 
but that may be less indicative of the overall frequency.
    
    Chaining is unlikely to occur if socket output does not fill the kernel 
buffer (!EWOULDBLOCK), or if a write event mask is desired as a result of input 
(say a flow event) and EPOLLIN is also necessary with the EPOLLOUT.
    
    Chaining *is* likely to occur if the socket is quiet (i.e. no recent output 
and waiting on EPOLLIN), and the pconnection gets a pn_connection_wake to do 
enough output to fill the kernel output buffer.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cliffjansen/qpid-proton p1842_1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/qpid-proton/pull/145.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #145
    
----
commit 1f43f37149e56180424ba0434a98d812ed7ebfb7
Author: Cliff Jansen <cjansen@...>
Date:   2018-05-23T21:45:56Z

    PROTON-1842: revert 79d9019 temporary mitigation for follow-on long term fix

commit ca5c1b55f58aecab5be3c19e14de1d799f92307d
Author: Cliff Jansen <cjansen@...>
Date:   2018-05-24T15:46:37Z

    PROTON-1842: epoll proactor - add secondary/chained epollfd to maintain 1-1 
count between epoll registrations and eventual callbacks on pconnections

----


> [c] Dispatch/Proton crashes when opening/closing connections
> ------------------------------------------------------------
>
>                 Key: PROTON-1842
>                 URL: https://issues.apache.org/jira/browse/PROTON-1842
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: proton-c-0.22.0
>            Reporter: Chuck Rolke
>            Assignee: Alan Conway
>            Priority: Major
>         Attachments: helloworld.cpp, race.tsan, race.vg
>
>
> Using proton cpp example code that is modified to open and close connections 
> by the thousands in the main loop and having the event loop short circuit any 
> messaging with:
> {{  void on_connection_open(proton::connection& c) {}}
> {{      c.close();}}
> {{  }}}
> and then directing this client example to a dispatch router 1.1.0. Eventually 
> (after 100,000 to 1,000,000 connection open/closes) the router crashes with:
> {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:466: 
> wake_pop_front: Assertion `p->wakes_in_progress' failed.}}
> and with:
> {{qdrouterd: /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014: 
> proactor_do_epoll: Assertion `ee->type == PCONNECTION_TIMER' failed.}}
> This issue seems to happen only with qpid-dispatch accepting the open/close 
> event stream. Proton cpp example _server_direct_ and c example _direct_ work 
> properly with the same open/close event stream mounting into the 10s of 
> millions of connections.
> A core dump backtrace with the PCONNECTION_TIMER failure reads as:
> {{(gdb) bt}}
> {{#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51}}
> {{#1  0x00007f795c712c41 in __GI_abort () at abort.c:79}}
> {{#2  0x00007f795c709f7a in __assert_fail_base (fmt=0x7f795c85a260 
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
> assertion=assertion@entry=0x7f795d72e15a "ee->type == PCONNECTION_TIMER", }}
> {{    file=file@entry=0x7f795d72de98 
> "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=line@entry=2014, }}
> {{    function=function@entry=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> 
> "proactor_do_epoll") at assert.c:92}}
> {{#3  0x00007f795c709ff2 in __GI___assert_fail (assertion=0x7f795d72e15a 
> "ee->type == PCONNECTION_TIMER", file=0x7f795d72de98 
> "/home/chug/git/qpid-proton/c/src/proactor/epoll.c", line=2014, }}
> {{    function=0x7f795d72e320 <__PRETTY_FUNCTION__.6307> "proactor_do_epoll") 
> at assert.c:101}}
> {{#4  0x00007f795d72d29f in proactor_do_epoll (p=0x26b7310, can_block=true) 
> at /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2014}}
> {{#5  0x00007f795d72d30e in pn_proactor_wait (p=0x26b7310) at 
> /home/chug/git/qpid-proton/c/src/proactor/epoll.c:2030}}
> {{#6  0x00007f795dbe89ad in thread_run (arg=0x26be750) at 
> /home/chug/git/qpid-dispatch/src/server.c:946}}
> {{#7  0x00007f795d50e50b in start_thread (arg=0x7f794f486700) at 
> pthread_create.c:465}}
> {{#8  0x00007f795c7d216f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to