[ 
https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307412#comment-14307412
 ] 

ASF subversion and git services commented on DISPATCH-106:
----------------------------------------------------------

Commit 1657604 from [~mgoulish] in branch 'dispatch/trunk'
[ https://svn.apache.org/r1657604 ]

DISPATCH-106 : pn link corruption after router restart

Events must be processed one more time on a dead connector,
because those events are what makes the stale links gets
cleaned up.

> pn link corruption after router restart
> ---------------------------------------
>
>                 Key: DISPATCH-106
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-106
>             Project: Qpid Dispatch
>          Issue Type: Bug
>          Components: Router Node
>    Affects Versions: 0.3
>            Reporter: michael goulish
>             Fix For: 0.4
>
>
> With the standard 6-node demo network,  (A-D, X, Y)  after killing and 
> restarting node Y, I see a bad link on router D -- which causes D to crash.
> Here is sequence of events from logs of routers and the topologist testing 
> program:
>   01:05:05.367 Killing router Y, pid 20074
>   01:05:05.367 Sleeping 30 seconds
>   01:05:35.367 Restarting router Y, pid 20120
>   01:05:38     Router D : last "valid origins" post to its log file :
>                Node QDR.C valid origins: []
>   01:05:46     Router D posts to its log file:
>                Exited Router Flux Mode
>   01:06:05.368 checking for crash after node bounce
>                ( no crash detected )
>   01:06:17     last post to router D log file
>                ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 
> ls_seq=2 mobile_seq=0)
>   01:06:35.369 second check for crash. (none detected)
>   01:06:35.370 getting topology
>                ( Node D fails to respond.  PID 20072 )
>                ( core file, timestamped 01:06 )
>   here is backtrace from router D's core file
>   {
>     #0  pn_string_get (string=0xfdfdfdfdbabecafe) at 
> /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120
>     #1  0x00007ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at 
> /home/mick/dispatch/src/router_agent.c:112
>     #2  0x00007ff73fa8e7dd in qd_entity_refresh_router_link 
> (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
>         at /home/mick/dispatch/src/router_agent.c:120
>     #3  0x0000003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6
>     #4  0x0000003e408056bc in ffi_call () from /lib64/libffi.so.6
>     #5  0x00007ff737d2dc8b in _ctypes_callproc () from 
> /usr/lib64/python2.7/lib-dynload/_ctypes.so
>     #6  0x00007ff737d27a85 in PyCFuncPtr_call () from 
> /usr/lib64/python2.7/lib-dynload/_ctypes.so
>     #7  0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
>     #8  0x00000036df4de37c in PyEval_EvalFrameEx () from 
> /lib64/libpython2.7.so.1.0
>     #9  0x00000036df4e21dd in PyEval_EvalCodeEx () from 
> /lib64/libpython2.7.so.1.0
>     #10 0x00000036df4e088f in PyEval_EvalFrameEx () from 
> /lib64/libpython2.7.so.1.0
>     #11 0x00000036df4e21dd in PyEval_EvalCodeEx () from 
> /lib64/libpython2.7.so.1.0
>     #12 0x00000036df4e088f in PyEval_EvalFrameEx () from 
> /lib64/libpython2.7.so.1.0
>     #13 0x00000036df4e21dd in PyEval_EvalCodeEx () from 
> /lib64/libpython2.7.so.1.0
>     #14 0x00000036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0
>     #15 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
>     #16 0x00000036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0
>     #17 0x00000036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
>     #18 0x00000036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0
>     #19 0x00000036df44a29e in PyObject_CallFunction () from 
> /lib64/libpython2.7.so.1.0
>     #20 0x00007ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, 
> msg=0x7ff728019bd0, link_id=0
>         at /home/mick/dispatch/src/python_embedded.c:519
>     #21 0x00007ff73fa92533 in router_rx_handler (context=0x1db5fd0, 
> link=0x7ff730008710, delivery=0x7ff73004cc50)
>         at /home/mick/dispatch/src/router_node.c:922
>     #22 0x00007ff73fa7fa16 in do_receive (pnd=0x1e359a0) at 
> /home/mick/dispatch/src/container.c:221
>     #23 0x00007ff73fa7fea3 in process_handler (container=0x1dbd6f0, 
> unused=0x1e0a050, qd_conn=0x1e2c6a0)
>         at /home/mick/dispatch/src/container.c:362
>     #24 0x00007ff73fa80135 in handler (handler_context=0x1dbd6f0, 
> conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS,
>         qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438
>     #25 0x00007ff73fa98346 in process_connector (qd_server=0x1d78460, 
> cxtr=0x1e1b9b0)
>         at /home/mick/dispatch/src/server.c:322
>     #26 0x00007ff73fa98c1f in thread_run (arg=0x1d70d30) at 
> /home/mick/dispatch/src/server.c:546
>     #27 0x0000003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0
> ...
> }
>   Let's go up to qd_router_link_name
>   at /home/mick/dispatch/src/router_agent.c:112
>   (gdb) print * link
>         $1 =
>         {
>           prev = 0x7ff72800b210,
>           next = 0x7ff72800b390,
>           mask_bit = 3,
>           link_type = QD_LINK_ROUTER,
>           link_direction = QD_OUTGOING,
>           owning_addr = 0x1d7d6c0,
>           waypoint = 0x0,
>           link = 0x7ff7280099d0,
>           connected_link = 0x0,
>           ref = 0x7ff72800f350,
>           target = 0x0,
>           event_fifo =
>           {
>             head = 0x0,
>             tail = 0x0,
>             scratch = 0x0,
>             size = 0
>           },
>           msg_fifo =
>           {
>             head = 0x7ff73003c230,
>             tail = 0x7ff73003bb70,
>             scratch = 0x7ff73003b9f0,
>             size = 102
>           }
>         }
>   (gdb) print * (link->link)
>         $2 =
>         {
>           pn_sess = 0x7ff72804b7b0,
>           pn_link = 0x7ff72804d6a0,
>           context = 0x7ff72800b2d0,
>           node = 0x1db6bb0,
>           drain_mode = false
>         }
>   (gdb) print * (link->link->pn_link)
> $3 = {
>   endpoint = {
>     type = 33686018,
>     state = 33686018,
>     error = 0x202020202020202,
>     condition = {
>       name = 0x202020202020202,
>       description = 0x202020202020202,
>       info = 0x202020202020202
>     },
>     remote_condition = {
>       name = 0x202020202020202,
>       description = 0x202020202020202,
>       info = 0x202020202020202
>     },
>     endpoint_next = 0x202020202020202,
>     endpoint_prev = 0x202020202020202,
>     transport_next = 0x202020202020202,
>     transport_prev = 0x202020202020202,
>     modified = 2,
>     freed = 2,
>     posted_final = 2
>   },
>   source = {
>     address = 0x202020202020202,
>     properties = 0x202020202020202,
>     capabilities = 0x202020202020202,
>     outcomes = 0x202020202020202,
>     filter = 0x202020202020202,
>     durability = (PN_DELIVERIES | unknown: 33686016),
>     expiry_policy = 33686018,
>     timeout = 33686018,
>     type = 33686018,
>     distribution_mode = (PN_DIST_MODE_MOVE | unknown: 33686016),
>     dynamic = 2
>   },
>   target = {
>     address = 0x202020202020202,
>     properties = 0x202020202020202,
>     capabilities = 0x202020202020202,
>     outcomes = 0x202020202020202,
>     filter = 0x202020202020202,
>     durability = (PN_DELIVERIES | unknown: 33686016),
>     expiry_policy = 33686018,
> ( etc.  -- it's all garbage. )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to