[jira] [Created] (DISPATCH-2255) Investigate enable_mask for removal of malloc

2021-10-01 Thread michael goulish (Jira)
michael goulish created DISPATCH-2255:
-

 Summary: Investigate enable_mask for removal of malloc
 Key: DISPATCH-2255
 URL: https://issues.apache.org/jira/browse/DISPATCH-2255
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish


Find out how often enable_mask() is called in log.c

See if it would be practical to remove the malloc() and free() in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-1956) log.c rewrite to reduce locking scope

2021-09-28 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated DISPATCH-1956:
--
Summary: log.c rewrite to reduce locking scope  (was: Potential deadlock: 
logging lock vs entity cache lock)

> log.c rewrite to reduce locking scope
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.18.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-09-28 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1956:
-

Assignee: michael goulish  (was: Michael Goulish)

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.18.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-2173) 30-Mesh Behaving Badly

2021-09-28 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-2173.
-
Resolution: Won't Fix

It has been pointed out to me that a 30-mesh is not very realistic.

I was forced to admit that this was probably true.

> 30-Mesh Behaving Badly
> --
>
> Key: DISPATCH-2173
> URL: https://issues.apache.org/jira/browse/DISPATCH-2173
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>        Reporter: michael goulish
>    Assignee: michael goulish
>Priority: Major
>
> While testing scale-up of full-mesh networks I encountered some Bad Behavior 
> at 30 nodes. (435 connections.)
> On my first try, 15 of the routers died.
> On my second try, no nodes died – but the network never converged. It 
> consumed all available CPU (32 cores) for three minutes, and the 30 routers 
> printed a combined total of more than 1000 radius calculations to their logs 
> by the time I became wrathful and cast them all into the Bitbucket of Woe.
>  
> For reference, those radius calculations are how I decide that the network 
> has converged – everybody has settled down and agreed on the topology and 
> stopped talking about it. The last thing each router prints to its log is a 
> radius calculation, and then it's done. This may happen multiple times for 
> each router, but when the total number of such prints stops changing – the 
> network has converged.
>  
> For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this 
> test exceeded that by 25x, I decided it was never going to quit.
>  
> ...Now looking at the logs to see if I can figure out what was happening...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2252) Document router shutdown process

2021-09-23 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419066#comment-17419066
 ] 

michael goulish commented on DISPATCH-2252:
---

...And if I see along the way anything that clearly needs improvement or 
investigation, Jira that too.

> Document router shutdown process
> 
>
> Key: DISPATCH-2252
> URL: https://issues.apache.org/jira/browse/DISPATCH-2252
> Project: Qpid Dispatch
>  Issue Type: Improvement
>    Reporter: michael goulish
>        Assignee: michael goulish
>Priority: Minor
>
> Investigate the router shutdown process in detail, and produce a document in 
> the docs directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-2252) Document router shutdown process

2021-09-23 Thread michael goulish (Jira)
michael goulish created DISPATCH-2252:
-

 Summary: Document router shutdown process
 Key: DISPATCH-2252
 URL: https://issues.apache.org/jira/browse/DISPATCH-2252
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish


Investigate the router shutdown process in detail, and produce a document in 
the docs directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-2173) 30-Mesh Behaving Badly

2021-06-15 Thread michael goulish (Jira)
michael goulish created DISPATCH-2173:
-

 Summary: 30-Mesh Behaving Badly
 Key: DISPATCH-2173
 URL: https://issues.apache.org/jira/browse/DISPATCH-2173
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Reporter: michael goulish
Assignee: michael goulish


While testing scale-up of full-mesh networks I encountered some Bad Behavior at 
30 nodes. (435 connections.)

On my first try, 15 of the routers died.

On my second try, no nodes died – but the network never converged. It consumed 
all available CPU (32 cores) for three minutes, and the 30 routers printed a 
combined total of more than 1000 radius calculations to their logs by the time 
I became wrathful and cast them all into the Bitbucket of Woe.

 

For reference, those radius calculations are how I decide that the network has 
converged – everybody has settled down and agreed on the topology and stopped 
talking about it. The last thing each router prints to its log is a radius 
calculation, and then it's done. This may happen multiple times for each 
router, but when the total number of such prints stops changing – the network 
has converged.

 

For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this 
test exceeded that by 25x, I decided it was never going to quit.

 

...Now looking at the logs to see if I can figure out what was happening...

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-2122) Data race on alloc pool descriptor initialization

2021-06-14 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-2122:
-

Assignee: michael goulish  (was: Ken Giusti)

> Data race on alloc pool descriptor initialization
> -
>
> Key: DISPATCH-2122
> URL: https://issues.apache.org/jira/browse/DISPATCH-2122
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.16.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
>  Labels: race-condition, tsan
> Fix For: 1.17.0
>
>
> 65: WARNING: ThreadSanitizer: data race (pid=566240) 
> 65: Read of size 4 at 0x7f67599ae2c0 by thread T4: 
> 65: #0 qd_alloc 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:324 
> (libqpid-dispatch.so+0x6a1f2) 
> 65: #1 new_qd_link_ref_t 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:76 
> (libqpid-dispatch.so+0x79ae5) 
> 65: #2 qdr_node_connect_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:67 
> (libqpid-dispatch.so+0x121a78) 
> 65: #3 CORE_link_deliver 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1971 
> (libqpid-dispatch.so+0x127f1c) 
> 65: #4 qdr_link_process_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/transfer.c:178 
> (libqpid-dispatch.so+0x1045c6) 
> 65: #5 CORE_link_push 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1920 
> (libqpid-dispatch.so+0x127d00) 
> 65: #6 qdr_connection_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/connections.c:414 
> (libqpid-dispatch.so+0xc4bec) 
> 65: #7 AMQP_writable_conn_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:299 
> (libqpid-dispatch.so+0x122d42) 
> 65: #8 writable_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:395 
> (libqpid-dispatch.so+0x7b2e2) 
> 65: #9 qd_container_handle_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:747 
> (libqpid-dispatch.so+0x7cfd5) 
> 65: #10 handle /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1096 
> (libqpid-dispatch.so+0x130537) 
> 65: #11 thread_run 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1121 
> (libqpid-dispatch.so+0x13063a) 
> 65: #12 _thread_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:172 
> (libqpid-dispatch.so+0xad37a) 
> 65: #13   (libtsan.so.0+0x2d33f) 
> 65: 
> 65: Previous write of size 4 at 0x7f67599ae2c0 by thread T2 (mutexes: write 
> M10): 
> 65: #0 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:307 
> (libqpid-dispatch.so+0x6a14b) 
> 65: #1 qd_alloc 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:325 
> (libqpid-dispatch.so+0x6a20b) 
> 65: #2 new_qd_link_ref_t 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:76 
> (libqpid-dispatch.so+0x79ae5) 
> 65: #3 qdr_node_connect_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:67 
> (libqpid-dispatch.so+0x121a78) 
> 65: #4 CORE_link_deliver 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1971 
> (libqpid-dispatch.so+0x127f1c) 
> 65: #5 qdr_link_process_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/transfer.c:178 
> (libqpid-dispatch.so+0x1045c6) 
> 65: #6 CORE_link_push 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1920 
> (libqpid-dispatch.so+0x127d00) 
> 65: #7 qdr_connection_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/connections.c:414 
> (libqpid-dispatch.so+0xc4bec) 
> 65: #8 AMQP_writable_conn_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:299 
> (libqpid-dispatch.so+0x122d42) 
> 65: #9 writable_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:395 
> (libqpid-dispatch.so+0x7b2e2) 
> 65: #10 qd_container_handle_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:747 
> (libqpid-dispatch.so+0x7cfd5) 
> 65: #11 handle /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1096 
> (libqpid-dispatch.so+0x130537) 
> 65: #12 thread_run 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1121 
> (libqpid-dispatch.so+0x13063a) 
> 65: #13 _thread_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:172 
> (libqpid-dispatch.so+0xad37a) 
> 65: #14   (libtsan.so.0+0x2d33f)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357425#comment-17357425
 ] 

michael goulish commented on DISPATCH-1956:
---

I meant to close my *PR*.  Cripes.

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-1956.
-
Resolution: Fixed

Closing this one in favor of a better one coming shortly.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Reopened] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reopened DISPATCH-1956:
---

No, wait.  I didn't mean it to say 'fixed'.  Dang.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357401#comment-17357401
 ] 

michael goulish commented on DISPATCH-1956:
---

Hold on – I think I have a much better solution to this.  Need another hour or 
two...

 

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-03 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356529#comment-17356529
 ] 

michael goulish commented on DISPATCH-1956:
---

This might be an improvement in code logic, but it will introduce changes in 
behavior that are not relevant to this PR.  Indeed – when I tried it, I got a 
test failure. 

Any code clean-up like this suggestion should be pursued as part of a separate 
PR just for that purpose. And then we can fix whatever issues it may introduce 
as part of that PR.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-24 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350564#comment-17350564
 ] 

michael goulish commented on DISPATCH-1956:
---

Using Ken's reproducer, I cannot see exactly the same BT from latest master. 
But I see many reports of a similar cycle, so I will pick one of those and 
proceed.

 

Here it is:

65: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
65: Cycle in lock order graph: M11
65: 
65: Mutex M9 acquired here while holding mutex M11 in main thread:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57
65: #2 push_event src/entity_cache.c:61 
65: #3 qd_entity_cache_add src/entity_cache.c:67
65: #4 qd_log_source_lh src/log.c:373
65: #5 qd_log_source_lh src/log.c:362
65: #6 qd_log_source src/log.c:381 
65: #7 qd_log_initialize src/log.c:516
65: #8 qd_dispatch src/dispatch.c:90 
65: #9 main_process router/src/main.c:92
65: #10 main router/src/main.c:369
65: 
65: Mutex M11 previously acquired by the same thread here:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57
65: #2 qd_log_source src/log.c:380 
65: #3 qd_log_initialize src/log.c:516
65: #4 qd_dispatch src/dispatch.c:90 
65: #5 main_process router/src/main.c:92
65: #6 main router/src/main.c:369
65: 
65: Mutex M11 acquired here while holding mutex M9 in main thread:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57
65: #2 qd_vlog_impl src/log.c:436
65: #3 qd_log_impl src/log.c:462 
65: #4 qd_python_log src/python_embedded.c:545
65: #5   
65: #6 main_process router/src/main.c:97
65: #7 main router/src/main.c:369
65: 
65: Mutex M9 previously acquired by the same thread here:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57 
65: #2 qd_entity_refresh_begin src/entity_cache.c:78
65: #3 ffi_call_unix64  
65: #4 main_process router/src/main.c:97
65: #5 main router/src/main.c:369
65:

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> 

[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347716#comment-17347716
 ] 

michael goulish commented on DISPATCH-1956:
---

Thanks, Ken, that works!

I was commenting out this:

  #deadlock:qd_vlog_impl

I can see it now.

Tally ho!

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347377#comment-17347377
 ] 

michael goulish commented on DISPATCH-1956:
---

I will try the QE technique, and something I haven't tried before ... running 
multiple ctests at once!    Yow!

Except we're never going to establish the original frequency. It is unknowable. 
Imponderable. Ineffable.

SO I will run the test enough times to support a proof-by-vigorous-handwaving!

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347365#comment-17347365
 ] 

michael goulish commented on DISPATCH-1956:
---

I am trying to restore my 'mgoulish' RH account, but I need help from someone 
with magical powers.

 

I assumed that if I ran ctest, that would be sufficient. But now that you 
inform me that TSan issues do not reliably manifest, I will run ctest more 
times and see if I can get it to show itself.

But if we don't know how it was observed, nor with what frequency  how will 
we know when it is fixed?

 

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347330#comment-17347330
 ] 

michael goulish commented on DISPATCH-1956:
---

As of recent code on master, this is gone.

If I unsuppress the following issues:

{{   #race:qd_vlog_impl}}
{{   #deadlock:qd_vlog_impl}}
{{   #race:qd_log_entity}}

...and then run    {{ctest -VV}}    I get 2676 mentions of the qd_vlog_impl 
race (yikes!), 6 mentions of the qd_log_entity race, and 0 mentions of this 
qd_vlog_impl deadlock.

 

I guess this should be closed, but I do not seem to have permission to close it.

I will try to get my account fixed.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-29 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-2088.
-
Resolution: Fixed

Fixed by Chuck's PR:

https://github.com/apache/qpid-dispatch/pull/1174

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>        Reporter: michael goulish
>Assignee: Charles E. Rolke
>Priority: Blocker
> Fix For: 1.16.0
>
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-28 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334743#comment-17334743
 ] 

michael goulish commented on DISPATCH-2088:
---

Here you go!

 

 


(gdb) thread apply all bt

{color:#172b4d}Thread 33{color} (Thread 0x7fa320ff9640 (LWP 53393)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2fb60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

{color:#172b4d}Thread 32{color} (Thread 0x7fa2e8ff9640 (LWP 53408)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2a8000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
--Type  for more, q to quit, c to continue without paging--
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

{color:#172b4d}Thread 31{color} (Thread 0x7fa2e37fe640 (LWP 53409)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2bc000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7fa30effd640 (LWP 53396)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa30b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
--Type  for more, q to quit, c to continue without paging--
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7fa30f7fe640 (LWP 53395)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2fc000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7fa30cff9640 (LWP 53400)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2e4000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-di--Type  for more, q to quit, c to continue 
without paging--c
spatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

*{color:#de350b}Thread 27{color}* (Thread 0x7fa2eb7fe640 (LWP 53403)):
#0 0x7fa343d8350c in send () from /lib64/libpthread.so.0
#1 0x7fa343dbe718 in snd (s=512, b=, fd=25) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:333
#2 pni_raw_write (send=, set_error=, 
sock=, conn=) at 
/home/mick/latest/qpid-proton/c/src/proactor/raw_connection.c:566
#3 pni_raw_write (send=, set_error=, sock=25, 
conn=0x7fa2dc129cf0) at 
/home/mick/latest/qpid-proton/c/src/proactor/raw_connection.c:554
#4 pni_raw_connection_process (sched_ready=, t=0x7fa2dc129c30) 
at /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:388
#5 process (tsk=0x7fa2dc129c30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2230
#6 next_event_batch (p=, can_block=true) at 
/home/mick/latest/qpid-proton/c/src

[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-28 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334707#comment-17334707
 ] 

michael goulish commented on DISPATCH-2088:
---

 

I cannot repro with Debug build.

400 iterations with no failure.

 

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>        Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334451#comment-17334451
 ] 

michael goulish commented on DISPATCH-2088:
---

I'm afraid only the last few lines have anything in them.

 

 

2021-04-28 01:04:03.818860 -0400 ROUTER_CORE (info) [C190][L379] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818868 -0400 ROUTER_CORE (info) [C191][L380] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818877 -0400 ROUTER_CORE (info) [C191][L381] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818882 -0400 ROUTER_CORE (info) [C192][L382] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818893 -0400 ROUTER_CORE (info) [C192][L383] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818905 -0400 ROUTER_CORE (info) [C193][L384] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818913 -0400 ROUTER_CORE (info) [C193][L385] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818926 -0400 ROUTER_CORE (info) [C194][L386] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818931 -0400 ROUTER_CORE (info) [C194][L387] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818944 -0400 ROUTER_CORE (info) [C195][L388] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818949 -0400 ROUTER_CORE (info) [C195][L389] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818957 -0400 ROUTER_CORE (info) [C196][L390] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819046 -0400 ROUTER_CORE (info) [C196][L391] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819052 -0400 ROUTER_CORE (info) [C197][L392] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819059 -0400 ROUTER_CORE (info) [C197][L393] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819074 -0400 ROUTER_CORE (info) [C198][L394] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819081 -0400 ROUTER_CORE (info) [C198][L395] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819087 -0400 ROUTER_CORE (info) [C199][L396] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819096 -0400 ROUTER_CORE (info) [C199][L397] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:34.431844 -0400 TCP_ADAPTOR (info) [C181] 
PN_RAW_CONNECTION_DISCONNECTED connector
2021-04-28 01:04:34.431903 -0400 TCP_ADAPTOR (info) [C180] EOS
2021-04-28 01:04:34.431956 -0400 ROUTER_CORE (info) [C181][L361] Link lost: 
del=1 presett=1 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=1 blocked=no
2021-04-28 01:04:34.432011 -0400 ROUTER_CORE (info) [C181][L360] Link lost: 
del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=1 blocked=no
2021-04-28 01:04:34.432026 -0400 ROUTER_CORE (info) [C181] Connection Closed
2021-04-28 01:04:34.432479 -0400 TCP_ADAPTOR (info) [C183] 
PN_RAW_CONNECTION_DISCONNECTED connector
./r_one_router_Br: line 7: 27584 Segmentation fault (core dumped) qdrouterd 
--config ./Br_1.conf

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>        Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughp

[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1780#comment-1780
 ] 

michael goulish commented on DISPATCH-2088:
---

Apparently it helps if you let the code cool down a while.

I tried it again after a break and it crashed immediately – same backtrace.  ( 
And with "-p 10" on the iperf client. )

So that is 2 crashes in 42 attempts.

 

Here is my router config file:

router {
 mode: interior
 id: Br
 workerThreads: 32
 }

tcpListener {
 host: 10.10.10.1
 port: 9090
 address: throughput
 siteId: my-site
 }


 tcpConnector {
 host: 10.10.10.1
 port: 8080
 address: throughput
 siteId: my-site
 }

 

 

 

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333284#comment-17333284
 ] 

michael goulish commented on DISPATCH-2088:
---

*The iperf commands I used in the test:*

   iperf3 -s -p 8080    # server

   iperf3 -c 10.10.10.1 -p 9090 -t 60 -P 10    # client



( The router's TCP listener was on port 9090, while its TCP connector was on 
8080. )

 

*Reproducability:*   

 

  Not trivial.

  I reduced test time to 10 second and tried 40 more times – without success. 
10 of those trials were with 100 parallel threads on the iperf sender, and 10 
of them were with 200 parallel threads.

 

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>        Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)
michael goulish created DISPATCH-2088:
-

 Summary: SEGV in qd_buffer_dec_fanout
 Key: DISPATCH-2088
 URL: https://issues.apache.org/jira/browse/DISPATCH-2088
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Protocol Adaptors
Reporter: michael goulish


*code from 2021-04-26-afternoon*

{

  dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
  proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936

}

 

*Test*
 * Doing 1-router TCP throughput testing across high-bandwidth link.
 * Router has 32 worker threads.
 * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
 * Router is sustaining 10+ Gbit/sec during test.
 * SEGV happens at end of test.

 

Here's the backtrace:

 

{color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
{color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
{color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
{color:#de350b}#3 qd_message_stream_data_release 
(stream_data=0x7f01b80038c8){color}
{color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
{color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
(conn=conn@entry=0x7f0218012a88){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
{color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
(tc=0x7f0218012a88){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
{color:#de350b}#6 0x7f023707491d in router_core_thread 
(arg=0x1e6ccb0){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
{color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307
 ] 

michael goulish edited comment on PROTON-2362 at 4/1/21, 4:49 PM:
--

OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

With all actions enabled crash 10
{{ {{-no-close-connect    crash 12

{{-no-listen           }}{{crash 0 hang 2}}
{{ {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}}}
{{ {color:#de350b}{{-no-connect  NO PROBLEMS}}{color}}}
{{ {{-no-close-connect    crash 10 hang 2
{{ {{-no-wake crash 11
{{ {{-no-timeout  crash 11
 {{no-cancel-timeout    crash 12}}

 

 


was (Author: michaelgoulish):
OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
 {{-no-close-connect    crash 12}}

-no-listen                        crash 0 hang 2
 {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
 {color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
 {{-no-close-connect    crash 10 hang 2}}
 {{-no-wake crash 11}}
 {{-no-timeout  crash 11}}
 {{no-cancel-timeout    crash 12}}

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307
 ] 

michael goulish edited comment on PROTON-2362 at 4/1/21, 4:48 PM:
--

OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
 {{-no-close-connect    crash 12}}

-no-listen                        crash 0 hang 2
 {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
 {color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
 {{-no-close-connect    crash 10 hang 2}}
 {{-no-wake crash 11}}
 {{-no-timeout  crash 11}}
 {{no-cancel-timeout    crash 12}}

 

 


was (Author: michaelgoulish):
OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
{{-no-close-connect    crash 12}}

{{ -no-listen   crash 0 hang 2}}
{color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
{color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
{{-no-close-connect    crash 10 hang 2}}
{{-no-wake crash 11}}
{{-no-timeout  crash 11}}
{{no-cancel-timeout    crash 12}}

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307
 ] 

michael goulish commented on PROTON-2362:
-

OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
{{-no-close-connect    crash 12}}

{{ -no-listen   crash 0 hang 2}}
{color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
{color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
{{-no-close-connect    crash 10 hang 2}}
{{-no-wake crash 11}}
{{-no-timeout  crash 11}}
{{no-cancel-timeout    crash 12}}

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313151#comment-17313151
 ] 

michael goulish commented on PROTON-2362:
-

I am running batches of 50 threaderciser tests, 64 threads each, turning off 
one feature at a time, and counting failures.

See if you can spot the case, below, that I feel may be interesting.

 

All Features On   crash: 10

-no-close-connect    crash: 12

-no-listen   crash: 0    hang: 2

-no-close-listen  (y)   :)    (*)(*r)(*g) {color:#de350b}*NO PROBLEMS 
(*g)(*r)(*)    :)   (y)*      {color}

{color:#de350b}{color:#172b4d}~{color:#c1c7d0} 
(sorry, I can't figure out how to make the above text 
blink){color}~{color}{color} 

 

 

 

p.s.

    _"Brontosaurus"_ means _"Thunder Lizard"_, a kind of dinosaur.  

  I do not have a dinosaur.

  _"Brontonomicon",_ on the other hand, means _"What the Thunder Said"_    
or   _"Words of the Thunder"_   or possibly   _"The Book of Thunder"_.

  That's what I've got. 

  And when the thunder speaks, the software had better listen.

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>      Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-03-31 Thread michael goulish (Jira)
michael goulish created PROTON-2362:
---

 Summary: c-threaderciser timed out on 32-core machine.
 Key: PROTON-2362
 URL: https://issues.apache.org/jira/browse/PROTON-2362
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish


Using recent master – maybe 3 days old or so – I just ran Proton's ctest, after 
turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 threads.

 

Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.

( 1.5e18 femtoseconds. )

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-31 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312627#comment-17312627
 ] 

michael goulish commented on DISPATCH-2014:
---

I just ran Proton's ctest suite with the THREADERCISER turned on – on my box 
with 32 physical cores, 64 'threads'.

Test number 6 – "c-threaderciser" – timed out after 1500 seconds.

 

> Router TCP Adapter crash with high thread count and load
> 
>
> Key: DISPATCH-2014
> URL: https://issues.apache.org/jira/browse/DISPATCH-2014
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> Using latest proton and dispatch master code as of 3 hours ago.
> Testing router TCP adapter on a machine with 32 cores / 64 threads.
> I gave the router 64 worker threads, then used 'hey' load generator to send 
> it HTTP requests to a TCP listener which router forwarded to Nginx on same 
> machine. 
> Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
> sender throttled to 10 messages per second.
> It survived many tests, but crashed around test with 200 senders.
> I believe this is easily repeatable – I will go check that now.
>  
> Here is the thread that crashed:
> {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b} #2 process (tsk=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
> {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
>  
> {color:#172b4d}And here are all the threads:{color}
> {color:#de350b}(gdb) thread apply all bt{color}
> {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
> {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
> {color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
> batch=batch@entry=0x7f326811a578) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
> {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1140{color}
> {color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 process (tsk=) at 
> /home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, 
> c to continue without paging--{color}
> {color:#de350b}r/epoll.c:2248{color}
> {color:#de350b}#4 next_event_batch (p=, can_block=true) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
> {color:#de350b}#0 0x0

[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-22 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306496#comment-17306496
 ] 

michael goulish commented on DISPATCH-2014:
---

When I used 64 dispatch worker threads and hit it with 200 'hey' senders – each 
test 30 seconds long – it died 3 out of 4 times.  (SEGV)

 

When I went down to 32 dispatch worker threads, it survived 3 out of 3 tests 
with 200 senders, and then 3 out of 3 tests with 500 senders, and then 3 out of 
3 tests with 1000 senders.

 

> Router TCP Adapter crash with high thread count and load
> 
>
> Key: DISPATCH-2014
> URL: https://issues.apache.org/jira/browse/DISPATCH-2014
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>        Reporter: michael goulish
>Priority: Major
>
> Using latest proton and dispatch master code as of 3 hours ago.
> Testing router TCP adapter on a machine with 32 cores / 64 threads.
> I gave the router 64 worker threads, then used 'hey' load generator to send 
> it HTTP requests to a TCP listener which router forwarded to Nginx on same 
> machine. 
> Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
> sender throttled to 10 messages per second.
> It survived many tests, but crashed around test with 200 senders.
> I believe this is easily repeatable – I will go check that now.
>  
> Here is the thread that crashed:
> {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b} #2 process (tsk=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
> {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
>  
> {color:#172b4d}And here are all the threads:{color}
> {color:#de350b}(gdb) thread apply all bt{color}
> {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
> {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
> {color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
> batch=batch@entry=0x7f326811a578) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
> {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1140{color}
> {color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 process (tsk=) at 
> /home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, 
> c to continue without paging--{color}
> {color:#de350b}r/epoll.c:2248{color}
> {color:#de350b}#4 next_event_batch (p=, can_block=true) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#7 0x7f33181b2

[jira] [Created] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-22 Thread michael goulish (Jira)
michael goulish created DISPATCH-2014:
-

 Summary: Router TCP Adapter crash with high thread count and load
 Key: DISPATCH-2014
 URL: https://issues.apache.org/jira/browse/DISPATCH-2014
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Protocol Adaptors
Reporter: michael goulish


Using latest proton and dispatch master code as of 3 hours ago.

Testing router TCP adapter on a machine with 32 cores / 64 threads.

I gave the router 64 worker threads, then used 'hey' load generator to send it 
HTTP requests to a TCP listener which router forwarded to Nginx on same 
machine. 

Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
sender throttled to 10 messages per second.

It survived many tests, but crashed around test with 200 senders.

I believe this is easily repeatable – I will go check that now.

 

Here is the thread that crashed:

{color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
{color:#de350b} at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b} #2 process (tsk=){color}
{color:#de350b} at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
{color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
{color:#de350b} at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
{color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
{color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
{color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}

 

{color:#172b4d}And here are all the threads:{color}


{color:#de350b}(gdb) thread apply all bt{color}

{color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
{color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
{color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
{color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
batch=batch@entry=0x7f326811a578) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
{color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
/home/mick/latest/qpid-dispatch/src/server.c:1140{color}
{color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}

{color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
{color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b}#3 process (tsk=) at 
/home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, c 
to continue without paging--{color}
{color:#de350b}r/epoll.c:2248{color}
{color:#de350b}#4 next_event_batch (p=, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
{color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
/home/mick/latest/qpid-dispatch/src/server.c:1107{color}
{color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}

{color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
{color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
{color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
{color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
batch=batch@entry=0x7f32c8063af8) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
{color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40

Re: [VOTE] Release Qpid Dispatch Router 1.12.0 (RC1)

2020-04-28 Thread Michael Goulish
+1

It has now completed 120 million messages without problem in the following
test:

  * single router
  * 250 receivers, 250 senders, 250 addresses
  * senders in 5 groups, each throttled to send messages at different
intervals: {13, 17, 19, 23, 29} msec.
  * router has *64 worker threads* (machine has 64 threads)
  * average rate of messages at start of test is 12,500 per second. (Slows
down as senders start completing their assigned number of messages. (1
million) ).
  * messages are not presettled.

Also note -- *no memory growth.*




On Mon, Apr 27, 2020 at 5:49 PM Fernando Giorgetti 
wrote:

> +1
> - Executed integration tests with activemq artemis and tests are passing
> - Smoke tests against Openshift 3.11 and 4.3 (building a custom qdrouterd
> image)
> - DISPATCH-1626 - Verified new system test against RHEL6, 7 and 8
>
>
> On Mon, Apr 27, 2020 at 6:03 PM Ganesh Murthy 
> wrote:
>
> > On Mon, Apr 27, 2020 at 3:22 PM Ganesh Murthy 
> wrote:
> > >
> > > Hello All,
> > >
> > >  Please cast your vote on this thread to release RC1 as the
> > > official Qpid Dispatch Router version  1.12.0.
> > >
> > > RC1 of Qpid Dispatch Router version 1.12.0 can be found here:
> > >
> > > https://dist.apache.org/repos/dist/dev/qpid/dispatch/1.12.0-rc1/
> > >
> > > The following improvements, and bug fixes are introduced in 1.12.0:
> > >
> > I missed to list a major new feature -
> > >
> > New Feature -
> > DISPATCH-975 - Policy has no provision for limiting user message size
> >
> > My apologies.
> > Thanks.
> >
> > > Improvements -
> > > DISPATCH-1479 - multicast/routing behaviour doc improvements
> > > DISPATCH-1608 - Display workerThreads in the output of qdstat -g
> > > and qdmanage query --type=router
> > > DISPATCH-1611 - In debug mode, provide time and backtrace of
> > > leaked pool allocations
> > > DISPATCH-1615 - Backtrace of leaked allocations does not show
> object
> > address
> > > DISPATCH-1616 - Scraper could export facts for creating sequence
> > diagrams
> > > DISPATCH-1617 - Prevent router startup if edge or standalone
> > > routers have 'edge' role listeners
> > >
> > > Bug fixes -
> > > DISPATCH-1581 - Policy counters are int and should be uint64
> > > DISPATCH-1593 - Fix legend in console's topology view
> > > DISPATCH-1606 - Qpid dispatch console keeps trying to open
> > > connections when using empty username and password against a listener
> > > configured with SASL plain
> > > DISPATCH-1607 - [test] one_router
> > > test_48_connection_uptime_last_dlv ConnectionUptimeLastDlvTest
> > > intermittent fail
> > > DISPATCH-1609 - Policy denial logs omit the 'C' in the connection
> ID
> > > DISPATCH-1610 - qd_pn_free_link_session_t objects leaking when
> > > connections are socket closed
> > > DISPATCH-1612 - Automatically fill in the address and port that
> > > was used to serve the console into the console's connect form
> > > DISPATCH-1613 - Remove error log that is issued when >
> > > QDR_N_PRIORITY router links attach
> > > DISPATCH-1614 - Edge router crash when interior closes edge uplink
> > > connection
> > > DISPATCH-1618 - Server shutdown leaks policy setting objects
> > > DISPATCH-1622 - Router crash when trying to create connector via
> > qdmanage
> > > DISPATCH-1626 - On released callback invoked twice for same
> delivery
> > tag
> > > DISPATCH-1627 - Occasional leak of qd_iterator_buffer during
> > > system_tests_link_route_credit test
> > > DISPATCH-1628 - Crash after enforcing oversize message connection
> > close
> > > DISPATCH-1630 - Coverity issues on master branch
> > >
> > > Thanks.
> >
> > -
> > To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> > For additional commands, e-mail: users-h...@qpid.apache.org
> >
> >
>


Re: [jira] [Created] (PROTON-2189) proactor C client has abnormally long pauses during message send

2020-04-06 Thread Michael Goulish
>
>
> *Hey Mick, *
> *That did the trick - color me impressed.*
>


OK.

Ken, impressed:
[image: image.png]


On Mon, Apr 6, 2020 at 9:18 AM Ken Giusti  wrote:

> Hey Mick,
>
> That did the trick - color me impressed.
>
> If that's the intended behavior for sending streaming messages it wasn't
> clear to me.   I'll update the JIRA with your recommendations, thanks!
>
>
> On Sat, Apr 4, 2020 at 12:32 PM Michael Goulish 
> wrote:
>
>> Ken --
>>
>> I have a proactor client in Mercury and it handles timeouts differently
>> than your code.
>> I don't remember now exactly why this was necessary, but I do vaguely
>> recall weird
>> behavior before I fixed it, at aconway's suggestion.
>>
>> I think that, in the PN_PROACTOR_TIMEOUT case in the event handler, in
>> my code
>> it was not correct to send at that time and then reset the timer. (As you
>> are doing.)
>>
>> Instead, in that case all I did was to wake the cnx:
>>  pn_connection_wake ( context->connection );
>>
>> You can then send, and reset the timer only when you get an event of type:
>>  PN_CONNECTION_WAKE:
>>  send_message ( context );
>>  pn_proactor_set_timeout ( context->proactor, context->throttle );
>>
>> In my case, I am sending entire messages -- but this is how I get it to
>> 'throttle' the send-rate, i.e. send 1 message per N msec.
>>
>>
>> I don't remember exactly what Bad Thing was happening to me before I
>> started doing it this way, but I have a Bad Feeling that it may have been
>> similar to what you describe.
>>
>>
>>
>>
>>
>> On Sat, Apr 4, 2020 at 12:00 PM Ken Giusti (Jira) 
>> wrote:
>>
>>> Ken Giusti created PROTON-2189:
>>> --
>>>
>>>  Summary: proactor C client has abnormally long pauses
>>> during message send
>>>  Key: PROTON-2189
>>>  URL: https://issues.apache.org/jira/browse/PROTON-2189
>>>  Project: Qpid Proton
>>>   Issue Type: Bug
>>>   Components: proton-c
>>> Affects Versions: proton-c-0.30.0
>>>  Environment: To compile the clients install qpid-proton-c-devel
>>> and simply compile:
>>>
>>> gcc  -O2 -g -Wall -lqpid-proton -lm -o clogger clogger.c
>>>
>>> To reproduce my test, build qdrouterd and run it in the background.
>>> You need to have a consumer attached.  There is a test receiver client
>>> in the qdrouterd build in /tests/test-receiver.  This receiver
>>> is designed to handle streaming messages (by default sent to 'test-address')
>>>
>>> Run the consumer in the background then run each clogger (default params
>>> are fine).
>>>
>>> You should observe that clogger-reactor runs smoothly (use
>>> PN_TRACE_FRM=1 on qdrouterd as well).
>>>
>>> You'll see clogger-reactor send the message header, then nothing for
>>> awhile, then send the entire message.
>>>
>>> Use "-D" for debug output to see how many bytes have been written to
>>> pn_link_send()
>>>
>>>
>>> Reporter: Ken Giusti
>>>  Attachments: clogger-proactor.c, clogger-reactor.c
>>>
>>> I have a proactor-based C test client that has the ability to slowly
>>> send extremely large messages slowly.  This is done by sending 'chunks' of
>>> body data with pauses in between.
>>>
>>> This client was designed to test large streaming messages against
>>> qdrouterd.
>>>
>>> The behavior of this client is unexpected - I would expect the message
>>> data to appear "on the wire" in bursts relatively quickly.  In reality the
>>> data is buffered - in some cases over 1 GB is buffered - before it is
>>> written (as indicated by the lack @transfer frames dumped by the client AND
>>> the qdrouterd).  In some cases it takes up to 30 seconds before the
>>> client's data starts being written to the client.
>>>
>>> I've refactored the client to use reactor instead and the data flows as
>>> expected.  There is minimal buffering and no abnormally long pauses.
>>>
>>> The clients are attached.
>>>
>>> It is quite likely the proactor client is incorrectly implemented, but I
>>> used the qdrouterd I/O loop as the model and cannot see what may be wrong.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian Jira
>>> (v8.3.4#803005)
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
>>> For additional commands, e-mail: dev-h...@qpid.apache.org
>>>
>>>
>
> --
> -K
>


Re: [jira] [Created] (PROTON-2189) proactor C client has abnormally long pauses during message send

2020-04-04 Thread Michael Goulish
Ken --

I have a proactor client in Mercury and it handles timeouts differently
than your code.
I don't remember now exactly why this was necessary, but I do vaguely
recall weird
behavior before I fixed it, at aconway's suggestion.

I think that, in the PN_PROACTOR_TIMEOUT case in the event handler, in my
code
it was not correct to send at that time and then reset the timer. (As you
are doing.)

Instead, in that case all I did was to wake the cnx:
 pn_connection_wake ( context->connection );

You can then send, and reset the timer only when you get an event of type:
 PN_CONNECTION_WAKE:
 send_message ( context );
 pn_proactor_set_timeout ( context->proactor, context->throttle );

In my case, I am sending entire messages -- but this is how I get it to
'throttle' the send-rate, i.e. send 1 message per N msec.


I don't remember exactly what Bad Thing was happening to me before I
started doing it this way, but I have a Bad Feeling that it may have been
similar to what you describe.





On Sat, Apr 4, 2020 at 12:00 PM Ken Giusti (Jira)  wrote:

> Ken Giusti created PROTON-2189:
> --
>
>  Summary: proactor C client has abnormally long pauses during
> message send
>  Key: PROTON-2189
>  URL: https://issues.apache.org/jira/browse/PROTON-2189
>  Project: Qpid Proton
>   Issue Type: Bug
>   Components: proton-c
> Affects Versions: proton-c-0.30.0
>  Environment: To compile the clients install qpid-proton-c-devel
> and simply compile:
>
> gcc  -O2 -g -Wall -lqpid-proton -lm -o clogger clogger.c
>
> To reproduce my test, build qdrouterd and run it in the background.
> You need to have a consumer attached.  There is a test receiver client in
> the qdrouterd build in /tests/test-receiver.  This receiver is
> designed to handle streaming messages (by default sent to 'test-address')
>
> Run the consumer in the background then run each clogger (default params
> are fine).
>
> You should observe that clogger-reactor runs smoothly (use PN_TRACE_FRM=1
> on qdrouterd as well).
>
> You'll see clogger-reactor send the message header, then nothing for
> awhile, then send the entire message.
>
> Use "-D" for debug output to see how many bytes have been written to
> pn_link_send()
>
>
> Reporter: Ken Giusti
>  Attachments: clogger-proactor.c, clogger-reactor.c
>
> I have a proactor-based C test client that has the ability to slowly send
> extremely large messages slowly.  This is done by sending 'chunks' of body
> data with pauses in between.
>
> This client was designed to test large streaming messages against
> qdrouterd.
>
> The behavior of this client is unexpected - I would expect the message
> data to appear "on the wire" in bursts relatively quickly.  In reality the
> data is buffered - in some cases over 1 GB is buffered - before it is
> written (as indicated by the lack @transfer frames dumped by the client AND
> the qdrouterd).  In some cases it takes up to 30 seconds before the
> client's data starts being written to the client.
>
> I've refactored the client to use reactor instead and the data flows as
> expected.  There is minimal buffering and no abnormally long pauses.
>
> The clients are attached.
>
> It is quite likely the proactor client is incorrectly implemented, but I
> used the qdrouterd I/O loop as the model and cannot see what may be wrong.
>
>
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
> For additional commands, e-mail: dev-h...@qpid.apache.org
>
>


Re: [VOTE] Release Qpid Dispatch Router 1.10.0 (RC2)

2019-12-16 Thread Michael Goulish
*+ 1*

I did a 2-million link test, i.e. 1 million in, 1 million out, spread
evenly over 100 connections -- on a single router, coming from a single
client -- using Gordon's python client made for this purpose -- and nothing
blew up or melted down. And it did not use an exorbitant amount of memory
like it used to before Gordon's fix, nor did it slow down linearly as the
number of links increased.

NOTE that the router does become unresponsive while all those links are
being torn down -- i.e. it does not respond to qdstat calls -- but that is
not a regression and the fix is believed to be in Proton code, not in the
router.
This is significant, because it is easy for the user to conclude that the
router has simply crashed while it is not responding.
A fix for this issue will hopefully be in the next release.







On Sat, Dec 14, 2019 at 11:16 PM Ganesh Murthy  wrote:

> Hello All,
>
>  Please cast your vote on this thread to release RC2 as the
> official Qpid Dispatch Router version  1.10.0.
>
> RC2 of Qpid Dispatch Router version 1.10.0 can be found here:
>
> https://dist.apache.org/repos/dist/dev/qpid/dispatch/1.10.0-rc2/
>
> The following features, improvements, and bug fixes are introduced in
> 1.10.0:
>
> Features -
> DISPATCH-1441 - optparse python library deprecated, migrate to argparse
> DISPATCH-1442 - Add a metadata field to the router management entity
> DISPATCH-1463 - Detect deliveries that are stuck in the router for
> a long time
> DISPATCH-1467 - Add build option to enable Address Sanitizer build
> (ASAN)
>
> Improvements -
> DISPATCH-1186 - qdstat to include csv output format
> DISPATCH-1358 - Port console to patternfly 4 / React
> DISPATCH-1369 - Update console dependencies to avoid npm security
> warnings
> DISPATCH-1399 - Allow arrows on console's topology page to be disabled
> DISPATCH-1409 - Update qdstat -l output to include the current credit
> DISPATCH-1411 - Add timestamp and router name to header of qdstat
> output
> DISPATCH-1412 - Policy C code performance issue:  reload python
> module for each call
> DISPATCH-1416 - Policy log could include denial counts on connection
> close
> DISPATCH-1419 - Add status to connector mgmt schema
> DISPATCH-1427 - Allow policy to define user (group) specific
> connection limits
> DISPATCH-1434 - Add new attribute saslPasswordFile to the connector
> entity
> DISPATCH-1438 - Have ctest parse the routers debug dump files for
> memory pool leaks
> DISPATCH-1439 - Expose create time/last transfer time through the
> Connection management entity
> DISPATCH-1440 - Deprecate the passwordFile field in sslProfile and
> consolidate all password scenarios to use  the password field
> DISPATCH-1445 - Update saslPassword attribute in connector entity
> to use openssl style prefixes
> DISPATCH-1446 - system_tests_qdmanage failing on test_get_log
> DISPATCH-1450 - Add build option to enable thread sanitizer build
> DISPATCH-1454 - system_tests_one_router failing due to changes in
> qpid-proton
> DISPATCH-1455 - Two system tests failing after optparse to
> argparse migration
> DISPATCH-1465 - system_tests_policy.test_verify_z_connection_stats
> fails
> DISPATCH-1466 - flake8 errors in system_test.py
> DISPATCH-1471 - [test] When string comparison asserts fail the
> strings are not printed
> DISPATCH-1480 - Address Sanitizer leak in system_tests_multi_phase
> DISPATCH-1491 - bottleneck adding or removing addresses in mobile
> address engine
> DISPATCH-1500 - inefficiencies in handling large MAU messages
> DISPATCH-1507 - Don't collapse small number of edge routers and
> clients into single circle
> DISPATCH-1516 - Trace log the peer delivery id and link id when
> linking and unlinking peers
>
> Bug fixes -
> DISPATCH-1172 - Link routes and auto links activated on wrong
> connections if many route-container conns exist
> DISPATCH-1258 - Crash executing http test
> DISPATCH-1377 - system_tests_topology_disposition failing on
> machine with python3 only
> DISPATCH-1418 - The default forwarding treatment is not overridden
> by the treatment in the address configuration
> DISPATCH-1421 - Attaching link to unavailable address sets source
> address to null in attach reply
> DISPATCH-1423 - Multicast sender with no receiver has first 250
> messages released
> DISPATCH-1426 - Repetitive receiver fail over causes memory leak
> DISPATCH-1428 - route connection not indexed by 'connection' field
> of connector
> DISPATCH-1431 - system_tests_one_router_failing on
> test_19_semantics_multicast
> DISPATCH-1433 - system_tests_delivery_abort failing due to
> receiver connecting late
> DISPATCH-1443 - Unable to run ctest on Centos 8
> DISPATCH-1453 - Adding "defaultDistribution: unavailable"
> overrides the regular handling of transaction coordinator link refusal
> DISPATCH-1460 - Router control 

Re: [VOTE] Release Qpid Dispatch Router 1.9.0 (RC2)

2019-09-17 Thread Michael Goulish
+1

Two large-scale tests done, with no trouble.

Test 1

5 routers linear network (A...E)
500 senders on router A, 500 receivers on router E
1 unique address per client-pair
distribution : closest
senders throttled to 10 messages per second
non-pre-settled, 100 byte payload messages.
1e4 messages per sender.


Test 2
-
5 router linear network
500 receivers on E
each 50 receivers share 1 multicast address
10 senders on A, each with a unique multicast address
each sender will be sending to 50 receivers
100 byte payload msgs non-pre-settled
senders throttled to 10 messages per second
1e4 messages per sender

On Mon, Sep 16, 2019 at 2:17 PM Ganesh Murthy  wrote:

> Hello All,
>
>  Please cast your vote on this thread to release RC2 as the
> official Qpid Dispatch Router version  1.9.0.
>
> RC2 of Qpid Dispatch Router version 1.9.0 can be found here:
>
> https://dist.apache.org/repos/dist/dev/qpid/dispatch/1.9.0-rc2/
>
> The following improvements, and bug fixes are introduced in 1.9.0:
>
> Improvements -
> DISPATCH-480 - Default tests timeout is too short for some machines
> DISPATCH-1266 - Improve router's handling of unsettled multicast
> deliveries
> DISPATCH-1338 - Improvements to edge router documentation
> DISPATCH-1345 - Reduce the number of QDR_LINK_FLOW events by
> coalescing credit grants
> DISPATCH-1346 - Create documentation for priority delivery
> DISPATCH-1347 - Update documentation for Dispatch Router console
> DISPATCH-1350 - Update logging/monitoring documentation
> DISPATCH-1353 - Document how to configure access policy control on
> router-initiated connections
> DISPATCH-1354 - Interrouter annotation processing uses slow methods
> DISPATCH-1370 - Move the schema, connect, and entities tabs to the
> right in the console
> DISPATCH-1372 - alloc_pool intrusive linked list can be replaced
> by a linked stack
> DISPATCH-1374 - Add qdstat options --all-routers and all-entities
> which display statistics of all routers and displays all entities
> DISPATCH-1376 - Make it easier to change the product name in the
> console
> DISPATCH-1379 - Message receive performance improvements
> DISPATCH-1381 - Create documentation for configuring fallback
> destinations
> DISPATCH-1382 - Document ability to force-close a connection from
> the web console
> DISPATCH-1385 - qd_message_list_t is dead code
> DISPATCH-1388 - Authorization doc fails to describe vhost
> abstraction clearly
> DISPATCH-1396 - Doc how to start the router
>
> Bugs -
> DISPATCH-1359 - Set ctest timeout to 300 seconds.
> DISPATCH-1361 - system_tests_fallback_dest hanging in some cases
> DISPATCH-1362 - Shutdown crash when trying to clean up fallback
> addresses
> DISPATCH-1365 - Table of links with delayed deliveries is showing
> all endpoint links
> DISPATCH-1378 - missing lock of connection's links_with_work list
> DISPATCH-1380 - qdrouterd leaves dangling qd_link_t pointer
> DISPATCH-1383 - system_tests_policy is timing out
> DISPATCH-1387 - Coverity issues on master branch
> DISPATCH-1391 - Proton link reference not cleared on router link
> objects during session close
> DISPATCH-1394 - qd_check_message() incorrectly validates partially
> received messages
> DISPATCH-1398 - "Expression with no effect" warning for console web
> DISPATCH-1404 - message annotation parsing incorrectly uses
> ->remainder for current buffer capacity
> DISPATCH-1406 - Inter-router link stall on receive client failover
> DISPATCH-1407 - Memory leak on link policy denial
> DISPATCH-1408 - system_tests_distribution failing when running
> under valgrind
> DISPATCH-1410 - attach of auto-links not logged
> DISPATCH-1413 - system_tests_two_routers.py failing intermittently on
> Travis
> DISPATCH-1417 - Crash when connection_wake ctx points to freed memory
>
> Thanks.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
> For additional commands, e-mail: dev-h...@qpid.apache.org
>
>


Re: [VOTE] Release Qpid Dispatch Router 1.9.0 (RC1)

2019-09-11 Thread Michael Goulish
May I vote DELAY?
Could you please not conclude this vote until you receive my vote?

I am seeing something that I don't understand with a large multicast test,
and I would like to cast my vote only after I have a chance to talk to Ken
Giusti about it.

Unfortunately I am out today seeing a doctor about a brain transplant, but
I think I will be able to contact Ken sometime today if all goes well and
we will get this resolved.




On Mon, Sep 9, 2019 at 5:36 PM Ganesh Murthy  wrote:

> Hello All,
>
>  Please cast your vote on this thread to release RC1 as the
> official Qpid Dispatch Router version  1.9.0.
>
> RC1 of Qpid Dispatch Router version 1.9.0 can be found here:
>
> https://dist.apache.org/repos/dist/dev/qpid/dispatch/1.9.0-rc1/
>
> The following  improvements, and bug fixes are introduced in 1.9.0:
>
> Improvements -
> DISPATCH-480 - Default tests timeout is too short for some machines
> DISPATCH-1266 - Improve router's handling of unsettled multicast
> deliveries
> DISPATCH-1338 - Improvements to edge router documentation
> DISPATCH-1345 - Reduce the number of QDR_LINK_FLOW events by
> coalescing credit grants
> DISPATCH-1346 - Create documentation for priority delivery
> DISPATCH-1347 - Update documentation for Dispatch Router console
> DISPATCH-1350 - Update logging/monitoring documentation
> DISPATCH-1353 - Document how to configure access policy control on
> router-initiated connections
> DISPATCH-1354 - Interrouter annotation processing uses slow methods
> DISPATCH-1370 - Move the schema, connect, and entities tabs to the
> right in the console
> DISPATCH-1372 - alloc_pool intrusive linked list can be replaced
> by a linked stack
> DISPATCH-1374 - Add qdstat options --all-routers and all-entities
> which display statistics of all routers and displays all entities
> DISPATCH-1376 - Make it easier to change the product name in the
> console
> DISPATCH-1379 - Message receive performance improvements
> DISPATCH-1381 - Create documentation for configuring fallback
> destinations
> DISPATCH-1382 - Document ability to force-close a connection from
> the web console
> DISPATCH-1385 - qd_message_list_t is dead code
> DISPATCH-1388 - Authorization doc fails to describe vhost
> abstraction clearly
> DISPATCH-1396 - Doc how to start the router
>
> Bug fixes -
> DISPATCH-1359 - Set ctest timeout to 300 seconds.
> DISPATCH-1361 - system_tests_fallback_dest hanging in some cases
> DISPATCH-1362 - Shutdown crash when trying to clean up fallback
> addresses
> DISPATCH-1365 - Table of links with delayed deliveries is showing
> all endpoint links
> DISPATCH-1378 - missing lock of connection's links_with_work list
> DISPATCH-1380 - qdrouterd leaves dangling qd_link_t pointer
> DISPATCH-1383 - system_tests_policy is timing out
> DISPATCH-1387 - Coverity issues on master branch
> DISPATCH-1391 - Proton link reference not cleared on router link
> objects during session close
> DISPATCH-1394 - qd_check_message() incorrectly validates partially
> received messages
> DISPATCH-1398 - "Expression with no effect" warning for console web
> DISPATCH-1404 - message annotation parsing incorrectly uses
> ->remainder for current buffer capacity
> DISPATCH-1406 - Inter-router link stall on receive client failover
> DISPATCH-1407 - Memory leak on link policy denial
> DISPATCH-1408 - system_tests_distribution failing when running
> under valgrind
> DISPATCH-1410 - attach of auto-links not logged
> DISPATCH-1413 - system_tests_two_routers.py failing intermittently on
> Travis
>
>
> Thanks.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
> For additional commands, e-mail: dev-h...@qpid.apache.org
>
>


[jira] [Assigned] (DISPATCH-1368) Link (address) priority is ignored by the second hop router

2019-06-14 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1368:
-

Assignee: michael goulish

> Link (address) priority is ignored by the second hop router
> ---
>
> Key: DISPATCH-1368
> URL: https://issues.apache.org/jira/browse/DISPATCH-1368
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.8.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
> Fix For: 1.9.0
>
>
> Address-based priority is only enforced on the egress of the first hop router.
> In a 3 router linear network:
> Sender --> Router A --> Router B --> Router C --> Receiver
> Message delivery is properly sent via the inter-router links between Router A 
> and Router B.
> However, those messages are all forwarded on the default priority (4) between 
> router B and C.
> [C --> Receiver is fine - priority doesn't apply to egress endpoint links]
> The expectation is that the message priority is honored across all 
> inter-router links.
> [Reproducer|https://github.com/kgiusti/dispatch/tree/DISPATCH-1368-reproducer]
> Build the router, then run the priority test (ctest -VV -R priority).
> Then grep for "DELIVERIES" in the log files:
>  grep "DELIVERIES" 
> tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/*.log
> tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/A.log:2019-06-14
>  11:10:00.324389 -0400 ROUTER (error) DELIVERIES PER PRIORITY: 9=20 8=0 7=28 
> 6=0 5=0 4(default)=21 3=0 2=12 1=0 0=343 
> (/home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/router_core_thread.c:188)
> tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/B.log:2019-06-14
>  11:10:00.302570 -0400 ROUTER (error) DELIVERIES PER PRIORITY: 9=0 8=0 7=0 
> 6=0 5=0 4(default)=172 3=0 2=0 1=0 0=286 
> (/home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/router_core_thread.c:188)
> ...
> Notice the counts on A (tx to B) - these are correct.
> On B all msgs are sent priority 4 (default) to C - this is wrong.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



Re: [VOTE] Release Qpid Dispatch Router 1.8.0 (RC1)

2019-06-11 Thread Michael Goulish
executive summary:   I change my vote to  *+1*

but with JIRAs.


OK! I have something.

I didn't remember it, but my notes show that I have seen this exact failure
before, also after a clean install.
( I mean the authz failure. The other one, the http failure is already
explained.)

So this means that the problem has got to be with my machine setup ---
there is some package that the authz test requires that I do not install
after a clean OS upgrade.

I do have a great plethora of SASL and SSL packages installed, so I don't
know what it might be, and I am having trouble tracing into the Python
code. If anyone has a clue, please let me know.

But, since the problem is limited to the test not warning about a required
package, I will do two things:

  1. raise a JIRA,

  2. and change my vote to

 *+1*




On Tue, Jun 11, 2019 at 12:04 PM Ganesh Murthy  wrote:

> On Tue, Jun 11, 2019 at 11:58 AM Michael Goulish 
> wrote:
>
> > Of course I will not change my vote based on a suggestion ( from me and
> > Gordon ) that the failure I am seeing might be caused by a missing
> > package.
> >
> > What I will do is start looking into this to see if that is indeed the
> > case.
> > And then we will make a change that detects and warns about that case.
> > And then I will change my vote.
> >
> Agreed, fair enough.
>
> >
> >
> >
> > On Tue, Jun 11, 2019 at 11:28 AM Chuck Rolke  wrote:
> >
> > > +1
> > >
> > > * Checked checksums
> > > * Build/test Fedora 29, python 3
> > >   * Fails system_tests_http (known problem, not a regression)
> > > * Build/test Fedora 28, python 2
> > >   * Occasional test fail system_tests_fallback_dest
> > > known problem in new test code and not in mission code; fix is
> > already
> > > on master; not a regression
> > >
> > >
> > > - Original Message -
> > > > From: "Ganesh Murthy" 
> > > > To: us...@qpid.apache.org, dev@qpid.apache.org
> > > > Sent: Friday, June 7, 2019 11:23:40 AM
> > > > Subject: [VOTE] Release Qpid Dispatch Router 1.8.0 (RC1)
> > > >
> > > > Hello All,
> > > >  Please cast your vote on this thread to release RC1 as the
> > > > official Qpid Dispatch Router version  1.8.0.
> > > >
> > > > RC1 of Qpid Dispatch Router version 1.8.0 can be found here:
> > > >
> > > > https://dist.apache.org/repos/dist/dev/qpid/dispatch/1.8.0-rc1/
> > > >
> > > > The following features, improvements, and bug fixes are introduced in
> > > 1.8.0:
> > > >
> > > > Features -
> > > >DISPATCH-1337 - Fallback Destination for Unreachable Addresses
> > > >
> > > > Improvements -
> > > > DISPATCH-1308 - Console access to the force-close a connection
> > > feature
> > > > DISPATCH-1320 - Make it easier to use separate logos for upstream
> > > > and downstream masthead
> > > > DISPATCH-1321 - Set rpath for qpid-proton (and other
> dependencies)
> > > > when they are found in nonstandard location
> > > > DISPATCH-1329 - Edge router system test needs skip test
> convenience
> > > > switches
> > > > DISPATCH-1340 - Show settlement rate and delayed deliveries in
> > client
> > > > popup
> > > > DISPATCH-1341 - Add list of delayed links to console's overview
> > page
> > > > DISPATCH-1348 - Avoid qdr_error_t allocation if not necessary
> > > > DISPATCH-1356 - Remove the dotted line around routers that
> > > > indicates the router is fixed.
> > > > DISPATCH-1357 - Change the name of the 'Kill' feature to 'Close'
> > > >
> > > > Bug fixes -
> > > > DISPATCH-974 - Getting connections via the router management
> > > > protocol causes AMQP framing errors
> > > > DISPATCH-1230 - System test failing with OpenSSL >= 1.1 -
> > > > system_tests_ssl
> > > > DISPATCH-1312 - Remove cmake option USE_MEMORY_POOL
> > > > DISPATCH-1317 - HTTP system test is failing on python2.6
> > > > DISPATCH-1318 - edge_router system test failing
> > > > DISPATCH-1322 - Edge router drops disposition when remote
> receiver
> > > closes
> > > > DISPATCH-1323 - Deprecate addr and externalAddr attributes of
> > > > autoLink entity. Add address and externalAddress instead.
> > > > DISPATCH-1324 - [tools] Scraper uses depreca

Re: [VOTE] Release Qpid Dispatch Router 1.8.0 (RC1)

2019-06-11 Thread Michael Goulish
Of course I will not change my vote based on a suggestion ( from me and
Gordon ) that the failure I am seeing might be caused by a missing
package.

What I will do is start looking into this to see if that is indeed the case.
And then we will make a change that detects and warns about that case.
And then I will change my vote.



On Tue, Jun 11, 2019 at 11:28 AM Chuck Rolke  wrote:

> +1
>
> * Checked checksums
> * Build/test Fedora 29, python 3
>   * Fails system_tests_http (known problem, not a regression)
> * Build/test Fedora 28, python 2
>   * Occasional test fail system_tests_fallback_dest
> known problem in new test code and not in mission code; fix is already
> on master; not a regression
>
>
> - Original Message -
> > From: "Ganesh Murthy" 
> > To: us...@qpid.apache.org, dev@qpid.apache.org
> > Sent: Friday, June 7, 2019 11:23:40 AM
> > Subject: [VOTE] Release Qpid Dispatch Router 1.8.0 (RC1)
> >
> > Hello All,
> >  Please cast your vote on this thread to release RC1 as the
> > official Qpid Dispatch Router version  1.8.0.
> >
> > RC1 of Qpid Dispatch Router version 1.8.0 can be found here:
> >
> > https://dist.apache.org/repos/dist/dev/qpid/dispatch/1.8.0-rc1/
> >
> > The following features, improvements, and bug fixes are introduced in
> 1.8.0:
> >
> > Features -
> >DISPATCH-1337 - Fallback Destination for Unreachable Addresses
> >
> > Improvements -
> > DISPATCH-1308 - Console access to the force-close a connection
> feature
> > DISPATCH-1320 - Make it easier to use separate logos for upstream
> > and downstream masthead
> > DISPATCH-1321 - Set rpath for qpid-proton (and other dependencies)
> > when they are found in nonstandard location
> > DISPATCH-1329 - Edge router system test needs skip test convenience
> > switches
> > DISPATCH-1340 - Show settlement rate and delayed deliveries in client
> > popup
> > DISPATCH-1341 - Add list of delayed links to console's overview page
> > DISPATCH-1348 - Avoid qdr_error_t allocation if not necessary
> > DISPATCH-1356 - Remove the dotted line around routers that
> > indicates the router is fixed.
> > DISPATCH-1357 - Change the name of the 'Kill' feature to 'Close'
> >
> > Bug fixes -
> > DISPATCH-974 - Getting connections via the router management
> > protocol causes AMQP framing errors
> > DISPATCH-1230 - System test failing with OpenSSL >= 1.1 -
> > system_tests_ssl
> > DISPATCH-1312 - Remove cmake option USE_MEMORY_POOL
> > DISPATCH-1317 - HTTP system test is failing on python2.6
> > DISPATCH-1318 - edge_router system test failing
> > DISPATCH-1322 - Edge router drops disposition when remote receiver
> closes
> > DISPATCH-1323 - Deprecate addr and externalAddr attributes of
> > autoLink entity. Add address and externalAddress instead.
> > DISPATCH-1324 - [tools] Scraper uses deprecated cgi.escape function
> > DISPATCH-1325 - Sender connections to edge router that connect
> > 'too soon' never get credit
> > DISPATCH-1326 - Anonymous messages are released by edge router
> > even if there is a receiver for the messages
> > DISPATCH-1330 - Q2 stall due to incorrect msg buffer ref count
> > decrement on link detach
> > DISPATCH-1334 - Background map on topology page incorrect height
> > DISPATCH-1335 - After adding client, topology page shows new icon
> > in upper-left corner
> > DISPATCH-1339 - Multiple consoles attached to a router are showing
> > as separate icons
> >
> >
> > Thanks.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
> > For additional commands, e-mail: dev-h...@qpid.apache.org
> >
> >
>
> -
> To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
> For additional commands, e-mail: users-h...@qpid.apache.org
>
>


[jira] [Created] (PROTON-2046) pn_connection_set_container should check for null or empty string

2019-05-13 Thread michael goulish (JIRA)
michael goulish created PROTON-2046:
---

 Summary: pn_connection_set_container should check for null or 
empty string
 Key: PROTON-2046
 URL: https://issues.apache.org/jira/browse/PROTON-2046
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


pn_connection_set_container() makes no checks of the ID string that gets passed 
in. This value is expected to be unique, so it should probably check for NULL 
and empty-string.

I was passing in empty strings and it was cheerfully accepting them.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-04 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809913#comment-16809913
 ] 

michael goulish commented on DISPATCH-1309:
---

Chuck –

Are you sure you mean "5672" ?

More normal for the console would be "5673".

I could not get mine to crash, with 50 repetitions of \{ connect + disconnect } 
, with 5673 – with one router or my whole Death Star network.

When I tried it 5672, I could not get it to connect at all.

 

 

 

 

> Various crashes in 1.6 release
> --
>
> Key: DISPATCH-1309
> URL: https://issues.apache.org/jira/browse/DISPATCH-1309
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: System 'unused':(
> Fedora 5.0.3-200.fc29.x86_64,
> Python 2.7.15,
> Proton master @ eab1f.
> System 'taj':(
> Fedora 4.18.16-200.fc28.x86_64,
> Python 3.6.6,
> Proton master @ 68b38
>Reporter: Chuck Rolke
>Priority: Major
> Attachments: DISPATCH-1309-backtraces.txt, 
> DISPATCH-1309-gen_configs_linear.py
>
>
> qpid-dispatch master @ 51244, which is very close to the 1.6 release, has 
> various crashes.
> The test network is 12 routers spread over two systems. (Configuration 
> generator to be attached.) Four interior routers are in linear arrangement 
> with A and C on one system ('unused'), and B and D on the other system 
> ('taj'). Each system then attaches four edge routers, one to each interior 
> router.
> Running lightweight tests, like proton cpp simple_send and simple_recv to 
> ports on INTA and INTB interior routers leads to a crash on INTC. The crashes 
> typically look like reuse of structures after they have been freed (addresses 
> are 0x). Other crashes hint of general memory corruption 
> (crashes in malloc.c).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-04 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809806#comment-16809806
 ] 

michael goulish commented on DISPATCH-1309:
---

Yee hah!

Chuck's comment reminded me – I believe I have also seen crashes *only* when 
the console was attached.

Furthermore, I think I have seen crashes  maybe not *only* but *more often* 
when I was *shutting down* a console *while* the network was still running.

 

I tried that just now – with 1.6 code.  I had to start, stop, and restart the 
console 11 times, but then it happened. Boom. With this core:

 

#0 pn_collector_put (collector=0x4242424242424242, 
 clazz=0x7f0e99c38520 , context=0x0,
 type=type@entry=PN_CONNECTION_WAKE)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/event.c:134
#1 0x7f0e99ca6258 in http_thread_run (v=0x2036850)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/http-libwebsockets.c:731
#2 0x7f0e995df50b in start_thread () from /lib64/libpthread.so.0
#3 0x7f0e988a338f in clone () from /lib64/libc.so.6

 

Which is one I have seen before.

Now I have *some hope* of getting some kind of baseline, based on number of 
crashes per console stop-and-restart, so that I can do some kind of vivisection 
of the code.

 

 

 

 

 

 

 

> Various crashes in 1.6 release
> --
>
> Key: DISPATCH-1309
> URL: https://issues.apache.org/jira/browse/DISPATCH-1309
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: System 'unused':(
> Fedora 5.0.3-200.fc29.x86_64,
> Python 2.7.15,
> Proton master @ eab1f.
> System 'taj':(
> Fedora 4.18.16-200.fc28.x86_64,
> Python 3.6.6,
> Proton master @ 68b38
>Reporter: Chuck Rolke
>Priority: Major
> Attachments: DISPATCH-1309-backtraces.txt, 
> DISPATCH-1309-gen_configs_linear.py
>
>
> qpid-dispatch master @ 51244, which is very close to the 1.6 release, has 
> various crashes.
> The test network is 12 routers spread over two systems. (Configuration 
> generator to be attached.) Four interior routers are in linear arrangement 
> with A and C on one system ('unused'), and B and D on the other system 
> ('taj'). Each system then attaches four edge routers, one to each interior 
> router.
> Running lightweight tests, like proton cpp simple_send and simple_recv to 
> ports on INTA and INTB interior routers leads to a crash on INTC. The crashes 
> typically look like reuse of structures after they have been freed (addresses 
> are 0x). Other crashes hint of general memory corruption 
> (crashes in malloc.c).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-02 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808055#comment-16808055
 ] 

michael goulish commented on DISPATCH-1309:
---

And since the above comment I have not been able to get another crash

:(

 

 

> Various crashes in 1.6 release
> --
>
> Key: DISPATCH-1309
> URL: https://issues.apache.org/jira/browse/DISPATCH-1309
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: System 'unused':(
> Fedora 5.0.3-200.fc29.x86_64,
> Python 2.7.15,
> Proton master @ eab1f.
> System 'taj':(
> Fedora 4.18.16-200.fc28.x86_64,
> Python 3.6.6,
> Proton master @ 68b38
>Reporter: Chuck Rolke
>Priority: Major
> Attachments: DISPATCH-1309-backtraces.txt, 
> DISPATCH-1309-gen_configs_linear.py
>
>
> qpid-dispatch master @ 51244, which is very close to the 1.6 release, has 
> various crashes.
> The test network is 12 routers spread over two systems. (Configuration 
> generator to be attached.) Four interior routers are in linear arrangement 
> with A and C on one system ('unused'), and B and D on the other system 
> ('taj'). Each system then attaches four edge routers, one to each interior 
> router.
> Running lightweight tests, like proton cpp simple_send and simple_recv to 
> ports on INTA and INTB interior routers leads to a crash on INTC. The crashes 
> typically look like reuse of structures after they have been freed (addresses 
> are 0x). Other crashes hint of general memory corruption 
> (crashes in malloc.c).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-02 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808007#comment-16808007
 ] 

michael goulish commented on DISPATCH-1309:
---

OK! I thought Mercury might help reproduce this more easily, and ... it did.

I made a 13-router star-shaped network  ( the Death Star ) – 12 routers in a 
circle and one at the center.

There was 1 receiver at every router on the circle all hoping for 1 million 
messages. 1 sender at the center router, trying to make all the receivers happy.

 

It ran for a good amount of time – I could see the traffic turning all the 
links green using the console – and then 7 routers crashed all at once, 
generating 5 different types of core files.

 

Which follow.

 

##
 # Type 1
 ##

#0 0x7f230750 in raise () from /lib64/libc.so.6
#1 0x7f231d31 in abort () from /lib64/libc.so.6
#2 0x7f23bbba905a in __assert_fail_base () from /lib64/libc.so.6
#3 0x7f23bbba90d2 in __assert_fail () from /lib64/libc.so.6
#4 0x7f23bc9b8e6f in __pthread_tpp_change_priority () from 
/lib64/libpthread.so.0
#5 0x7f23bc9af8fb in __pthread_mutex_lock_full () from 
/lib64/libpthread.so.0
#6 0x7f23bd044309 in qdra_config_address_create_CT (core=0x7f23a805e0d8,
 name=, query=0x7f23a00307d8, in_body=)
 at 
/home/mick/latest/qpid-dispatch-1.6.0/src/router_core/agent_config_address.c:446
#7 0x in ?? ()

in qdra_config_address_create_CT
 (gdb) list
 441 addr->priority = priority;
 442 pattern = 0;
 443
 444 qd_iterator_reset_view(iter, ITER_VIEW_ALL);
 445 qd_parse_tree_add_pattern(core->addr_parse_tree, iter, addr);
 446 DEQ_INSERT_TAIL(core->addr_config, addr);
 447
 448 //
 449 // Compose the result map for the response.
 450 //

 

##
 # Type 2
 ##

#0 connection_wake (conn=)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/remote_sasl.c:241
 #1 0x7f7cef4884cb in pni_sasl_impl_free (transport=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/sasl/sasl.c:181
 #2 pn_sasl_free (transport=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/sasl/sasl.c:764
 #3 0x7f7cef480b90 in pn_transport_finalize (object=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/transport.c:665
 #4 0x7f7cef472a99 in pn_class_decref (clazz=0x7f7cef69aca0 ,
 clazz@entry=0x7f7cef69a520 , object=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/object/object.c:95
 #5 0x7f7cef472cbf in pn_decref (object=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/object/object.c:253
 #6 0x7f7cef480851 in pn_transport_free (transport=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/transport.c:644
 #7 0x7f7cef47b994 in pn_connection_driver_destroy 
(d=d@entry=0x7f7cd4014d98)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/connection_driver.c:94
 #8 0x7f7cef25b604 in pconnection_final_free (pc=0x7f7cd40147f0)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:889
 #9 0x7f7cef25c4fc in pconnection_cleanup (pc=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:905
 #10 0x7f7cef25d295 in pconnection_process (pc=0x7f7cd40147f0, 
events=,
 timeout=timeout@entry=false, topup=false, is_io_2=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:1273
 #11 0x7f7cef25dd03 in proactor_do_epoll (p=0x1ee9600, 
can_block=can_block@entry=true)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:2139
 #12 0x7f7cef25ef2a in pn_proactor_wait (p=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:2157
 #13 0x7f7cef7057af in thread_run (arg=0x1db7960)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/server.c:994
 #14 0x7f7cef04150b in start_thread () from /lib64/libpthread.so.0
 #15 0x7f7cee30538f in clone () from /lib64/libc.so.6

 

##
 # Type 3
 ##


 #0 qd_hash_internal_retrieve_with_hash (hash=,
 key=key@entry=0x7f140c097ad8, h=, h=)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:204
#1 0x7f1432401a15 in qd_hash_internal_retrieve (key=0x7f140c097ad8, 
h=0x7f141c000bc0)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:219
#2 qd_hash_retrieve (h=0x7f141c000bc0, key=key@entry=0x7f140c097ad8,
 val=val@entry=0x7ffe6c6ac638) at 
/home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:270
#3 0x7f14324312e6 in qdr_lookup_terminus_address_CT (core=0xb656c0,
 dir=, conn=conn@entry=0x7f140c076798, terminus=0x7f140c086258,
 link_route=link_route@entry=0x7ffe6c6ac77d,
 unavailable=unavailable@entry=0x7ffe6c6ac77e, core_endpoint=0x7ffe6c6ac77f,
 accept_dyn

[jira] [Closed] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-22 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-1280.
-
Resolution: Fixed

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-22 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798961#comment-16798961
 ] 

michael goulish commented on DISPATCH-1280:
---

LWS developer pushed patch. I got through 100 iterations of my reproducer on 
master with no crash.  (I could not do enough iterations before to get a real 
baseline, but I did get one crash in first 20 tries.)

I think it's a deadbug.

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-21 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798302#comment-16798302
 ] 

michael goulish edited comment on DISPATCH-1280 at 3/21/19 5:53 PM:


Well, kinda.

I saw one crash using LWS latest master, and then I tried 20 more times and all 
I got was this error message:

NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello 
(default vhost default)

 ( On LWS version 3.0.1 the crash hapens every time. )

But!  The one crash I did see had basically identical backtrace as in version 
3.0.1. (See previous comment.)

I raised an issue with LWS:

    [https://github.com/warmcat/libwebsockets/issues/1527]

 

 


was (Author: mgoulish):
Well, kinda.

I saw one crash using LWS latest master, and then I tried 20 more times and all 
I got was this error message:

NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello 
(default vhost default)

 

But!  The one crash I did see had basically identical backtrace as in version 
3.0.1. (See previous comment.)

I raised an issue with LWS:

    https://github.com/warmcat/libwebsockets/issues/1527

 

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-21 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798302#comment-16798302
 ] 

michael goulish commented on DISPATCH-1280:
---

Well, kinda.

I saw one crash using LWS latest master, and then I tried 20 more times and all 
I got was this error message:

NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello 
(default vhost default)

 

But!  The one crash I did see had basically identical backtrace as in version 
3.0.1. (See previous comment.)

I raised an issue with LWS:

    https://github.com/warmcat/libwebsockets/issues/1527

 

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-20 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797619#comment-16797619
 ] 

michael goulish commented on DISPATCH-1280:
---

reproduced with simple example.

What I did:

  1. from the lws code tree for 3.0.1 (7 Sep 2018, 
fb31602ff9aeb88267fb8132d48df31195782ae5) use the example 
minimal-examples/http-server/minimal-http-server-tls.

  2. Alter the .c file this way:

     info.options = LWS_SERVER_OPTION_DO_SSL_GLOBAL_INIT |    
{color:#FF}LWS_SERVER_OPTION_ALLOW_NON_SSL_ON_SSL_PORT{color} ;

3. build and run it. It listens on 
[https://localhost:7681|https://localhost:7681/] 

4. In browser, do this request:  [http://localhost:7681/index.html]

big bada boom.

 

#0 0x7f63281fff60 in SSL_get0_alpn_selected () from /lib64/libssl.so.1.1
#1 0x7f632880ea17 in lws_tls_server_conn_alpn () from 
/usr/local/lib/libwebsockets.so.13
#2 0x7f632880ee98 in lws_server_socket_service_ssl () from 
/usr/local/lib/libwebsockets.so.13
#3 0x7f632880d1ad in rops_handle_POLLIN_listen () from 
/usr/local/lib/libwebsockets.so.13
#4 0x7f6328800389 in lws_service_fd_tsi () from 
/usr/local/lib/libwebsockets.so.13
#5 0x7f6328816ce7 in _lws_plat_service_tsi.part.1 () from 
/usr/local/lib/libwebsockets.so.13
#6 0x7f6328800455 in lws_service () from /usr/local/lib/libwebsockets.so.13
#7 0x00400965 in main (argc=1, argv=0x7fff71638b68) at 
minimal-http-server-tls.c:87

 

Next I will see if this still happens with latest code.

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-20 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797575#comment-16797575
 ] 

michael goulish commented on DISPATCH-1280:
---

Looked at closed issues back to release date of v2.4.2  (8 March 2018).

Nothing looks like the issue we are seeing.

Closed issues are here:

https://github.com/warmcat/libwebsockets/issues?page=11=is%3Aissue+is%3Aclosed

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-20 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797391#comment-16797391
 ] 

michael goulish commented on DISPATCH-1280:
---

It sounds like this happens all the time. Is that true? Not a rare occurrence? 

 

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-18 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1280:
-

Assignee: michael goulish

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>        Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1215) several memory leaks in edge-router soak test

2018-12-07 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1215:
-

 Summary: several memory leaks in edge-router soak test
 Key: DISPATCH-1215
 URL: https://issues.apache.org/jira/browse/DISPATCH-1215
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish


Using recent master code trees (dispatch and proton)...

The test sets up a simple 3-linear router network, A-B-C, and attaches 100 edge 
routers to A. It then kills one edge router, replaces it, and repeats that 
kill-and-replace operation 50 times. (At which point I manually killed router 
A.)

Router A was running under valgrind, and produced the following output:
 
{color:#ff} {color}
{color:#ff}[mick@colossus ~]$ /usr/bin/valgrind --leak-check=full 
--show-leak-kinds=definite --trace-children=yes 
--suppressions=/home/mick/latest/qpid-dispatch/tests/valgrind.supp 
/home/mick/latest/install/dispatch/sbin/qdrouterd  --config 
/home/mick/mercury/results/test_03/2018_12_06/config/A.conf -I 
/home/mick/latest/install/dispatch/lib/qpid-dispatch/python
==9409== Memcheck, a memory error detector
==9409== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9409== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9409== Command: /home/mick/latest/install/dispatch/sbin/qdrouterd --config 
/home/mick/mercury/results/test_03/2018_12_06/config/A.conf -I 
/home/mick/latest/install/dispatch/lib/qpid-dispatch/python
==9409==
^C==9409==
==9409== Process terminating with default action of signal 2 (SIGINT)
==9409==    at 0x61C0A37: kill (in 
/usr/lib64/[libc-2.26.so|http://libc-2.26.so/])
==9409==    by 0x401636: main (main.c:367)
==9409==
==9409== HEAP SUMMARY:
==9409== in use at exit: 6,933,690 bytes in 41,903 blocks
==9409==   total heap usage: 669,024 allocs, 627,121 frees, 92,449,020 bytes 
allocated
==9409==
==9409== *8,640 (480 direct, 8,160 indirect) bytes in 20 blocks are definitely 
lost in loss record 4,229 of 4,323*
==9409==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==9409==    by 0x4E7D336: qdr_error_from_pn (error.c:37)
==9409==    by 0x4E905D7: AMQP_link_detach_handler (router_node.c:822)
==9409==    by 0x4E60A6C: close_links (container.c:298)
==9409==    by 0x4E6109F: close_handler (container.c:311)
==9409==    by 0x4E6109F: qd_container_handle_event (container.c:639)
==9409==    by 0x4E93971: handle (server.c:985)
==9409==    by 0x4E944C8: thread_run (server.c:1010)
==9409==    by 0x4E947CF: qd_server_run (server.c:1284)
==9409==    by 0x40186E: main_process (main.c:112)
==9409==    by 0x401636: main (main.c:367)
==9409==
==9409== *14,256 (792 direct, 13,464 indirect) bytes in 33 blocks are 
definitely lost in loss record 4,261 of 4,323*
==9409==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==9409==    by 0x4E7D336: qdr_error_from_pn (error.c:37)
==9409==    by 0x4E905D7: AMQP_link_detach_handler (router_node.c:822)
==9409==    by 0x4E60A6C: close_links (container.c:298)
==9409==    by 0x4E6109F: close_handler (container.c:311)
==9409==    by 0x4E6109F: qd_container_handle_event (container.c:639)
==9409==    by 0x4E93971: handle (server.c:985)
==9409==    by 0x4E944C8: thread_run (server.c:1010)
{color}
{color:#ff}==9409==    by 0x550150A: start_thread (in 
/usr/lib64/[libpthread-2.26.so|http://libpthread-2.26.so/]){color}
 {color:#ff}==9409==    by 0x628138E: clone (in 
/usr/lib64/[libc-2.26.so|http://libc-2.26.so/])
==9409==
==9409== *575,713 (24 direct, 575,689 indirect) bytes in 1 blocks are 
definitely lost in loss record 4,321 of 4,323*
==9409==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==9409==    by 0x4E83FCA: qdr_add_link_ref (router_core.c:518)
==9409==    by 0x4E7A3BF: qdr_link_inbound_first_attach_CT (connections.c:1517)
==9409==    by 0x4E8484B: router_core_thread (router_core_thread.c:116)
==9409==    by 0x550150A: start_thread (in 
/usr/lib64/[libpthread-2.26.so|http://libpthread-2.26.so/])
==9409==    by 0x628138E: clone (in 
/usr/lib64/[libc-2.26.so|http://libc-2.26.so/])
==9409==
==9409== LEAK SUMMARY:
==9409==    definitely lost: 1,296 bytes in 54 blocks
==9409==    indirectly lost: 597,313 bytes in 3,096 blocks
==9409==  possibly lost: 1,473,248 bytes in 6,538 blocks
==9409==    still reachable: 4,861,833 bytes in 32,215 blocks
==9409== suppressed: 0 bytes in 0 blocks
==9409== Reachable blocks (those to which a pointer was found) are not shown.
==9409== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==9409==
==9409== For counts of detected and suppressed errors, rerun with: -v
==9409== ERROR SUMMARY: 1040 errors from 1040 contexts (suppressed: 0 from 0)
{color}
[mick@colossus ~]$
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1155) dueling httpRootDirs

2018-10-24 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1155:
-

 Summary: dueling httpRootDirs
 Key: DISPATCH-1155
 URL: https://issues.apache.org/jira/browse/DISPATCH-1155
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish
Assignee: michael goulish


New version of qpid-dispatch-router uses 
"/usr/share/qpid-dispatch/console/stand-alone" as the default httpRootDir. But 
when installing new qpid-dispatch-console package, the pages are available at 
"/usr/share/qpid-dispatch/console".

This forces the user to define httpRootDir on the listener to bypass this issue.

Ted suggests this fix:

 
Remove the default behavior for httpRootDir. If it is not specified in the 
configuration for a listener, then HTTP requests shall be rejected on 
connections to that listener. Such a listener would only be usable for AMQP 
over websockets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-959) Rate limiting policy

2018-10-18 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655668#comment-16655668
 ] 

michael goulish commented on DISPATCH-959:
--

This is not a bug, it's a new feature.

> Rate limiting policy
> 
>
> Key: DISPATCH-959
> URL: https://issues.apache.org/jira/browse/DISPATCH-959
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Policy Engine, Routing Engine
>Affects Versions: 1.0.1
>Reporter: Chuck Rolke
>Priority: Major
> Fix For: Backlog
>
>
> Router administrators would like rate-limiting policies to allow different 
> classes of users. A network-rate limit similar to how home cable networks are 
> provisioned for bandwidth is a classic model and is being considered as the 
> first choice.
> A message-per-second limit might be easier to enforce. But a single user 
> message may have a large data section, or have a small data section but have 
> huge message annotations. Thus a user might consume a lot of network 
> bandwidth with only a few messages.
> It is still unclear at what level the rate limiting should be applied. 
> Choices are:
>  * Per vhost
>  * Per vhost connection
>  * Per vhost user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-1139) support prioritized addresses

2018-10-18 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-1139.
-
Resolution: Implemented

> support prioritized addresses
> -
>
> Key: DISPATCH-1139
> URL: https://issues.apache.org/jira/browse/DISPATCH-1139
> Project: Qpid Dispatch
>  Issue Type: New Feature
>  Components: Router Node, Routing Engine, Tests
>        Reporter: michael goulish
>    Assignee: michael goulish
>Priority: Major
>
> Support a new field in the address descriptor in router configuration files 
> that will assign a priority to the address.
> Any message that does not have an intrinsic priority already assigned will 
> inherit the priority of the address to which it is sent.  If no priority is 
> explicitly assigned to an address, then it will be assigned the default 
> priority.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Resolved] (DISPATCH-1140) tests for message priority

2018-10-12 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved DISPATCH-1140.
---
Resolution: Duplicate

Sorry – I should have just included this with DISPATCH-1139.

When I PR that one, it will have a test that looks at both message and address 
priority.

 

 

> tests for message priority
> --
>
> Key: DISPATCH-1140
> URL: https://issues.apache.org/jira/browse/DISPATCH-1140
> Project: Qpid Dispatch
>  Issue Type: New Feature
>    Reporter: michael goulish
>        Assignee: michael goulish
>Priority: Major
>
> The message priority code recently checked in ( in DISPATCH-1096 ) should 
> have at least the following two tests:
>  
>  # Make a two-router network, A and B. Send messages from A to B, confirm 
> that they arrive, then kill and restart B and send and confirm more messages. 
> Do this test  once with B connecting to A, and once with A connecting to B.
>  # Two-router network again. Send some messages from A to B (i.e. sender 
> attached to A, rcvr to B) – sending at least one message of each priority.   
> ( 0 - 9, inclusive ). Send management commands to A to see how many outgoing 
> inter-router links had message traffic go over them. The number should be 10.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1140) tests for message priority

2018-10-05 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1140:
-

 Summary: tests for message priority
 Key: DISPATCH-1140
 URL: https://issues.apache.org/jira/browse/DISPATCH-1140
 Project: Qpid Dispatch
  Issue Type: New Feature
Reporter: michael goulish
Assignee: michael goulish


The message priority code recently checked in ( in DISPATCH-1096 ) should have 
at least the following two tests:

 
 # Make a two-router network, A and B. Send messages from A to B, confirm that 
they arrive, then kill and restart B and send and confirm more messages. Do 
this test  once with B connecting to A, and once with A connecting to B.
 # Two-router network again. Send some messages from A to B (i.e. sender 
attached to A, rcvr to B) – sending at least one message of each priority.   ( 
0 - 9, inclusive ). Send management commands to A to see how many outgoing 
inter-router links had message traffic go over them. The number should be 10.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Resolved] (DISPATCH-1096) support AMQP prioritized messages

2018-10-05 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved DISPATCH-1096.
---
Resolution: Implemented

I will open a separate Jira for tests that this code needs.

> support AMQP prioritized messages
> -
>
> Key: DISPATCH-1096
> URL: https://issues.apache.org/jira/browse/DISPATCH-1096
> Project: Qpid Dispatch
>  Issue Type: New Feature
>    Reporter: michael goulish
>        Assignee: michael goulish
>Priority: Major
> Fix For: 1.4.0
>
>
> Detect priority info from message header in the router code.
> Create separate inter-router links for the various priorities.
> Per connection (i.e. not globally across the router) service high-priority 
> inter-router links before low priority links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (PROTON-1949) no message header if priority == default

2018-10-05 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/PROTON-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-1949.
---
Resolution: Not A Problem

We have found a nice workaround for this–probably better, actually--and do not 
need proton to change anything.

 

> no message header if priority == default
> 
>
> Key: PROTON-1949
> URL: https://issues.apache.org/jira/browse/PROTON-1949
> Project: Qpid Proton
>  Issue Type: Bug
>    Reporter: michael goulish
>Priority: Major
>
> Proton does not send a message header if there would be nothing in it but the 
> priority field, and if the priority was set to the default value (4). 
> At the router level, we are allowing the user to set priorities on addresses. 
> Those priorities will be given to any message sent to that address if the 
> message otherwise had no priority set.
> So - we need to be able to distinguish between messages that were assigned 
> the default priority, and messages in which the priority was left undefined.
> We would like proton to send the priority field in the message header if the 
> user sets any priority. Then we will be able to interpret no header, or no 
> priority field in the header as "no priority was assigned".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-1949) no message header if priority == default

2018-10-05 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/PROTON-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640115#comment-16640115
 ] 

michael goulish commented on PROTON-1949:
-

Nolo contendere.

We have decided that it is better to give precedence to the address's priority, 
which means that we do not need an ability in the message to express _no value_.

I will close this as not-a-bug.

 

 

> no message header if priority == default
> 
>
> Key: PROTON-1949
> URL: https://issues.apache.org/jira/browse/PROTON-1949
> Project: Qpid Proton
>  Issue Type: Bug
>    Reporter: michael goulish
>Priority: Major
>
> Proton does not send a message header if there would be nothing in it but the 
> priority field, and if the priority was set to the default value (4). 
> At the router level, we are allowing the user to set priorities on addresses. 
> Those priorities will be given to any message sent to that address if the 
> message otherwise had no priority set.
> So - we need to be able to distinguish between messages that were assigned 
> the default priority, and messages in which the priority was left undefined.
> We would like proton to send the priority field in the message header if the 
> user sets any priority. Then we will be able to interpret no header, or no 
> priority field in the header as "no priority was assigned".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1139) support prioritized addresses

2018-10-04 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1139:
-

 Summary: support prioritized addresses
 Key: DISPATCH-1139
 URL: https://issues.apache.org/jira/browse/DISPATCH-1139
 Project: Qpid Dispatch
  Issue Type: New Feature
  Components: Router Node, Routing Engine, Tests
Reporter: michael goulish
Assignee: michael goulish


Support a new field in the address descriptor in router configuration files 
that will assign a priority to the address.

Any message that does not have an intrinsic priority already assigned will 
inherit the priority of the address to which it is sent.  If no priority is 
explicitly assigned to an address, then it will be assigned the default 
priority.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1126) ERROR Attempt to attach too many inter-router links for priority sheaf.

2018-10-04 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638896#comment-16638896
 ] 

michael goulish commented on DISPATCH-1126:
---

pending fix for this in PR 384

> ERROR Attempt to attach too many inter-router links for priority sheaf.
> ---
>
> Key: DISPATCH-1126
> URL: https://issues.apache.org/jira/browse/DISPATCH-1126
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.3.0
> Environment: Fedora 28
>  * Three router network in linear arrangement A - B - C.
>  * B has a listener; A and C connect to it
>  
>Reporter: Chuck Rolke
>Assignee: michael goulish
>Priority: Major
> Attachments: taj-GRN.log
>
>
> Some state probably not cleaned up when router connections are lost. 10 
> messages
>     (error) Attempt to attach too many inter-router links for priority sheaf.
> appear when routers reconnect.
> Start the network. Then kill routers A and C and restart them. Router B 
> prints the messages.
> Log file attached



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-1949) no message header if priority == default

2018-10-04 Thread michael goulish (JIRA)
michael goulish created PROTON-1949:
---

 Summary: no message header if priority == default
 Key: PROTON-1949
 URL: https://issues.apache.org/jira/browse/PROTON-1949
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish


Proton does not send a message header if there would be nothing in it but the 
priority field, and if the priority was set to the default value (4). 

At the router level, we are allowing the user to set priorities on addresses. 
Those priorities will be given to any message sent to that address if the 
message otherwise had no priority set.

So - we need to be able to distinguish between messages that were assigned the 
default priority, and messages in which the priority was left undefined.

We would like proton to send the priority field in the message header if the 
user sets any priority. Then we will be able to interpret no header, or no 
priority field in the header as "no priority was assigned".

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1135) Router A leaks memory when router B killed and restarted.

2018-10-01 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1135:
-

 Summary: Router A leaks memory when router B killed and restarted.
 Key: DISPATCH-1135
 URL: https://issues.apache.org/jira/browse/DISPATCH-1135
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish


I set up a 2-node router network, with B connecting to A.

No clients.

Repeatedly killing and restarting B – giving 3 seconds after each kill and 
after each restart for the network to settle down.   Repeated 100 times. The 
same router A ran for the duration of the test.

The 'ps' program, run repeatedly on router A, indicated that it was leaking 
about 82 KB per kill-and-restart.  Using 'qdstat m' on A after each 
kill-and-restart showed the following difference between iteration 1 and 
iteration 100.  ( Note, this shows growth of only 44 KB per iteration. )

 

As far as I looked into the past (about 1 year) I saw similar behavior.

 

In the chart below, the first column "size" is the number of bytes in a single 
struct of that type.

"In-threads" means how many of each struct are currently being used.

 

Note that, although there are no clients, the routers will be sending some 
messages to each other.

 

 

{{type    size  in-threads   in-threads    item  byte}}
{{   test 1  test 100 growth    growth}}
{{ ==}}

{{qd_buffer_t   536   256   2944    2688   1440768}}
{{ qd_message_content_t 1056   128   1216    1088   1148928}}
{{ qd_iterator_t 160   448   7488        7040   1126400}}
{{ qd_parsed_field_t  88   256   2880    2624    230912}}
{{ qdr_delivery_t    248   256   1152         896    08}}
{{ qd_message_t  160   256   1088 832    133120}}
{{ qd_connection_t  2320    32 64  32 74240}}
{{ qdr_general_work_t 64    64    448 384 24576}}
{{ qdr_link_t    360   192    256  64 23040}}
{{ qd_bitmask_t   24   192   1088 896 21504}}
{{ qdr_connection_work_t  48    64    384 320     15360}}
{{ qdr_link_work_t    48    64    384 320 15360}}
{{ qd_link_t              96   128    256 128     12288}}
{{ qdr_link_ref_t 24    64            448         384  9216}}
{{ qd_parsed_turbo_t  64   128    256 128  8192}}
{{ qd_link_ref_t  24    64    256         192  4608}}
{{ qdr_error_t    24    64    256         192  4608}}
{{ qd_deferred_call_t 32    64    192 128  4096}}
{{ qdr_terminus_t 64   192            256  64  4096}}
{{ qdr_delivery_ref_t 24    64    128  64      1536}}

 

( All other structs have zero growth. (Or, in one case, less.) )

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-1126) ERROR Attempt to attach too many inter-router links for priority sheaf.

2018-09-24 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1126:
-

Assignee: michael goulish

> ERROR Attempt to attach too many inter-router links for priority sheaf.
> ---
>
> Key: DISPATCH-1126
> URL: https://issues.apache.org/jira/browse/DISPATCH-1126
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.3.0
> Environment: Fedora 28
>  * Three router network in linear arrangement A - B - C.
>  * B has a listener; A and C connect to it
>  
>Reporter: Chuck Rolke
>Assignee: michael goulish
>Priority: Major
> Attachments: taj-GRN.log
>
>
> Some state probably not cleaned up when router connections are lost. 10 
> messages
>     (error) Attempt to attach too many inter-router links for priority sheaf.
> appear when routers reconnect.
> Start the network. Then kill routers A and C and restart them. Router B 
> prints the messages.
> Log file attached



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1096) support AMQP prioritized messages

2018-09-19 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621117#comment-16621117
 ] 

michael goulish commented on DISPATCH-1096:
---

The priority code should make messages default to priority 4 when there is no 
priority in the header, or no header at all in the message.

The proton library leaves out the message header (well, makes it an empty list) 
if there would otherwise be nothing but a default priority value in there.

> support AMQP prioritized messages
> -
>
> Key: DISPATCH-1096
> URL: https://issues.apache.org/jira/browse/DISPATCH-1096
> Project: Qpid Dispatch
>  Issue Type: New Feature
>    Reporter: michael goulish
>        Assignee: michael goulish
>Priority: Major
> Fix For: 1.4.0
>
>
> Detect priority info from message header in the router code.
> Create separate inter-router links for the various priorities.
> Per connection (i.e. not globally across the router) service high-priority 
> inter-router links before low priority links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1096) support AMQP prioritized messages

2018-08-06 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1096:
-

 Summary: support AMQP prioritized messages
 Key: DISPATCH-1096
 URL: https://issues.apache.org/jira/browse/DISPATCH-1096
 Project: Qpid Dispatch
  Issue Type: New Feature
Reporter: michael goulish
Assignee: michael goulish


Detect priority info from message header in the router code.

Create separate inter-router links for the various priorities.

Per connection (i.e. not globally across the router) service high-priority 
inter-router links before low priority links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-873) new routes calculated wrongly after connector deletion

2017-11-09 Thread michael goulish (JIRA)
michael goulish created DISPATCH-873:


 Summary: new routes calculated wrongly after connector deletion
 Key: DISPATCH-873
 URL: https://issues.apache.org/jira/browse/DISPATCH-873
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 1.0.0
Reporter: michael goulish
Priority: Blocker
 Fix For: 1.0.0


I have a 3-mesh network with nodes A, B, C.
B-->A cost is 10
C-->A cost is 10
B-->C cost is 100.

Initial route from B to C is calculated correctly as B,A,C : cost == 20.

But after I used qdmanage to delete the connector from B to A, I get no further 
messages delivered from B to C.
Using qdstat to look at routing table, it looks wrong:

Both B and C think they can only get to each other by going through A.  But 
there is now no route that way, because B-->A has been deleted.  They should be 
using the direct connection B-->C. Yet they both calculate the cost 
correctly as 100.



===
A  
===
Routers in the Network
router-id  next-hop  link  ver  cost  neighbors   valid-origins
A  (self)- 1  ['C']   []
B  C - 1110   ['A', 'C']  []
C  - 1 110['A', 'B']  ['B']
===
B  
===
Routers in the Network
router-id  next-hop  link  ver  cost  neighbors  valid-origins
B  (self)- 1  ['C']  []
C  A - 1100   [] []
===
C  
===
Routers in the Network
router-id  next-hop  link  ver  cost  neighbors   valid-origins
A  - 0 110['C']   []
B  A - 1100   ['A', 'C']  ['A']
C  (self)- 1  ['A', 'B']  []





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-870) connection improperly reopened from closed connector

2017-11-03 Thread michael goulish (JIRA)
michael goulish created DISPATCH-870:


 Summary: connection improperly reopened from closed connector
 Key: DISPATCH-870
 URL: https://issues.apache.org/jira/browse/DISPATCH-870
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 1.0.0
Reporter: michael goulish
Priority: Major


I have a 3-mesh router network, ABC, and I am sending messages from B to C.  
The route being used is B,A,C -- because I have configured it to be cheaper 
than B,C .

I use the management interface to kill the connector from C to A.  For the next 
two seconds my messages are released. I use another management call to confirm 
that the connector has really been removed. ( I also see it happening in the C 
code, at fn qd_connection_manager_delete_connector()  .   )

What We Expect: the network should re-route to start sending these messages on 
the route B,C -- because that is now the only route available.

What We Observe: after 2 seconds, the function try_open_lh() is called.  It 
reopens the connection from C to A even though the connector has been removed.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-04-11 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-1408.
---
   Resolution: Fixed
Fix Version/s: 0.18.0

Fixed with checkin d22f124b0534983f6557850e48f13317ec6df0e5

> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>    Assignee: michael goulish
> Fix For: 0.18.0
>
> Attachments: jira_proton_1408_reproducer.tar.gz
>
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-03-15 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-1408:

Attachment: jira_proton_1408_reproducer.tar.gz

Everything you need in a tidy little package.
I have 10 out of 10 reproductions with this.


> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>Assignee: Alan Conway
> Attachments: jira_proton_1408_reproducer.tar.gz
>
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-03-15 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926821#comment-15926821
 ] 

michael goulish commented on PROTON-1408:
-

I can now reproduce the problem 100%, and after just a couple minutes instead 
of 9 hours or 27 hours as it was initially.
This is done by:
  1. storing deliveries in the receiver and only acking when I get 100,000
  2. Altering proton code so that the first outgoing ID it uses is already 
close to 2^31 - 1

I am now packaging up all my stuff for the reproducer.


> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>Assignee: Alan Conway
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-03-01 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890860#comment-15890860
 ] 

michael goulish commented on PROTON-1408:
-

Using proton and dispatch code from 17 Feb 2017, I am running 5 simultaneous 
tests on a large machine, each with 1 router, 2 senders, 2 receivers.
So far I have no reproduction of the slow-down.  All the senders have gone 
beyond 8 billion messages with no slowdown at all.
OS is RHEL 7.2 .





> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>        Reporter: michael goulish
>Assignee: Alan Conway
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-02-17 Thread michael goulish (JIRA)
michael goulish created PROTON-1408:
---

 Summary: long-lived connections suffer large performance hit after 
many messages
 Key: PROTON-1408
 URL: https://issues.apache.org/jira/browse/PROTON-1408
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


In long-running soak tests, in which connections are never taken down, I am 
seeing a sudden & severe performance degradation when the number of messages 
over the connection reaches about 6.4 billion.  

This is happening in tests with two senders, two receivers & one router 
intermediating.  

I have tried C libUV clients as well as CPP clients.  Behavior is not 
identical, but I see sudden performance drop, ie. 8x throughput decrease or 
worse, in both cases.

Alan / Ted / Ken see an issue in use of improper comparison logic in 
pn_do_disposition(), in transport.c  . I am trying to prove this now.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-372) qdstat should have a timeout command line argument

2016-06-08 Thread michael goulish (JIRA)
michael goulish created DISPATCH-372:


 Summary: qdstat should have a timeout command line argument
 Key: DISPATCH-372
 URL: https://issues.apache.org/jira/browse/DISPATCH-372
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish


qdstat should have a timeout command line argument.
but -- it doesn't.

sometimes when the router is busy, it is helpful to allow a longer timeout.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-369) investigate excursions in memory usage

2016-06-08 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321466#comment-15321466
 ] 

michael goulish commented on DISPATCH-369:
--

...and without anything interesting showing up in the output from 'qdstat -m'.



> investigate excursions in memory usage
> --
>
> Key: DISPATCH-369
> URL: https://issues.apache.org/jira/browse/DISPATCH-369
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
>    Reporter: michael goulish
>Assignee: michael goulish
> Attachments: n_senders_vs_MEM_three_trials.jpg
>
>
> I don't know if this is a bug or not.  I'm Jirifying it as a way of 
> remembering an interesting behavior that my testing has shown, so that I can 
> continue developing the testing and  come back to this later.
> ...
> While measuring router memory usage under varying message rate and number of 
> senders -- when I run the same test multiple times, I am occasionally (about 
> 1 in 4 times or so) seeing a test in which memory usage is much higher than 
> the others.
> For example:
>   In this test:
>   {
> straight-through topology ( 1 sender --> 1 address --> 1 receiver )
> 200 senders
> 200 messages per second
> 100 bytes per message
>   }
> I record router memory usage at the point when all receivers are just hitting 
> 10,000 messages.   (This is because it grows -- see previous JIRA.)
> In three iterations I get the following memory usage:
>66 MB
>63 MB
>   181 MB
> Something similar, but less drastic, happened occasionally at lower levels in 
> the test.  
> In this case, this is a tripling of memory usage for the same scenario.  I 
> doubt that this is the result of slightly  different timing in a block 
> allocation of data structures.  What just happened?
> Start by investigating with "qdstat -m"  and see if that shows some or all of 
> the difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-369) investigate excursions in memory usage

2016-06-08 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321448#comment-15321448
 ] 

michael goulish commented on DISPATCH-369:
--

I rebuilt dispatch without the memory pooling feature, expecting that this 
would make the memory blow-ups go away.  It did not!  On the 7th run of my 
test, I saw memory go from 60 MB  (Resident Set Size) to 480 MB between one 
printout of 'top' and the next.  (3 seconds)  -- same behavior I was seeing 
with memory pooling enabled.

> investigate excursions in memory usage
> --
>
> Key: DISPATCH-369
> URL: https://issues.apache.org/jira/browse/DISPATCH-369
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
>    Reporter: michael goulish
>Assignee: michael goulish
> Attachments: n_senders_vs_MEM_three_trials.jpg
>
>
> I don't know if this is a bug or not.  I'm Jirifying it as a way of 
> remembering an interesting behavior that my testing has shown, so that I can 
> continue developing the testing and  come back to this later.
> ...
> While measuring router memory usage under varying message rate and number of 
> senders -- when I run the same test multiple times, I am occasionally (about 
> 1 in 4 times or so) seeing a test in which memory usage is much higher than 
> the others.
> For example:
>   In this test:
>   {
> straight-through topology ( 1 sender --> 1 address --> 1 receiver )
> 200 senders
> 200 messages per second
> 100 bytes per message
>   }
> I record router memory usage at the point when all receivers are just hitting 
> 10,000 messages.   (This is because it grows -- see previous JIRA.)
> In three iterations I get the following memory usage:
>66 MB
>63 MB
>   181 MB
> Something similar, but less drastic, happened occasionally at lower levels in 
> the test.  
> In this case, this is a tripling of memory usage for the same scenario.  I 
> doubt that this is the result of slightly  different timing in a block 
> allocation of data structures.  What just happened?
> Start by investigating with "qdstat -m"  and see if that shows some or all of 
> the difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-369) investigate excursions in memory usage

2016-06-07 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated DISPATCH-369:
-
Attachment: n_senders_vs_MEM_three_trials.jpg

Results of repeating each test three times, showing occasional excursions in 
memory usage.



> investigate excursions in memory usage
> --
>
> Key: DISPATCH-369
> URL: https://issues.apache.org/jira/browse/DISPATCH-369
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
>    Reporter: michael goulish
>Assignee: michael goulish
> Attachments: n_senders_vs_MEM_three_trials.jpg
>
>
> I don't know if this is a bug or not.  I'm Jirifying it as a way of 
> remembering an interesting behavior that my testing has shown, so that I can 
> continue developing the testing and  come back to this later.
> ...
> While measuring router memory usage under varying message rate and number of 
> senders -- when I run the same test multiple times, I am occasionally (about 
> 1 in 4 times or so) seeing a test in which memory usage is much higher than 
> the others.
> For example:
>   In this test:
>   {
> straight-through topology ( 1 sender --> 1 address --> 1 receiver )
> 200 senders
> 200 messages per second
> 100 bytes per message
>   }
> I record router memory usage at the point when all receivers are just hitting 
> 10,000 messages.   (This is because it grows -- see previous JIRA.)
> In three iterations I get the following memory usage:
>66 MB
>63 MB
>   181 MB
> Something similar, but less drastic, happened occasionally at lower levels in 
> the test.  
> In this case, this is a tripling of memory usage for the same scenario.  I 
> doubt that this is the result of slightly  different timing in a block 
> allocation of data structures.  What just happened?
> Start by investigating with "qdstat -m"  and see if that shows some or all of 
> the difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-369) investigate excursions in memory usage

2016-06-07 Thread michael goulish (JIRA)
michael goulish created DISPATCH-369:


 Summary: investigate excursions in memory usage
 Key: DISPATCH-369
 URL: https://issues.apache.org/jira/browse/DISPATCH-369
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.6.0
Reporter: michael goulish
Assignee: michael goulish


I don't know if this is a bug or not.  I'm Jirifying it as a way of remembering 
an interesting behavior that my testing has shown, so that I can continue 
developing the testing and  come back to this later.

...


While measuring router memory usage under varying message rate and number of 
senders -- when I run the same test multiple times, I am occasionally (about 1 
in 4 times or so) seeing a test in which memory usage is much higher than the 
others.

For example:
  In this test:
  {
straight-through topology ( 1 sender --> 1 address --> 1 receiver )
200 senders
200 messages per second
100 bytes per message
  }

I record router memory usage at the point when all receivers are just hitting 
10,000 messages.   (This is because it grows -- see previous JIRA.)

In three iterations I get the following memory usage:

   66 MB
   63 MB
  181 MB

Something similar, but less drastic, happened occasionally at lower levels in 
the test.  

In this case, this is a tripling of memory usage for the same scenario.  I 
doubt that this is the result of slightly  different timing in a block 
allocation of data structures.  What just happened?

Start by investigating with "qdstat -m"  and see if that shows some or all of 
the difference.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



Re: Review Request 48362: PROTON-1221: c++ container::schedule() support.

2016-06-07 Thread Michael Goulish



> 
> In other words, are we trying for a fixed frequency or an "at least"
> delay?
> 

What I need this feature for is to be able to send messages at a fixed 
frequency.


-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-344) memory growth after repeated calls from qdstat -m

2016-05-24 Thread michael goulish (JIRA)
michael goulish created DISPATCH-344:


 Summary: memory growth after repeated calls from qdstat -m
 Key: DISPATCH-344
 URL: https://issues.apache.org/jira/browse/DISPATCH-344
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
Reporter: michael goulish


0. version of dispatch code is   0.6.0 RC3
1. bring up a router
2. do not attach any clients, except...
3. ...repeatedly invoke qdstat -m on the router 

result:

After 1000 calls from "qdstat -m", top shows that router memory has grown by 
4947968 bytes.  The output from "qdstat -m" accounts for about 63% of that, or 
318 bytes.

Here are the data types that increased, according to qdstat, ordered from 
largest to smallest.



Um.   This table looked really nice when it was in a fixed-width font.




  type   size   total total increase   increase
beforeafter structs bytes
  
=
  qd_log_entry_t 2104112  1040 928 1952512
  qd_buffer_t536  80  11201040  557440
  qd_field_iterator_t128 192  12801088  139264
  qdr_delivery_t 136  64   512 448   60928
  qdr_connection_t   216  64   320 256   55296
  qdr_field_t40  192  12801088   43520
  qd_connection_t224  64   256 192   43008
  qd_message_content_t   640  1680  64   40960
  qd_message_t   128 192   512 320   40960
  qdpn_connector_t   600  1664  48   28800
  qdr_general_work_t 64   64   512 448   28672
  qdr_connection_work_t  56   64   512 448   25088
  qd_composite_t 112  64   256 192   21504
  qdr_link_t 264  1680  64   16896
  qd_composed_field_t64   64   256 192   12288
  qdr_terminus_t 64   64   256 192   12288
  qdr_delivery_ref_t 24   64   512 448   10752
  qdr_link_ref_t 24   64   512 448   10752
  qd_parsed_field_t  80  128   256 128   10240
  qdr_action_t   160 256   320  64   10240
  qd_link_t  48   64   256 1929216
  qdr_error_t240   320 3207680
  qd_deferred_call_t 32   64   256 1926144


grand total increase from qdstat:318
grand total increase from top:   4947968



Here is the script I used
This input window is breaking some lines.   >:-(   


#! /bin/bash

echo "NOTE:  router should already be running."

INSTALL_ROOT=${SHACKLETON_ROOT}/install
PROTON_INSTALL_DIR=${INSTALL_ROOT}/proton
DISPATCH_INSTALL_DIR=${INSTALL_ROOT}/dispatch

QDSTAT=${DISPATCH_INSTALL_DIR}/bin/qdstat

export LD_LIBRARY_PATH=${DISPATCH_INSTALL_DIR}/lib64:${PROTON_INSTALL_DIR}/lib64
export 
PYTHONPATH=${DISPATCH_INSTALL_DIR}/lib/qpid-dispatch/python:${DISPATCH_INSTALL_DIR}/lib/python2.7/site-packages:${PROTON_INSTALL_DIR}/lib64/proton/bindings/python

ROUTER_PID=`ps -aef | grep qdrouterd | grep -v grep | awk '{print $2}'`

count=1
while [ $count -lt 1001 ]
do
  echo "==="
  echo "TEST $count"
  echo "==="
  count=$(( $count + 1 ))

  top -b -n 1 -p ${ROUTER_PID}

  ${QDSTAT} -m

  sleep 3
done




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.

2016-04-29 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264603#comment-15264603
 ] 

michael goulish commented on PROTON-992:


Dispatch is not yet immune to this issue.
Also, I think Proton needs to let the application handle initialization and 
shutdown of Cyrus SASL.

I made a test that brings up a 6-router network, and randomly kills and 
restarts routers.
I get a router core, usually within 5 iterations, because of this issue.

Here is how I fixed it:

  1. Let dispatch code call sasl_client_init() and sasl_server_init()  at the 
top of qd_server_run().  And remove these calls from Proton.  In keeping these 
calls to itself, Proton cannot prevent two threads from simultaneously getting 
into sasl_*_init().  SegV City.

  2. Prevent proton from calling sasl_{client,server}_done(), in 
pni_sasl_impl_free().   Being thread-agnostic, Proton cannot possibly know when 
it's safe to dispose of the sasl object, which is being used by many threads.   
Both of those Cyrus calls affect global state by NULLing out a global pointer 
that stores the mechanisms string.

With these changes, my test has now run to 400 iterations with no crash.



> Proton's use of Cyrus SASL is not thread-safe.
> --
>
> Key: PROTON-992
> URL: https://issues.apache.org/jira/browse/PROTON-992
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Affects Versions: 0.10
>    Reporter: michael goulish
>Assignee: Andrew Stitcher
>Priority: Critical
>
> Documentation for the Cyrus SASL library says that the library is believed to 
> be thread-safe only if the code that uses it meets several requirements.
> The requirements are:
> * you supply mutex functions (see sasl_set_mutex())
> * you make no libsasl calls until sasl_client/server_init() completes
> * no libsasl calls are made after sasl_done() is begun
> * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.
> It says explicitly that that sasl_set* calls are not thread safe, since they 
> set global state.
> The proton library makes calls to sasl_set* functions in :
>   pni_init_client()
>   pni_init_server(), and
>   pni_process_init()
> Since those are internal functions, there is no way for code that uses Proton 
> to lock around those calls.
> I think proton needs a new API call to let applications call 
> sasl_set_mutex().  Or something.
> We probably also need other protections to meet the other requirements 
> specified in the Cyrus documentation (and quoted above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-296) segfault on router startup

2016-04-27 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260774#comment-15260774
 ] 

michael goulish commented on DISPATCH-296:
--

I have also seen this crash, with same frequency Gordon is describing.
In my case, I have a network of 6 routers.  I repeatedly kill one and replace 
it.
After a few such kills and restarts, I see this crash.

After instrumenting the Cyrus SASL code, I see a bad situation just before the 
crash:
two threads from same process both inside the Cyrus fn sasl_client_init()  
within a
few microseconds of each other.  

The Cyrus SASL code for the fn sasl_client_init() has a little logic to try and 
protect 
against multiple calls to the function -- but it will not work in a 
multi-threaded 
environment except by luck.

MDEBUG proton called sasl_client_init.  PID 28668 TID 7f1ac85a01c0  TIME 
1461781160.774368  <- different threads in same fn 7 usec apart  
MDEBUG proton called sasl_client_init.  PID 28668 TID 7f1abaca1700  TIME 
1461781160.774375  <- just before crash in sasl_dispose
MDEBUG proton calling sasl_dispose.   PID 28668 TID  7f1ac85a01c0  TIME 
1461781160.77
MDEBUG proton calling sasl_dispose.   PID 28668  TID 7f1abaca1700  TIME 
1461781160.774532




> segfault on router startup
> --
>
> Key: DISPATCH-296
> URL: https://issues.apache.org/jira/browse/DISPATCH-296
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Container
>Affects Versions: 0.6
>Reporter: Gordon Sim
> Attachments: multiconnect.conf
>
>
> Starting up a router with a couple of connectors (connectingto qpidd 
> instances in my case), the router occasionally (maybe one in five) crashes 
> with a segfault.
> {noformat}
> (gdb) bt
> #0  0x7629c76e in sasl_client_add_plugin () from /lib64/libsasl2.so.3
> #1  0x7629cf58 in sasl_client_init () from /lib64/libsasl2.so.3
> #2  0x7796ecff in pni_init_client 
> (transport=transport@entry=0x7fffdc008fc0) at 
> /home/gordon/projects/proton/proton-c/src/sasl/cyrus_sasl.c:115
> #3  0x7796e87e in pn_do_mechanisms (transport=0x7fffdc008fc0, 
> frame_type=, channel=, args=, 
> payload=)
> at /home/gordon/projects/proton/proton-c/src/sasl/sasl.c:703
> #4  0x77959b26 in pni_dispatch_action (payload=0x7fffe96f2360, 
> args=0x7fffdc0091c0, channel=0, frame_type=1 '\001', lcode=, 
> transport=0x7fffdc008fc0)
> at /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:74
> #5  pni_dispatch_frame (args=0x7fffdc0091c0, transport=0x7fffdc008fc0, 
> frame=...) at 
> /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:116
> #6  pn_dispatcher_input (transport=0x7fffdc008fc0, bytes=0x7fffdc00f358 "", 
> available=0, batch=false, halt=0x7fffdc009144) at 
> /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:135
> #7  0x7795fbba in transport_consume 
> (transport=transport@entry=0x7fffdc008fc0) at 
> /home/gordon/projects/proton/proton-c/src/transport/transport.c:1751
> #8  0x779630d2 in pn_transport_process 
> (transport=transport@entry=0x7fffdc008fc0, size=) at 
> /home/gordon/projects/proton/proton-c/src/transport/transport.c:2860
> #9  0x77bb08e3 in qdpn_connector_process (c=0x7fffdc0068c0) at 
> /home/gordon/projects/dispatch/src/posix/driver.c:761
> #10 0x77bc3a91 in process_connector (cxtr=0x7fffdc0068c0, 
> qd_server=0x702b50) at /home/gordon/projects/dispatch/src/server.c:683
> #11 thread_run (arg=0x87b9b0) at 
> /home/gordon/projects/dispatch/src/server.c:958
> #12 0x7772660a in start_thread () from /lib64/libpthread.so.0
> #13 0x76c8ba4d in clone () from /lib64/libc.so.6
> {noformat}
> other threads:
> {noformat}
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x77fd1180 (LWP 19319))]
> #0  0x7772e89d in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x7772e89d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x777289cd in pthread_mutex_lock () from /lib64/libpthread.so.0
> #2  0x77bb1239 in sys_mutex_lock (mutex=0x702da0) at 
> /home/gordon/projects/dispatch/src/posix/threading.c:70
> #3  0x77bc4723 in qd_timer (qd=qd@entry=0x604240, 
> cb=cb@entry=0x77bc11b0 , context=context@entry=0x702b50) at 
> /home/gordon/projects/dispatch/src/timer.c:89
> #4  0x77bc3f33 in qd_server_run (qd=0x604240) at 
> /home/gordon/projects/dispatch/src/server.c:1349
> #5  0x00401ac7 in main_process 
> (config_path=config_path@entry=0x7fffe090 
> "./etc/qpid-dispatch/multiconnect.conf", 
> 

Re: [dispatch] Agreeing on terminology about directions.

2016-02-23 Thread Michael Goulish


I think this is confusing, because the link origins
seem a little router-centric, while the link message 
directions are client-centric.

Why not make everything rclearly router-centric, in
this context.  Like this:


  links made by clients are "client links"  (or "client-originated links")
  links made by routers are "router links"  (or "router-originated links")

  links pointing at clients are pointing "outward"
  links pointing at routers are pointing "inward"
  links between two routers are "internal"





- Original Message -
> Direction of message flow is tricky to describe in router
> configuration. "in" and "out", "sender" and "receiver" all have
> opposite meaning depending on whether you are thinking from a router or
> client perspective.
> 
> Here's what I would propose based on the existing use of "in" and "out"
> in the linkRoutePattern configuration. This is the opposite of how I
> usually think of "in" and "out" but I suspect this is a 50/50 issue
> where it doesn't matter which we pick as long as we pick one.
> 
> Note I'm using "relay" to mean router-initiated links, which currently
> means waypoints and link-routes.
> 
> ===
> Directional terminology
> ===
> 
> Connections: can be established to the router (via listeners) or from
> the router (via connectors) The direction that the connection was made
> has no effect on how it can be used.
> 
> Links opened from outside the router are "client" links.  Links opened
> by the router are "relay" links.
> 
> Links that receive messages from the router network are called *in*
> links because messages flow from the network *into* a client or are
> relayed *into* an external system.
> 
> Links that send messages *to* the router network are called "out" links
> because messages flow *out* of clients or external systems to the
> network.
> 
> 
> Shout if I've got this wrong or if anyone has ideas for less ambiguous
> terms than in/out or send/receive.
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
> For additional commands, e-mail: dev-h...@qpid.apache.org
> 
> 

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-210) try an epoll-based driver ...

2016-01-29 Thread michael goulish (JIRA)
michael goulish created DISPATCH-210:


 Summary: try an epoll-based driver ...
 Key: DISPATCH-210
 URL: https://issues.apache.org/jira/browse/DISPATCH-210
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish



...to improve scalability to large numbers of attached messaging apps.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



Review Request 38439: PROTON-992 : introduce pn_init() fn

2015-09-16 Thread michael goulish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38439/
---

Review request for qpid, Alan Conway, Kenneth Giusti, and Ted Ross.


Repository: qpid-proton-git


Description
---

PROTON-992 : introduce a pn_init() fn, so we can initialize things like Cyrus 
SASL exactly-once.  
New file proton.h to hold declaration, because this doesn't seem to fit 
anywhere else.
There is a flag in the sasl code to show whether this new interface has been 
used or not.
If not -- allow code to function as before.  Don't require current proton apps 
to change.


Diffs
-

  proton-c/include/proton/proton.h PRE-CREATION 
  proton-c/src/sasl/cyrus_sasl.c 809bad5 
  proton-c/src/sasl/none_sasl.c 674326f 
  proton-c/src/sasl/sasl-internal.h b3f4c7f 
  proton-c/src/util.c e2c6727 

Diff: https://reviews.apache.org/r/38439/diff/


Testing
---

ctest -VV to confirm that not using the new interface does not break anything.

And I hacked a copy of dispatch router to make it call the new pn_init() from 
main -- and confirm that the cyrus sasl server init and client init fns are 
called exactly once.


Thanks,

michael goulish



[jira] [Created] (DISPATCH-157) add sasl tests to dispatch unit tests

2015-08-26 Thread michael goulish (JIRA)
michael goulish created DISPATCH-157:


 Summary: add sasl tests to dispatch unit tests
 Key: DISPATCH-157
 URL: https://issues.apache.org/jira/browse/DISPATCH-157
 Project: Qpid Dispatch
  Issue Type: Improvement
  Components: Tests
Affects Versions: 0.5
Reporter: michael goulish
Assignee: michael goulish
 Fix For: 0.5


Add a complete set of sasl tests to the Dispatch unit test framework.
ensure correct behavior for cross-product of 

   authenticatePeer  := { no, yes, insecureOk }
  x
   saslMechanisms:= { NONE, PLAIN, DIGEST-MD5, CRAM-MD5, GSSAPI, SRP }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



Re: Review Request 36992: PROTON-886 and PROTON-930 -- handle_max and AMQP numeric default constants

2015-08-05 Thread michael goulish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36992/
---

(Updated Aug. 5, 2015, 9:52 a.m.)


Review request for qpid, Kenneth Giusti and Ted Ross.


Changes
---

Improvements from kgiusti's feedback.


Repository: qpid-proton-git


Description
---

PROTON-886 and PROTON-930 -- handle_max and AMQP numeric default constants.  

Note:
{
  I am skipping the java implementation for now,
  because I would like to get the C version out of
  my life.  (And this diff is already quite large
  enough, I think.)

  I will write a separate JIRA for the java version
  and make it refer to this one.  I will work on it
  as a background task.
}

This diff adds two pieces of functionality:

  1. extract numeric default valuies from AMQP
 xml files and store them as defined constants
 in protocol.h  (PROTON_930)

  2. add code to enforce AMQP 1.0 spec mandated
 behavior with respect to handle-max values
 -- i.e. negotiated limits on number of links --
 per session.  (PROTON-886)

These two changes are combined into one checkin
because of some earlier feedback I got suggesting
that I not check in PROTON-930 until I had some
code that could actually use the constants that it
creates.

The code that is generated by the changes in
proton-c/src/protocol.h.py show up in the file
protocol.h.  It is this:

- begin snippet ---
/* Numeric default values */
#define AMQP_OPEN_MAX_FRAME_SIZE_DEFAULT 4294967295
#define AMQP_OPEN_CHANNEL_MAX_DEFAULT 65535
#define AMQP_BEGIN_HANDLE_MAX_DEFAULT 4294967295
#define AMQP_SOURCE_TIMEOUT_DEFAULT 0
#define AMQP_TARGET_TIMEOUT_DEFAULT 0
- end snippet ---


Diffs (updated)
-

  proton-c/bindings/python/proton/__init__.py 46b9466 
  proton-c/include/proton/session.h 94d2869 
  proton-c/src/engine/engine-internal.h 727f50d 
  proton-c/src/engine/engine.c 9043e0b 
  proton-c/src/protocol.h.py bbc0dfe 
  proton-c/src/transport/transport.c 6abf862 
  tests/python/proton_tests/engine.py 7a1d539 

Diff: https://reviews.apache.org/r/36992/diff/


Testing
---

ctest -VV with Java tests running.


Thanks,

michael goulish



Re: Review Request 36992: PROTON-886 and PROTON-930 -- handle_max and AMQP numeric default constants

2015-08-05 Thread michael goulish


 On July 31, 2015, 8:27 p.m., Kenneth Giusti wrote:
  proton-c/src/engine/engine.c, line 2230
  https://reviews.apache.org/r/36992/diff/1/?file=1026182#file1026182line2230
 
  use pn_min() in utils.h

Wow we ... have a min macro.  I...  That'sOK!   Absolutely.


 On July 31, 2015, 8:27 p.m., Kenneth Giusti wrote:
  proton-c/src/protocol.h.py, line 39
  https://reviews.apache.org/r/36992/diff/1/?file=1026183#file1026183line39
 
  Yay! another chance to make you a Python programmer!
  
  You don't need the standalone int conversion statement to test the type 
  (it's not 'natural' python):
  
  int(default)
  
  Use '%d' as the format flag (instead of %s), and pass it the 
  int(default) in the print() statement.

Bah.  What's 'natural'?  standalone makes the intent more clear.  Now I'm 
putting a comment there.  Next they'll be telling me how to use whitespace.  
Oh, wait...


- michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36992/#review93765
---


On Aug. 5, 2015, 9:52 a.m., michael goulish wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36992/
 ---
 
 (Updated Aug. 5, 2015, 9:52 a.m.)
 
 
 Review request for qpid, Kenneth Giusti and Ted Ross.
 
 
 Repository: qpid-proton-git
 
 
 Description
 ---
 
 PROTON-886 and PROTON-930 -- handle_max and AMQP numeric default constants.  
 
 Note:
 {
   I am skipping the java implementation for now,
   because I would like to get the C version out of
   my life.  (And this diff is already quite large
   enough, I think.)
 
   I will write a separate JIRA for the java version
   and make it refer to this one.  I will work on it
   as a background task.
 }
 
 This diff adds two pieces of functionality:
 
   1. extract numeric default valuies from AMQP
  xml files and store them as defined constants
  in protocol.h  (PROTON_930)
 
   2. add code to enforce AMQP 1.0 spec mandated
  behavior with respect to handle-max values
  -- i.e. negotiated limits on number of links --
  per session.  (PROTON-886)
 
 These two changes are combined into one checkin
 because of some earlier feedback I got suggesting
 that I not check in PROTON-930 until I had some
 code that could actually use the constants that it
 creates.
 
 The code that is generated by the changes in
 proton-c/src/protocol.h.py show up in the file
 protocol.h.  It is this:
 
 - begin snippet ---
 /* Numeric default values */
 #define AMQP_OPEN_MAX_FRAME_SIZE_DEFAULT 4294967295
 #define AMQP_OPEN_CHANNEL_MAX_DEFAULT 65535
 #define AMQP_BEGIN_HANDLE_MAX_DEFAULT 4294967295
 #define AMQP_SOURCE_TIMEOUT_DEFAULT 0
 #define AMQP_TARGET_TIMEOUT_DEFAULT 0
 - end snippet ---
 
 
 Diffs
 -
 
   proton-c/bindings/python/proton/__init__.py 46b9466 
   proton-c/include/proton/session.h 94d2869 
   proton-c/src/engine/engine-internal.h 727f50d 
   proton-c/src/engine/engine.c 9043e0b 
   proton-c/src/protocol.h.py bbc0dfe 
   proton-c/src/transport/transport.c 6abf862 
   tests/python/proton_tests/engine.py 7a1d539 
 
 Diff: https://reviews.apache.org/r/36992/diff/
 
 
 Testing
 ---
 
 ctest -VV with Java tests running.
 
 
 Thanks,
 
 michael goulish
 




Re: [VOTE] Move Dispatch to git

2015-08-05 Thread Michael Goulish

We should move the code to git,
Any way you look at it.
It's the best code control system
So new and shiny that it glistens.

We should move it any day.
We should move it right away.
But one fact here we should construe.
When I say We, I'm meaning You.

:-)


[X] Yes - move the qpid-dispatch repo to git
[ ] No - no change: leave qpid-dispatch on subversion



-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



Re: Dispatch devs: any interest in moving to git?

2015-08-03 Thread Michael Goulish

+1

It does clear like the development universe has decided that git
is the preferred technology.  Then it's worth it just so that developers
on this project will not have to keep two different source code control
systems in their heads.

Also, I do _not_ think that git is just a fashion.  It's a fundamentally
different way for sccs to work, and I bet that most projects will still
be using it 10 years from now.  I sure cannot say that about svn.



- Original Message -

Show of hands: would any of the dispatch developers object to moving the 
dispatch project from subversion to git?

I've been using the git-svn conduit with dispatch, and, well, it just feels 
like a hack.

Sure, I'm personally biased towards git, and maybe there really isn't a solid 
technical reason behind moving to it (though I'd bet some github users would 
argue that).

Opinions?

-K

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



Review Request 36992: PROTON-886 and PROTON-930 -- handle_max and AMQP numeric default constants

2015-07-31 Thread michael goulish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36992/
---

Review request for qpid, Kenneth Giusti and Ted Ross.


Repository: qpid-proton-git


Description
---

PROTON-886 and PROTON-930 -- handle_max and AMQP numeric default constants.  

Note:
{
  I am skipping the java implementation for now,
  because I would like to get the C version out of
  my life.  (And this diff is already quite large
  enough, I think.)

  I will write a separate JIRA for the java version
  and make it refer to this one.  I will work on it
  as a background task.
}

This diff adds two pieces of functionality:

  1. extract numeric default valuies from AMQP
 xml files and store them as defined constants
 in protocol.h  (PROTON_930)

  2. add code to enforce AMQP 1.0 spec mandated
 behavior with respect to handle-max values
 -- i.e. negotiated limits on number of links --
 per session.  (PROTON-886)

These two changes are combined into one checkin
because of some earlier feedback I got suggesting
that I not check in PROTON-930 until I had some
code that could actually use the constants that it
creates.

The code that is generated by the changes in
proton-c/src/protocol.h.py show up in the file
protocol.h.  It is this:

- begin snippet ---
/* Numeric default values */
#define AMQP_OPEN_MAX_FRAME_SIZE_DEFAULT 4294967295
#define AMQP_OPEN_CHANNEL_MAX_DEFAULT 65535
#define AMQP_BEGIN_HANDLE_MAX_DEFAULT 4294967295
#define AMQP_SOURCE_TIMEOUT_DEFAULT 0
#define AMQP_TARGET_TIMEOUT_DEFAULT 0
- end snippet ---


Diffs
-

  proton-c/bindings/python/proton/__init__.py 46b9466 
  proton-c/include/proton/session.h 94d2869 
  proton-c/src/engine/engine-internal.h 727f50d 
  proton-c/src/engine/engine.c 9043e0b 
  proton-c/src/protocol.h.py bbc0dfe 
  proton-c/src/transport/transport.c 6abf862 
  tests/python/proton_tests/engine.py 7a1d539 

Diff: https://reviews.apache.org/r/36992/diff/


Testing
---

ctest -VV with Java tests running.


Thanks,

michael goulish



Re: Review Request 35798: PROTON-919: make the C impl behave same as Java wrt channel_max error

2015-07-17 Thread michael goulish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35798/
---

(Updated July 17, 2015, 2:39 p.m.)


Review request for qpid, Andrew Stitcher and Kenneth Giusti.


Repository: qpid-proton-git


Description
---

Alter C impl to return error code if we attempt to change local channel_max 
after OPEN frame has been sent.   Also alter python binding to detect that 
error code, and throw exception.  This way, the C and Java versions of one of 
the test of channel_max functionality have the same behavior.


Diffs
-

  proton-c/include/proton/error.h c6c7d2e 
  proton-c/include/proton/transport.h 483f5a9 
  proton-c/src/transport/transport.c ff80e21 
  tests/python/proton_tests/engine.py 258665d 

Diff: https://reviews.apache.org/r/35798/diff/


Testing (updated)
---

ctest -VV   ---  C and Java

Please note:  This Jira changes the public interface in that it adds #define 
PN_OK 0 to the list of possible error return values in error.h


Thanks,

michael goulish



Re: Review Request 36546: PROTON-930: extract numeric default values from AMQP xml at build-time.

2015-07-17 Thread michael goulish


 On July 16, 2015, 2:51 p.m., Andrew Stitcher wrote:
  proton-c/src/protocol.h.py, line 32
  https://reviews.apache.org/r/36546/diff/1/?file=1013400#file1013400line32
 
  Perhaps the constants should have a prefix like AMQP_ ? It would make 
  them a bit long though, and I don't remember if that is consistent with the 
  other definitions output here or not.
  
  Perhaps ping Rafi and ask him for an opinion too.

I do think the AMQP_ should be prepended to make it clear that these constants 
are extracted from the spec xml.   But I am asking Rafi to take a glance.


- michael


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36546/#review91888
---


On July 16, 2015, 1:52 p.m., michael goulish wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/36546/
 ---
 
 (Updated July 16, 2015, 1:52 p.m.)
 
 
 Review request for qpid and Andrew Stitcher.
 
 
 Repository: qpid-proton-git
 
 
 Description
 ---
 
 PROTON-930: extract numeric default values from AMQP xml at build-time.
 
 
 Diffs
 -
 
   proton-c/src/protocol.h.py bbc0dfe 
   proton-c/src/transport/transport.c 7bce3b5 
 
 Diff: https://reviews.apache.org/r/36546/diff/
 
 
 Testing
 ---
 
 build
 ctest -VV
 
 
 Thanks,
 
 michael goulish
 




Re: Review Request 36546: PROTON-930: extract numeric default values from AMQP xml at build-time.

2015-07-17 Thread michael goulish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36546/
---

(Updated July 17, 2015, 9:09 a.m.)


Review request for qpid, Andrew Stitcher and Rafael Schloming.


Changes
---

prepend AMQP to spec-mandated default constants.  (I meant to do this before.)  
The names are as long as they need to be for clarity.


Repository: qpid-proton-git


Description
---

PROTON-930: extract numeric default values from AMQP xml at build-time.


Diffs (updated)
-

  proton-c/src/protocol.h.py bbc0dfe 
  proton-c/src/transport/transport.c 7bce3b5 

Diff: https://reviews.apache.org/r/36546/diff/


Testing (updated)
---

build
ctest -VV

This code creates the following new code near the top of protocol.h

/* Numeric default values */
#define AMQP_OPEN_MAX_FRAME_SIZE_DEFAULT 4294967295
#define AMQP_OPEN_CHANNEL_MAX_DEFAULT 65535
#define AMQP_BEGIN_HANDLE_MAX_DEFAULT 4294967295
#define AMQP_SOURCE_TIMEOUT_DEFAULT 0
#define AMQP_TARGET_TIMEOUT_DEFAULT 0


Thanks,

michael goulish



Re: Review Request 35798: PROTON-919: make the C impl behave same as Java wrt channel_max error

2015-07-17 Thread michael goulish

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35798/
---

(Updated July 17, 2015, 4:10 p.m.)


Review request for qpid, Andrew Stitcher and Kenneth Giusti.


Changes
---

Last time I apparently uploaded a diff here before I finished making changes in 
the code.   This diff represents what I actually chacked in.


Repository: qpid-proton-git


Description
---

Alter C impl to return error code if we attempt to change local channel_max 
after OPEN frame has been sent.   Also alter python binding to detect that 
error code, and throw exception.  This way, the C and Java versions of one of 
the test of channel_max functionality have the same behavior.


Diffs (updated)
-

  proton-c/bindings/python/proton/__init__.py d5dcceb 
  proton-c/include/proton/error.h 2ed2f31 
  proton-c/include/proton/transport.h 483f5a9 
  proton-c/src/transport/transport.c 7bce3b5 
  tests/python/proton_tests/engine.py c18683f 

Diff: https://reviews.apache.org/r/35798/diff/


Testing
---

ctest -VV   ---  C and Java

Please note:  This Jira changes the public interface in that it adds #define 
PN_OK 0 to the list of possible error return values in error.h


Thanks,

michael goulish



  1   2   >