[jira] [Updated] (DISPATCH-1019) Messaging instability in networks with many clients / addresses.

2018-05-29 Thread Marcel Meulemans (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Meulemans updated DISPATCH-1019:
---
Description: 
After DISPATCH-966 has been fixed I am still experiencing problems in a network 
with many clients / addresses. I am running a three node fully connected mesh 
of dispatch routers with 1 attached clients all with two address messaging 
at around 100 msg/sec.

In the logs I am seeing the following errors:

{{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient 
Data to Determine Tag}}
 {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't 
convert message field body}}

Which, in turn, lead to python errors like:

{{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 
area=0 mobile_seq=0)}}
{{2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message 
processing}}
{{ Traceback (most recent call last):}}
{{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, 
in handleControlMessage}}
{{ self.mobile_address_engine.handle_mau(msg, now)}}
{{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, 
in handle_mau}}
{{ node = self.node_tracker.router_node(msg.id)}}
{{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, 
in router_node}}
{{ return self.nodes[node_id]}}
{{ KeyError: None}}
{{2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: 
opcode=MAU body=None}}

I have tracked down the cause of the "Insufficient Data to Determine Tag" 
message to the following: During the call to {{qd_parse}} of the {{MAU}} 
message the {{qd_iterator_t}} reaches the end of the buffer list before it 
should. Specifically the call to {{qd_iterator_advance}} here 
[https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails; 
to move forward a certain number of bytes (e.g. 31) even though the 
{{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 
84802). The "fail" is because we reach the end of the buffer list before we 
should (here 
[https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]).

What I have not been able to figure out yet is why this happens because it is 
not consistent. Many large MAU message are parsed correctly only sometimes not. 
I am able to reproduce these errors every time I run my tests. There may be a 
time component involved because the more logging I to the router code, the less 
often the errors seem to occur.

  was:
After DISPATCH-966 has been fixed I am still experiencing problems in a network 
with many clients / addresses. I am running a three node fully connected mesh 
of dispatch routers with 1 attached clients all with two address messaging 
at around 100 msg/sec.

In the logs I am seeing the following errors:

{{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient 
Data to Determine Tag}}
 {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't 
convert message field body}}

Which, in turn, lead to python errors like:

{{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 
area=0 mobile_seq=0)}}
{{ 2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message 
processing}}
{{ Traceback (most recent call last):}}
{{File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, 
in handleControlMessage}}
{{self.mobile_address_engine.handle_mau(msg, now)}}
{{File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, 
in handle_mau}}
{{node = self.node_tracker.router_node(msg.id)}}
{{File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in 
router_node}}
{{return self.nodes[node_id]}}
{{ KeyError: None}}
{{ 2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: 
opcode=MAU body=None}}

I have tracked down the cause of the "Insufficient Data to Determine Tag" 
message to the following: During the call to {{qd_parse}} of the {{MAU}} 
message the {{qd_iterator_t}} reaches the end of the buffer list before it 
should. Specifically the call to {{qd_iterator_advance}} here 
[https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails; 
to move forward a certain number of bytes (e.g. 31) even though the 
{{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 
84802). The "fail" is because we reach the end of the buffer list before we 
should (here 
[https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]).

What I have not been able to figure out yet is why this happens because it is 
not consistent. Many large MAU message are parsed correctly only sometimes not. 
I am able to reproduce these errors every 

[jira] [Updated] (DISPATCH-1019) Messaging instability in networks with many clients / addresses.

2018-05-29 Thread Marcel Meulemans (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Meulemans updated DISPATCH-1019:
---
Description: 
After DISPATCH-966 has been fixed I am still experiencing problems in a network 
with many clients / addresses. I am running a three node fully connected mesh 
of dispatch routers with 1 attached clients all with two address messaging 
at around 100 msg/sec.

In the logs I am seeing the following errors:

{{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient 
Data to Determine Tag}}
 {{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't 
convert message field body}}

Which, in turn, lead to python errors like:

{{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 
area=0 mobile_seq=0)}}
{{ 2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message 
processing}}
{{ Traceback (most recent call last):}}
{{File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, 
in handleControlMessage}}
{{self.mobile_address_engine.handle_mau(msg, now)}}
{{File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, 
in handle_mau}}
{{node = self.node_tracker.router_node(msg.id)}}
{{File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, in 
router_node}}
{{return self.nodes[node_id]}}
{{ KeyError: None}}
{{ 2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: 
opcode=MAU body=None}}

I have tracked down the cause of the "Insufficient Data to Determine Tag" 
message to the following: During the call to {{qd_parse}} of the {{MAU}} 
message the {{qd_iterator_t}} reaches the end of the buffer list before it 
should. Specifically the call to {{qd_iterator_advance}} here 
[https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails; 
to move forward a certain number of bytes (e.g. 31) even though the 
{{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 
84802). The "fail" is because we reach the end of the buffer list before we 
should (here 
[https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]).

What I have not been able to figure out yet is why this happens because it is 
not consistent. Many large MAU message are parsed correctly only sometimes not. 
I am able to reproduce these errors every time I run my tests. There may be a 
time component involved because the more logging I to the router code, the less 
often the errors seem to occur.

  was:
After DISPATCH-966 has been fixed I am still experiencing problems in a network 
with many clients / addresses. I am running a three node fully connected mesh 
of dispatch routers with 1 attached clients all with two address messaging 
at around 100 msg/sec.

In the logs I am seeing the following errors:

{{2018-05-29 14:31:05.145732 + ERROR (error) Invalid message: Insufficient 
Data to Determine Tag}}
{{2018-05-29 14:31:05.145748 + ERROR (error) Invalid message: Can't convert 
message field body}}

Which, in turn, lead to python errors like:

{{2018-05-29 14:31:05.145971 + ROUTER_MA (trace) RCVD: MAU(id=None pv=1 
area=0 mobile_seq=0)}}
{{2018-05-29 14:31:05.146130 + ROUTER (error) Exception in control message 
processing}}
{{Traceback (most recent call last):}}
{{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/engine.py", line 157, 
in handleControlMessage}}
{{ self.mobile_address_engine.handle_mau(msg, now)}}
{{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/mobile.py", line 97, 
in handle_mau}}
{{ node = self.node_tracker.router_node(msg.id)}}
{{ File "/usr/lib/python2.7/qpid_dispatch_internal/router/node.py", line 363, 
in router_node}}
{{ return self.nodes[node_id]}}
{{KeyError: None}}
{{2018-05-29 14:31:05.146175 + ROUTER (error) Control message error: 
opcode=MAU body=None}}

I have tracked down the cause of the "Insufficient Data to Determine Tag" 
message to the following: During the call to {{qd_parse}} of the {{MAU}} 
message the {{qd_iterator_t}} reaches the end of the buffer list before it 
should. Specifically the call to {{qd_iterator_advance}} here 
[https://github.com/apache/qpid-dispatch/blob/master/src/parse.c#L151,] "fails; 
to move forward a certain number of bytes (e.g. 31) even though the 
{{iterator->view_pointer->remaining}} value has plenty of bytes left (e.g. 
84802). The "fail" is because we reach the end of the buffer list before we 
should (here 
[https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323|https://github.com/apache/qpid-dispatch/blob/master/src/iterator.c#L323).]).

What I have not been able to figure out yet is why this happens because it is 
not consistent. Many large MAU message are parsed correctly only sometimes not. 
I am able to reproduce these errors every