[ 
https://issues.apache.org/jira/browse/PROTON-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827294#comment-16827294
 ] 

Chuck Rolke commented on PROTON-2033:
-------------------------------------

Working out the analysis proves that post-0.27 proton did not 'break dispatch 
self tests'.
 * Dispatch had a latent bug and some (presumably) proton timing changes 
exposed that bug.
 * Dispatch self tests were unprepared for the python receiver to get 
on_settled callbacks.

That said, before dismissing this issue a discussion of the timing change and 
the on_settled callbacks is in order.
h3. Test Overview

 For review, how did the test work?
{code:java}
                     +-------------+          +------------+
                     |   INT.A    |          |     EA1     |
  +-----------+   +-------+   +-------+  edge |       +-------+   +-------+
  | receiver |<--+ 21001 |   | 21000 |<--------*     | 21002 |<--+ sender |
  +-----------+   +-------+   +-------+       |       +-------+   +-------+
                     |  interior  |          |    edge     |
                     +-------------+          +------------+
               ^                          ^                     ^
            CONN1                      CONN2                 CONN3
{code}
 
  
 From a single reactor container a proton python sender and receiver connect to 
two routers respectively.
 * The test injects some number of messages at the sender and verifies that 
they are all accepted.
 * Then the receiver is closed (connection stays open) and 50 messages are 
injected by the sender.
 * The expectation is that all the 50 messages are released.

h3. Timing Change

A significant change was measured between when the receiver client called 
receiver.close() and when the corresponding detach was received by INT.A port 
21001.
||Time mS||0.27.x||master||
|run 1|3.2|14.5|
|run 2|3.3|17.3|

These timings are reflected in the self test behavior:
 * The 0.27.x code is "so fast" that it detaches the receiver before any of the 
messages in the 50-message storm get through INT.A. All of the messages are 
released by the router network. No messages were released or modified by the 
receiver.
 * The master branch is "so slow" that one or possibly up to a dozen messages 
are sent to the receiver by INT.A. Now the receiver is dealing with a receiver 
close in the face of incoming messages. Now some messages are seen as 
'modified' by the sender and not just released.

Is there an explanation for why the client response to the receiver.close() is 
so slow compared to 0.27.x?
h3. Receiver on_settled callbacks

These appear when the receiver is closing in the face of incoming messages. 
This may not be a new issue as dispatch self tests never had to deal with them 
before. If it *is* an new issue some consideration must be given for how 
existing clients (like dispatch self tests) must be modified to deal with them.

 

> qpid-proton changes 0.27.x to master(9ff0a) break qpid-dispatch self test
> -------------------------------------------------------------------------
>
>                 Key: PROTON-2033
>                 URL: https://issues.apache.org/jira/browse/PROTON-2033
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c, python-binding
>    Affects Versions: proton-c-0.28.0
>         Environment: Fedora 29, python 3.7, proton and dispatch debug builds
>            Reporter: Chuck Rolke
>            Priority: Major
>
> While cleaning up qpid-dispatch for a 1.6 follow-on release an intermittent 
> failure in one of the tests was exposed. Analysis of the failure is in 
> https://issues.apache.org/jira/browse/DISPATCH-1322 in an attached text 
> document 
> https://issues.apache.org/jira/secure/attachment/12965891/DISPATCH-1322-analysis.txt
> Another part of the cleanup was fixing the failing test to better report what 
> went wrong. See https://issues.apache.org/jira/browse/DISPATCH-1318
> These issues happened using qpid-proton master @9ff0a. Temporarily the 
> qpid-proton version was reverted to branch 0.27.x @560ba. Using proton 0.27.x 
> branch the qpid-dispatch self test does not fail. It passed several thousand 
> times in a loop. Moving proton back to the master branch brought the dispatch 
> failures back.
> From the qpid-dispatch perspective this issue is serious. If a receiver 
> detaches a link while a sender is sending to it then the sender may lose a 
> disposition.
> No attempt yet has been made to bisect proton to see where the different 
> behavior starts showing up. That may be the next best avenue of research.
> Another theory for the error is that qpid-dispatch has been mishandling links 
> all along.
> To help expose the problem one may clone 
> [https://github.com/ChugR/qpid-dispatch] , or add it as a remote, and then 
> check out branch DISPATCH-1318-cwip. Test system_tests_edge_router has been 
> gutted to run only test_12 in a loop and to print details of the error(s).
> {{  cd build}}
> {{  make}}
> {{  ctest -VV -R edge}}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to