[ 
https://issues.apache.org/jira/browse/PROTON-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508786#comment-13508786
 ] 

Gordon Sim edited comment on PROTON-157 at 12/3/12 3:05 PM:
------------------------------------------------------------

I believe I now understand how this comes about. The delivery buffer is filled 
up with outgoing transfers. However, as credit is added by the peer, more 
deliveries can be queued up in the tpwork queue. Then a disposition comes in 
for a portion of the previously sent messages. This results in the broker 
settling those transfers which causes those deliveries to be added back onto 
the tpwork queue, behind the deliveries yet to be sent. Another delivery may 
now be added on to the tp work queue. If the pwork queue is then processed, it 
goes through the first batch of unset deliveries, but there is still no space 
in the delivery bufer, so these are left as is. It then processes the batch of 
settled messages, cleans up the delivery buffer and removes them from tpwork. 
Now finally it gets to the next unsent delivery, added after these. Since the 
buffer has now been cleaned, this gets assigned the next delivery id. However 
there are transfers ahead of it in the queue that have *not* yet had a delivery 
state assigned. The next time tp work is processed, these will get delivery 
state assigned with the delivery id out of sync.

A suggested fix (that works for my case and passes tests) is to detect when we 
have removed deliveries from tp work, and start processing from the head. This 
may not be optimal in all cases of course, but it is a simple change:

{noformat}
Index: proton-c/src/engine/engine.c
===================================================================
--- proton-c/src/engine/engine.c        (revision 1416552)
+++ proton-c/src/engine/engine.c        (working copy)
@@ -2219,7 +2219,11 @@
         pn_clear_tpwork(delivery);
       }
 
-      delivery = delivery->tpwork_next;
+      if (delivery->tpwork) {
+        delivery = delivery->tpwork_next;
+      } else {
+        delivery = conn->tpwork_head;
+      }
     }
   }
{noformat}
                
      was (Author: gsim):
    I believe I now understand how this comes about. The delivery buffer is 
filled up with outgoing transfers. However, as credit is added by the peer, 
more deliveries can be queued up in the tpwork queue. Then a disposition comes 
in for a portion of the previously sent messages. This results in the broker 
settling those transfers which causes those deliveries to be added back onto 
the tpwork queue, behind the deliveries yet to be sent. Another delivery may 
now be added on to the tp work queue. If the pwork queue is then processed, it 
goes through the first batch of unset deliveries, but there is still no space 
in the delivery bufer, so these are left as is. It then processes the batch of 
settled messages, cleans up the delivery buffer and removes them from tpwork. 
Now finally it gets to the next unsent delivery, added after these. Since the 
buffer has now been cleaned, this gets assigned the next delivery id. However 
there are transfers ahead of it in the queue that have *not* yet had a delivery 
state assigned. The next time tp work is processed, these will get delivery 
state assigned with the delivery id out of sync.

A suggested fix (that works for my case and passes tests) is to detect when we 
have removed deliveries from tp work, and start processing from the head. This 
may not be optimal in all cases of course, but it is a simple change:

{noformat}
Index: proton-c/src/engine/engine.c
===================================================================
--- proton-c/src/engine/engine.c        (revision 1416552)
+++ proton-c/src/engine/engine.c        (working copy)
@@ -2219,7 +2219,11 @@
         pn_clear_tpwork(delivery);
       }
 
-      delivery = delivery->tpwork_next;
+      if (delivery->tpwork) {
+        delivery = delivery->tpwork_next;
+      } else {
+        delivery = conn->tpwork_head;
+      }
     }
   }
{noformat}
                  
> invalid delivery-id sent(?)
> ---------------------------
>
>                 Key: PROTON-157
>                 URL: https://issues.apache.org/jira/browse/PROTON-157
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-c
>    Affects Versions: 0.2
>            Reporter: Gordon Sim
>
> E.g. the following trace is from the client side (the peer was qpidd) and 
> appears to show a delivery-id being skipped:
> [0xb2a800:1] <- TRANSFER @20 [1, 1520, b"\xf0\x05\x00\x00\x00\x00\x00\x00", 
> 0, false, false] (177) 
> "\x00Sp\xc0\x08\x05BP\x00@@R\x01\x00Ss\xd0\x00\x00\x00\x11\x00\x00\x00\x0d@@@@@@@@@@@@@\x00St\xd1\x00\x00\x00\x1a\x00\x00\x00\x04\xa1\x02snp\x00\x00\x16\xbd\xa1\x02ts\x80\x12\xc8\xfb\xe4\xdb\xd1\xc5\xdf\x00Su\xa0dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> [0xb2a800:1] <- TRANSFER @20 [1, 1521, b"\xf1\x05\x00\x00\x00\x00\x00\x00", 
> 0, false, false] (177) 
> "\x00Sp\xc0\x08\x05BP\x00@@R\x01\x00Ss\xd0\x00\x00\x00\x11\x00\x00\x00\x0d@@@@@@@@@@@@@\x00St\xd1\x00\x00\x00\x1a\x00\x00\x00\x04\xa1\x02snp\x00\x00\x16\xbe\xa1\x02ts\x80\x12\xc8\xfb\xe4\xdb\xd1\xe8c\x00Su\xa0dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> [0xb2a800:1] <- TRANSFER @20 [1, 1522, b"\xf2\x05\x00\x00\x00\x00\x00\x00", 
> 0, false, false] (177) 
> "\x00Sp\xc0\x08\x05BP\x00@@R\x01\x00Ss\xd0\x00\x00\x00\x11\x00\x00\x00\x0d@@@@@@@@@@@@@\x00St\xd1\x00\x00\x00\x1a\x00\x00\x00\x04\xa1\x02snp\x00\x00\x16\xbf\xa1\x02ts\x80\x12\xc8\xfb\xe4\xdb\xd2\x8b\x95\x00Su\xa0dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> [0xb2a800:1] <- TRANSFER @20 [1, 1523, b"\xf3\x05\x00\x00\x00\x00\x00\x00", 
> 0, false, false] (177) 
> "\x00Sp\xc0\x08\x05BP\x00@@R\x01\x00Ss\xd0\x00\x00\x00\x11\x00\x00\x00\x0d@@@@@@@@@@@@@\x00St\xd1\x00\x00\x00\x1a\x00\x00\x00\x04\xa1\x02snp\x00\x00\x16\xc0\xa1\x02ts\x80\x12\xc8\xfb\xe4\xdb\xd2\xa8\xa5\x00Su\xa0dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> [0xb2a800:1] <- TRANSFER @20 [1, 1525, b"\xf4\x05\x00\x00\x00\x00\x00\x00", 
> 0, false, false] (177) 
> "\x00Sp\xc0\x08\x05BP\x00@@R\x01\x00Ss\xd0\x00\x00\x00\x11\x00\x00\x00\x0d@@@@@@@@@@@@@\x00St\xd1\x00\x00\x00\x1a\x00\x00\x00\x04\xa1\x02snp\x00\x00\x16\xc1\xa1\x02ts\x80\x12\xc8\xfb\xe4\xdb\xd2\xc01\x00Su\xa0dXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> [0xb2a800:0] -> CLOSE @24 [@29 [:"amqp:session:invalid-field"]]
> ERROR amqp:session:invalid-field sequencing error, expected delivery-id 1524, 
> got 1525
> It is of course possible that I am doing something wrong in the broker code, 
> but I can't think what would cause something like this (since I can't 
> influence the delivery ids directly). It only seems to happen when there is a 
> prefetch greater than 900.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to