[jira] [Resolved] (DISPATCH-1256) Throughput gets worse as more routers are added

Ken Giusti (Jira) Fri, 11 Oct 2019 07:48:18 -0700


     [ 
https://issues.apache.org/jira/browse/DISPATCH-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ken Giusti resolved DISPATCH-1256.
----------------------------------
    Fix Version/s: 1.9.0
       Resolution: Fixed

We've identified the bulk of the inter-router bottleneck is due to heavy 
processing of Message Annotations on the inter-router links.

Release 1.9.0 contains a number of optimizations to improve inter-router 
forwarding, however further improvements to the MA processing codepath can be 
made.

This work is being tracked in DISPATCH-1389

> Throughput gets worse as more routers are added
> -----------------------------------------------
>
>                 Key: DISPATCH-1256
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1256
>             Project: Qpid Dispatch
>          Issue Type: Bug
>            Reporter: Alan Conway
>            Assignee: Ted Ross
>            Priority: Major
>             Fix For: 1.9.0
>
>         Attachments: reproducer.tgz
>
>
> Creating a chain of routers gives worse throughput as the chain gets longer.
> Reproducer:
> Install quiver from here: https://github.com/alanconway/quiver (extened with 
> --send-url)
> Run the attached script like this:
> qdr-chain.sh quiver --arrow c --count 1000 --send-url //:10001/testme 
> //:10004/testme
> Sets up a chain of 4 routers, each connected to the previous. Waits for 
> qdstat -nv at router 1 to show 4 routers in the topology. Receive messages 
> from address 'testme' at router 4, send to address 'testme' at router 1.
> This runs the routers with enough tracing to run the dispatch scraper tool, 
> which may affect the results. However 1 million message runs without tracing 
> show a drop-off in throughput going from 1m0 (no router) up to 1m4 (4-router 
> chain) - the more routers the worse it gets. This seems consistent with the 
> credit delays seen in the traced case.
> 1m0/out.txt:End-to-end rate ..................................... 296,824 
> messages/s
> 1m0/out.txt: 50% ........ 2 ms 99.90% ........ 6 ms
> 1m1/out.txt:End-to-end rate ..................................... 133,851 
> messages/s
> 1m1/out.txt: 50% ........ 1 ms 99.90% ........ 3 ms
> 1m2/out.txt:End-to-end rate ...................................... 59,011 
> messages/s
> 1m2/out.txt: 50% ........ 2 ms 99.90% ........ 8 ms
> 1m3/out.txt:End-to-end rate ...................................... 42,146 
> messages/s
> 1m3/out.txt: 50% ........ 3 ms 99.90% ....... 11 ms
> 1m4/out.txt:End-to-end rate ...................................... 32,415 
> messages/s
> 1m4/out.txt: 50% ........ 4 ms 99.90% ....... 13 ms
> The output (config, logs and scrapes) from a 1000-message, 4-router tracing 
> run is included, as well as the script to generate new data. Runs of 10k 
> messages are feasible, but the HTML gets large and slow to view. From my 
> experiments the results are similar.
> From file:scrape.html#c_connections the sender connection is starved of 
> credit far more than any other connection: 20x more starvation instances, 40x 
> longer total starvation and 100x longer wait for initial credit.
> Overall the flow of credit is slower than the flow of messages, and each 
> additional router is reducing the throughput of credits, which reduces 
> message throughput.
> Observations:
> - initial credit delay is excessive (100ms in this case when worst-case 
> message latency is 33ms), something is wrong there.
> - inter-router link credit is small (250) compared to application credit 
> (1000). Giving full credit to a sender creates a credit sharing problem,
>  but we could give full credit to the inter-router links.
> - re-evaluate our credit sharing strategy to perform better in the common 
> unshared case.
> Extracts from scrape:
> ◊ 2019-01-23 13:55:16.524477 D0#141 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (0,1000)
> ◊ 2019-01-23 13:55:16.633700 A0#287 1 [A0_6] -> [] quiver-sender-3db6ad3f 
> flow [0,0] (0,250)
> Initial delay 109ms, much worse than worst-case message latency of 33ms
> ◊ 2019-01-23 13:55:16.639758 D0#153 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (3,1000)
> ◊ 2019-01-23 13:55:16.642682 A0#543 1 [A0_6] -> [] quiver-sender-3db6ad3f 
> flow [0,0] (168,85)
> ◊ 2019-01-23 13:55:16.645205 D0#163 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (5,1000)
> ◊ 2019-01-23 13:55:16.645448 D0#167 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (6,1000)
> ◊ 2019-01-23 13:55:16.646510 A0#713 1 [A0_6] <- [] quiver-sender-3db6ad3f 
> transfer [0,0] (252) transfer_281 - 158 bytes no more credit (accepted 
> settled 0.028323 S)
> ◊ 2019-01-23 13:55:16.647142 D0#193 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (7,1000)
> ◊ 2019-01-23 13:55:16.647198 D0#195 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (10,1000)
> ◊ 2019-01-23 13:55:16.647915 D0#214 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (16,1000)
> ◊ 2019-01-23 13:55:16.648060 D0#217 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (25,1000)
> ◊ 2019-01-23 13:55:16.651294 D0#239 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (28,1000)
> ◊ 2019-01-23 13:55:16.652235 D0#257 4 [D0_3] <- [] quiver-receiver-2ebb30e5 
> flow [0,0] (41,1000)
> ◊ 2019-01-23 13:55:16.654294 A0#802 1 [A0_6] -> [] quiver-sender-3db6ad3f 
> flow [0,0] (253,9) credit restored
> Here we see that the receiver has given 9 credit responses in the same time 
> that the front router only gave 2. Either the router is slow to forward 
> messages, or it's not giving back credit quickly when it does (or both) See 
> flow.txt for the complete history of flow events at the sender and receiver, 
> scrape.html for the complete history of everything.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

[jira] [Resolved] (DISPATCH-1256) Throughput gets worse as more routers are added

Reply via email to