subject:"\[jira\] \[Comment Edited\] \(DISPATCH\-957\) Unbalanced memory consumption in a 2 routers configuration and specific workload"

[jira] [Comment Edited] (DISPATCH-957) Unbalanced memory consumption in a 2 routers configuration and specific workload

2018-04-18 Thread Ken Giusti (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443122#comment-16443122
 ] 

Ken Giusti edited comment on DISPATCH-957 at 4/18/18 8:08 PM:
--

Thanks for the logs, they are quite revealing.

I don't think this is a router bug.  I think it's a client issue.

Taking a look at router1.log I see the following:

Periodically a client connects every 30 seconds, but seems to clean up its 
links.  I'm confident that's your status collector.   Ignoring that I see:

1) 19:59:23ish - Servers begin to connection.  4 links are created for each 
connection - this is expected for RPC servers
2) 19:58:08ish - Servers done connecting
3) 19:59:23ish - a "link storm" hits - a couple thousand links are established 
in less than 2 seconds.  
4) 20:01:07 - links storm over.  All links remain up for the rest of the log.

So here's what I think is happening:

After the servers finish connecting the test starts the RPC clients start 
making RPC calls.  Each client has created a read link for receiving the RPC 
reply.  Obviously each client has its own unique reply-to address.

Once the server receives and processes a request it then sends a reply to the 
reply-to address.  This means that the server is going to create a unique 
reply-to link for each client it received a request from.  That's a where all 
the links are coming from.

IIRC, the oslo.messaging driver has a periodic task that expires these reply-to 
links after they've been idle for > 600 seconds.  Once the test is over would 
it be possible to not disturb the servers for > 600 seconds, then get a qdstat 
-a from router1?   I suspect the number of in-use qdr_link_t's will drop after 
these links are cleaned up.



was (Author: kgiusti):
Thanks for the logs, they are quite revealing.

I don't think this is a router bug.  I think it's a client issue.

Taking a look at router1.log I see the following:

Periodically a client connects every 30 seconds, but seems to clean up its 
links.  I'm confident that's your status collector.   Ignoring that I see:

1) 19:59:23ish - Servers begin to connection.  4 links are created for each 
connection - this is expected for RPC servers
2) 19:58:08ish - Servers done connecting
3) 19:59:23ish - a "link storm" hits - a couple thousand links are established 
in less than 2 seconds.  
4) 20:01:07 - links storm over.  All links remain up for the rest of the log.

So here's what I think is happening:

After the servers finish connecting the test starts are the RPC clients start 
making RPC calls.  Each client has created a read link for receiving the RPC 
reply.  Obviously each client has its own unique reply-to address.

Once the server receives and processes a request it then sends a reply to the 
reply-to address.  This means that the server is going to create a unique 
reply-to link for each client it received a request from.  That's a where all 
the links are coming from.

IIRC, the oslo.messaging driver has a periodic task that expires these reply-to 
links after they've been idle for > 600 seconds.  Once the test is over would 
it be possible to not disturb the servers for > 600 seconds, then get a qdstat 
-a from router1?   I suspect the number of in-use qdr_link_t's will drop after 
these links are cleaned up.


> Unbalanced memory consumption in a 2 routers configuration and specific 
> workload
> 
>
> Key: DISPATCH-957
> URL: https://issues.apache.org/jira/browse/DISPATCH-957
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.0.1
> Environment: * At the time I was experimenting, I built the router 
> from the source and
> used 22400df dockerized. It's available in docker hub : 
> msimonin/qdrouterd:22400df or msimonin/qdrouterd-collectd:22400f 
>  
>  * ombt version used embeds the following library
> oslo.messaging==5.35.0
> pyngus==2.2.2
> python-qpid-proton==0.19.0
>  
>  * I used ombt-orchestrator to deploy all the stack using the g5k provider 
> (see
> https://github.com/msimonin/ombt-orchestrator/) In a local machine setup,
> vagrant provider can be used but I'm not sure if it is reasonnable to scale 
> the
> above number of agents. I've attached nevertheless the configuration used.
>  
>  * Host Linux distribution is debian9
>  
>  
>Reporter: Matthieu Simonin
>Assignee: Ken Giusti
>Priority: Major
> Attachments: call.png, cast.png, conf.yaml, conf.yaml, inc-calls.png, 
> mem_usage.tar.gz, router0.log, router1.log
>
>
> After discussion with Ken Giusti we deem appropriate to fill a bug to track 
> the
> following behavior.
> Note also that the exact version used in the following description isn't 
> exactly 1.1.0 but one built from so

[jira] [Comment Edited] (DISPATCH-957) Unbalanced memory consumption in a 2 routers configuration and specific workload

2018-04-13 Thread Matthieu Simonin (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437863#comment-16437863
 ] 

Matthieu Simonin edited comment on DISPATCH-957 at 4/13/18 8:53 PM:


In attachment the log files of the two qdrouterd.
 * router0 is where clients are connected
 * router1 is where servers are connected

I ran (conf in attachment):
 * {{oo deploy --driver=router g5k}}
 * {{oo test_case_1 --nbr_clients=100 --nbr_servers=100 --nbr_calls=1000 
--pause=0.1 --timeout 200}}

 

edit: I forgot to mention that I observed the same behaviour (increased memory 
consumption on router1)


was (Author: msimonin):
In attachment the log files of the two qdrouterd.
 * router0 is where clients are connected
 * router1 is where servers are connected

I ran (conf in attachment):
 * {{oo deploy --driver=router g5k}}
 * {{oo test_case_1 --nbr_clients=100 --nbr_servers=100 --nbr_calls=1000 
--pause=0.1 --timeout 200}}

 

> Unbalanced memory consumption in a 2 routers configuration and specific 
> workload
> 
>
> Key: DISPATCH-957
> URL: https://issues.apache.org/jira/browse/DISPATCH-957
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.0.1
> Environment: * At the time I was experimenting, I built the router 
> from the source and
> used 22400df dockerized. It's available in docker hub : 
> msimonin/qdrouterd:22400df or msimonin/qdrouterd-collectd:22400f 
>  
>  * ombt version used embeds the following library
> oslo.messaging==5.35.0
> pyngus==2.2.2
> python-qpid-proton==0.19.0
>  
>  * I used ombt-orchestrator to deploy all the stack using the g5k provider 
> (see
> https://github.com/msimonin/ombt-orchestrator/) In a local machine setup,
> vagrant provider can be used but I'm not sure if it is reasonnable to scale 
> the
> above number of agents. I've attached nevertheless the configuration used.
>  
>  * Host Linux distribution is debian9
>  
>  
>Reporter: Matthieu Simonin
>Assignee: Ken Giusti
>Priority: Major
> Attachments: call.png, cast.png, conf.yaml, conf.yaml, inc-calls.png, 
> mem_usage.tar.gz, router0.log, router1.log
>
>
> After discussion with Ken Giusti we deem appropriate to fill a bug to track 
> the
> following behavior.
> Note also that the exact version used in the following description isn't 
> exactly 1.1.0 but one built from source 22400df (master back in february).
> I started two interconnected routers (router0 and router1)
> router0 is where all my consumers connect router1 is where all my producers
> connect .
> The workload is an RPC test using oslo.messaging library using calls
> (resp. casts) : clients keep sending message and block for the response 
> (resp. do not block).
> I've attached some observations:
> 1)
> With 100 consumers and 100 producers and calls I observe a higher memory
> consumption of router0 in comparison of the memory consumption of router1 (see
> calls.png). Casts seem to less affect the router memory. Calls usually 
> requires
> more ressources because of the return values flowing back to the producer but
> I wouldn't expect this big difference.
> I've attached a tgz in which, you'll find the results of qdtsat -a,-m,-l
> * before the benchmark (start)
> * early during the benchmark (during)
> * late during the benchmark (during-1)
> * after the benchmark completed (after)
> 2) I've run a second test increasing incrementaly the (#clients, #servers) : 
> [50, 100, 200, 500] (calls only)
> see inc-calls.png
> in this case the difference of the memory consumption between router0 and 
> router1 is [50MB, 100MB, 300MB, 1.5GB]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

[jira] [Comment Edited] (DISPATCH-957) Unbalanced memory consumption in a 2 routers configuration and specific workload

[jira] [Comment Edited] (DISPATCH-957) Unbalanced memory consumption in a 2 routers configuration and specific workload

2 matches

Site Navigation

Mail list logo

Footer information