[ 
https://issues.apache.org/jira/browse/MESOS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-7376:
-----------------------------------
    Summary: Reduce copying of the Registry to improve Registrar performance.  
(was: Long registry updates when the number of agents is high)

> Reduce copying of the Registry to improve Registrar performance.
> ----------------------------------------------------------------
>
>                 Key: MESOS-7376
>                 URL: https://issues.apache.org/jira/browse/MESOS-7376
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 1.3.0
>            Reporter: Ilya Pronin
>            Assignee: Ilya Pronin
>            Priority: Critical
>
> During scale testing we discovered that as the number of registered agents 
> grows the time it takes to update the registry grows to unacceptable values 
> very fast. At some point it starts exceeding {{registry_store_timeout}} which 
> doesn't fire.
> With 55k agents we saw this ({{registry_store_timeout=20secs}}):
> {noformat}
> I0331 17:11:21.227442 36472 registrar.cpp:473] Applied 69 operations in 
> 3.138843387secs; attempting to update the registry
> I0331 17:11:24.441409 36464 log.cpp:529] LogStorage.set: acquired the lock in 
> 74461ns
> I0331 17:11:24.441541 36464 log.cpp:543] LogStorage.set: started in 51770ns
> I0331 17:11:26.869323 36462 log.cpp:628] LogStorage.set: wrote append at 
> position=6420881 in 2.41043644secs
> I0331 17:11:26.869454 36462 state.hpp:179] State.store: storage.set has 
> finished in 2.428189561secs (b=1)
> I0331 17:11:56.199453 36469 registrar.cpp:518] Successfully updated the 
> registry in 34.971944192secs
> {noformat}
> This is caused by repeated {{Registry}} copying which involves copying a big 
> object graph that takes roughly 0.4 sec (with 55k agents).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to