[ https://issues.apache.org/jira/browse/MESOS-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781090#comment-16781090 ]
Meng Zhu commented on MESOS-8345: --------------------------------- Uploaded three perf traces: During a scale test, we tested the scalability of concurrent subscribers. Three perf traces are collected: mesos-master_ui_p1.stacks.gz // baseline, with a few (one?) subscribers mesos-master_ui_p10.stacks.gz // added 10 more subscribers mesos-master_ui_p19.stacks.gz // add 19 subscribers At 19, slowness and timeouts are reported: "having a tough time getting a response from /mesos/api/v1?subscribe" "73s/100MB response time, now getting timeouts (504)" > Improve master responsiveness while serving state information. > -------------------------------------------------------------- > > Key: MESOS-8345 > URL: https://issues.apache.org/jira/browse/MESOS-8345 > Project: Mesos > Issue Type: Epic > Components: HTTP API, master > Reporter: Benjamin Mahler > Assignee: Alexander Rukletsov > Priority: Major > Labels: mesosphere, performance > Attachments: mesos-master_ui_p1.stacks.gz, > mesos-master_ui_p10.stacks.gz, mesos-master_ui_p19.stacks.gz > > > Currently when state is requested from the master, the response is built > using the master actor. This means that when the master is building an > expensive state response, the master is locked and cannot process other > events. This in turn can lead to higher latency on further requests to state. > Previous performance improvements to JSON generation (MESOS-4235) alleviated > this issue, but for large cluster with a lot of clients this can still be a > problem. > It's possible to serve state outside of the master actor by streaming the > state (re-using the existing streaming operator API) into another actor(s) > and serving from there. > NOTE: I believe this approach will incur a small performance cost during > master failover, since the master has to perform an additional copy of state > that it fans out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)