[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15133914#comment-15133914
 ] 

Tymofii commented on MESOS-3307:
--------------------------------

Yes, it generates JSON much faster now, but we still having lots and lots 
completed tasks and frameworks there, which we don't care about for service 
discovery, but want to keep them for history.
Wouldn't it be great to have some basic filtering for /state endpoint to get 
only active tasks/frameworks, only tasks or particular framework, only slaves 
information etc.?
/state-summary endpoint introduced recently doesn't fit service discovery 
requirements.

> Configurable size of completed task / framework history
> -------------------------------------------------------
>
>                 Key: MESOS-3307
>                 URL: https://issues.apache.org/jira/browse/MESOS-3307
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Ian Babrou
>            Assignee: Kevin Klues
>              Labels: mesosphere
>             Fix For: 0.27.0
>
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>    1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-0000"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 100000;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>        1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>       16      37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 10000s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to