[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001956#comment-15001956
 ] 

Felix Bechstein commented on MESOS-3307:
----------------------------------------

I'm watching that issue to. I still feel, that we need a flag to limit the 
state size.
We are currently running ~30 frameworks + a lot of spark batch jobs registering 
new frameworks every now and then. This leads to ~50k completed tasks in the 
state resulting in a 10MB state.json. Nobody ever reads these very old 
completed tasks. So why hold them in memory or ship them every few seconds.

Keep in mind, that web browsers need to parse that json every few seconds to 
update the UI.

> Configurable size of completed task / framework history
> -------------------------------------------------------
>
>                 Key: MESOS-3307
>                 URL: https://issues.apache.org/jira/browse/MESOS-3307
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Ian Babrou
>
> We try to make Mesos work with multiple frameworks and mesos-dns at the same 
> time. The goal is to have set of frameworks per team / project on a single 
> Mesos cluster.
> At this point our mesos state.json is at 4mb and it takes a while to 
> assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
> pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
> Here's the problem:
> {noformat}
> mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
> .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
>    1 "20150606-001827-252388362-5050-5982-0003"
>   16 "20150606-001827-252388362-5050-5982-0005"
>   18 "20150606-001827-252388362-5050-5982-0029"
>   73 "20150606-001827-252388362-5050-5982-0007"
>  141 "20150606-001827-252388362-5050-5982-0009"
>  154 "20150820-154817-302720010-5050-15320-0000"
>  289 "20150606-001827-252388362-5050-5982-0004"
>  510 "20150606-001827-252388362-5050-5982-0012"
>  666 "20150606-001827-252388362-5050-5982-0028"
>  923 "20150116-002612-269165578-5050-32204-0003"
> 1000 "20150606-001827-252388362-5050-5982-0001"
> 1000 "20150606-001827-252388362-5050-5982-0006"
> 1000 "20150606-001827-252388362-5050-5982-0010"
> 1000 "20150606-001827-252388362-5050-5982-0011"
> 1000 "20150606-001827-252388362-5050-5982-0027"
> mesos λ fgrep 1000 -r src/master
> src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 100000;
> src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
> 1000;
> {noformat}
> Active tasks are just 6% of state.json response:
> {noformat}
> mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
>        1   14796 4138942
> mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
>       16      37  252774
> {noformat}
> I see four options that can improve the situation:
> 1. Add query string param to exclude completed tasks from state.json and use 
> it in mesos-dns and similar tools. There is no need for mesos-dns to know 
> about completed tasks, it's just extra load on master and mesos-dns.
> 2. Make history size configurable.
> 3. Make JSON serialization faster. With 10000s of tasks even without history 
> it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
> 60 seconds instead of every 5 seconds isn't really an option.
> 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
> have it in Mesos. This way mesos-dns could avoid polling master state and 
> switch to listening for events.
> All can be done independently.
> Note to mesosphere folks: please start distributing debug symbols with your 
> distribution. I was asking for it for a while and it is really helpful: 
> https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
> Perf report for leading master: 
> !http://i.imgur.com/iz7C3o0.png!
> I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to