[ https://issues.apache.org/jira/browse/MESOS-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161297#comment-14161297 ]
Benjamin Mahler commented on MESOS-1862: ---------------------------------------- https://reviews.apache.org/r/26392/ > Performance regression in the Master's http metrics. > ---------------------------------------------------- > > Key: MESOS-1862 > URL: https://issues.apache.org/jira/browse/MESOS-1862 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 0.21.0 > Reporter: Benjamin Mahler > Assignee: Benjamin Mahler > Priority: Blocker > > As part of the change to hold on to terminal unacknowledged tasks in the > master, we introduced a performance regression during the following patch: > https://github.com/apache/mesos/commit/0760b007ad65bc91e8cea377339978c78d36d247 > {noformat} > commit 0760b007ad65bc91e8cea377339978c78d36d247 > Author: Benjamin Mahler <bmah...@twitter.com> > Date: Thu Sep 11 10:48:20 2014 -0700 > Minor cleanups to the Master code. > Review: https://reviews.apache.org/r/25566 > {noformat} > Rather than keeping a running count of allocated resources, we now compute > resources on-demand. This was done in order to ignore terminal task's > resources. > As a result of this change, the /stats.json and /metrics/snapshot endpoints > on the master have slowed down substantially on large clusters. > {noformat} > $ time curl localhost:5050/health > real 0m0.004s > user 0m0.001s > sys 0m0.002s > $ time curl localhost:5050/stats.json > /dev/null > real 0m15.402s > user 0m0.001s > sys 0m0.003s > $ time curl localhost:5050/metrics/snapshot > /dev/null > real 0m6.059s > user 0m0.002s > sys 0m0.002s > {noformat} > {{perf top}} reveals some of the resource computation during a request to > stats.json: > {noformat: perf top} > Events: 36K cycles > 10.53% libc-2.5.so [.] _int_free > 9.90% libc-2.5.so [.] malloc > 8.56% libmesos-0.21.0.so [.] std::_Rb_tree<process::ProcessBase*, > process::ProcessBase*, std::_Identity<process::ProcessBase*>, > std::less<process::ProcessBase*>, std::allocator<process::ProcessBase*> >:: > 8.23% libc-2.5.so [.] _int_malloc > 5.80% libstdc++.so.6.0.8 [.] > std::_Rb_tree_increment(std::_Rb_tree_node_base*) > 5.33% [kernel] [k] _raw_spin_lock > 3.13% libstdc++.so.6.0.8 [.] std::string::assign(std::string const&) > 2.95% libmesos-0.21.0.so [.] > process::SocketManager::exited(process::ProcessBase*) > 2.43% libmesos-0.21.0.so [.] mesos::Resource::MergeFrom(mesos::Resource > const&) > 1.88% libmesos-0.21.0.so [.] mesos::internal::master::Slave::used() const > 1.48% libstdc++.so.6.0.8 [.] __gnu_cxx::__atomic_add(int volatile*, > int) > 1.45% [kernel] [k] find_busiest_group > 1.41% libc-2.5.so [.] free > 1.38% libmesos-0.21.0.so [.] > mesos::Value_Range::MergeFrom(mesos::Value_Range const&) > 1.13% libmesos-0.21.0.so [.] > mesos::Value_Scalar::MergeFrom(mesos::Value_Scalar const&) > 1.12% libmesos-0.21.0.so [.] mesos::Resource::SharedDtor() > 1.07% libstdc++.so.6.0.8 [.] __gnu_cxx::__exchange_and_add(int > volatile*, int) > 0.94% libmesos-0.21.0.so [.] > google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet > const&) > 0.92% libstdc++.so.6.0.8 [.] operator new(unsigned long) > 0.88% libmesos-0.21.0.so [.] > mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&) > 0.75% libmesos-0.21.0.so [.] mesos::matches(mesos::Resource const&, > mesos::Resource const&) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)