Benjamin Mahler created MESOS-1862:
--------------------------------------

             Summary: Performance regression in the Master's http metrics.
                 Key: MESOS-1862
                 URL: https://issues.apache.org/jira/browse/MESOS-1862
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 0.21.0
            Reporter: Benjamin Mahler
            Assignee: Benjamin Mahler
            Priority: Blocker


As part of the change to hold on to terminal unacknowledged tasks in the 
master, we introduced a performance regression during the following patch:

https://github.com/apache/mesos/commit/0760b007ad65bc91e8cea377339978c78d36d247
{noformat}
commit 0760b007ad65bc91e8cea377339978c78d36d247
Author: Benjamin Mahler <bmah...@twitter.com>
Date:   Thu Sep 11 10:48:20 2014 -0700

    Minor cleanups to the Master code.

    Review: https://reviews.apache.org/r/25566
{noformat}

Rather than keeping a running count of allocated resources, we now compute 
resources on-demand. This was done in order to ignore terminal task's resources.

As a result of this change, the /stats.json and /metrics/snapshot endpoints on 
the master have slowed down substantially on large clusters.

{noformat}
$ time curl localhost:5050/health
real    0m0.004s
user    0m0.001s
sys     0m0.002s

$ time curl localhost:5050/stats.json > /dev/null
real    0m15.402s
user    0m0.001s
sys     0m0.003s

$ time curl localhost:5050/metrics/snapshot > /dev/null
real    0m6.059s
user    0m0.002s
sys     0m0.002s
{noformat}

{{perf top}} reveals some of the resource computation during a request to 
stats.json:
{noformat: perf top}
Events: 36K cycles
 10.53%  libc-2.5.so             [.] _int_free
  9.90%  libc-2.5.so             [.] malloc
  8.56%  libmesos-0.21.0.so  [.] std::_Rb_tree<process::ProcessBase*, 
process::ProcessBase*, std::_Identity<process::ProcessBase*>, 
std::less<process::ProcessBase*>, std::allocator<process::ProcessBase*> >::
  8.23%  libc-2.5.so             [.] _int_malloc
  5.80%  libstdc++.so.6.0.8      [.] 
std::_Rb_tree_increment(std::_Rb_tree_node_base*)
  5.33%  [kernel]                [k] _raw_spin_lock
  3.13%  libstdc++.so.6.0.8      [.] std::string::assign(std::string const&)
  2.95%  libmesos-0.21.0.so  [.] 
process::SocketManager::exited(process::ProcessBase*)
  2.43%  libmesos-0.21.0.so  [.] mesos::Resource::MergeFrom(mesos::Resource 
const&)
  1.88%  libmesos-0.21.0.so  [.] mesos::internal::master::Slave::used() const
  1.48%  libstdc++.so.6.0.8      [.] __gnu_cxx::__atomic_add(int volatile*, int)
  1.45%  [kernel]                [k] find_busiest_group
  1.41%  libc-2.5.so             [.] free
  1.38%  libmesos-0.21.0.so  [.] 
mesos::Value_Range::MergeFrom(mesos::Value_Range const&)
  1.13%  libmesos-0.21.0.so  [.] 
mesos::Value_Scalar::MergeFrom(mesos::Value_Scalar const&)
  1.12%  libmesos-0.21.0.so  [.] mesos::Resource::SharedDtor()
  1.07%  libstdc++.so.6.0.8      [.] __gnu_cxx::__exchange_and_add(int 
volatile*, int)
  0.94%  libmesos-0.21.0.so  [.] 
google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet 
const&)
  0.92%  libstdc++.so.6.0.8      [.] operator new(unsigned long)
  0.88%  libmesos-0.21.0.so  [.] 
mesos::Value_Ranges::MergeFrom(mesos::Value_Ranges const&)
  0.75%  libmesos-0.21.0.so  [.] mesos::matches(mesos::Resource const&, 
mesos::Resource const&)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to