[ https://issues.apache.org/jira/browse/MESOS-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238797#comment-14238797 ]
Jie Yu commented on MESOS-2182: ------------------------------- Digging a little more, {code} ProcessReference ProcessManager::use(const UPID& pid) { if (pid.ip == __ip__ && pid.port == __port__) { synchronized (processes) { if (processes.count(pid.id) > 0) { // Note that the ProcessReference constructor _must_ get // called while holding the lock on processes so that waiting // for references is atomic (i.e., race free). return ProcessReference(processes[pid.id]); } } } return ProcessReference(NULL); } {code} That means every time we do `dispatch(self(), ...)`, we will end up contending on lock `processes`. So if there is a really big critical section (like the one in SocketManager::exited(..)), the performance will be hurt pretty badly. > Performance issue in libprocess SocketManager. > ---------------------------------------------- > > Key: MESOS-2182 > URL: https://issues.apache.org/jira/browse/MESOS-2182 > Project: Mesos > Issue Type: Bug > Components: libprocess > Reporter: Benjamin Mahler > Priority: Blocker > > Noticed an issue in production under which the master is slow to respond > after failover for ~15 minutes. > After looking at some perf data, the top offender is: > {noformat} > 12.02% mesos-master libmesos-0.21.0-rc3.so [.] > std::_Rb_tree<process::ProcessBase*, process::ProcessBase*, > std::_Identity<process::ProcessBase*>, std::less<process::ProcessBase*>, > std::allocator<process::ProcessBase*> >::erase(process::ProcessBase* const&) > ... > 3.29% mesos-master libmesos-0.21.0-rc3.so [.] > process::SocketManager::exited(process::ProcessBase*) > {noformat} > It appears that in the SocketManager, whenever an internal Process exits, we > loop over all the links unnecessarily: > {code} > void SocketManager::exited(ProcessBase* process) > { > // An exited event is enough to cause the process to get deleted > // (e.g., by the garbage collector), which means we can't > // dereference process (or even use the address) after we enqueue at > // least one exited event. Thus, we save the process pid. > const UPID pid = process->pid; > // Likewise, we need to save the current time of the process so we > // can update the clocks of linked processes as appropriate. > const Time time = Clock::now(process); > synchronized (this) { > // Iterate through the links, removing any links the process might > // have had and creating exited events for any linked processes. > foreachpair (const UPID& linkee, set<ProcessBase*>& processes, links) { > processes.erase(process); > if (linkee == pid) { > foreach (ProcessBase* linker, processes) { > CHECK(linker != process) << "Process linked with itself"; > synchronized (timeouts) { > if (Clock::paused()) { > Clock::update(linker, time); > } > } > linker->enqueue(new ExitedEvent(linkee)); > } > } > } > links.erase(pid); > } > } > {code} > On clusters with 10,000s of slaves, this means we hold the socket manager > lock for a very expensive loop erasing nothing from a set! This is because, > the master contains links from the Master Process to each slave. However, > when a random ephemeral Process terminates, we don't need to loop over each > slave link. > While we hold this lock, the following calls will block: > {code} > class SocketManager > { > public: > Socket accepted(int s); > void link(ProcessBase* process, const UPID& to); > PID<HttpProxy> proxy(const Socket& socket); > void send(Encoder* encoder, bool persist); > void send(const Response& response, > const Request& request, > const Socket& socket); > void send(Message* message); > Encoder* next(int s); > void close(int s); > void exited(const Node& node); > void exited(ProcessBase* process); > ... > {code} > As a result, the slave observers and the master can block calling send()! > Short term, we will try to fix this issue by removing the unnecessary > looping. Longer term, it would be nice to avoid all this locking when sending > on independent sockets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)