[ https://issues.apache.org/jira/browse/MESOS-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler updated MESOS-391: ---------------------------------- Labels: twitter (was: newbie twitter) > Slave GarbageCollector needs to also take into account the number of links, > when determining removal time. > ---------------------------------------------------------------------------------------------------------- > > Key: MESOS-391 > URL: https://issues.apache.org/jira/browse/MESOS-391 > Project: Mesos > Issue Type: Bug > Reporter: Benjamin Mahler > Assignee: Ritwik Yadav > Labels: twitter > > The slave garbage collector does not take into account the number of links > present, which means that if we create a lot of executor directories (up to > LINK_MAX), we won't necessarily GC. > As a result of this, the slave crashes: > F0313 21:40:02.926494 33746 paths.hpp:233] CHECK_SOME(mkdir) failed: Failed > to create executor directory > '/var/lib/mesos/slaves/201303090208-1937777162-5050-38880-267/frameworks/201103282247-0000000019-0000/executors/thermos-1363210801777-mesos-meta_slave_0-27-e74e4b30-dcf1-4e88-8954-dd2b40b7dd89/runs/499fcc13-c391-421c-93d2-a56d1a4a931e': > Too many links > *** Check failure stack trace: *** > @ 0x7f9320f82f9d google::LogMessage::Fail() > @ 0x7f9320f88c07 google::LogMessage::SendToLog() > @ 0x7f9320f8484c google::LogMessage::Flush() > @ 0x7f9320f84ab6 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f9320c70312 _CheckSome::~_CheckSome() > @ 0x7f9320c9dd5c > mesos::internal::slave::paths::createExecutorDirectory() > @ 0x7f9320c9e60d mesos::internal::slave::Framework::createExecutor() > @ 0x7f9320c7a7f7 mesos::internal::slave::Slave::runTask() > @ 0x7f9320c9cb43 ProtobufProcess<>::handler4<>() > @ 0x7f9320c8678b std::tr1::_Function_handler<>::_M_invoke() > @ 0x7f9320c9d1ab ProtobufProcess<>::visit() > @ 0x7f9320e4c774 process::MessageEvent::visit() > @ 0x7f9320e40a1d process::ProcessManager::resume() > @ 0x7f9320e41268 process::schedule() > @ 0x7f932055973d start_thread > @ 0x7f931ef3df6d clone > The fix here is to take into account the number of links (st_nlinks), when > determining whether we need to GC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)