Adar Dembo created KUDU-1433:
--------------------------------

             Summary: MaintenanceManager::GetMaintenanceManagerStatusDump can 
crash a server
                 Key: KUDU-1433
                 URL: https://issues.apache.org/jira/browse/KUDU-1433
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 0.5.0
            Reporter: Adar Dembo
            Priority: Blocker
             Fix For: 0.9.0


The tserver Andrew and I have been using for the hackathon crashed when we hit 
the /maintenance-manager URL. The crash:

{noformat}
F0429 19:18:42.514312 35122 maintenance_manager.h:54] Check failed: valid_
*** Check failure stack trace: ***
    @     0x7fab69c2cf4d  google::LogMessage::Fail()
    @     0x7fab69c2ee4d  google::LogMessage::SendToLog()
    @     0x7fab69c2ca89  google::LogMessage::Flush()
    @     0x7fab69c2f8ef  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fab6f8b16e6  
kudu::MaintenanceManager::GetMaintenanceManagerStatusDump()
    @     0x7fab70d56f68  
kudu::tserver::TabletServerPathHandlers::HandleMaintenanceManagerPage()
    @     0x7fab70d57d34  
boost::detail::function::void_function_obj_invoker2<>::invoke()
    @     0x7fab6ffc1cfc  kudu::Webserver::RunPathHandler()
    @     0x7fab6ffc2716  kudu::Webserver::BeginRequestCallback()
    @     0x7fab6ffc28dc  kudu::Webserver::BeginRequestCallbackStatic()
    @     0x7fab6ffce32e  handle_request
    @     0x7fab6ffd0c2e  process_new_connection
    @     0x7fab6ffd12cc  worker_thread
    @     0x7fab6be98aa1  start_thread
    @     0x7fab67e1593d  clone
    @              (nil)  (unknown)
{noformat}

I suspect that we've got at least one op whose UpdateStats() method is not 
calling even one setter on the MaintenanceMgrStats object passed into it, or 
isn't writing cached previous stats into the passed-in object. LogGC, 
FlushDeltaMemStores, and FlushMRS are all culprits. There's nothing necessarily 
wrong with that (though it would be interesting to remember why we don't cache 
stats in these ops), so we need to fix GetMaintenanceManagerStatusDump to not 
access !valid_ stats objects.

I think this was introduced about a year ago by commit 5e1f45e.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to