Alexey Serbin created KUDU-3680:
-----------------------------------

             Summary: LogRollingITest.TestLogCleanupOnStartup: TSAN reports 
race due to concurrency issue in glog GetLoggingDirectories()
                 Key: KUDU-3680
                 URL: https://issues.apache.org/jira/browse/KUDU-3680
             Project: Kudu
          Issue Type: Bug
          Components: test
            Reporter: Alexey Serbin
         Attachments: log-rolling-itest.txt.xz

Apparently, the following code in glog's src/logging.cc isn't thread-safe and 
that's becomes apparent in TSAN builds of the 
LogRollingITest.TestLogCleanupOnStartup test scenario when scheduler anomalies 
are present:

{code}
static vector<string>* logging_directories_list;                                
                                                                                
const vector<string>& GetLoggingDirectories() {                                 
  // Not strictly thread-safe but we're called early in InitGoogle().           
  if (logging_directories_list == NULL) {                                       
    logging_directories_list = new vector<string>;                              
                                                                                
    if ( !FLAGS_log_dir.empty() ) {                                             
      // A dir was specified, we should use it                                  
      logging_directories_list->push_back(FLAGS_log_dir);                       
    } else {                                                                    
      GetTempDirectories(logging_directories_list);                             
#ifdef GLOG_OS_WINDOWS                                                          
      char tmp[MAX_PATH];                                                       
      if (GetWindowsDirectoryA(tmp, MAX_PATH))                                  
        logging_directories_list->push_back(tmp);                               
      logging_directories_list->push_back(".\\");                               
#else                                                                           
      logging_directories_list->push_back("./");                                
#endif                                                                          
    }                                                                           
  }                                                                             
  return *logging_directories_list;                                             
} 
{code}

The actual race happens in the code above, and it affects two threads iterating 
over the std::vector<string> instance which is accessed via const reference and 
being iterated over in the LogFileObject::Write() method for two corresponding 
LogFileObject objects:
{code}
      const vector<string> & log_dirs = GetLoggingDirectories();                
                                                                                
      // Go through the list of dirs, and try to create the log file in each    
      // until we succeed or run out of options                                 
      bool success = false;                                                     
      for (vector<string>::const_iterator dir = log_dirs.begin();               
           dir != log_dirs.end();                                               
           ++dir) {                                                             
        ...                                                           
      } 
{code}

The proper solution for GetLoggingDirectories() would be using atomic 
semantics, where the static pointer is swapped atomically using 
std::atomic::compare_exchange_strong() or similar construct only when the 
underlying std::vector<string> object has been populated with all the necessary 
entries.

The full log with the race detected by TSAN is attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to