[ 
https://issues.apache.org/jira/browse/KUDU-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3680:
--------------------------------
    Description: 
Apparently, the following code in glog's src/logging.cc isn't thread-safe and 
that becomes apparent in TSAN builds of the 
LogRollingITest.TestLogCleanupOnStartup test scenario when scheduler anomalies 
are present:

{code}
static vector<string>* logging_directories_list;                                
                                                                                
const vector<string>& GetLoggingDirectories() {                                 
  // Not strictly thread-safe but we're called early in InitGoogle().           
  if (logging_directories_list == NULL) {                                       
    logging_directories_list = new vector<string>;                              
                                                                                
    if ( !FLAGS_log_dir.empty() ) {                                             
      // A dir was specified, we should use it                                  
      logging_directories_list->push_back(FLAGS_log_dir);                       
    } else {                                                                    
      GetTempDirectories(logging_directories_list);                             
#ifdef GLOG_OS_WINDOWS                                                          
      char tmp[MAX_PATH];                                                       
      if (GetWindowsDirectoryA(tmp, MAX_PATH))                                  
        logging_directories_list->push_back(tmp);                               
      logging_directories_list->push_back(".\\");                               
#else                                                                           
      logging_directories_list->push_back("./");                                
#endif                                                                          
    }                                                                           
  }                                                                             
  return *logging_directories_list;                                             
} 
{code}

The actual race happens in the code above, and it affects two concurrently 
running threads iterating over the std::vector<string> instance which is 
accessed via const reference in the LogFileObject::Write() method for two 
different LogFileObject objects:
{code}
      const vector<string> & log_dirs = GetLoggingDirectories();                
                                                                                
      // Go through the list of dirs, and try to create the log file in each    
      // until we succeed or run out of options                                 
      bool success = false;                                                     
      for (vector<string>::const_iterator dir = log_dirs.begin();               
           dir != log_dirs.end();                                               
           ++dir) {                                                             
        ...                                                           
      } 
{code}

The proper solution for GetLoggingDirectories() would be using atomic 
semantics, where the static pointer is swapped atomically using 
std::atomic::compare_exchange_strong() or similar construct only when the 
underlying std::vector<string> object has been populated with all the necessary 
entries.

The full log of the LogRollingITest.TestLogCleanupOnStartup scenario with the 
race detected by TSAN is attached.

  was:
Apparently, the following code in glog's src/logging.cc isn't thread-safe and 
that becomes apparent in TSAN builds of the 
LogRollingITest.TestLogCleanupOnStartup test scenario when scheduler anomalies 
are present:

{code}
static vector<string>* logging_directories_list;                                
                                                                                
const vector<string>& GetLoggingDirectories() {                                 
  // Not strictly thread-safe but we're called early in InitGoogle().           
  if (logging_directories_list == NULL) {                                       
    logging_directories_list = new vector<string>;                              
                                                                                
    if ( !FLAGS_log_dir.empty() ) {                                             
      // A dir was specified, we should use it                                  
      logging_directories_list->push_back(FLAGS_log_dir);                       
    } else {                                                                    
      GetTempDirectories(logging_directories_list);                             
#ifdef GLOG_OS_WINDOWS                                                          
      char tmp[MAX_PATH];                                                       
      if (GetWindowsDirectoryA(tmp, MAX_PATH))                                  
        logging_directories_list->push_back(tmp);                               
      logging_directories_list->push_back(".\\");                               
#else                                                                           
      logging_directories_list->push_back("./");                                
#endif                                                                          
    }                                                                           
  }                                                                             
  return *logging_directories_list;                                             
} 
{code}

The actual race happens in the code above, and it affects two concurrently 
running threads iterating over the std::vector<string> instance which is 
accessed via const reference in the LogFileObject::Write() method for two 
different LogFileObject objects:
{code}
      const vector<string> & log_dirs = GetLoggingDirectories();                
                                                                                
      // Go through the list of dirs, and try to create the log file in each    
      // until we succeed or run out of options                                 
      bool success = false;                                                     
      for (vector<string>::const_iterator dir = log_dirs.begin();               
           dir != log_dirs.end();                                               
           ++dir) {                                                             
        ...                                                           
      } 
{code}

The proper solution for GetLoggingDirectories() would be using atomic 
semantics, where the static pointer is swapped atomically using 
std::atomic::compare_exchange_strong() or similar construct only when the 
underlying std::vector<string> object has been populated with all the necessary 
entries.

The full log with the race detected by TSAN is attached.


> LogRollingITest.TestLogCleanupOnStartup: TSAN reports race due to concurrency 
> issue in glog GetLoggingDirectories()
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3680
>                 URL: https://issues.apache.org/jira/browse/KUDU-3680
>             Project: Kudu
>          Issue Type: Bug
>          Components: test
>            Reporter: Alexey Serbin
>            Priority: Major
>         Attachments: log-rolling-itest.txt.xz
>
>
> Apparently, the following code in glog's src/logging.cc isn't thread-safe and 
> that becomes apparent in TSAN builds of the 
> LogRollingITest.TestLogCleanupOnStartup test scenario when scheduler 
> anomalies are present:
> {code}
> static vector<string>* logging_directories_list;                              
>   
>                                                                               
>   
> const vector<string>& GetLoggingDirectories() {                               
>   
>   // Not strictly thread-safe but we're called early in InitGoogle().         
>   
>   if (logging_directories_list == NULL) {                                     
>   
>     logging_directories_list = new vector<string>;                            
>   
>                                                                               
>   
>     if ( !FLAGS_log_dir.empty() ) {                                           
>   
>       // A dir was specified, we should use it                                
>   
>       logging_directories_list->push_back(FLAGS_log_dir);                     
>   
>     } else {                                                                  
>   
>       GetTempDirectories(logging_directories_list);                           
>   
> #ifdef GLOG_OS_WINDOWS                                                        
>   
>       char tmp[MAX_PATH];                                                     
>   
>       if (GetWindowsDirectoryA(tmp, MAX_PATH))                                
>   
>         logging_directories_list->push_back(tmp);                             
>   
>       logging_directories_list->push_back(".\\");                             
>   
> #else                                                                         
>   
>       logging_directories_list->push_back("./");                              
>   
> #endif                                                                        
>   
>     }                                                                         
>   
>   }                                                                           
>   
>   return *logging_directories_list;                                           
>   
> } 
> {code}
> The actual race happens in the code above, and it affects two concurrently 
> running threads iterating over the std::vector<string> instance which is 
> accessed via const reference in the LogFileObject::Write() method for two 
> different LogFileObject objects:
> {code}
>       const vector<string> & log_dirs = GetLoggingDirectories();              
>   
>                                                                               
>   
>       // Go through the list of dirs, and try to create the log file in each  
>   
>       // until we succeed or run out of options                               
>   
>       bool success = false;                                                   
>   
>       for (vector<string>::const_iterator dir = log_dirs.begin();             
>   
>            dir != log_dirs.end();                                             
>   
>            ++dir) {                                                           
>   
>         ...                                                           
>       } 
> {code}
> The proper solution for GetLoggingDirectories() would be using atomic 
> semantics, where the static pointer is swapped atomically using 
> std::atomic::compare_exchange_strong() or similar construct only when the 
> underlying std::vector<string> object has been populated with all the 
> necessary entries.
> The full log of the LogRollingITest.TestLogCleanupOnStartup scenario with the 
> race detected by TSAN is attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to