Alexey Serbin created KUDU-3680:
-----------------------------------
Summary: LogRollingITest.TestLogCleanupOnStartup: TSAN reports
race due to concurrency issue in glog GetLoggingDirectories()
Key: KUDU-3680
URL: https://issues.apache.org/jira/browse/KUDU-3680
Project: Kudu
Issue Type: Bug
Components: test
Reporter: Alexey Serbin
Attachments: log-rolling-itest.txt.xz
Apparently, the following code in glog's src/logging.cc isn't thread-safe and
that's becomes apparent in TSAN builds of the
LogRollingITest.TestLogCleanupOnStartup test scenario when scheduler anomalies
are present:
{code}
static vector<string>* logging_directories_list;
const vector<string>& GetLoggingDirectories() {
// Not strictly thread-safe but we're called early in InitGoogle().
if (logging_directories_list == NULL) {
logging_directories_list = new vector<string>;
if ( !FLAGS_log_dir.empty() ) {
// A dir was specified, we should use it
logging_directories_list->push_back(FLAGS_log_dir);
} else {
GetTempDirectories(logging_directories_list);
#ifdef GLOG_OS_WINDOWS
char tmp[MAX_PATH];
if (GetWindowsDirectoryA(tmp, MAX_PATH))
logging_directories_list->push_back(tmp);
logging_directories_list->push_back(".\\");
#else
logging_directories_list->push_back("./");
#endif
}
}
return *logging_directories_list;
}
{code}
The actual race happens in the code above, and it affects two threads iterating
over the std::vector<string> instance which is accessed via const reference and
being iterated over in the LogFileObject::Write() method for two corresponding
LogFileObject objects:
{code}
const vector<string> & log_dirs = GetLoggingDirectories();
// Go through the list of dirs, and try to create the log file in each
// until we succeed or run out of options
bool success = false;
for (vector<string>::const_iterator dir = log_dirs.begin();
dir != log_dirs.end();
++dir) {
...
}
{code}
The proper solution for GetLoggingDirectories() would be using atomic
semantics, where the static pointer is swapped atomically using
std::atomic::compare_exchange_strong() or similar construct only when the
underlying std::vector<string> object has been populated with all the necessary
entries.
The full log with the race detected by TSAN is attached.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)