[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176072#comment-17176072 ] Andrew Wong commented on KUDU-3119: --- The race isn't quite where I expected, per the following lines in the logs: {code:java} Write of size 1 at 0x7f82f790a760 by thread T5 (mutexes: write M1638): #0 spp::sparsegroup<>::_sizing(unsigned int) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1103:56 (libkudu_fs.so+0x102d70) #1 void spp::sparsegroup<>::_set_aux<>(kudu::MemTrackerAllocator, std::__1::allocator > >&, unsigned char, std::__1::pair<>&, spp::integral_constant) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31 (libkudu_fs.so+0x102ac8) #2 void spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, std::__1::allocator > >&, unsigned char, unsigned char, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13 (libkudu_fs.so+0x102a56) #3 std::__1::pair<>* spp::sparsegroup<>::set >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9 (libkudu_fs.so+0x10295f) #4 std::__1::pair<>& spp::sparsetable<>::set >(unsigned long, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25 (libkudu_fs.so+0x1036ba) #5 std::__1::pair<>& spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22 (libkudu_fs.so+0x101910) #6 std::__1::pair<>& spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, kudu::BlockIdEqual, kudu::MemTrackerAllocator >, std::__1::allocator > > > >::DefaultValue>(kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28 (libkudu_fs.so+0x1014a1) #7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29 (libkudu_fs.so+0xeece0) #8 kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32 (libkudu_fs.so+0xe6a27) ... Previous read of size 1 at 0x7f82f790a760 by thread T6 (mutexes: write M1637): #0 spp::sparsegroup<>::_sizing(unsigned int) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1088:14 (libkudu_fs.so+0x102d1c) #1 void spp::sparsegroup<>::_set_aux > >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&, spp::integral_constant) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31 (libkudu_fs.so+0x102ac8) #2 void spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, std::__1::allocator<> >&, unsigned char, unsigned char, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13 (libkudu_fs.so+0x102a56) #3 std::__1::pair<>* spp::sparsegroup<>::set > >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9 (libkudu_fs.so+0x10295f) #4 std::__1::pair<>& spp::sparsetable<>::set > >(unsigned long, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25 (libkudu_fs.so+0x1036ba) #5 std::__1::pair<>& spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22 (libkudu_fs.so+0x101910) #6 std::__1::pair<>& spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, kudu::BlockIdEqual, kudu::MemTrackerAllocator >, std::__1::allocator > > > >::DefaultValue>(kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28 (libkudu_fs.so+0x1014a1) #7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29 (libkudu_fs.so+0xeece0) #8 kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32 (l
[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175778#comment-17175778 ] Alexey Serbin commented on KUDU-3119: - I changed the priority to BLOCKER in the context of cutting a new release soon. It would be great to clarify on this: # If this is a real race, could this affect data consistency, leading to data corruption or alike in the long run? # If this is a race that could cause a corruption (done either by the {{kudu}} CLI tool or by kudu tablet server,) it should be fixed before cutting the upcoming release. > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Blocker > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, > kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155693#comment-17155693 ] Alexey Serbin commented on KUDU-3119: - One more TSAN trace.f [^kudu-tool-test.3.txt.xz] > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Minor > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, > kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155096#comment-17155096 ] Alexey Serbin commented on KUDU-3119: - It seems the issue started manifesting itself after https://github.com/apache/kudu/commit/98f44f4537ceddffedaf9afce26b634c4ab2142a > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Minor > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155078#comment-17155078 ] Alexey Serbin commented on KUDU-3119: - One more instance of TSAN race report, log is attached. [^kudu-tool-test.20200709.txt.xz] I guess there is real race if trying to add a block at the same index from two different threads. Yes, there is a lock per index, but consider what happens when two threads trying to access using {{operator[]}} if an element at the index which was not present: {noformat} bool LogBlockManager::AddLogBlock(LogBlockRefPtr lb) { // InsertIfNotPresent doesn't use move semantics, so instead we just // insert an empty scoped_refptr and assign into it down below rather // than using the utility function. int index = lb->block_id().id() & kBlockMapMask; std::lock_guard l(*managed_block_shards_[index].lock); auto& blocks_by_block_id = *managed_block_shards_[index].blocks_by_block_id; LogBlockRefPtr* entry_ptr = &blocks_by_block_id[lb->block_id()]; if (*entry_ptr) { // Already have an entry for this block ID. return false; } ... {noformat} > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Minor > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)