[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176072#comment-17176072
 ] 

Andrew Wong commented on KUDU-3119:
---

The race isn't quite where I expected, per the following lines in the logs:

{code:java}
  Write of size 1 at 0x7f82f790a760 by thread T5 (mutexes: write M1638):
#0 spp::sparsegroup<>::_sizing(unsigned int) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1103:56
 (libkudu_fs.so+0x102d70)
#1 void 
spp::sparsegroup<>::_set_aux<>(kudu::MemTrackerAllocator, 
std::__1::allocator > >&, unsigned char, std::__1::pair<>&, 
spp::integral_constant) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31
 (libkudu_fs.so+0x102ac8)
#2 void 
spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, 
std::__1::allocator > >&, unsigned char, unsigned char, 
std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13
 (libkudu_fs.so+0x102a56)
#3 std::__1::pair<>* spp::sparsegroup<>::set 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9
 (libkudu_fs.so+0x10295f)
#4 std::__1::pair<>& spp::sparsetable<>::set >(unsigned 
long, std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25
 (libkudu_fs.so+0x1036ba)
#5 std::__1::pair<>& 
spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22
 (libkudu_fs.so+0x101910)
#6 std::__1::pair<>& 
spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, 
kudu::BlockIdEqual, kudu::MemTrackerAllocator >, 
std::__1::allocator > > > 
>::DefaultValue>(kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28
 (libkudu_fs.so+0x1014a1)
#7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29
 (libkudu_fs.so+0xeece0)
#8 
kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr)
 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32
 (libkudu_fs.so+0xe6a27)
...

  Previous read of size 1 at 0x7f82f790a760 by thread T6 (mutexes: write M1637):
#0 spp::sparsegroup<>::_sizing(unsigned int) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1088:14
 (libkudu_fs.so+0x102d1c)
#1 void spp::sparsegroup<>::_set_aux > 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&, spp::integral_constant) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31
 (libkudu_fs.so+0x102ac8)
#2 void 
spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, 
std::__1::allocator<> >&, unsigned char, unsigned char, std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13
 (libkudu_fs.so+0x102a56)
#3 std::__1::pair<>* spp::sparsegroup<>::set > 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9
 (libkudu_fs.so+0x10295f)
#4 std::__1::pair<>& spp::sparsetable<>::set > >(unsigned long, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25
 (libkudu_fs.so+0x1036ba)
#5 std::__1::pair<>& 
spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22
 (libkudu_fs.so+0x101910)
#6 std::__1::pair<>& 
spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, 
kudu::BlockIdEqual, kudu::MemTrackerAllocator >, 
std::__1::allocator > > > 
>::DefaultValue>(kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28
 (libkudu_fs.so+0x1014a1)
#7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29
 (libkudu_fs.so+0xeece0)
#8 
kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr)
 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32
 (l

[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175778#comment-17175778
 ] 

Alexey Serbin commented on KUDU-3119:
-

I changed the priority to BLOCKER in the context of cutting a new release soon. 
 It would be great to clarify on this:

# If this is a real race, could this affect data consistency, leading to data 
corruption or alike in the long run?
# If this is a race that could cause a corruption (done either by the {{kudu}} 
CLI tool or by kudu tablet server,) it should be fixed before cutting the 
upcoming release.

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Blocker
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, 
> kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-07-10 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155693#comment-17155693
 ] 

Alexey Serbin commented on KUDU-3119:
-

One more TSAN trace.f [^kudu-tool-test.3.txt.xz] 

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Minor
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, 
> kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-07-09 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155096#comment-17155096
 ] 

Alexey Serbin commented on KUDU-3119:
-

It seems the issue started manifesting itself after 
https://github.com/apache/kudu/commit/98f44f4537ceddffedaf9afce26b634c4ab2142a

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Minor
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-07-09 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155078#comment-17155078
 ] 

Alexey Serbin commented on KUDU-3119:
-

One more instance of TSAN race report, log is attached. 
[^kudu-tool-test.20200709.txt.xz] 

I guess there is real race if trying to add a block at the same index from two 
different threads.  Yes, there is a lock per index, but consider what happens 
when two threads trying to access using {{operator[]}} if an element at the 
index which was not present:

{noformat}
bool LogBlockManager::AddLogBlock(LogBlockRefPtr lb) {
  // InsertIfNotPresent doesn't use move semantics, so instead we just
  // insert an empty scoped_refptr and assign into it down below rather
  // than using the utility function.
  int index = lb->block_id().id() & kBlockMapMask;
  std::lock_guard l(*managed_block_shards_[index].lock);
  auto& blocks_by_block_id = *managed_block_shards_[index].blocks_by_block_id;
  LogBlockRefPtr* entry_ptr = &blocks_by_block_id[lb->block_id()];
  if (*entry_ptr) {
// Already have an entry for this block ID.
return false;
  }
...
{noformat}

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Minor
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)