[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176072#comment-17176072 ] Andrew Wong commented on KUDU-3119: --- The race isn't quite where I expected, per the following lines in the logs: {code:java} Write of size 1 at 0x7f82f790a760 by thread T5 (mutexes: write M1638): #0 spp::sparsegroup<>::_sizing(unsigned int) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1103:56 (libkudu_fs.so+0x102d70) #1 void spp::sparsegroup<>::_set_aux<>(kudu::MemTrackerAllocator, std::__1::allocator > >&, unsigned char, std::__1::pair<>&, spp::integral_constant) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31 (libkudu_fs.so+0x102ac8) #2 void spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, std::__1::allocator > >&, unsigned char, unsigned char, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13 (libkudu_fs.so+0x102a56) #3 std::__1::pair<>* spp::sparsegroup<>::set >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9 (libkudu_fs.so+0x10295f) #4 std::__1::pair<>& spp::sparsetable<>::set >(unsigned long, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25 (libkudu_fs.so+0x1036ba) #5 std::__1::pair<>& spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22 (libkudu_fs.so+0x101910) #6 std::__1::pair<>& spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, kudu::BlockIdEqual, kudu::MemTrackerAllocator >, std::__1::allocator > > > >::DefaultValue>(kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28 (libkudu_fs.so+0x1014a1) #7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29 (libkudu_fs.so+0xeece0) #8 kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32 (libkudu_fs.so+0xe6a27) ... Previous read of size 1 at 0x7f82f790a760 by thread T6 (mutexes: write M1637): #0 spp::sparsegroup<>::_sizing(unsigned int) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1088:14 (libkudu_fs.so+0x102d1c) #1 void spp::sparsegroup<>::_set_aux > >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&, spp::integral_constant) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31 (libkudu_fs.so+0x102ac8) #2 void spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, std::__1::allocator<> >&, unsigned char, unsigned char, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13 (libkudu_fs.so+0x102a56) #3 std::__1::pair<>* spp::sparsegroup<>::set > >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9 (libkudu_fs.so+0x10295f) #4 std::__1::pair<>& spp::sparsetable<>::set > >(unsigned long, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25 (libkudu_fs.so+0x1036ba) #5 std::__1::pair<>& spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22 (libkudu_fs.so+0x101910) #6 std::__1::pair<>& spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, kudu::BlockIdEqual, kudu::MemTrackerAllocator >, std::__1::allocator > > > >::DefaultValue>(kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28 (libkudu_fs.so+0x1014a1) #7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29 (libkudu_fs.so+0xeece0) #8 kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32 (l
[jira] [Created] (KUDU-3181) Compilation manager queue may have too many tasks
Li Zhiming created KUDU-3181: Summary: Compilation manager queue may have too many tasks Key: KUDU-3181 URL: https://issues.apache.org/jira/browse/KUDU-3181 Project: Kudu Issue Type: Bug Components: codegen Reporter: Li Zhiming Attachments: heap.svg When a client frequently scanning for thousands of diffrent columns, the code_cache_hits rate is quite low. Then compilation task is frequently submitted to queue, but the compiler manager thread cannot consumes the queue quick enough. The queue could accumulate tons of entries, each of which retains a copy of schema meta data, so a lot of memeory is consumed for a long time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2373) maintenance-manager-num-threads=0 causes CHECK fail during server startup
[ https://issues.apache.org/jira/browse/KUDU-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175786#comment-17175786 ] Bankim Bhavsar commented on KUDU-2373: -- It'd be good to add DEFINE_validator() for this flag. > maintenance-manager-num-threads=0 causes CHECK fail during server startup > - > > Key: KUDU-2373 > URL: https://issues.apache.org/jira/browse/KUDU-2373 > Project: Kudu > Issue Type: Bug > Components: server >Affects Versions: 1.7.0 >Reporter: Dan Burkert >Assignee: Mahesh Reddy >Priority: Minor > Labels: trivial > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-2373) maintenance-manager-num-threads=0 causes CHECK fail during server startup
[ https://issues.apache.org/jira/browse/KUDU-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bankim Bhavsar reassigned KUDU-2373: Assignee: Mahesh Reddy > maintenance-manager-num-threads=0 causes CHECK fail during server startup > - > > Key: KUDU-2373 > URL: https://issues.apache.org/jira/browse/KUDU-2373 > Project: Kudu > Issue Type: Bug > Components: server >Affects Versions: 1.7.0 >Reporter: Dan Burkert >Assignee: Mahesh Reddy >Priority: Minor > Labels: trivial > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175778#comment-17175778 ] Alexey Serbin edited comment on KUDU-3119 at 8/11/20, 6:44 PM: --- I changed the priority to BLOCKER in the context of cutting a new release soon. It would be great to clarify on this: # If this is a real race, could this affect data consistency, leading to data corruption or alike in the long run? # If this is a race that could cause a corruption (done either by the {{kudu}} CLI tool or by kudu tablet server), I think it should be fixed before cutting next release. was (Author: aserbin): I changed the priority to BLOCKER in the context of cutting a new release soon. It would be great to clarify on this: # If this is a real race, could this affect data consistency, leading to data corruption or alike in the long run? # If this is a race that could cause a corruption (done either by the {{kudu}} CLI tool or by kudu tablet server,) it should be fixed before cutting the upcoming release. > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Blocker > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, > kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175778#comment-17175778 ] Alexey Serbin commented on KUDU-3119: - I changed the priority to BLOCKER in the context of cutting a new release soon. It would be great to clarify on this: # If this is a real race, could this affect data consistency, leading to data corruption or alike in the long run? # If this is a race that could cause a corruption (done either by the {{kudu}} CLI tool or by kudu tablet server,) it should be fixed before cutting the upcoming release. > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Blocker > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, > kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-3119: Priority: Blocker (was: Minor) > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Blocker > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, > kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3176) Backup & restore incompatibility
[ https://issues.apache.org/jira/browse/KUDU-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Bukor updated KUDU-3176: --- Priority: Blocker (was: Critical) > Backup & restore incompatibility > > > Key: KUDU-3176 > URL: https://issues.apache.org/jira/browse/KUDU-3176 > Project: Kudu > Issue Type: Bug >Reporter: Attila Bukor >Assignee: Attila Bukor >Priority: Blocker > > The ownership in the backup metadata introduced in KUDU-3090 seems to have > backward/forward compatibility issues as restoring a backup with > post-ownership backup tool that was created on a pre-ownership cluster with > the matching backup tool fails. Other combinations might also fail but I > haven't reproduced them so far. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3176) Backup & restore incompatibility
[ https://issues.apache.org/jira/browse/KUDU-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175771#comment-17175771 ] Andrew Wong commented on KUDU-3176: --- What was the error seen here? Do you have application logs for the restore job? Or at least can you point to what the issue is? > Backup & restore incompatibility > > > Key: KUDU-3176 > URL: https://issues.apache.org/jira/browse/KUDU-3176 > Project: Kudu > Issue Type: Bug >Reporter: Attila Bukor >Assignee: Attila Bukor >Priority: Critical > > The ownership in the backup metadata introduced in KUDU-3090 seems to have > backward/forward compatibility issues as restoring a backup with > post-ownership backup tool that was created on a pre-ownership cluster with > the matching backup tool fails. Other combinations might also fail but I > haven't reproduced them so far. -- This message was sent by Atlassian Jira (v8.3.4#803005)