[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176072#comment-17176072
 ] 

Andrew Wong commented on KUDU-3119:
---

The race isn't quite where I expected, per the following lines in the logs:

{code:java}
  Write of size 1 at 0x7f82f790a760 by thread T5 (mutexes: write M1638):
#0 spp::sparsegroup<>::_sizing(unsigned int) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1103:56
 (libkudu_fs.so+0x102d70)
#1 void 
spp::sparsegroup<>::_set_aux<>(kudu::MemTrackerAllocator, 
std::__1::allocator > >&, unsigned char, std::__1::pair<>&, 
spp::integral_constant) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31
 (libkudu_fs.so+0x102ac8)
#2 void 
spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, 
std::__1::allocator > >&, unsigned char, unsigned char, 
std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13
 (libkudu_fs.so+0x102a56)
#3 std::__1::pair<>* spp::sparsegroup<>::set 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9
 (libkudu_fs.so+0x10295f)
#4 std::__1::pair<>& spp::sparsetable<>::set >(unsigned 
long, std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25
 (libkudu_fs.so+0x1036ba)
#5 std::__1::pair<>& 
spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22
 (libkudu_fs.so+0x101910)
#6 std::__1::pair<>& 
spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, 
kudu::BlockIdEqual, kudu::MemTrackerAllocator >, 
std::__1::allocator > > > 
>::DefaultValue>(kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28
 (libkudu_fs.so+0x1014a1)
#7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29
 (libkudu_fs.so+0xeece0)
#8 
kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr)
 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32
 (libkudu_fs.so+0xe6a27)
...

  Previous read of size 1 at 0x7f82f790a760 by thread T6 (mutexes: write M1637):
#0 spp::sparsegroup<>::_sizing(unsigned int) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1088:14
 (libkudu_fs.so+0x102d1c)
#1 void spp::sparsegroup<>::_set_aux > 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&, spp::integral_constant) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31
 (libkudu_fs.so+0x102ac8)
#2 void 
spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, 
std::__1::allocator<> >&, unsigned char, unsigned char, std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13
 (libkudu_fs.so+0x102a56)
#3 std::__1::pair<>* spp::sparsegroup<>::set > 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9
 (libkudu_fs.so+0x10295f)
#4 std::__1::pair<>& spp::sparsetable<>::set > >(unsigned long, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25
 (libkudu_fs.so+0x1036ba)
#5 std::__1::pair<>& 
spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22
 (libkudu_fs.so+0x101910)
#6 std::__1::pair<>& 
spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, 
kudu::BlockIdEqual, kudu::MemTrackerAllocator >, 
std::__1::allocator > > > 
>::DefaultValue>(kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28
 (libkudu_fs.so+0x1014a1)
#7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29
 (libkudu_fs.so+0xeece0)
#8 
kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr)
 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32
 (l

[jira] [Created] (KUDU-3181) Compilation manager queue may have too many tasks

2020-08-11 Thread Li Zhiming (Jira)
Li Zhiming created KUDU-3181:


 Summary: Compilation manager queue may have too many tasks
 Key: KUDU-3181
 URL: https://issues.apache.org/jira/browse/KUDU-3181
 Project: Kudu
  Issue Type: Bug
  Components: codegen
Reporter: Li Zhiming
 Attachments: heap.svg

When a client frequently scanning for thousands of diffrent columns, the 
code_cache_hits rate is quite low. Then compilation task is frequently 
submitted to queue, but the compiler manager thread cannot consumes the queue 
quick enough. The queue could accumulate tons of entries, each of which retains 
a copy of schema meta data, so a lot of memeory is consumed for a long time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2373) maintenance-manager-num-threads=0 causes CHECK fail during server startup

2020-08-11 Thread Bankim Bhavsar (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175786#comment-17175786
 ] 

Bankim Bhavsar commented on KUDU-2373:
--

It'd be good to add DEFINE_validator() for this flag.

> maintenance-manager-num-threads=0 causes CHECK fail during server startup
> -
>
> Key: KUDU-2373
> URL: https://issues.apache.org/jira/browse/KUDU-2373
> Project: Kudu
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.7.0
>Reporter: Dan Burkert
>Assignee: Mahesh Reddy
>Priority: Minor
>  Labels: trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-2373) maintenance-manager-num-threads=0 causes CHECK fail during server startup

2020-08-11 Thread Bankim Bhavsar (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bankim Bhavsar reassigned KUDU-2373:


Assignee: Mahesh Reddy

> maintenance-manager-num-threads=0 causes CHECK fail during server startup
> -
>
> Key: KUDU-2373
> URL: https://issues.apache.org/jira/browse/KUDU-2373
> Project: Kudu
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.7.0
>Reporter: Dan Burkert
>Assignee: Mahesh Reddy
>Priority: Minor
>  Labels: trivial
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175778#comment-17175778
 ] 

Alexey Serbin edited comment on KUDU-3119 at 8/11/20, 6:44 PM:
---

I changed the priority to BLOCKER in the context of cutting a new release soon. 
 It would be great to clarify on this:

# If this is a real race, could this affect data consistency, leading to data 
corruption or alike in the long run?
# If this is a race that could cause a corruption (done either by the {{kudu}} 
CLI tool or by kudu tablet server), I think it should be fixed before cutting 
next release.


was (Author: aserbin):
I changed the priority to BLOCKER in the context of cutting a new release soon. 
 It would be great to clarify on this:

# If this is a real race, could this affect data consistency, leading to data 
corruption or alike in the long run?
# If this is a race that could cause a corruption (done either by the {{kudu}} 
CLI tool or by kudu tablet server,) it should be fixed before cutting the 
upcoming release.

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Blocker
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, 
> kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Alexey Serbin (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175778#comment-17175778
 ] 

Alexey Serbin commented on KUDU-3119:
-

I changed the priority to BLOCKER in the context of cutting a new release soon. 
 It would be great to clarify on this:

# If this is a real race, could this affect data consistency, leading to data 
corruption or alike in the long run?
# If this is a race that could cause a corruption (done either by the {{kudu}} 
CLI tool or by kudu tablet server,) it should be fixed before cutting the 
upcoming release.

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Blocker
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, 
> kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Alexey Serbin (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3119:

Priority: Blocker  (was: Minor)

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Blocker
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, 
> kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3176) Backup & restore incompatibility

2020-08-11 Thread Attila Bukor (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Bukor updated KUDU-3176:
---
Priority: Blocker  (was: Critical)

> Backup & restore incompatibility
> 
>
> Key: KUDU-3176
> URL: https://issues.apache.org/jira/browse/KUDU-3176
> Project: Kudu
>  Issue Type: Bug
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Blocker
>
> The ownership in the backup metadata introduced in KUDU-3090 seems to have 
> backward/forward compatibility issues as restoring a backup with 
> post-ownership backup tool that was created on a pre-ownership cluster with 
> the matching backup tool fails. Other combinations might also fail but I 
> haven't reproduced them so far.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3176) Backup & restore incompatibility

2020-08-11 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175771#comment-17175771
 ] 

Andrew Wong commented on KUDU-3176:
---

What was the error seen here? Do you have application logs for the restore job? 
Or at least can you point to what the issue is?

> Backup & restore incompatibility
> 
>
> Key: KUDU-3176
> URL: https://issues.apache.org/jira/browse/KUDU-3176
> Project: Kudu
>  Issue Type: Bug
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Critical
>
> The ownership in the backup metadata introduced in KUDU-3090 seems to have 
> backward/forward compatibility issues as restoring a backup with 
> post-ownership backup tool that was created on a pre-ownership cluster with 
> the matching backup tool fails. Other combinations might also fail but I 
> haven't reproduced them so far.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)