[kudu-CR] [compaction] Add memory estimation unit test
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20787 ) Change subject: [compaction] Add memory estimation unit test .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/20787/6/src/kudu/tablet/compaction_policy-test.cc File src/kudu/tablet/compaction_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/20787/6/src/kudu/tablet/compaction_policy-test.cc@651 PS6, Line 651: *1024*1024 > style nit for here and below: separate operation symbol from the operands b One more point that Ashwani pointed to. With trying to set FLAGS_memory_limit_hard_bytes, there are two issues: * Once it's set, the hard memory limit is modified for all the tests in this binary, and the setting isn't changing back even if rolling back the flag's setting. That's might lead to very unexpected outcomes, especially if randomizing the set of scenarios to run (i.e. when using `--gtest_shuffle`). * From the other side, is this setting at line 651 actually effective (i.e. does it set the hard memory limit as expected)? Did you check that InitLimits() hasn't been called earlier when running this test scenario? -- To view, visit http://gerrit.cloudera.org:8080/20787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5397c7e7bf29942a610b0b8203b6918811c49bbb Gerrit-Change-Number: 20787 Gerrit-PatchSet: 6 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Ádám Bakai Gerrit-Comment-Date: Tue, 30 Jan 2024 06:16:15 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 ) Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Patch Set 73: > Patch Set 73: > > Thank you very much for the contribution! Thanks for reviewing this big patch! -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du Gerrit-Comment-Date: Tue, 30 Jan 2024 05:55:24 + Gerrit-HasComments: No
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 ) Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Patch Set 73: (2 comments) http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt File src/kudu/fs/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt@44 PS71, Line 44: kudu_util > Well, NO_TESTS is rather something about the way the binaries are built, no Done http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc File src/kudu/fs/dir_manager.cc: http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc@220 PS72, Line 220: // In non-test en > Please use IsGTest() instead. You could find its usages in data_dirs.cc an Done -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du Gerrit-Comment-Date: Tue, 30 Jan 2024 05:54:45 + Gerrit-HasComments: Yes
[kudu-CR] [compaction] Add tests to generate high memory rowset compaction
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20816 ) Change subject: [compaction] Add tests to generate high memory rowset compaction .. [compaction] Add tests to generate high memory rowset compaction This change implements a better approach to test out high memory compaction case. There is an existing patch: https://gerrit.cloudera.org/#/c/19278/ that uses different approach which does not scale well because of excessive memory usage by client. Patch contains two tests: 1. TestRowSetCompactionProceedWithNoBudgetingConstraints (generates deltas with memory requirements under budget) 2. TestRowSetCompactionSkipWithBudgetingConstraints (generates deltas with memory requirements over budget) and a helper function: GenHighMemConsumptionDeltas Using this helper function, callers can generate deltas of different sizes as per test needs. The size_factor can be used to achieve that. Increasing size_factor by 1 adds 1 MB worth of deltas as far as rowset compaction memory consumption is concerned. Change-Id: I1996558e71c49314c6acf12faf854c796548318c Reviewed-on: http://gerrit.cloudera.org:8080/20816 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/tablet/compaction-test.cc 1 file changed, 141 insertions(+), 0 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/20816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1996558e71c49314c6acf12faf854c796548318c Gerrit-Change-Number: 20816 Gerrit-PatchSet: 10 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [compaction] Add tests to generate high memory rowset compaction
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20816 ) Change subject: [compaction] Add tests to generate high memory rowset compaction .. Patch Set 9: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/20816/6/src/kudu/tablet/compaction-test.cc File src/kudu/tablet/compaction-test.cc: http://gerrit.cloudera.org:8080/#/c/20816/6/src/kudu/tablet/compaction-test.cc@637 PS6, Line 637: // into consideration the amount of free memory left and based on > The problem with setting FLAGS_memory_limit_hard_bytes in the test is that Yep, with the initialization g_hard_limit it's quite limiting, indeed. Alternatively, you could separate these tests into their own binary, if it makes sense. It's up to you -- if you are quite confident this approach is good enough with updated thresholds, then I'm OK with this approach. Anyway, if after pushing this patch it starts failing from time to time, it will be necessary to reconsider this approach. -- To view, visit http://gerrit.cloudera.org:8080/20816 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1996558e71c49314c6acf12faf854c796548318c Gerrit-Change-Number: 20816 Gerrit-PatchSet: 9 Gerrit-Owner: Ashwani Raina Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 30 Jan 2024 05:51:34 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/20725 ) Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. KUDU-3527 Fix block manager test when using 64k container block alignment BlockManagerTest.TestMetadataOkayDespiteFailure might fail on system where we use 64k alignment for data blocks. Root cause: Currently tablets fail to load if .metadata is missing but there is still a non-empty ".data" file. If FLAGS_env_inject_eio is set to greater than zero, then there is a chance that, when we delete the container file, we only delete the ".meta", but leave the ".data" file. So deleting containers with injected io errors is expected to sometime prevent the block manager from restarting properly. However container deletion almost never occurred in this test until we run it on the new RHEL 8.8 ARM with 64K page size. Why is it stable on x86_64: On x86_64 we usually use 4k block alignment. We write 6080 byte data into a block, which is padded to 8k. So in the current test settings we have 32 blocks in a container when it becomes full (FLAGS_log_container_max_size = 256k). Later we delete exactly half of the 500 blocks we wrote. The chance of deleting all 32 blocks in a container is very small, and even if it happens, it still has around 0.09 chance to become corrupted. It is a bit flaky, but it would fail less than once in a billion run. If you dramatically decrease the FLAGS_log_container_max_size flag, the test starts to occasionally fail on a traditional x86_64 machine too. Why is it unstable with 64k alignment: With 64k alignment (currently used on ARM RHEL 8.8 with 64k page size), there are 4 blocks in a full container file. We write 500 blocks, so we expect to have nearly 125 full files. If we delete exactly half of the blocks, we will make many (full) container file empty. Some of them might fail to be deleted properly leaving a lonely non-empty .data file without .metadata. On my RHEL machine the test fails 97-98% of the time for this exact reason. Solution: The test TestMetadataOkayDespiteFailure was supposed to test reloading the block manager with containers having deleted blocks, with some previous failed delete. It (probably) never tested the case when container deletion occurs. + Disabled container deletion, so the test scope remains the same as it was with smaller block alignments. + Add a new (currently disabled) test, to see how block manager handles the above described situation. Filed a JIRA issue to track the issue: KUDU-3528. The original issue is not ARM specific, and far from trivial to solve, and was always in the system. Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Reviewed-on: http://gerrit.cloudera.org:8080/20725 Reviewed-by: Ashwani Raina Reviewed-by: Alexey Serbin Tested-by: Alexey Serbin --- M src/kudu/fs/block_manager-test.cc 1 file changed, 97 insertions(+), 16 deletions(-) Approvals: Ashwani Raina: Looks good to me, but someone else must approve Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Alexey Serbin has removed a vote on this change. Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20725 ) Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. Patch Set 15: Verified+1 unrelated test failures (ASAN, TSAN) -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 30 Jan 2024 05:33:26 + Gerrit-HasComments: No
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 ) Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Patch Set 73: Thank you very much for the contribution! -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du Gerrit-Comment-Date: Tue, 30 Jan 2024 05:29:21 + Gerrit-HasComments: No
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 ) Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Patch Set 73: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du Gerrit-Comment-Date: Tue, 30 Jan 2024 05:28:07 + Gerrit-HasComments: No
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Alexey Serbin has removed a vote on this change. Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 ) Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Patch Set 73: Verified+1 unrelated test failures: * CatalogManagerConfigurations/MasterStressTest.Test/2 (ASAN): KUDU-3481 * MultiThreadedTabletTest/3.DeleteAndReinsert (RELEASE): KUDU-2667 * SecurityITest.TestNonDefaultPrincipalMultipleMaster (RELEASE) * ToolTest.TestHmsList (RELEASE) * AutoRebalancerTest.NextLeaderResumesAutoRebalancing (TSAN) -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du Gerrit-Comment-Date: Tue, 30 Jan 2024 05:27:52 + Gerrit-HasComments: No
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Hello Alexey Serbin, Yuqi Du, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18569 to look at the new patch set (#73). Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. KUDU-3371 [fs] Use RocksDB to store LBM metadata Since the LogBlockContainerNativeMeta stores block records sequentially in the metadata file, the live blocks maybe in a very low ratio, so it may cause serious disk space amplification and long bootstrap times. This patch introduces a new class LogBlockContainerRdbMeta which uses RocksDB to store LBM metadata, a new item will be Put() into RocksDB when a new block is created in LBM, and the item will be Delete() from RocksDB when the block is removed from LBM. Data in RocksDB can be maintained by RocksDB itself, i.e. deleted items will be GCed so it's not needed to rewrite the metadata as how we do in LogBlockContainerNativeMeta. The implementation also reuses most logic of the base class LogBlockContainer, the main difference with LogBlockContainerNativeMeta is that LogBlockContainerRdbMeta stores block records metadata in RocksDB rather than in a native file. The main implementation of interfaces from the base class including: a. Create a container Data file is created similar to LogBlockContainerNativeMeta, but the metadata part is stored in RocksDB with keys constructed as ".", and values are the same to the records stored in metadata file of LogBlockContainerNativeMeta. b. Open a container Similar to LogBlockContainerNativeMeta, and it's not needed to check the metadata part, because it has been checked when loading containers during the bootstrap phase. c. Destroy a container If the container is dead (full and no live blocks), remove the data file, and clean up metadata part, by deleting all the keys prefixed by "". d. Load a container (by ProcessRecords()) Iterate the RocksDB in the key range [, ), because dead blocks have been deleted directly, thus only live block records will be populated, we can use them as LogBlockContainerNativeMeta. e. Create blocks in a container Put() serialized BlockRecordPB records into RocksDB, keys are constructed the same to the above. f. Remove blocks from a container Construct the keys same to the above, Delete() them from RocksDB in batch. This patch contains the following changes: - Adds a new block manager type named 'logr', it uses RocksDB to store LBM metadata. The new block manager is enabled by setting --block_manager=logr. - Related tests add new parameterized value to test the case of "--block_manager=logr". It's optional to use RocksDB, we can use the former LBM as before, we will introduce more tools to convert data between the two implementations in the future. The optimization is obvious as shown in JIRA KUDU-3371, it shows that the time spent to re-open tablet server's metadata when 99.99% of all the records removed reduced about 9.5 times when using LogBlockContainerRdbMeta instead of LogBlockContainerNativeMeta. Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f --- M src/kudu/benchmarks/CMakeLists.txt M src/kudu/client/CMakeLists.txt M src/kudu/consensus/CMakeLists.txt M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.cc M src/kudu/fs/dir_manager.cc M src/kudu/fs/dir_manager.h M src/kudu/fs/dir_util.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/fs_report.cc M src/kudu/fs/fs_report.h M src/kudu/fs/log_block_manager-test-util.cc M src/kudu/fs/log_block_manager-test-util.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/dense_node-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/server/CMakeLists.txt M src/kudu/tablet/compaction-test.cc M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/util/CMakeLists.txt M thirdparty/build-definitions.sh 32 files changed, 1,782 insertions(+), 185 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/18569/73 -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 73 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu
[kudu-CR] KUDU-3371 [fs] Use RocksDB to store LBM metadata
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18569 ) Change subject: KUDU-3371 [fs] Use RocksDB to store LBM metadata .. Patch Set 72: Code-Review+2 (4 comments) Please address one nit about using IsGTest() instead using NO_TESTS macro. Otherwise, this looks good to me! http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt File src/kudu/fs/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/CMakeLists.txt@44 PS71, Line 44: kudu_util > In https://gerrit.cloudera.org/c/18569/71/src/kudu/fs/dir_manager.cc#221, I Well, NO_TESTS is rather something about the way the binaries are built, not whether some code is running in a test context. Please consider updating this and using IsGTest() utility function that's declared in test_util_prod.h http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc File src/kudu/fs/dir_manager.cc: http://gerrit.cloudera.org:8080/#/c/18569/71/src/kudu/fs/dir_manager.cc@207 PS71, Line 207: RdbDir::RdbDir(Env* env, DirMetrics* metrics, > Because it's a static member, it causes a link error if not define it here. Ah, indeed -- I missed this. Thank you for the clarification. http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc File src/kudu/fs/dir_manager.cc: http://gerrit.cloudera.org:8080/#/c/18569/72/src/kudu/fs/dir_manager.cc@220 PS72, Line 220: #if defined(NO_TESTS) Please use IsGTest() instead. You could find its usages in data_dirs.cc and a few other places. http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh File thirdparty/build-definitions.sh: http://gerrit.cloudera.org:8080/#/c/18569/71/thirdparty/build-definitions.sh@1192 PS71, Line 1192: -DWITH_LZ4=ON > I added a new option "-Dlz4_ROOT_DIR=$PREFIX" for RocksDB to find lz4 accor That's perfect, thank you! -- To view, visit http://gerrit.cloudera.org:8080/18569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie72f6914eb5653a9c034766c6cd3741a8340711f Gerrit-Change-Number: 18569 Gerrit-PatchSet: 72 Gerrit-Owner: Yingchun Lai Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: KeDeng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yifan Zhang Gerrit-Reviewer: Yingchun Lai Gerrit-Reviewer: Yuqi Du Gerrit-Comment-Date: Tue, 30 Jan 2024 01:56:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/20725 ) Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. Patch Set 15: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@981 PS14, Line 981: // The chance that we delete the : // .metadata but fail to delete the .data file is (1-x) * x > Look at the last 2 lines of LogBlockContainerNativeMeta::~LogBlockContainer I see -- thank you for the clarification. Indeed, in the destructor the only error handling is adding a warning log message if an operation returns non-OK status, that's it. For some reason, I was under impression the removal of those files is done in a different place, but I should have double-checked. Then it's a good news: this estimate is a precise one. -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Tue, 30 Jan 2024 01:34:18 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Ashwani Raina has posted comments on this change. ( http://gerrit.cloudera.org:8080/20725 ) Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. Patch Set 15: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Mon, 29 Jan 2024 16:30:28 + Gerrit-HasComments: No
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/20725 ) Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/20725/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20725/14//COMMIT_MSG@21 PS14, Line 21: RHEL 8.8 ARM 64k core > Did you mean "RHEL 8.8 ARM with 64K page size"? Done http://gerrit.cloudera.org:8080/#/c/20725/14//COMMIT_MSG@36 PS14, Line 36: 64k core > nit: 64K page size ? Done http://gerrit.cloudera.org:8080/#/c/20725/14//COMMIT_MSG@51 PS14, Line 51: Added a jira to find a solution > nit: Filed a JIRA issue to track the issue Done http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@161 PS14, Line 161: CreateABlock > style nit: why not just CreateBlock() ? If you want to emphasize it's a si Thank you. I just used the original lambda name (create_a_block), did not gave it much thought. I will rename it to CreateBlock() http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@942 PS14, Line 942: is > are Done http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@949 PS14, Line 949: resistent > resistant Done http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@981 PS14, Line 981: // The chance that we delete the : // .metadata but fail to delete the .data file is (1-x) * x > I guess that's a bit simplistic given that the deletion of these two files Look at the last 2 lines of LogBlockContainerNativeMeta::~LogBlockContainerNativeMeta(). There is no error handling there. You can verify that the delete indeed happens when this smart pointers are reset: + Put a breakpoint to line 995 (CommitDeletedBlocks) in block_manager-test.cc and run the new test (use --gtest_also_run_disabled_tests) + When the breakpoint is hit, put a breakpoint into "Status DeleteFile(...)" in env_posix.cc. Resume the debugging. You will see a .data then a .metadata delete. you can check the debug callstack to see they are indeed called from ~LogBlockContainerNativeMeta() So the deletion error chances are independent as long as it is our artificial error and not a real disk error. http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@1022 PS14, Line 1022: , > nit: remove the comma? Done http://gerrit.cloudera.org:8080/#/c/20725/14/src/kudu/fs/block_manager-test.cc@1028 PS14, Line 1028: " > nit: remove the stray quote? Done -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 14 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka Gerrit-Comment-Date: Mon, 29 Jan 2024 16:26:25 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3527 Fix block manager test when using 64k container block alignment
Hello Mahesh Reddy, Alexey Serbin, Ashwani Raina, Kudu Jenkins, Abhishek Chennaka, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20725 to look at the new patch set (#15). Change subject: KUDU-3527 Fix block manager test when using 64k container block alignment .. KUDU-3527 Fix block manager test when using 64k container block alignment BlockManagerTest.TestMetadataOkayDespiteFailure might fail on system where we use 64k alignment for data blocks. Root cause: Currently tablets fail to load if .metadata is missing but there is still a non-empty ".data" file. If FLAGS_env_inject_eio is set to greater than zero, then there is a chance that, when we delete the container file, we only delete the ".meta", but leave the ".data" file. So deleting containers with injected io errors is expected to sometime prevent the block manager from restarting properly. However container deletion almost never occurred in this test until we run it on the new RHEL 8.8 ARM with 64K page size. Why is it stable on x86_64: On x86_64 we usually use 4k block alignment. We write 6080 byte data into a block, which is padded to 8k. So in the current test settings we have 32 blocks in a container when it becomes full (FLAGS_log_container_max_size = 256k). Later we delete exactly half of the 500 blocks we wrote. The chance of deleting all 32 blocks in a container is very small, and even if it happens, it still has around 0.09 chance to become corrupted. It is a bit flaky, but it would fail less than once in a billion run. If you dramatically decrease the FLAGS_log_container_max_size flag, the test starts to occasionally fail on a traditional x86_64 machine too. Why is it unstable with 64k alignment: With 64k alignment (currently used on ARM RHEL 8.8 with 64k page size), there are 4 blocks in a full container file. We write 500 blocks, so we expect to have nearly 125 full files. If we delete exactly half of the blocks, we will make many (full) container file empty. Some of them might fail to be deleted properly leaving a lonely non-empty .data file without .metadata. On my RHEL machine the test fails 97-98% of the time for this exact reason. Solution: The test TestMetadataOkayDespiteFailure was supposed to test reloading the block manager with containers having deleted blocks, with some previous failed delete. It (probably) never tested the case when container deletion occurs. + Disabled container deletion, so the test scope remains the same as it was with smaller block alignments. + Add a new (currently disabled) test, to see how block manager handles the above described situation. Filed a JIRA issue to track the issue: KUDU-3528. The original issue is not ARM specific, and far from trivial to solve, and was always in the system. Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 --- M src/kudu/fs/block_manager-test.cc 1 file changed, 97 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/25/20725/15 -- To view, visit http://gerrit.cloudera.org:8080/20725 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7e325bde502b7d7f39dd17fa84cb7eb42a3d7c20 Gerrit-Change-Number: 20725 Gerrit-PatchSet: 15 Gerrit-Owner: Zoltan Martonka Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Ashwani Raina Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Zoltan Martonka
[kudu-CR] [log block manager] Write lock for deletion
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20901 to look at the new patch set (#6). Change subject: [log_block_manager] Write lock for deletion .. [log_block_manager] Write lock for deletion Both removing blocks and compacting metadata manipulates the live_blocks_ value and file content. This way it can happen that on thread A live_blocks_ is updated, on thread B metadata is compacted and live_blocks_ value is overwritten, then thread A writes deletion records in the metadata file. This way it creates inconsistency between file content and live_blocks_ value, which leads to incorrect behaviour to skip compaction later. This fix makes sure that CompactMetadata and deletion can not overlap each other. No new test is needed because it fixes LogBlockManagerTest:: TestContainerBlockLimitingByMetadataSizeWithCompaction flakiness. Change-Id: I73712a5fd8eccf23621f8abda3e99ebdf10a769f --- M src/kudu/fs/log_block_manager.cc 1 file changed, 2 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/01/20901/6 -- To view, visit http://gerrit.cloudera.org:8080/20901 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I73712a5fd8eccf23621f8abda3e99ebdf10a769f Gerrit-Change-Number: 20901 Gerrit-PatchSet: 6 Gerrit-Owner: Ádám Bakai Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3535 Clear log cache while tombstoning a tablet replica.
Hello Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20822 to look at the new patch set (#10). Change subject: KUDU-3535 Clear log cache while tombstoning a tablet replica. .. KUDU-3535 Clear log cache while tombstoning a tablet replica. The log cache of a replica still exists even if the replica has been already tombstoned. This problem might take place if we decrease the replication factor of a table with high throughput. So we should clear the log cache while deleting the replica with delete type TABLET_DATA_TOMBSTONED. Change-Id: I6cf545e604f80d41e7ebd9660acfd2e928cd27a9 --- M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/log_cache.cc M src/kudu/consensus/log_cache.h 5 files changed, 31 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/20822/10 -- To view, visit http://gerrit.cloudera.org:8080/20822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6cf545e604f80d41e7ebd9660acfd2e928cd27a9 Gerrit-Change-Number: 20822 Gerrit-PatchSet: 10 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] KUDU-3535 Clear log cache while tombstoning a replica.
Hello Kudu Jenkins, Wang Xixu, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20822 to look at the new patch set (#9). Change subject: KUDU-3535 Clear log cache while tombstoning a replica. .. KUDU-3535 Clear log cache while tombstoning a replica. The log cache of a replica still exists even if the replica has been already tombstoned. This problem might take place if we decrease the replication factor of a table with high throughput. So we should clear the log cache while deleting the replica with delete type TABLET_DATA_TOMBSTONED. Change-Id: I6cf545e604f80d41e7ebd9660acfd2e928cd27a9 --- M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/log_cache.cc M src/kudu/consensus/log_cache.h 5 files changed, 31 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/20822/9 -- To view, visit http://gerrit.cloudera.org:8080/20822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6cf545e604f80d41e7ebd9660acfd2e928cd27a9 Gerrit-Change-Number: 20822 Gerrit-PatchSet: 9 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Wang Xixu <1450306...@qq.com>
[kudu-CR] [rpc] validate security-related parameters earlier
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20961 ) Change subject: [rpc] validate security-related parameters earlier .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20961/1/src/kudu/server/rpc_server.cc File src/kudu/server/rpc_server.cc: http://gerrit.cloudera.org:8080/#/c/20961/1/src/kudu/server/rpc_server.cc@166 PS1, Line 166: return Status::InvalidArgument( +1 -- To view, visit http://gerrit.cloudera.org:8080/20961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie16537c440041c9ee74276da35f2599592ef7e04 Gerrit-Change-Number: 20961 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Abhishek Chennaka Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Mon, 29 Jan 2024 09:10:23 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3535 Clear log cache while tombstoning a replica.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/20822 ) Change subject: KUDU-3535 Clear log cache while tombstoning a replica. .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/20822/8/src/kudu/consensus/consensus_queue-test.cc File src/kudu/consensus/consensus_queue-test.cc: http://gerrit.cloudera.org:8080/#/c/20822/8/src/kudu/consensus/consensus_queue-test.cc@1194 PS8, Line 1194: a replica is tombstoned Would it better to create a replica and tombstone it, then verify that it's log cache will be cleared? http://gerrit.cloudera.org:8080/#/c/20822/8/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: http://gerrit.cloudera.org:8080/#/c/20822/8/src/kudu/consensus/consensus_queue.h@379 PS8, Line 379: FRIEND_TEST(ConsensusQueueTest, ClearLogCacheWhileClosing); It is better to put it in dictionary order. -- To view, visit http://gerrit.cloudera.org:8080/20822 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cf545e604f80d41e7ebd9660acfd2e928cd27a9 Gerrit-Change-Number: 20822 Gerrit-PatchSet: 8 Gerrit-Owner: Song Jiacheng Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Song Jiacheng Gerrit-Reviewer: Wang Xixu <1450306...@qq.com> Gerrit-Comment-Date: Mon, 29 Jan 2024 09:07:23 + Gerrit-HasComments: Yes