Yan-Daojiang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23844 )
Change subject: [tools] Add --preserve_table_ids flag to unsafe_rebuild ...................................................................... Patch Set 3: (8 comments) Thanks for the thorough review! I've addressed all the feedback - please take another look when you have a chance. http://gerrit.cloudera.org:8080/#/c/23844/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23844/2//COMMIT_MSG@7 PS2, Line 7: Add --preserve_table_ids flag to unsafe_rebuild > nit: https://kudu.apache.org/docs/command_line_tools_reference.html#master- Thank you for reminding me! I will work on this modification after the code is finalized. http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/kudu-admin-test.cc@3865 PS2, Line 3865: eserve_table_ids flag preserves o > Do you think a couple more unit tests need to be added that manipulates --t Thanks for the suggestion! I've added two new tests that specifically test the `--tserver_enforce_access_control` scenario: 1. `TestRebuildMasterPreserveTableIdsWithAccessControl`: Verifies that when `--tserver_enforce_access_control=true` is enabled, using `--preserve_table_ids` during rebuild allows subsequent scans to succeed. 2. `TestRebuildMasterWithoutPreserveTableIdsAccessControlFails`: Demonstrates the problem that `--preserve_table_ids` solves - when `--tserver_enforce_access_control=true` is enabled and we rebuild WITHOUT `--preserve_table_ids`, scans fail. These tests directly validate the core use case mentioned in the commit message. http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/kudu-admin-test.cc@3980 PS2, Line 3980: LOG(INFO) << "Table_id after rebuild without preserve_table_ids: " << rebuilt_table_id; : ASSERT_NE(original_table_id, rebuilt_table_id) : << "Table ID should change after rebuild without --preserve_table_ids"; : } > Make it simpler by using ASSERT_NE or similar macros and throw error when c Done. http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@68 PS2, Line 68: "'kudu pbc dump tablet-meta/<tablet-id>' on each server and taking the ma > Move this above and make necessary alignments to the whole text, to maintai Thanks for the catch! Done. http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@67 PS2, Line 67: "viable value can be determined manually using " : "'kudu pbc dump tablet-meta/<tablet-id>' on each server and taking the max value " : "found across all tablets. In tablet server versions >= 1.16, Kudu will determine " : "the proper value for each tablet automatically."); : DECLARE_string(tables); > Move this above default_schema_version. Done. http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@261 PS2, Line 261: o we fall back to generating a new one in t > Here and below (line 319-320): Good catch! You're right - `has_table_id()` returns true only when the field is explicitly set by the sender. In practice, tablet servers always set meaningful (non-empty) table IDs when they set the field at all, so the `empty()` check is redundant. I've simplified the code by removing the redundant `empty()` checks and added comments. http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@268 PS2, Line 268: ARNING) << Substitute( : "Tablet server cannot provide a valid table_id for table '$0'. " > This would be a post facto situation where event of unsuccessful table id p Good suggestion! I've: 1. Simplified the log message to just state the fact. 2. Added a code comment explaining why this happens: "Older Kudu versions do not populate table_id in tablet status, so we fall back to generating a new one in that case." http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@315 PS2, Line 315: > Question: Good question! The `table->id()` comes from the first replica processed for this table (set in `CreateTable()`), while `replica_table_id` comes from subsequent replicas of the same table being processed in `CheckTableConsistency()`. When rebuilding from multiple tablet servers, we process replicas in arbitrary order - different tablets of the same table, or different replicas of the same tablet on different tservers. In normal operation, all replicas of a table should report the same table_id. This check is a defensive measure to catch rare anomalies like metadata corruption or inconsistent state across tservers (e.g. someone edited the relevant metadata using kudu pbc edit.). If table_ids don't match, it indicates something is wrong with the cluster and we should report an error rather than silently continue. -- To view, visit http://gerrit.cloudera.org:8080/23844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ae4353564922312d646f0323271d804e32e3b0d Gerrit-Change-Number: 23844 Gerrit-PatchSet: 3 Gerrit-Owner: Yan-Daojiang <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yan-Daojiang <[email protected]> Gerrit-Comment-Date: Mon, 12 Jan 2026 07:55:13 +0000 Gerrit-HasComments: Yes
