Yan-Daojiang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23844 )

Change subject: [tools] Add --preserve_table_ids flag to unsafe_rebuild
......................................................................


Patch Set 3:

(8 comments)

Thanks for the thorough review! I've addressed all the feedback - please take 
another look when you have a chance.

http://gerrit.cloudera.org:8080/#/c/23844/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23844/2//COMMIT_MSG@7
PS2, Line 7: Add --preserve_table_ids flag to unsafe_rebuild
> nit: https://kudu.apache.org/docs/command_line_tools_reference.html#master-
Thank you for reminding me! I will work on this modification after the code is 
finalized.


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/kudu-admin-test.cc@3865
PS2, Line 3865: eserve_table_ids flag preserves o
> Do you think a couple more unit tests need to be added that manipulates --t
Thanks for the suggestion! I've added two new tests that specifically test the 
`--tserver_enforce_access_control` scenario:

1. `TestRebuildMasterPreserveTableIdsWithAccessControl`: Verifies that when 
`--tserver_enforce_access_control=true` is enabled, using 
`--preserve_table_ids` during rebuild allows subsequent scans to succeed.

2. `TestRebuildMasterWithoutPreserveTableIdsAccessControlFails`: Demonstrates 
the problem that `--preserve_table_ids` solves - when 
`--tserver_enforce_access_control=true` is enabled and we rebuild WITHOUT 
`--preserve_table_ids`, scans fail.

These tests directly validate the core use case mentioned in the commit message.


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/kudu-admin-test.cc@3980
PS2, Line 3980:     LOG(INFO) << "Table_id after rebuild without 
preserve_table_ids: " << rebuilt_table_id;
              :     ASSERT_NE(original_table_id, rebuilt_table_id)
              :         << "Table ID should change after rebuild without 
--preserve_table_ids";
              :   }
> Make it simpler by using ASSERT_NE or similar macros and throw error when c
Done.


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc
File src/kudu/tools/master_rebuilder.cc:

http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@68
PS2, Line 68:  "'kudu pbc dump tablet-meta/<tablet-id>' on each server and 
taking the ma
> Move this above and make necessary alignments to the whole text, to maintai
Thanks for the catch! Done.


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@67
PS2, Line 67:               "viable value can be determined manually using "
            :               "'kudu pbc dump tablet-meta/<tablet-id>' on each 
server and taking the max value "
            :               "found across all tablets. In tablet server 
versions >= 1.16, Kudu will determine "
            :               "the proper value for each tablet automatically.");
            : DECLARE_string(tables);
> Move this above default_schema_version.
Done.


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@261
PS2, Line 261: o we fall back to generating a new one in t
> Here and below (line 319-320):
Good catch! You're right - `has_table_id()` returns true only when the field is 
explicitly set by the sender. In practice, tablet servers always set meaningful 
(non-empty) table IDs when they set the field at all, so the `empty()` check is 
redundant.

I've simplified the code by removing the redundant `empty()` checks and added 
comments.


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@268
PS2, Line 268: ARNING) << Substitute(
             :           "Tablet server cannot provide a valid table_id for 
table '$0'. "
> This would be a post facto situation where event of unsuccessful table id p
Good suggestion! I've:

1. Simplified the log message to just state the fact.
2. Added a code comment explaining why this happens: "Older Kudu versions do 
not populate table_id in tablet status, so we fall back to generating a new one 
in that case."


http://gerrit.cloudera.org:8080/#/c/23844/2/src/kudu/tools/master_rebuilder.cc@315
PS2, Line 315:
> Question:
Good question! The `table->id()` comes from the first replica processed for 
this table (set in `CreateTable()`), while `replica_table_id` comes from 
subsequent replicas of the same table being processed in 
`CheckTableConsistency()`.

When rebuilding from multiple tablet servers, we process replicas in arbitrary 
order - different tablets of the same table, or different replicas of the same 
tablet on different tservers.

In normal operation, all replicas of a table should report the same table_id. 
This check is a defensive measure to catch rare anomalies like metadata 
corruption or inconsistent state across tservers (e.g. someone edited the 
relevant metadata using kudu pbc edit.). If table_ids don't match, it indicates 
something is wrong with the cluster and we should report an error rather than 
silently continue.



--
To view, visit http://gerrit.cloudera.org:8080/23844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ae4353564922312d646f0323271d804e32e3b0d
Gerrit-Change-Number: 23844
Gerrit-PatchSet: 3
Gerrit-Owner: Yan-Daojiang <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yan-Daojiang <[email protected]>
Gerrit-Comment-Date: Mon, 12 Jan 2026 07:55:13 +0000
Gerrit-HasComments: Yes

Reply via email to