[jira] [Comment Edited] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450910#comment-17450910 ] dengke edited comment on KUDU-3326 at 11/30/21, 7:04 AM: - The following is a summary of the changes involved in this discussion: 1. When do deleting, add a parameter to determine whether to delete the trash table directly. If not, the trash table will be retained for a period of time by default. 2. ListTables() does not display the trash table. If we want to display the trash table, we need to add a new API with new parameters. 3. After deleting a table, we do not need to modify the table name, but directly manage it through the newly added independent abandoned table list and use "trashed_time” field to maintains the deletion time. When reloading, the destination container for loading is determined according to whether there are “trashed_time” field (a separate container for the trash table). 4. When do recalling, we could have users recall by specifying a table ID, and potentially giving a new name for the recalled table. In addition to these contents, is there anything to be added? was (Author: koppa): The following is a summary of the changes involved in this discussion: 1. When do deleting, add a parameter to determine whether to delete the trash table directly. If not, the trash table will be retained for a period of time by default. 2. ListTables() does not display the trash table. If we want to display the trash table, we need to add a new API with new parameters. 3. After deleting a table, we do not need to modify the table name, but directly manage it through the newly added independent abandoned table list and use "trashed_time” field to maintains the deletion time. When reloading, the destination container for loading is determined according to whether there are “trashed_time” field (a separate container for the trash table). 4. When do recalling, we could have users recall by specifying a table ID, and potentially giving a new name for the recalled table. In addition to these contents, is there anything to be added? > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450910#comment-17450910 ] dengke commented on KUDU-3326: -- The following is a summary of the changes involved in this discussion: 1. When do deleting, add a parameter to determine whether to delete the trash table directly. If not, the trash table will be retained for a period of time by default. 2. ListTables() does not display the trash table. If we want to display the trash table, we need to add a new API with new parameters. 3. After deleting a table, we do not need to modify the table name, but directly manage it through the newly added independent abandoned table list and use "trashed_time” field to maintains the deletion time. When reloading, the destination container for loading is determined according to whether there are “trashed_time” field (a separate container for the trash table). 4. When do recalling, we could have users recall by specifying a table ID, and potentially giving a new name for the recalled table. In addition to these contents, is there anything to be added? > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-1959) Hard to tell when a cluster is done starting up
[ https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450871#comment-17450871 ] ASF subversion and git services commented on KUDU-1959: --- Commit 6d81fe44b51844942bc8433931d663531547c4b8 in kudu's branch refs/heads/master from Abhishek Chennaka [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=6d81fe4 ] KUDU-1959 - Add tests for /startup page and metrics for tservers This patch implements the tests for the startup page using mini tablet server. - We inject latency to bootstrap tablets while reading the webpage every 10 milliseconds and validating the status for each step. - Fail a data directory and validate the status of each startup step. We also validate the below startup metrics in the above scenarios (log_block_manager* metrics in the case of using log block manager): - log_block_manager_total_containers_startup - log_block_manager_processed_containers_startup - log_block_manager_containers_processing_time_startup - tablets_num_total_startup - tablets_num_opened_startup - tablets_opening_time_startup Additionally we also fix a race condition in the Kudu tablet server WebUI. This race condition occurs if the tablet server is started while the WebUI is continuously curled. The reason appears to be starting up of the webserver before registering the path handlers as a part of the change https://gerrit.cloudera.org/#/c/17730/ Change-Id: I9f432b4eb813e51214b4d6b3c5b7b4c89426f47f Reviewed-on: http://gerrit.cloudera.org:8080/17990 Reviewed-by: Andrew Wong Tested-by: Andrew Wong > Hard to tell when a cluster is done starting up > --- > > Key: KUDU-1959 > URL: https://issues.apache.org/jira/browse/KUDU-1959 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling >Reporter: Jean-Daniel Cryans >Assignee: Abhishek >Priority: Major > Labels: roadmap-candidate, usability > > Restarting a cluster that has a good amount of data, it's hard to tell when > it's "done". Right now the things I do: > - Run ksck, wait until most tablets are not in "unavailable" or > "boostrapping" state. > - Watch the metrics and see when the data under management is close to where > it was before restarting (it grows as tablets are getting bootstrapped). > - Look at the tablet server web UIs for tablets, compare how many are done > bootstrapping VS in the process of VS not started. > Ideas on how to improve this: > - In the master's web UI for tablet servers, show how many tablets are > running VS not running (I wouldn't add anything about tombstoned tablets) > - Add metrics for tablets in different states. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error
[ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3341: - Description: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and if any, outdated replicas could be deleted finally. was: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{--follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and if any, outdated replicas could be deleted finally. > Catalog Manager should stop retrying DeleteTablet when receive > WRONG_SERVER_UUID error > -- > > Key: KUDU-3341 > URL: https://issues.apache.org/jira/browse/KUDU-3341 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: YifanZhang >Assignee: YifanZhang >Priority: Minor > > Sometimes a tablet server could be shutdown because of detected disk > failures, and this server would be re-added to the cluster with all data > cleared. > Replicas could be replicated after > {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master > send DeleteTablet RPCs to this tserver, but receive either a RPC > failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started > with a new uuid), and keep retrying to delete tablets after > {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). > It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because > the server uuid could only be corrected by restarting the tablet server, at > that time full tablet reports would sent to master and if any, outdated > replicas could be deleted finally. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error
[ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang reassigned KUDU-3341: Assignee: YifanZhang > Catalog Manager should stop retrying DeleteTablet when receive > WRONG_SERVER_UUID error > -- > > Key: KUDU-3341 > URL: https://issues.apache.org/jira/browse/KUDU-3341 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: YifanZhang >Assignee: YifanZhang >Priority: Minor > > Sometimes a tablet server could be shutdown because of detected disk > failures, and this server would be re-added to the cluster with all data > cleared. > Replicas could be replicated after > {{--follower_unavailable_considered_failed_sec}} seconds. And then master > send DeleteTablet RPCs to this tserver, but receive either a RPC > failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started > with a new uuid), and keep retrying to delete tablets after > {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). > It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because > the server uuid could only be corrected by restarting the tablet server, at > that time full tablet reports would sent to master and if any, outdated > replicas could be deleted finally. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error
[ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3341: - Description: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{--follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and if any, outdated replicas could be deleted finally. was: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and outdated replicas could be deleted finally. > Catalog Manager should stop retrying DeleteTablet when receive > WRONG_SERVER_UUID error > -- > > Key: KUDU-3341 > URL: https://issues.apache.org/jira/browse/KUDU-3341 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: YifanZhang >Priority: Minor > > Sometimes a tablet server could be shutdown because of detected disk > failures, and this server would be re-added to the cluster with all data > cleared. > Replicas could be replicated after > {{--follower_unavailable_considered_failed_sec}} seconds. And then master > send DeleteTablet RPCs to this tserver, but receive either a RPC > failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started > with a new uuid), and keep retrying to delete tablets after > {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). > It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because > the server uuid could only be corrected by restarting the tablet server, at > that time full tablet reports would sent to master and if any, outdated > replicas could be deleted finally. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error
[ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3341: - Summary: Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error (was: Catalog Manager should stop retrying DeleteTablet when receive WRONG_UUID_ERROR) > Catalog Manager should stop retrying DeleteTablet when receive > WRONG_SERVER_UUID error > -- > > Key: KUDU-3341 > URL: https://issues.apache.org/jira/browse/KUDU-3341 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: YifanZhang >Priority: Minor > > Sometimes a tablet server could be shutdown because of detected disk > failures, and this server would be re-added to the cluster with all data > cleared. > Replicas could be replicated after > {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master > send DeleteTablet RPCs to this tserver, but receive either a RPC > failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started > with a new uuid), and keep retrying to delete tablets after > {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). > It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because > the server uuid could only be corrected by restarting the tablet server, at > that time full tablet reports would sent to master and outdated replicas > could be deleted finally. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450798#comment-17450798 ] Andrew Wong commented on KUDU-3326: --- Alternatively, to avoid the whole question of naming convention, rather than relying on table renames (which incurs some IO on the tablet servers to persist metadata), we could introduce a separate list of trashed tables to the catalog manager that isn't visible to users via normal {{ListTable}} and {{OpenTable}} calls. When loading all the tables to memory, based on whether the table has a "trashed_time" field, Kudu could move the table into a separate container (i.e. not {{table_ids_map_}} or {{normalized_table_names_map_}}). When recalling, we could have users recall by specifying a table ID, and potentially giving a new name. > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450792#comment-17450792 ] Andrew Wong commented on KUDU-3326: --- Sorry for the late response here! {quote}So in your opinion, only one trash table is allowed to exist to meet our design requirements? {quote} I wouldn't be against it, at least with the caveats mentioned. That said, while we're thinking about design here, I do think wouldn't be too difficult to come up with a naming convention that does satisfy uniqueness constraints. For instance, we could add the creation timestamp to the trashed table's name, or better yet, the table ID. E.g. instead of KUDU_TRASH:A, we could name it {{KUDU_TRASH::A}} or {{{}KUDU_TRASH::A{}}}. {quote} This function can be distinguished by adding commands parameters, or it is more convenient to mark the trash table directly during list? {quote} I think adding an argument to the {{ListTables()}} API (or adding a new API with the argument) that opts into showing trashed tables seems reasonable. I think the default should be to not show them though; especially as they will not be visible to Impala. > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (KUDU-38) bootstrap should not replay logs that are known to be fully flushed
[ https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-38: --- Assignee: Andrew Wong (was: Todd Lipcon) > bootstrap should not replay logs that are known to be fully flushed > --- > > Key: KUDU-38 > URL: https://issues.apache.org/jira/browse/KUDU-38 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Affects Versions: M3 >Reporter: Todd Lipcon >Assignee: Andrew Wong >Priority: Major > Labels: data-scalability, roadmap-candidate, startup-time > > Currently the bootstrap process will process all of the log segments, > including those that can be trivially determined to contain only durable > edits. This makes startup unnecessarily slow. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_UUID_ERROR
[ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3341: - Description: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and outdated replicas could be deleted finally. was: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{--follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and outdated replicas could be deleted finally. > Catalog Manager should stop retrying DeleteTablet when receive > WRONG_UUID_ERROR > --- > > Key: KUDU-3341 > URL: https://issues.apache.org/jira/browse/KUDU-3341 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: YifanZhang >Priority: Minor > > Sometimes a tablet server could be shutdown because of detected disk > failures, and this server would be re-added to the cluster with all data > cleared. > Replicas could be replicated after > {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master > send DeleteTablet RPCs to this tserver, but receive either a RPC > failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started > with a new uuid), and keep retrying to delete tablets after > {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). > It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because > the server uuid could only be corrected by restarting the tablet server, at > that time full tablet reports would sent to master and outdated replicas > could be deleted finally. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_UUID_ERROR
[ https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YifanZhang updated KUDU-3341: - Component/s: master Description: Sometimes a tablet server could be shutdown because of detected disk failures, and this server would be re-added to the cluster with all data cleared. Replicas could be replicated after {{--follower_unavailable_considered_failed_sec}} seconds. And then master send DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and keep retrying to delete tablets after {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because the server uuid could only be corrected by restarting the tablet server, at that time full tablet reports would sent to master and outdated replicas could be deleted finally. Summary: Catalog Manager should stop retrying DeleteTablet when receive WRONG_UUID_ERROR (was: Catalog Manager should stop retrying DeleteTablet when receive WRON) > Catalog Manager should stop retrying DeleteTablet when receive > WRONG_UUID_ERROR > --- > > Key: KUDU-3341 > URL: https://issues.apache.org/jira/browse/KUDU-3341 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: YifanZhang >Priority: Minor > > Sometimes a tablet server could be shutdown because of detected disk > failures, and this server would be re-added to the cluster with all data > cleared. > Replicas could be replicated after > {{--follower_unavailable_considered_failed_sec}} seconds. And then master > send DeleteTablet RPCs to this tserver, but receive either a RPC > failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started > with a new uuid), and keep retrying to delete tablets after > {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour). > It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because > the server uuid could only be corrected by restarting the tablet server, at > that time full tablet reports would sent to master and outdated replicas > could be deleted finally. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRON
YifanZhang created KUDU-3341: Summary: Catalog Manager should stop retrying DeleteTablet when receive WRON Key: KUDU-3341 URL: https://issues.apache.org/jira/browse/KUDU-3341 Project: Kudu Issue Type: Improvement Reporter: YifanZhang -- This message was sent by Atlassian Jira (v8.20.1#820001)