[jira] [Comment Edited] (KUDU-3326) Add Soft Delete Table Supports

2021-11-29 Thread dengke (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450910#comment-17450910
 ] 

dengke edited comment on KUDU-3326 at 11/30/21, 7:04 AM:
-

The following is a summary of the changes involved in this discussion:
1. When do deleting, add a parameter to determine whether to delete the trash 
table directly. If not, the trash table will be retained for a period of time 
by default. 


2. ListTables() does not display the trash table. If we want to display the 
trash table, we need to add a new API with new parameters.


3. After deleting a table, we do not need to modify the table name, but 
directly manage it through the newly added independent abandoned table list and 
use "trashed_time” field to maintains the deletion time. When reloading, the 
destination container for loading is determined according to whether there are 
“trashed_time” field (a separate container for the trash table).


4. When do recalling, we could have users recall by specifying a table ID, and 
potentially giving a new name for the recalled table.


In addition to these contents, is there anything to be added?


was (Author: koppa):
The following is a summary of the changes involved in this discussion:
1. When do deleting, add a parameter to determine whether to delete the trash 
table directly. If not, the trash table will be retained for a period of time 
by default. 
2. ListTables() does not display the trash table. If we want to display the 
trash table, we need to add a new API with new parameters.
3. After deleting a table, we do not need to modify the table name, but 
directly manage it through the newly added independent abandoned table list and 
use "trashed_time” field to maintains the deletion time. When reloading, the 
destination container for loading is determined according to whether there are 
“trashed_time” field (a separate container for the trash table).
4. When do recalling, we could have users recall by specifying a table ID, and 
potentially giving a new name for the recalled table.
In addition to these contents, is there anything to be added?

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-29 Thread dengke (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450910#comment-17450910
 ] 

dengke commented on KUDU-3326:
--

The following is a summary of the changes involved in this discussion:
1. When do deleting, add a parameter to determine whether to delete the trash 
table directly. If not, the trash table will be retained for a period of time 
by default. 
2. ListTables() does not display the trash table. If we want to display the 
trash table, we need to add a new API with new parameters.
3. After deleting a table, we do not need to modify the table name, but 
directly manage it through the newly added independent abandoned table list and 
use "trashed_time” field to maintains the deletion time. When reloading, the 
destination container for loading is determined according to whether there are 
“trashed_time” field (a separate container for the trash table).
4. When do recalling, we could have users recall by specifying a table ID, and 
potentially giving a new name for the recalled table.
In addition to these contents, is there anything to be added?

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-1959) Hard to tell when a cluster is done starting up

2021-11-29 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450871#comment-17450871
 ] 

ASF subversion and git services commented on KUDU-1959:
---

Commit 6d81fe44b51844942bc8433931d663531547c4b8 in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=6d81fe4 ]

KUDU-1959 - Add tests for /startup page and metrics for tservers

This patch implements the tests for the startup page using mini tablet
server.
 - We inject latency to bootstrap tablets while reading the webpage every
   10 milliseconds and validating the status for each step.
 - Fail a data directory and validate the status of each startup step.

We also validate the below startup metrics in the above scenarios
(log_block_manager* metrics in the case of using log block manager):
 - log_block_manager_total_containers_startup
 - log_block_manager_processed_containers_startup
 - log_block_manager_containers_processing_time_startup
 - tablets_num_total_startup
 - tablets_num_opened_startup
 - tablets_opening_time_startup

Additionally we also fix a race condition in the Kudu tablet server
WebUI. This race condition occurs if the tablet server is started while
the WebUI is continuously curled. The reason appears to be starting up
of the webserver before registering the path handlers as a part of
the change https://gerrit.cloudera.org/#/c/17730/

Change-Id: I9f432b4eb813e51214b4d6b3c5b7b4c89426f47f
Reviewed-on: http://gerrit.cloudera.org:8080/17990
Reviewed-by: Andrew Wong 
Tested-by: Andrew Wong 


> Hard to tell when a cluster is done starting up
> ---
>
> Key: KUDU-1959
> URL: https://issues.apache.org/jira/browse/KUDU-1959
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling
>Reporter: Jean-Daniel Cryans
>Assignee: Abhishek
>Priority: Major
>  Labels: roadmap-candidate, usability
>
> Restarting a cluster that has a good amount of data, it's hard to tell when 
> it's "done". Right now the things I do:
>  - Run ksck, wait until most tablets are not in "unavailable" or 
> "boostrapping" state.
>  - Watch the metrics and see when the data under management is close to where 
> it was before restarting (it grows as tablets are getting bootstrapped).
>  - Look at the tablet server web UIs for tablets, compare how many are done 
> bootstrapping VS in the process of VS not started.
> Ideas on how to improve this:
>  - In the master's web UI for tablet servers, show how many tablets are 
> running VS not running (I wouldn't add anything about tombstoned tablets)
>  - Add metrics for tablets in different states.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error

2021-11-29 Thread YifanZhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YifanZhang updated KUDU-3341:
-
Description: 
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master 
send DeleteTablet RPCs to this tserver, but receive either a RPC 
failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with 
a new uuid), and keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and if any, outdated 
replicas could be deleted finally.

  was:
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{--follower_unavailable_considered_failed_sec}} seconds. And then master send 
DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was 
shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and 
keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and if any, outdated 
replicas could be deleted finally.


> Catalog Manager should stop retrying DeleteTablet when receive 
> WRONG_SERVER_UUID error
> --
>
> Key: KUDU-3341
> URL: https://issues.apache.org/jira/browse/KUDU-3341
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: YifanZhang
>Assignee: YifanZhang
>Priority: Minor
>
> Sometimes a tablet server could be shutdown because of detected disk 
> failures, and this server would be re-added to the cluster with all data 
> cleared.
> Replicas could be replicated after  
> {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master 
> send DeleteTablet RPCs to this tserver, but receive either a RPC 
> failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started 
> with a new uuid), and keep retrying to delete tablets after 
> {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
> the server uuid could only be corrected by restarting the tablet server, at 
> that time full tablet reports would sent to master and if any, outdated 
> replicas could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error

2021-11-29 Thread YifanZhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YifanZhang reassigned KUDU-3341:


Assignee: YifanZhang

> Catalog Manager should stop retrying DeleteTablet when receive 
> WRONG_SERVER_UUID error
> --
>
> Key: KUDU-3341
> URL: https://issues.apache.org/jira/browse/KUDU-3341
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: YifanZhang
>Assignee: YifanZhang
>Priority: Minor
>
> Sometimes a tablet server could be shutdown because of detected disk 
> failures, and this server would be re-added to the cluster with all data 
> cleared.
> Replicas could be replicated after  
> {{--follower_unavailable_considered_failed_sec}} seconds. And then master 
> send DeleteTablet RPCs to this tserver, but receive either a RPC 
> failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started 
> with a new uuid), and keep retrying to delete tablets after 
> {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
> the server uuid could only be corrected by restarting the tablet server, at 
> that time full tablet reports would sent to master and if any, outdated 
> replicas could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error

2021-11-29 Thread YifanZhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YifanZhang updated KUDU-3341:
-
Description: 
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{--follower_unavailable_considered_failed_sec}} seconds. And then master send 
DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was 
shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and 
keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and if any, outdated 
replicas could be deleted finally.

  was:
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master 
send DeleteTablet RPCs to this tserver, but receive either a RPC 
failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with 
a new uuid), and keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and outdated replicas could 
be deleted finally.


> Catalog Manager should stop retrying DeleteTablet when receive 
> WRONG_SERVER_UUID error
> --
>
> Key: KUDU-3341
> URL: https://issues.apache.org/jira/browse/KUDU-3341
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: YifanZhang
>Priority: Minor
>
> Sometimes a tablet server could be shutdown because of detected disk 
> failures, and this server would be re-added to the cluster with all data 
> cleared.
> Replicas could be replicated after  
> {{--follower_unavailable_considered_failed_sec}} seconds. And then master 
> send DeleteTablet RPCs to this tserver, but receive either a RPC 
> failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started 
> with a new uuid), and keep retrying to delete tablets after 
> {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
> the server uuid could only be corrected by restarting the tablet server, at 
> that time full tablet reports would sent to master and if any, outdated 
> replicas could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_SERVER_UUID error

2021-11-29 Thread YifanZhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YifanZhang updated KUDU-3341:
-
Summary: Catalog Manager should stop retrying DeleteTablet when receive 
WRONG_SERVER_UUID error  (was: Catalog Manager should stop retrying 
DeleteTablet when receive WRONG_UUID_ERROR)

> Catalog Manager should stop retrying DeleteTablet when receive 
> WRONG_SERVER_UUID error
> --
>
> Key: KUDU-3341
> URL: https://issues.apache.org/jira/browse/KUDU-3341
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: YifanZhang
>Priority: Minor
>
> Sometimes a tablet server could be shutdown because of detected disk 
> failures, and this server would be re-added to the cluster with all data 
> cleared.
> Replicas could be replicated after  
> {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master 
> send DeleteTablet RPCs to this tserver, but receive either a RPC 
> failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started 
> with a new uuid), and keep retrying to delete tablets after 
> {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
> the server uuid could only be corrected by restarting the tablet server, at 
> that time full tablet reports would sent to master and outdated replicas 
> could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-29 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450798#comment-17450798
 ] 

Andrew Wong commented on KUDU-3326:
---

Alternatively, to avoid the whole question of naming convention, rather than 
relying on table renames (which incurs some IO on the tablet servers to persist 
metadata), we could introduce a separate list of trashed tables to the catalog 
manager that isn't visible to users via normal {{ListTable}} and {{OpenTable}} 
calls. When loading all the tables to memory, based on whether the table has a 
"trashed_time" field, Kudu could move the table into a separate container (i.e. 
not {{table_ids_map_}} or {{normalized_table_names_map_}}). When recalling, we 
could have users recall by specifying a table ID, and potentially giving a new 
name.

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-29 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450792#comment-17450792
 ] 

Andrew Wong commented on KUDU-3326:
---

Sorry for the late response here!
{quote}So in your opinion, only one trash table is allowed to exist to meet our 
design requirements?
{quote}
I wouldn't be against it, at least with the caveats mentioned.

That said, while we're thinking about design here, I do think wouldn't be too 
difficult to come up with a naming convention that does satisfy uniqueness 
constraints. For instance, we could add the creation timestamp to the trashed 
table's name, or better yet, the table ID. E.g. instead of KUDU_TRASH:A, we 
could name it {{KUDU_TRASH::A}} or {{{}KUDU_TRASH::A{}}}.

 {quote}
This function can be distinguished by adding commands parameters, or it is more 
convenient to mark the trash table directly during list?
 {quote}

I think adding an argument to the {{ListTables()}} API (or adding a new API 
with the argument) that opts into showing trashed tables seems reasonable. I 
think the default should be to not show them though; especially as they will 
not be visible to Impala.



> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (KUDU-38) bootstrap should not replay logs that are known to be fully flushed

2021-11-29 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-38:
---

Assignee: Andrew Wong  (was: Todd Lipcon)

> bootstrap should not replay logs that are known to be fully flushed
> ---
>
> Key: KUDU-38
> URL: https://issues.apache.org/jira/browse/KUDU-38
> Project: Kudu
>  Issue Type: Sub-task
>  Components: tablet
>Affects Versions: M3
>Reporter: Todd Lipcon
>Assignee: Andrew Wong
>Priority: Major
>  Labels: data-scalability, roadmap-candidate, startup-time
>
> Currently the bootstrap process will process all of the log segments, 
> including those that can be trivially determined to contain only durable 
> edits. This makes startup unnecessarily slow.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_UUID_ERROR

2021-11-29 Thread YifanZhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YifanZhang updated KUDU-3341:
-
Description: 
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master 
send DeleteTablet RPCs to this tserver, but receive either a RPC 
failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started with 
a new uuid), and keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and outdated replicas could 
be deleted finally.

  was:
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{--follower_unavailable_considered_failed_sec}} seconds. And then master send 
DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was 
shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and 
keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and outdated replicas could 
be deleted finally.


> Catalog Manager should stop retrying DeleteTablet when receive 
> WRONG_UUID_ERROR
> ---
>
> Key: KUDU-3341
> URL: https://issues.apache.org/jira/browse/KUDU-3341
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: YifanZhang
>Priority: Minor
>
> Sometimes a tablet server could be shutdown because of detected disk 
> failures, and this server would be re-added to the cluster with all data 
> cleared.
> Replicas could be replicated after  
> {{\-\-follower_unavailable_considered_failed_sec}} seconds. And then master 
> send DeleteTablet RPCs to this tserver, but receive either a RPC 
> failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started 
> with a new uuid), and keep retrying to delete tablets after 
> {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
> the server uuid could only be corrected by restarting the tablet server, at 
> that time full tablet reports would sent to master and outdated replicas 
> could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRONG_UUID_ERROR

2021-11-29 Thread YifanZhang (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YifanZhang updated KUDU-3341:
-
Component/s: master
Description: 
Sometimes a tablet server could be shutdown because of detected disk failures, 
and this server would be re-added to the cluster with all data cleared.

Replicas could be replicated after  
{{--follower_unavailable_considered_failed_sec}} seconds. And then master send 
DeleteTablet RPCs to this tserver, but receive either a RPC failure(tserver was 
shutdown) or a WRONG_SERVER_UUID error(tserver started with a new uuid), and 
keep retrying to delete tablets after 
{{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).

It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
the server uuid could only be corrected by restarting the tablet server, at 
that time full tablet reports would sent to master and outdated replicas could 
be deleted finally.
Summary: Catalog Manager should stop retrying DeleteTablet when receive 
WRONG_UUID_ERROR  (was: Catalog Manager should stop retrying DeleteTablet when 
receive WRON)

> Catalog Manager should stop retrying DeleteTablet when receive 
> WRONG_UUID_ERROR
> ---
>
> Key: KUDU-3341
> URL: https://issues.apache.org/jira/browse/KUDU-3341
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: YifanZhang
>Priority: Minor
>
> Sometimes a tablet server could be shutdown because of detected disk 
> failures, and this server would be re-added to the cluster with all data 
> cleared.
> Replicas could be replicated after  
> {{--follower_unavailable_considered_failed_sec}} seconds. And then master 
> send DeleteTablet RPCs to this tserver, but receive either a RPC 
> failure(tserver was shutdown) or a WRONG_SERVER_UUID error(tserver started 
> with a new uuid), and keep retrying to delete tablets after 
> {{{}--unresponsive_ts_rpc_timeout_ms{}}}(default 1 hour).
> It's not so necessary to retry when receive WRONG_SERVER_UUID errors, because 
> the server uuid could only be corrected by restarting the tablet server, at 
> that time full tablet reports would sent to master and outdated replicas 
> could be deleted finally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KUDU-3341) Catalog Manager should stop retrying DeleteTablet when receive WRON

2021-11-29 Thread YifanZhang (Jira)
YifanZhang created KUDU-3341:


 Summary: Catalog Manager should stop retrying DeleteTablet when 
receive WRON
 Key: KUDU-3341
 URL: https://issues.apache.org/jira/browse/KUDU-3341
 Project: Kudu
  Issue Type: Improvement
Reporter: YifanZhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)