[ 
https://issues.apache.org/jira/browse/KUDU-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xixu Wang updated KUDU-3452:
----------------------------
    Description: 
h1. Background

In my case, every day a new Kudu table (called: history_data_table) will be 
created to store history data and a new partition for another table (called: 
business_data_table) to be ready to store today's data. These tables and 
partitions all require 3 replicas. This business logic was implemented by some 
Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: 
--catalog_manager_check_ts_count_for_create_table is false.

Sometimes, one tserver maybe become unavailable. Table creating task will retry 
continuously and always fail until the tserver become healthy again. See the 
error:

{color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error 
processing pending assignments: Invalid argument: error selecting replicas for 
tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet servers are 
online for table 'test_table'. Need at least 3 replicas, but only 2 tablet 
servers are available{color}

{color:#172b4d}As there are no enough replicas, a tablet will never be created. 
The state of this tablet is not running. Therefore, read or write this tablet 
will fail even if there are 2 tservers can be used to create 2 replicas.{color}

 

An already created tablet can still be on service even if one of its 3 replicas 
become unavailable. Why can not create a three-replicas table when only 2 
tservers healthy?

 

Besides, a validate table creating task will be affected by another invalidate 
tasks. In the upper example, a table creating task with RF=1 will still not 
succeed even if there exists more than one alive tablet servers. Because the 
background task manager will break the whole process when finds a tablet 
creating task failed and begin a new process to try to execute all tasks.

 

 
h1. Design

A new flag: --support_create_tablet_without_enough_healthy_tservers is added. 
The original logic keeps the same. When this flag is set true, a three-replicas 
tablet can be created successfully and its status is losing one replica. This 
tablet can be be read and write normally.

 

There are 3 things need to do:
 # A tool to cancel the table creating task.
 # A tool to show the running table creating task.
 # A method to create table without enough healthy tservers.
 # make invalidate table creating task not affected by other invalidate tasks.

  was:
h1. Background

In my case, every day a new Kudu table (called: history_data_table) will be 
created to store history data and a new partition for another table (called: 
business_data_table) to be ready to store today's data. These tables and 
partitions all require 3 replicas. This business logic was implemented by some 
Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: 
--catalog_manager_check_ts_count_for_create_table is false.

Sometimes, one tserver maybe become unavailable. Table creating task will retry 
continuously and always fail until the tserver become healthy again. See the 
error:

{color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error 
processing pending assignments: Invalid argument: error selecting replicas for 
tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet servers are 
online for table 'test_table'. Need at least 3 replicas, but only 2 tablet 
servers are available{color}

{color:#172b4d}As there are no enough replicas, a tablet will never be created. 
The state of this tablet is not running. Therefore, read or write this tablet 
will fail even if there are 2 tservers can be used to create 2 replicas.{color}

 

An already created tablet can still be on service even if one of its 3 replicas 
become unavailable. Why can not create a three-replicas table when only 2 
tservers healthy?

 

Besides, a validate table creating task will be affected by another invalidate 
tasks. In the upper example, a table creating task with RF=1 will still not 
succeed even if there exists more than one alive tablet servers. Because the 
background task manager will break the whole process when finds a tablet 
creating task failed and begin a new process to try to execute all tasks.

 

 
h1. Design

A new flag: --support_create_tablet_without_enough_healthy_tservers is added. 
The original logic keeps the same. When this flag is set true, a three-replicas 
tablet can be created successfully and its status is losing one replica. This 
tablet can be be read and write normally.

 

There are 3 things need to do:
 # A tool to cancel the table creating task.
 # A tool to show the running table creating task.
 # A method to create table without enough healthy tservers


> Support creating three-replicas table or partition when only 2 tservers 
> healthy
> -------------------------------------------------------------------------------
>
>                 Key: KUDU-3452
>                 URL: https://issues.apache.org/jira/browse/KUDU-3452
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Xixu Wang
>            Priority: Major
>
> h1. Background
> In my case, every day a new Kudu table (called: history_data_table) will be 
> created to store history data and a new partition for another table (called: 
> business_data_table) to be ready to store today's data. These tables and 
> partitions all require 3 replicas. This business logic was implemented by 
> some Python scripts. My Kudu cluster contains 3 masters and 3 tservers. Flag: 
> --catalog_manager_check_ts_count_for_create_table is false.
> Sometimes, one tserver maybe become unavailable. Table creating task will 
> retry continuously and always fail until the tserver become healthy again. 
> See the error:
> {color:#ff8b00}E0222 11:10:32.767140 3321 catalog_manager.cc:672] Error 
> processing pending assignments: Invalid argument: error selecting replicas 
> for tablet 41dffa9783f14f36a5b6c35e89075c1a, state:0: Not enough tablet 
> servers are online for table 'test_table'. Need at least 3 replicas, but only 
> 2 tablet servers are available{color}
> {color:#172b4d}As there are no enough replicas, a tablet will never be 
> created. The state of this tablet is not running. Therefore, read or write 
> this tablet will fail even if there are 2 tservers can be used to create 2 
> replicas.{color}
>  
> An already created tablet can still be on service even if one of its 3 
> replicas become unavailable. Why can not create a three-replicas table when 
> only 2 tservers healthy?
>  
> Besides, a validate table creating task will be affected by another 
> invalidate tasks. In the upper example, a table creating task with RF=1 will 
> still not succeed even if there exists more than one alive tablet servers. 
> Because the background task manager will break the whole process when finds a 
> tablet creating task failed and begin a new process to try to execute all 
> tasks.
>  
>  
> h1. Design
> A new flag: --support_create_tablet_without_enough_healthy_tservers is added. 
> The original logic keeps the same. When this flag is set true, a 
> three-replicas tablet can be created successfully and its status is losing 
> one replica. This tablet can be be read and write normally.
>  
> There are 3 things need to do:
>  # A tool to cancel the table creating task.
>  # A tool to show the running table creating task.
>  # A method to create table without enough healthy tservers.
>  # make invalidate table creating task not affected by other invalidate tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to