[ 
https://issues.apache.org/jira/browse/KUDU-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yejiabao_h updated KUDU-3325:
-----------------------------
    Attachment: image-2021-10-06-19-23-51-769.png

> When wal is deleted, fault recovery and load balancing are abnormal
> -------------------------------------------------------------------
>
>                 Key: KUDU-3325
>                 URL: https://issues.apache.org/jira/browse/KUDU-3325
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>            Reporter: yejiabao_h
>            Priority: Major
>         Attachments: image-2021-10-06-15-36-40-996.png, 
> image-2021-10-06-15-36-53-813.png, image-2021-10-06-15-37-09-520.png, 
> image-2021-10-06-15-37-24-776.png, image-2021-10-06-15-37-42-533.png, 
> image-2021-10-06-15-37-54-782.png, image-2021-10-06-15-38-06-575.png, 
> image-2021-10-06-15-38-17-388.png, image-2021-10-06-15-38-29-176.png, 
> image-2021-10-06-15-38-39-852.png, image-2021-10-06-15-38-53-343.png, 
> image-2021-10-06-15-39-03-296.png, image-2021-10-06-19-23-51-769.png
>
>
> h3. 1、using kudu leader step down to create multiple wal message
> ./kudu tablet leader_step_down  $MASTER_IP   1299f5a939d2453c83104a6db0cae3e7 
> h4. wal
> !image-2021-10-06-15-36-40-996.png!
> h4. cmeta
> !image-2021-10-06-15-36-53-813.png!
> h3. 2、stop one of tserver to start tablet recovery,so that we can make 
> opid_index flush to cmeta
> !image-2021-10-06-15-37-09-520.png!
> h4. wal
> !image-2021-10-06-15-37-24-776.png!
> h4. cmeta
> !image-2021-10-06-15-37-42-533.png!
> h3. 3、stop all tservers,and delete tablet wal
> !image-2021-10-06-15-37-54-782.png!
> h3. 4、start all tservers
> we can see the index in wal starts counting from 1, but the opid_index 
> recorded in cmeta is the value 20 which is before deleting wal
>  
> h4. wal
> !image-2021-10-06-15-38-06-575.png!
>  
> h4. cmeta
> !image-2021-10-06-15-38-17-388.png!
>  
> h3. 5、stop a tserver,trigger fault recovery
> !image-2021-10-06-15-38-29-176.png!
> when the leader recovery a replica, and master request change raft config to 
> add the new replica to new raft config, leader replica while ignored because 
> the opindex is smaller than that in cmeta.
>  
> h3. 6、delete all wals
> !image-2021-10-06-15-38-39-852.png!
> h3. 7、kudu cluster rebalance
> ./kudu cluster rebalance $MASTER_IP
> !image-2021-10-06-15-38-53-343.png!
> !image-2021-10-06-15-39-03-296.png!
> rebalance is also failed when change raft config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to