[ 
https://issues.apache.org/jira/browse/HDDS-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved HDDS-8173.
------------------------------
    Fix Version/s: 1.4.0
       Resolution: Fixed

The pull request is now merged.  Thanks, [~hemantk]!

> SchemaV3 RocksDB entries are not removed after container delete
> ---------------------------------------------------------------
>
>                 Key: HDDS-8173
>                 URL: https://issues.apache.org/jira/browse/HDDS-8173
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: db
>            Reporter: Christos Bisias
>            Assignee: Hemant Kumar
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>         Attachments: rocksDBContainerDelete.diff
>
>
> After deleting a container, all RocksDB entries for that container should be 
> deleted from RocksDB. Metadata and block data are still on the DB intact.
>  
> Problem appears to stem from the call to 
> {code:java}
> BlockUtils.removeContainerFromDB(containerData, conf){code}
> and does not clear entries in datanode SchemaV3 rocksDb for the given 
> container id.
>  
> We can reproduce this issue on a docker cluster as follows: 
>  * start a docker cluster with 5 datanodes
>  * put a key under a bucket to create a container
>  * close the container
>  * put 2 datanodes that the container has replicas, on decommission
>  * recommission the datanodes
>  * container should be over-replicated
>  * ReplicationManager should issue a container delete for 2 datanodes
>  * Check one of the two datanodes
>  * Container should be deleted
>  * Check RocksDB block data entries for the container 
>  
> on {color:#00875a}master{color}, ozone root
>  
> {code:java}
> ❯ cd hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone {code}
> edit {color:#00875a}docker-config {color}and add below two configs needed for 
> decommission
>  
>  
> {code:java}
> OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm
> OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm {code}
> start Ozone cluster with 5 datanodes, connect to scm and create a key
>  
> {code:java}
> ❯ docker-compose up --scale datanode=5 -d
> ❯ docker exec -it ozone_scm_1 bash
> bash-4.2$ ozone sh volume create /vol1   
> bash-4.2$ ozone sh bucket create /vol1/bucket1
> bash-4.2$ ozone sh key put /vol1/bucket1/key1 /etc/hosts{code}
> close the container and check on which datanodes it's on
> {code:java}
> bash-4.2$ ozone admin container close 1
> bash-4.2$ ozone admin container info 1 
> ...
> {code}
> check scm roles, to get scm IP and port
>  
> {code:java}
> bash-4.2$ ozone admin scm roles
> 99960cfeda73:9894:LEADER:62393063-a1e0-4d5e-bcf5-938cf09a9511:172.25.0.4 
> {code}
>  
> check datanode list, to get IP and hostname for 2 datanodes the container is 
> on
>  
> {code:java}
> bash-4.2$ ozone admin datanode list
> ... {code}
> place both datanodes on decommission
>  
>  
> {code:java}
> bash-4.2$ ozone admin datanode decommission -id=scmservice 
> --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname> {code}
> wait until both datanodes are decommissioned, at that point if we check the 
> container's info we can see that it has replicas placed upon other datanodes 
> as well
>  
>  
> recommission both datanodes
>  
> {code:java}
> bash-4.2$ ozone admin datanode recommission -id=scmservice 
> --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname>  {code}
> After a few minutes, on scm logs
>  
> {code:java}
> 2023-03-15 18:24:53,810 [ReplicationMonitor] INFO 
> replication.LegacyReplicationManager: Container #1 is over replicated. 
> Expected replica count is 3, but found 5.
> 2023-03-15 18:24:53,810 [ReplicationMonitor] INFO 
> replication.LegacyReplicationManager: Sending delete container command for 
> container #1 to datanode 
> d6461c13-c2fa-4437-94f5-f75010a49069(ozone_datanode_2.ozone_default/172.25.0.11)
> 2023-03-15 18:24:53,811 [ReplicationMonitor] INFO 
> replication.LegacyReplicationManager: Sending delete container command for 
> container #1 to datanode 
> 6b077eea-543b-47ca-abf2-45f26c106903(ozone_datanode_5.ozone_default/172.25.0.6)
>  {code}
> connect to one of the datanodes, the container is being deleted
>  
> check that the container is deleted
> {code:java}
> bash-4.2$ ls 
> /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/current/containerDir0/
> bash-4.2$  {code}
> check RocksDB
> {code:java}
> bash-4.2$ ozone debug ldb --db 
> /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/DS-a8a72696-e4cf-42a6-a66c-04f0b614fde4/container.db
>  scan --column-family=block_data {code}
> Block data for the deleted container are still there
> {code:java}
>   "blockID": {
>     "containerBlockID": {
>       "containerID": 1,
>       "localID": 111677748019200001 {code}
> {color:#00875a}metadata{color} and {color:#00875a}block_data{color} still 
> have the entries while {color:#00875a}deleted_blocks{color} and 
> {color:#00875a}delete_txns{color} are empty.
>  
> I've also attached a diff with a test added under 
> {color:#00875a}TestContainerPersistence{color}, that verifies above issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to