[ https://issues.apache.org/jira/browse/HDDS-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz-wo Sze resolved HDDS-8173. ------------------------------ Fix Version/s: 1.4.0 Resolution: Fixed The pull request is now merged. Thanks, [~hemantk]! > SchemaV3 RocksDB entries are not removed after container delete > --------------------------------------------------------------- > > Key: HDDS-8173 > URL: https://issues.apache.org/jira/browse/HDDS-8173 > Project: Apache Ozone > Issue Type: Bug > Components: db > Reporter: Christos Bisias > Assignee: Hemant Kumar > Priority: Critical > Labels: pull-request-available > Fix For: 1.4.0 > > Attachments: rocksDBContainerDelete.diff > > > After deleting a container, all RocksDB entries for that container should be > deleted from RocksDB. Metadata and block data are still on the DB intact. > > Problem appears to stem from the call to > {code:java} > BlockUtils.removeContainerFromDB(containerData, conf){code} > and does not clear entries in datanode SchemaV3 rocksDb for the given > container id. > > We can reproduce this issue on a docker cluster as follows: > * start a docker cluster with 5 datanodes > * put a key under a bucket to create a container > * close the container > * put 2 datanodes that the container has replicas, on decommission > * recommission the datanodes > * container should be over-replicated > * ReplicationManager should issue a container delete for 2 datanodes > * Check one of the two datanodes > * Container should be deleted > * Check RocksDB block data entries for the container > > on {color:#00875a}master{color}, ozone root > > {code:java} > ❯ cd hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone {code} > edit {color:#00875a}docker-config {color}and add below two configs needed for > decommission > > > {code:java} > OZONE-SITE.XML_ozone.scm.nodes.scmservice=scm > OZONE-SITE.XML_ozone.scm.address.scmservice.scm=scm {code} > start Ozone cluster with 5 datanodes, connect to scm and create a key > > {code:java} > ❯ docker-compose up --scale datanode=5 -d > ❯ docker exec -it ozone_scm_1 bash > bash-4.2$ ozone sh volume create /vol1 > bash-4.2$ ozone sh bucket create /vol1/bucket1 > bash-4.2$ ozone sh key put /vol1/bucket1/key1 /etc/hosts{code} > close the container and check on which datanodes it's on > {code:java} > bash-4.2$ ozone admin container close 1 > bash-4.2$ ozone admin container info 1 > ... > {code} > check scm roles, to get scm IP and port > > {code:java} > bash-4.2$ ozone admin scm roles > 99960cfeda73:9894:LEADER:62393063-a1e0-4d5e-bcf5-938cf09a9511:172.25.0.4 > {code} > > check datanode list, to get IP and hostname for 2 datanodes the container is > on > > {code:java} > bash-4.2$ ozone admin datanode list > ... {code} > place both datanodes on decommission > > > {code:java} > bash-4.2$ ozone admin datanode decommission -id=scmservice > --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname> {code} > wait until both datanodes are decommissioned, at that point if we check the > container's info we can see that it has replicas placed upon other datanodes > as well > > > recommission both datanodes > > {code:java} > bash-4.2$ ozone admin datanode recommission -id=scmservice > --scm=172.25.0.4:9894 <datanodeIP>/<datanodeHostname> {code} > After a few minutes, on scm logs > > {code:java} > 2023-03-15 18:24:53,810 [ReplicationMonitor] INFO > replication.LegacyReplicationManager: Container #1 is over replicated. > Expected replica count is 3, but found 5. > 2023-03-15 18:24:53,810 [ReplicationMonitor] INFO > replication.LegacyReplicationManager: Sending delete container command for > container #1 to datanode > d6461c13-c2fa-4437-94f5-f75010a49069(ozone_datanode_2.ozone_default/172.25.0.11) > 2023-03-15 18:24:53,811 [ReplicationMonitor] INFO > replication.LegacyReplicationManager: Sending delete container command for > container #1 to datanode > 6b077eea-543b-47ca-abf2-45f26c106903(ozone_datanode_5.ozone_default/172.25.0.6) > {code} > connect to one of the datanodes, the container is being deleted > > check that the container is deleted > {code:java} > bash-4.2$ ls > /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/current/containerDir0/ > bash-4.2$ {code} > check RocksDB > {code:java} > bash-4.2$ ozone debug ldb --db > /data/hdds/hdds/CID-ca9fef0f-9af2-4dbf-af02-388d624c2f10/DS-a8a72696-e4cf-42a6-a66c-04f0b614fde4/container.db > scan --column-family=block_data {code} > Block data for the deleted container are still there > {code:java} > "blockID": { > "containerBlockID": { > "containerID": 1, > "localID": 111677748019200001 {code} > {color:#00875a}metadata{color} and {color:#00875a}block_data{color} still > have the entries while {color:#00875a}deleted_blocks{color} and > {color:#00875a}delete_txns{color} are empty. > > I've also attached a diff with a test added under > {color:#00875a}TestContainerPersistence{color}, that verifies above issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org