Benoit Sigoure created HBASE-29108: -------------------------------------- Summary: regionserver does not cleanup storefiles written to .tmp directory when failing to close the storefiles during compaction Key: HBASE-29108 URL: https://issues.apache.org/jira/browse/HBASE-29108 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 2.5.10 Reporter: Benoit Sigoure
Background: * When hbase performs a compaction, it writes the compaction result (1 or more storefiles) to a file in HDFS under {{/hbase/data/<namespace>/<table>/<region>/{*}.tmp{*}/<columnfamily>/<storefile>}} * Once the compaction succeeds, the storefile is renamed to {{/hbase/data/<namespace>/<table>/<region>/<columnfamily>/<storefile>}} (moved out of the .tmp directory to where storefiles should be stored and are read to serve client RPC's) * When compaction fails, in some cases cleanup is performed and the storefile under {{.tmp}} directory is cleaned up (deleted). However, in other cases the storefile is left to be under {{.tmp}} directory (e.g. when one of the datanodes where the storefile's last block was being written gets {{{}SIGKILL{}}}'ed) Problem * In certain cases, a storefile under {{.tmp}} will contain 1 corrupt block replica and 1 good block replica (e.g. one replica will be corrupt due to reason {{GENSTAMP_MISMATCH}} - generation stamp differs AND/OR have a file length lower than the good replica's file lengths). Namenode will detect this block corruption and report it in its metrics * The corrupt blocks will remain corrupt and the good block replica will not be re-replicated to other datanodes to fix the corruption. * The storefile under .tmp remains "open" / under-construction by the regionserver. Impact * *No* visible impact on hbase clients (storefiles under {{.tmp}} are not read to return data to clients). * This can trip up alerts & monitoring (there are corrupt blocks being reported by namenode that do not fix themselves until regions are reopened/regionservers restart) * Decommissioning of datanodes can get blocked indefinitely (a block that contains a corrupt replica but is part of a file that is still open does not get re-replicated to other datanodes even if a good replica is available, thus the datanode that has the only good replica of a block cannot be decommissioned) Workaround * A region can be re-opened (e.g. by restarting a regionserver on which the region is open), which causes the region's {{.tmp}} directory to be deleted recursively once the region is opened again, removing all corrupt blocks and leftover storefiles. This bug report was written by Tomas Baltrunas at Arista. -- This message was sent by Atlassian Jira (v8.20.10#820010)