Bram Schuur created HBASE-29784:
-----------------------------------

             Summary: DeleteFamilyVersion is not effectuated even though it is 
committed to WAL
                 Key: HBASE-29784
                 URL: https://issues.apache.org/jira/browse/HBASE-29784
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 2.6.3
         Environment: JDK: 21.0.9
HBAse: 2.6.3
Hadoop: 3.4.2
Arch: x86
OS: Containerized linux
            Reporter: Bram Schuur


We are running hbase 2.6.3 as a datastore, sometimes we wipe data through 
DeleteFamilyVersion. Every now and then (intermittent, non-deterministic), the 
hbase database somehow forgets about a 'DeleteFamilyVersion' that we emitted 
for a row, making the data we meant to erase to appear again.

We started capturing more extensive WAL logs for our regions, which shows the 
DeleteFamilyVersion we emit is committed to WAL, however the data is still 
visible through the api after flushing/compaction of the region. There are no 
errors in the logs.

Below a snippet of the data we traced:

Data as queried from the hbase api:

{code}
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:\x00/1765693071241000000/Put/vlen=1/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:description/1765693071241000000/Put/vlen=4/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:domainIdentifier/1765693071241000000/Put/vlen=60/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:domainName/1765693071241000000/Put/vlen=8/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:identifiers/1765693071241000000/Put/vlen=94/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:lastUpdateTimestamp/1765693071241000000/Put/vlen=8/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:layerIdentifier/1765693071241000000/Put/vlen=39/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:layerName/1765693071241000000/Put/vlen=12/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:name/1765693071241000000/Put/vlen=26/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:order/1765693071241000000/Put/vlen=10/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:properties/1765693071241000000/Put/vlen=208/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:tags/1765693071241000000/Put/vlen=184/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:typeIdentifier/1765693071241000000/Put/vlen=72/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:typeName/1765693071241000000/Put/vlen=10/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:~\x00/1765693071241000000/Put/vlen=11/seqid=0
\x00\x00\xC7\xEBs\xFB\xCA\xDA/cf:~e\x06SYNCED\x00\x00\xDA\xB7\xE2\xD3\x8FW/1765693071241000000/Put/vlen=16/seqid=0
{code}

Data in captured WAL:
{code}
...
Sequence=10628094, table=sg__default__vertices, 
region=834ed0ff02e8d7d42b88ad5666a4b1e8, at write timestamp=Sun Dec 14 06:17:51 
UTC 2025
...
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:~\x00, 
timestamp=1765693071241000000, type=Put
    value: \x03\x01Componen\xF4
cell total size sum: 96
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:domainIdentifier, 
timestamp=1765693071241000000, type=Put
    value: 
\x02\x03\x01urn:stackpack:stackstate-k8s-agent-v2:shared:domain:agen\xF4
cell total size sum: 160
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:identifiers, 
timestamp=1765693071241000000, type=Put
    value: 
\x02!\x01\x01\x03\x01\xD7\x01urn:process:/i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.io:68116:1765692998000
cell total size sum: 184
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:domainName, 
timestamp=1765693071241000000, type=Put
    value: \x02\x03\x01Agen\xF4
cell total size sum: 96
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:typeName, 
timestamp=1765693071241000000, type=Put
    value: \x02\x03\x01proces\xF3
cell total size sum: 96
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:name, 
timestamp=1765693071241000000, type=Put
    value: \x02\x03\x01containerd-shim-runc-v\xB2
cell total size sum: 112
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:description, 
timestamp=1765693071241000000, type=Put
    value: \x02\x03\x01\x81
cell total size sum: 96
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:typeIdentifier, 
timestamp=1765693071241000000, type=Put
    value: 
\x02\x03\x01\xC4\x01urn:stackpack:stackstate-k8s-agent-v2:shared:component-type:process
cell total size sum: 168
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:layerIdentifier, 
timestamp=1765693071241000000, type=Put
    value: \x02\x03\x01urn:stackpack:common:layer:processe\xF3
cell total size sum: 136
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:layerName, 
timestamp=1765693071241000000, type=Put
    value: \x02\x03\x01Processe\xF3
cell total size sum: 104
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:properties, 
timestamp=1765693071241000000, type=Put
    value: \x02 
\x01\x04\x03\x01hos\xF4\x03\x01i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.i\xEF\x03\x01external_i\xE4\x03\x01\xD7\x01urn:process:/i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.io:68116:1765692998000\x03\x01pi\xE4\x03\x016811\xB6\x03\x01create_tim\xE5\x03\x01176569299800\xB0
cell total size sum: 296
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:tags, 
timestamp=1765693071241000000, type=Put
    value: 
\x02!\x01\x07\x03\x01host:i-06fb48dc80ed9944b-preprod-dev.preprod.stackstate.i\xEF\x03\x01stackpack:agen\xF4\x03\x01pid:6811\xB6\x03\x01user:roo\xF4\x03\x01os:linu\xF8\x03\x01command:/usr/bin/containerd-shim-runc-v\xB2\x03\x01process_category:executabl\xE5
cell total size sum: 272
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:order, 
timestamp=1765693071241000000, type=Put
    value: \x02\x0A\x00\x00\x00\x00\x00\x00\x00\x00
cell total size sum: 96
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:lastUpdateTimestamp, 
timestamp=1765693071241000000, type=Put
    value: \x02\x09\x92\xFE\x90\xB8\xE3f
cell total size sum: 112
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, 
column=cf:~e\x06SYNCED\x00\x00\xDA\xB7\xE2\xD3\x8FW, 
timestamp=1765693071241000000, type=Put
    value: \x01\x06Synced\x00\x00x\x87\xBA\xDE\xF7\xFE
cell total size sum: 112
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:\x00, 
timestamp=1765693071241000000, type=Put
    value: \x01
cell total size sum: 80
...
position: 1481623
...
Sequence=10628100, table=sg__default__vertices, 
region=834ed0ff02e8d7d42b88ad5666a4b1e8, at write timestamp=Sun Dec 14 06:17:51 
UTC 2025
...
row=\x00\x00\xC7\xEBs\xFB\xCA\xDA, column=cf:, timestamp=1765693071241000000, 
type=DeleteFamilyVersion
    value: 
cell total size sum: 80
...
position: 1531651

{code}

What could be the cause? I check the bugtracker but found nothing 
resembling/matching our symptomps. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to