Rui Wang created HDDS-4708:
------------------------------

             Summary: Optimization: update RetryCount less frequently (update 
once per ~100)
                 Key: HDDS-4708
                 URL: https://issues.apache.org/jira/browse/HDDS-4708
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
            Reporter: Rui Wang
            Assignee: Rui Wang


SCM maintains a DeleteBlockTransaction table [1]. For each transaction record 
in this table, there is a retry count [2]. This retry count increases every 
time when SCM retries the delete transaction and until it exceeds the maximum 
limit, then SCM stops retrying and admin can analyze why some blocks fail to 
delete.

Because the count is written into DB every time upon retries, I want to discuss 
whether it is worth an optimization that we can maintain the retry count as an 
in-memory state and we only write to DB when the retry count exceeds the limit 
(thus to leave for further analysis).

The reason for this idea is in SCM HA we are replicating DB changes over Ratis, 
and still persist retry count for every increase will have 3x cost compared to 
now. 

The drawback of only updating retrycount at the limit is, if SCM restart at a 
time, the retry count will be cleared and restart to count.


[1]: 
https://github.com/apache/ozone/blob/master/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java#L70
[2]: 
https://github.com/apache/ozone/blob/master/hadoop-hdds/interface-server/src/main/proto/ScmServerDatanodeHeartbeatProtocol.proto#L331



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to