[ 
https://issues.apache.org/jira/browse/HDFS-12283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131191#comment-16131191
 ] 

Anu Engineer commented on HDFS-12283:
-------------------------------------

Hi [~yuanbo] Thanks for getting this done. This is a critical part of Ozone and 
I really appreciate your work on this. Sorry, my comments are a little late. My 
are comments on v3 of the patch.

*  {{DeletedBlockLogImpl.java: Line 53}} #LATEST_TXID#  -- Do we need this 
since this table will have nothing but TXIDs , do we need to keep this under 
this specific key. In other words, since TXIDs are a monotonically increasing 
number and the key is ordered by TXIDs, I am thinking that just seeking to the 
last key is all you really need. Please let me know what I am missing here. if 
you are doing this because {{getRangeKVs}} don't support last keys, then we 
should fix that. I know that [~cheersyang] suggested this change in an earlier 
review, but in my mind, RocksDB and LevelDB are simple trees. So seekToEnd 
should be relatively efficient and I hope both the DBs have that interface.
* {{DeletedBlockLogImpl.java#commitTransactions}}  This is a hypothetical 
question. During the commitTransaction call, imagine one of the TXIDs is 
invalid. Does it abort the whole list or do you want to catch an exception for 
that TXID and continue? Not really sure if that can ever happen or if we should 
be worried about it.

* {{DeleteBlockLogImpl.java#incrementCount}}  Would like to understand what 
happens after we set this to -1;
{code}
// if the retry time exceeds the MAX_RETRY_TIME value
// then set the retry value to -1, stop retrying.
if (block.getCount() > maxRetry) {
          block.setCount(-1);
}
{code}
For the sake of argument, let us say we hit this line, and you have set the 
block.count to -1.What do we do now? Never delete the block? what if the 
machines were switched off and came back up after you set this to -1. Can I 
suggest that we start warning once it is above the maxRetry limit and write a 
log statement for every maxRetry-th hit. For example, if maxRetry is 5, we log 
a warning on 5th try, 10th try and 15th try. But never set this value to -1 and 
never give up. So to summarize, While I fully understand why you are setting 
the value to -1, It is not clear to me what happens after that.

* May be we need to add an option in SCMCLI to purge these entries -- perhaps 
file a JIRA for that. 
* {{addTransactions(Map<String, List<String>> blockMap)}} -- in this call is 
there a size limit to the list of blocks in the argument. We might want to keep 
that 50K or something like that.

* {{addTransactions}} -- batch.put(LATEST_TXID, Longs.toByteArray(latestId)); 
-- if we are seeking to the last key via a seek method, we will not need to do 
this. Please note, I am not asking if we can iterate to the last key, I want to 
seek to the last key. In terms of a tree, what I am imaging is that we will 
only walk a couple of nodes that point to the larger values of the tree.


* {{addTransactions}} I think there is a small ordering error here. Please do 
correct me if I am wrong.
{code}
215       deletedStore.writeBatch(batch);
216       this.latestTxid = latestId;
{code}
Imagine this scenario right after we execute line 215 ( I am assuming that is a 
flush to disk for time being), and the execution fails. When you come back into 
this function next time, we do this.
{code}
206 long latestId = this.latestTxid;
{code}
Now if the txid we wrote to the database was say; 105, and before we can update 
{{this.latestTxid}} we failed. We will end up reusing the TXIDs, which means 
that we will lose some TXIDs or the value part of some TXIDs. It is like going 
back in time.The fix might be to change the order of these lines.
{code}
215       this.latestTxid = latestId;
216       deletedStore.writeBatch(batch);
{code}
With this order, worse that can happen is that our TXIDs can have holes since 
we updated {{this.latestTxid}} 

* ozone.scm.block.deletion.max.retry -- use this config for logging warnings ? 
* In test -- add a testCommitTransactions -- with one TXID being invalid ? 

Nits: 
* line 62: typo or rename : largerThanLaestReadTxid --> 
largerThanLatestReadTxid or getNextTxid 
* line 158: Comments seems to refer to MAX_RETRY_TIME {{incrementCount}}, but 
no such variable exists.

> Ozone: DeleteKey-5: Implement SCM DeletedBlockLog
> -------------------------------------------------
>
>                 Key: HDFS-12283
>                 URL: https://issues.apache.org/jira/browse/HDFS-12283
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone, scm
>            Reporter: Weiwei Yang
>            Assignee: Yuanbo Liu
>         Attachments: HDFS-12283.001.patch, HDFS-12283-HDFS-7240.001.patch, 
> HDFS-12283-HDFS-7240.002.patch, HDFS-12283-HDFS-7240.003.patch
>
>
> The DeletedBlockLog is a persisted log in SCM to keep tracking container 
> blocks which are under deletion. It maintains info about under-deletion 
> container blocks that notified by KSM, and the state how it is processed. We 
> can use RocksDB to implement the 1st version of the log, the schema looks like
> ||TxID||ContainerName||Block List||ProcessedCount||
> |0|c1|b1,b2,b3|0|
> |1|c2|b1|3|
> |2|c2|b2, b3|-1|
> Some explanations
> # TxID is an incremental long value transaction ID for ONE container and 
> multiple blocks
> # Container name is the name of the container
> # Block list is a list of block IDs
> # ProcessedCount is the number of times SCM has sent this record to datanode, 
> it represents the "state" of the transaction, it is in range of \[-1, 5\], -1 
> means the transaction eventually failed after some retries, 5 is the max 
> number times of retries.
> We need to define {{DeletedBlockLog}} as an interface and implement this with 
> RocksDB {{MetadataStore}} as the first version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to