[ https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030676#comment-16030676 ]
Weiwei Yang commented on HDFS-11886: ------------------------------------ Thanks [~vagarychen] for raising up this problem and [~anu] for the design doc. Let me know if I understand this correctly. The proposal adds a *ksm-keys-under-progress.db* in KSM, only if all the steps finish successfully, a key is moved from *ksm-keys-under-progress.db* to *ksm.db*. This introduces more times of writes to disk # put key to inprogress db -> add key # delete key in inprogress db -> commit key 1 # add key to ksm db -> commit key 2 do we really need to persist this? Can we store the state in memory only? Only if all succeed, commit this to *ksm.db*, otherwise dispose it. If KSM crashed before a key is committed, that key won't be written to KSM namespace because that cache after KSM restart will be gone. This is like a write-cache in front of ksm.db. Another question: why we need to return a flag to OzoneHandler to determine if a container needs to be created? I am wondering why we need these additional RPC calls, why not let SCM creates the container on datanodes if necessary and simply return client an open container. Thanks. > Ozone : improve error handling for putkey operation > --------------------------------------------------- > > Key: HDFS-11886 > URL: https://issues.apache.org/jira/browse/HDFS-11886 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Reporter: Chen Liang > Attachments: design-notes-putkey.pdf > > > Ozone's putKey operations involve a couple steps: > 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore > 2. allocatedBlock gets returned to client, client checks to see if container > needs to be created on datanode, if yes, create the container > 3. writes the data to container. > it is possible that 1 succeeded, but 2 or 3 failed, in this case there will > be an entry in KSM's local metastore, but the key is actually nowhere to be > found. We need to revert 1 is 2 or 3 failed in this case. > To resolve this, we need at least two things to be implemented first. > 1. We need deleteKey() to be added KSM first. > 2. We also need container reports to be implemented first such that SCM can > track whether the container is actually added. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org