[ 
https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030676#comment-16030676
 ] 

Weiwei Yang commented on HDFS-11886:
------------------------------------

Thanks [~vagarychen] for raising up this problem and [~anu] for the design doc.
Let me know if I understand this correctly. The proposal adds a 
*ksm-keys-under-progress.db* in KSM, only if all the steps finish successfully, 
a key is moved from *ksm-keys-under-progress.db* to *ksm.db*. This introduces 
more times of writes to disk

  # put key to inprogress db -> add key
  # delete key in inprogress db -> commit key 1
  # add key to ksm db -> commit key 2

do we really need to persist this? Can we store the state in memory only? Only 
if all succeed, commit this to *ksm.db*, otherwise dispose it. If KSM crashed 
before a key is committed, that key won't be written to KSM namespace because 
that cache after KSM restart will be gone. This is like a write-cache in front 
of ksm.db.

Another question: why we need to return a flag to OzoneHandler to determine if 
a container needs to be created? I am wondering why we need these additional 
RPC calls, why not let SCM creates the container on datanodes if necessary and 
simply return client an open container.

Thanks.

> Ozone : improve error handling for putkey operation
> ---------------------------------------------------
>
>                 Key: HDFS-11886
>                 URL: https://issues.apache.org/jira/browse/HDFS-11886
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Chen Liang
>         Attachments: design-notes-putkey.pdf
>
>
> Ozone's putKey operations involve a couple steps:
> 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
> 2. allocatedBlock gets returned to client, client checks to see if container 
> needs to be created on datanode, if yes, create the container
> 3. writes the data to container.
> it is possible that 1 succeeded, but 2 or 3 failed, in this case there will 
> be an entry in KSM's local metastore, but the key is actually nowhere to be 
> found. We need to revert 1 is 2 or 3 failed in this case. 
> To resolve this, we need at least two things to be implemented first.
> 1. We need deleteKey() to be added KSM first. 
> 2. We also need container reports to be implemented first such that SCM can 
> track whether the container is actually added.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to