[ 
https://issues.apache.org/jira/browse/HDFS-11886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030685#comment-16030685
 ] 

Anu Engineer commented on HDFS-11886:
-------------------------------------

bq. do we really need to persist this? Can we store the state in memory only? 
Only if all succeed, commit this to ksm.db, otherwise dispose it. 
I am fine with doing that and It does make for a simpler system. Thanks for 
pointing it out.  There are 2 things that we must keep in mind.
# In the long run, we will have KSM HA, that is KSM state will be replicated 
via RAFT(Ratis). This means that you have to write this data to a log so if the 
active KSM dies, a follower is able to pick up the state.
# Again in future, we will have to support an API which allows end users to put 
single files that is larger than 5 GB. This will be achieved via putting 
multiple files into the system and using an API to connect these independent 
uploads into a single file. While these independent uploads are going on, 
saving that state might be useful.

But as I said, in the short term it is ok to do what you are suggesting, but in 
the long run we will have to persist this data. Please note that we have 1 I/O 
for a full block write, assuming a block a size of 256 MB, KSM is doing one 
local I/O for each 256 MB * 3 I/O on datanodes. Yes, it can be expensive, but 
nothing more than what HDFS does today.


bq. Another question: why we need to return a flag to OzoneHandler to determine 
if a container needs to be created? I

We have debated this a lot and gone both ways, originally we had SCM talk to 
datanodes to create the containers. Then we realized that reason about the 
state of SCM <->Datanode is more complex if SCM is actively connecting to 
datanodes. Hence we borrowed the model that is used in HDFS.

That is datanodes talk to SCM via heartbeats and all commands that SCM wants to 
send is done via HB response. The client who wants to write to a block ends up 
creating a block in HDFS world, so in ozone world, the ozone handler acts as 
the DFSclient of HDFS world.

bq. ksm-keys-under-progress.db to ksm.db. This introduces more times of writes 
to disk

Agree, in fact, [~xyao] suggested that we use the same database with a flag. 
This has the advantage the we use a LSM tree (levelDB) where these writes are 
optimized.


> Ozone : improve error handling for putkey operation
> ---------------------------------------------------
>
>                 Key: HDFS-11886
>                 URL: https://issues.apache.org/jira/browse/HDFS-11886
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Chen Liang
>         Attachments: design-notes-putkey.pdf
>
>
> Ozone's putKey operations involve a couple steps:
> 1. KSM calls allocateBlock to SCM, writes this info to KSM's local metastore
> 2. allocatedBlock gets returned to client, client checks to see if container 
> needs to be created on datanode, if yes, create the container
> 3. writes the data to container.
> it is possible that 1 succeeded, but 2 or 3 failed, in this case there will 
> be an entry in KSM's local metastore, but the key is actually nowhere to be 
> found. We need to revert 1 is 2 or 3 failed in this case. 
> To resolve this, we need at least two things to be implemented first.
> 1. We need deleteKey() to be added KSM first. 
> 2. We also need container reports to be implemented first such that SCM can 
> track whether the container is actually added.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to