[jira] [Commented] (HDFS-11888) Ozone: SCM: use container state machine for open containers allocated for key/blcoks

Anu Engineer (JIRA) Mon, 21 Aug 2017 14:29:40 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135866#comment-16135866
 ]


Anu Engineer commented on HDFS-11888:
-------------------------------------

[~xyao] Thanks for creating this patch. I think this is a brilliant patch and 
Node management, datanode state machines etc. should all move into this 
pattern. I had considered this pattern and was worried about the state 
explosion, but now that I am reading the code it is so much simpler to read and 
understand, so thank you for the patch. Some very minor code comments follow.

* {{BlockContainerInfo#addUsed}} -- Should we rename this ? Since it is 
possible to delete blocks? And may be changeUsed -- so you can pass a negative 
arg via size or just add one more function.

* {{BlockManagerImpl#loadAllocatedContainers}}
LOG.warn("Container {} allocated by block service can't be found in SCM", 
containerName);
Should we log this as an Error? I like the fact that we can continue, but may 
it is a more significant issue?  

* {{BlockManagerImpl#loadAllocatedContainers}} LOG.warn("Failed loading open 
container, continue next...");
This error message seems misleading, same with messages below where it talks 
about open containers. We are opening all Allocated containers now, right? 

* {{BlockManagerImpl#updateContainer}} nit: There is a commented out if 
statement, which I think you can remove.

* {{BlockManagerImpl#updateContainer}} nit: can we please a comment here, that 
we are relying on the lock in the {{allocateBlock}} function to make sure that  
{{contianers}} map remains consistent.

* {{BlockManagerImpl#allocateBlock}} Line 325: Can we please add a logging 
statement here so that we know why the IOException happened ? or return the ex 
to the user via making SCMException a wrapper to the IOException we got?  I 
just want some way to see the original exception.

* {{containerManager.getContainer}} -- Can this function ever return null ?  
More of a question. We seem to use this with possible return of null and 
assuming the return will be valid. {code}
 ContainerInfo container =
              containerManager.getContainer(containerName);
          if (container == null) {
            LOG.warn("Container {} allocated by block service" +
                "can't be found in SCM", containerName);
            return true;
          }
{code}
in Line 406/407
{code}
 return containerManager.getContainer(
          DFSUtil.bytes2String(containerBytes)).getPipeline();
{code}
* {{BlockManagerImpl#deleteBlock}} -- Do we need to update the ContainerInfo 
usedSize ? 
* {{ContainerInfo#setState}} Instead of Time.now() can we please use 
{{this.stateEnterTime = Time.monotonicNow()}}. I am just worried that in places 
wit Daylight Saving you will see the clock wind backwards.
* {{ContainerMapping- ctor}} while it is very clear for me  when I read code 
(probably because I did the code review for the state machine) it might be a 
good idea to write some comments about the state machine. Especially some 
comments on final states and transitions.  Understanding this state machine is 
going to be vital in understanding container management in SCM. Even over 
commenting this would not hurt us.
* {{ContianerMapping#allocateContainer Line 231}} Replace Time.now 
==>{{.setStateEnterTime(Time.monotonicNow())}}
* {{ContianerMapping#deleteContainer}} -- Is it possible we need any changes to 
state during this call ? Does the state move to deleted ? Should this call fire 
an event ? More of question than an assertion.





> Ozone: SCM: use container state machine for open containers allocated for 
> key/blcoks 
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-11888
>                 URL: https://issues.apache.org/jira/browse/HDFS-11888
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-11888-HDFS-7240.001.patch
>
>
> SCM BlockManager provision a pool of containers upon block creation request 
> that can't be satisfied with current open containers in the pool. However, 
> only one container is returned with the creationFlag to the client. The other 
> container provisioned in the same batch will not have this flag. Client can't 
> assume to use these containers that has not been created on SCM datanodes, 
> This ticket is opened to fix the issue by persist the createContainerNeeded 
> flag for the provisioned containers. The flag will be eventually cleared by 
> processing container report from datanode when the container report handler 
> is fully implemented on SCM. 
> For now, we will use a default batch size of 1 for 
> ozone.scm.container.provision_batch_size so that the container will be 
> created on demand upon the first block allocation into the container. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11888) Ozone: SCM: use container state machine for open containers allocated for key/blcoks

Reply via email to