[ https://issues.apache.org/jira/browse/HDFS-11888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16135866#comment-16135866 ]
Anu Engineer commented on HDFS-11888: ------------------------------------- [~xyao] Thanks for creating this patch. I think this is a brilliant patch and Node management, datanode state machines etc. should all move into this pattern. I had considered this pattern and was worried about the state explosion, but now that I am reading the code it is so much simpler to read and understand, so thank you for the patch. Some very minor code comments follow. * {{BlockContainerInfo#addUsed}} -- Should we rename this ? Since it is possible to delete blocks? And may be changeUsed -- so you can pass a negative arg via size or just add one more function. * {{BlockManagerImpl#loadAllocatedContainers}} LOG.warn("Container {} allocated by block service can't be found in SCM", containerName); Should we log this as an Error? I like the fact that we can continue, but may it is a more significant issue? * {{BlockManagerImpl#loadAllocatedContainers}} LOG.warn("Failed loading open container, continue next..."); This error message seems misleading, same with messages below where it talks about open containers. We are opening all Allocated containers now, right? * {{BlockManagerImpl#updateContainer}} nit: There is a commented out if statement, which I think you can remove. * {{BlockManagerImpl#updateContainer}} nit: can we please a comment here, that we are relying on the lock in the {{allocateBlock}} function to make sure that {{contianers}} map remains consistent. * {{BlockManagerImpl#allocateBlock}} Line 325: Can we please add a logging statement here so that we know why the IOException happened ? or return the ex to the user via making SCMException a wrapper to the IOException we got? I just want some way to see the original exception. * {{containerManager.getContainer}} -- Can this function ever return null ? More of a question. We seem to use this with possible return of null and assuming the return will be valid. {code} ContainerInfo container = containerManager.getContainer(containerName); if (container == null) { LOG.warn("Container {} allocated by block service" + "can't be found in SCM", containerName); return true; } {code} in Line 406/407 {code} return containerManager.getContainer( DFSUtil.bytes2String(containerBytes)).getPipeline(); {code} * {{BlockManagerImpl#deleteBlock}} -- Do we need to update the ContainerInfo usedSize ? * {{ContainerInfo#setState}} Instead of Time.now() can we please use {{this.stateEnterTime = Time.monotonicNow()}}. I am just worried that in places wit Daylight Saving you will see the clock wind backwards. * {{ContainerMapping- ctor}} while it is very clear for me when I read code (probably because I did the code review for the state machine) it might be a good idea to write some comments about the state machine. Especially some comments on final states and transitions. Understanding this state machine is going to be vital in understanding container management in SCM. Even over commenting this would not hurt us. * {{ContianerMapping#allocateContainer Line 231}} Replace Time.now ==>{{.setStateEnterTime(Time.monotonicNow())}} * {{ContianerMapping#deleteContainer}} -- Is it possible we need any changes to state during this call ? Does the state move to deleted ? Should this call fire an event ? More of question than an assertion. > Ozone: SCM: use container state machine for open containers allocated for > key/blcoks > ------------------------------------------------------------------------------------- > > Key: HDFS-11888 > URL: https://issues.apache.org/jira/browse/HDFS-11888 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone > Affects Versions: HDFS-7240 > Reporter: Xiaoyu Yao > Assignee: Xiaoyu Yao > Attachments: HDFS-11888-HDFS-7240.001.patch > > > SCM BlockManager provision a pool of containers upon block creation request > that can't be satisfied with current open containers in the pool. However, > only one container is returned with the creationFlag to the client. The other > container provisioned in the same batch will not have this flag. Client can't > assume to use these containers that has not been created on SCM datanodes, > This ticket is opened to fix the issue by persist the createContainerNeeded > flag for the provisioned containers. The flag will be eventually cleared by > processing container report from datanode when the container report handler > is fully implemented on SCM. > For now, we will use a default batch size of 1 for > ozone.scm.container.provision_batch_size so that the container will be > created on demand upon the first block allocation into the container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org