[jira] [Commented] (HDFS-7470) SecondaryNameNode need twice memory when calling reloadFromImageFile

Chris Nauroth (JIRA) Thu, 11 Dec 2014 16:14:13 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243424#comment-14243424
 ]


Chris Nauroth commented on HDFS-7470:
-------------------------------------

Creating a new {{FSNamesystem}} instance without running the full shutdown 
sequence on the old one would create a risk of some dangerous side effects.

* A new namesystem lock instance would get created, and there would be no 
synchronization of multiple threads around this.  This could violate mutual 
exclusion.  Two different threads could hold the 2 different lock instances, 
and think that mutual exclusion has been enforced.
* We wouldn't reap background threads inside things like the {{BlockManager}} 
and {{CacheManager}}.  Over time, we'd slowly leak threads and eventually hit 
{{OutOfMemoryError}} conditions.
* I can't remember if we hold an open file descriptor on the edit log when 
running as SecondaryNameNode.  If we do, then discarding the old 
{{FSNamesystem}} without a proper shutdown would leak a file descriptor.

In general, there are widespread assumptions throughout the codebase that 
{{FSNamesystem}} is instantiated exactly once and retained for the entire 
process lifetime.

> SecondaryNameNode need twice memory when calling reloadFromImageFile
> --------------------------------------------------------------------
>
>                 Key: HDFS-7470
>                 URL: https://issues.apache.org/jira/browse/HDFS-7470
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: zhaoyunjiong
>            Assignee: zhaoyunjiong
>         Attachments: HDFS-7470.1.patch, HDFS-7470.patch
>
>
> histo information at 2014-12-02 01:19
> {quote}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     186449630    19326123016  [Ljava.lang.Object;
>    2:     157366649    15107198304  
> org.apache.hadoop.hdfs.server.namenode.INodeFile
>    3:     183409030    11738177920  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>    4:     157358401     5244264024  
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>    5:             3     3489661000  
> [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>    6:      29253275     1872719664  [B
>    7:       3230821      284312248  
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory
>    8:       2756284      110251360  java.util.ArrayList
>    9:        469158       22519584  org.apache.hadoop.fs.permission.AclEntry
>   10:           847       17133032  [Ljava.util.HashMap$Entry;
>   11:        188471       17059632  [C
>   12:        314614       10067656  
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>   13:        234579        9383160  
> com.google.common.collect.RegularImmutableList
>   14:         49584        6850280  <constMethodKlass>
>   15:         49584        6356704  <methodKlass>
>   16:        187270        5992640  java.lang.String
>   17:        234579        5629896  
> org.apache.hadoop.hdfs.server.namenode.AclFeature
> {quote}
> histo information at 2014-12-02 01:32
> {quote}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     355838051    35566651032  [Ljava.lang.Object;
>    2:     302272758    29018184768  
> org.apache.hadoop.hdfs.server.namenode.INodeFile
>    3:     352500723    22560046272  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>    4:     302264510    10075087952  
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>    5:     177120233     9374983920  [B
>    6:             3     3489661000  
> [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>    7:       6191688      544868544  
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory
>    8:       2799256      111970240  java.util.ArrayList
>    9:        890728       42754944  org.apache.hadoop.fs.permission.AclEntry
>   10:        330986       29974408  [C
>   11:        596871       19099880  
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>   12:        445364       17814560  
> com.google.common.collect.RegularImmutableList
>   13:           844       17132816  [Ljava.util.HashMap$Entry;
>   14:        445364       10688736  
> org.apache.hadoop.hdfs.server.namenode.AclFeature
>   15:        329789       10553248  java.lang.String
>   16:         91741        8807136  
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction
>   17:         49584        6850280  <constMethodKlass>
> {quote}
> And the stack trace shows it was doing reloadFromImageFile:
> {quote}
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.getInode(FSDirectory.java:2426)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:160)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:121)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:902)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:888)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.reloadFromImageFile(FSImage.java:562)
>       at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:1048)
>       at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:536)
>       at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:388)
>       at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:354)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:356)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1630)
>       at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413)
>       at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:350)
>       at java.lang.Thread.run(Thread.java:745)
> {quote}
> So before doing reloadFromImageFile, I think we need release old namesystem 
> to prevent SecondaryNameNode OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7470) SecondaryNameNode need twice memory when calling reloadFromImageFile

Reply via email to