[ 
https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933856#comment-13933856
 ] 

Andrew Wang commented on HDFS-6102:
-----------------------------------

Doing some back of the envelope math while looking at INodeDirectorySection in 
fsimage.proto, we save a packed uint64 per child. These are varints, but let's 
assume worst case and they use the full 10 bytes. Thus, with the 64MB default 
max message size, we arrive at 6.7 million entries.

There are a couple approaches here:

- Split the directory section up into multiple messages, such that each message 
is under the limit
- Up the default from 64MB to the maximum supported value of 512MB, release 
note, and assume no one will realistically hit this
- Enforce a configurable maximum on the # of entries per directory

I think #3 is the best solution here, under the assumption that no one will 
need 6 million things in a directory. Still needs to be release noted of course.

> Cannot load an fsimage with a very large directory
> --------------------------------------------------
>
>                 Key: HDFS-6102
>                 URL: https://issues.apache.org/jira/browse/HDFS-6102
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.4.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>            Priority: Blocker
>
> Found by [~schu] during testing. We were creating a bunch of directories in a 
> single directory to blow up the fsimage size, and it ends up we hit this 
> error when trying to load a very large fsimage:
> {noformat}
> 2014-03-13 13:57:03,901 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 
> INodes.
> 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/dfs/nn/current/fsimage_0000000000024532742, 
> cpktTxId=0000000000024532742)
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
>         at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
>         at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
>         at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
>         at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
>         at 
> com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.<init>(FsImageProto.java:9839)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.<init>(FsImageProto.java:9770)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896)
>         at 52)
> ...
> {noformat}
> Some further research reveals there's a 64MB max size per PB message, which 
> seems to be what we're hitting here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to