[jira] [Updated] (HDFS-6102) Cannot load an fsimage with a very large directory
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6102: -- Attachment: hdfs-6102-2.patch Good idea Haohui, new patch adds some precondition checks and removes that if statement. Also a new test for the preconditions. > Cannot load an fsimage with a very large directory > -- > > Key: HDFS-6102 > URL: https://issues.apache.org/jira/browse/HDFS-6102 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch > > > Found by [~schu] during testing. We were creating a bunch of directories in a > single directory to blow up the fsimage size, and it ends up we hit this > error when trying to load a very large fsimage: > {noformat} > 2014-03-13 13:57:03,901 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 > INodes. > 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Failed to load image from > FSImageFile(file=/dfs/nn/current/fsimage_00024532742, > cpktTxId=00024532742) > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at > com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) > at > com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) > at > com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) > at 52) > ... > {noformat} > Some further research reveals there's a 64MB max size per PB message, which > seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6102) Cannot load an fsimage with a very large directory
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6102: -- Status: Patch Available (was: Open) > Cannot load an fsimage with a very large directory > -- > > Key: HDFS-6102 > URL: https://issues.apache.org/jira/browse/HDFS-6102 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Attachments: hdfs-6102-1.patch > > > Found by [~schu] during testing. We were creating a bunch of directories in a > single directory to blow up the fsimage size, and it ends up we hit this > error when trying to load a very large fsimage: > {noformat} > 2014-03-13 13:57:03,901 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 > INodes. > 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Failed to load image from > FSImageFile(file=/dfs/nn/current/fsimage_00024532742, > cpktTxId=00024532742) > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at > com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) > at > com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) > at > com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) > at 52) > ... > {noformat} > Some further research reveals there's a 64MB max size per PB message, which > seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6102) Cannot load an fsimage with a very large directory
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6102: -- Attachment: hdfs-6102-1.patch Patch attached. It's dead simple, just ups the default in DFSConfigKeys and hdfs-default.xml, and adds some notes. I also took the opportunity to set the max component limit in DFSConfigKeys, since I noticed that HDFS-6055 didn't do that. I manually tested by adding a million dirs to a dir, and we hit the limit. NN was able to startup again afterwards, and the fsimage itself was only 78MB (most of that probably going to the INode names). I think this is best case, not worst case, since IIRC the inode numbers start low and count up, but if someone wants to verify my envelope math I think it's good to go. > Cannot load an fsimage with a very large directory > -- > > Key: HDFS-6102 > URL: https://issues.apache.org/jira/browse/HDFS-6102 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Attachments: hdfs-6102-1.patch > > > Found by [~schu] during testing. We were creating a bunch of directories in a > single directory to blow up the fsimage size, and it ends up we hit this > error when trying to load a very large fsimage: > {noformat} > 2014-03-13 13:57:03,901 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 > INodes. > 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Failed to load image from > FSImageFile(file=/dfs/nn/current/fsimage_00024532742, > cpktTxId=00024532742) > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at > com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) > at > com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) > at > com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) > at > org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) > at 52) > ... > {noformat} > Some further research reveals there's a 64MB max size per PB message, which > seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.2#6252)