[jira] [Updated] (HDFS-6102) Cannot load an fsimage with a very large directory

2014-03-13 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6102:
--

Attachment: hdfs-6102-2.patch

Good idea Haohui, new patch adds some precondition checks and removes that if 
statement. Also a new test for the preconditions.

> Cannot load an fsimage with a very large directory
> --
>
> Key: HDFS-6102
> URL: https://issues.apache.org/jira/browse/HDFS-6102
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch
>
>
> Found by [~schu] during testing. We were creating a bunch of directories in a 
> single directory to blow up the fsimage size, and it ends up we hit this 
> error when trying to load a very large fsimage:
> {noformat}
> 2014-03-13 13:57:03,901 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 
> INodes.
> 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/dfs/nn/current/fsimage_00024532742, 
> cpktTxId=00024532742)
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
> at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
> at 
> com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896)
> at 52)
> ...
> {noformat}
> Some further research reveals there's a 64MB max size per PB message, which 
> seems to be what we're hitting here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6102) Cannot load an fsimage with a very large directory

2014-03-13 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6102:
--

Status: Patch Available  (was: Open)

> Cannot load an fsimage with a very large directory
> --
>
> Key: HDFS-6102
> URL: https://issues.apache.org/jira/browse/HDFS-6102
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-6102-1.patch
>
>
> Found by [~schu] during testing. We were creating a bunch of directories in a 
> single directory to blow up the fsimage size, and it ends up we hit this 
> error when trying to load a very large fsimage:
> {noformat}
> 2014-03-13 13:57:03,901 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 
> INodes.
> 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/dfs/nn/current/fsimage_00024532742, 
> cpktTxId=00024532742)
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
> at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
> at 
> com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896)
> at 52)
> ...
> {noformat}
> Some further research reveals there's a 64MB max size per PB message, which 
> seems to be what we're hitting here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6102) Cannot load an fsimage with a very large directory

2014-03-13 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6102:
--

Attachment: hdfs-6102-1.patch

Patch attached. It's dead simple, just ups the default in DFSConfigKeys and 
hdfs-default.xml, and adds some notes. I also took the opportunity to set the 
max component limit in DFSConfigKeys, since I noticed that HDFS-6055 didn't do 
that.

I manually tested by adding a million dirs to a dir, and we hit the limit. NN 
was able to startup again afterwards, and the fsimage itself was only 78MB 
(most of that probably going to the INode names). I think this is best case, 
not worst case, since IIRC the inode numbers start low and count up, but if 
someone wants to verify my envelope math I think it's good to go.

> Cannot load an fsimage with a very large directory
> --
>
> Key: HDFS-6102
> URL: https://issues.apache.org/jira/browse/HDFS-6102
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-6102-1.patch
>
>
> Found by [~schu] during testing. We were creating a bunch of directories in a 
> single directory to blow up the fsimage size, and it ends up we hit this 
> error when trying to load a very large fsimage:
> {noformat}
> 2014-03-13 13:57:03,901 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 
> INodes.
> 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Failed to load image from 
> FSImageFile(file=/dfs/nn/current/fsimage_00024532742, 
> cpktTxId=00024532742)
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> at 
> com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
> at 
> com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
> at 
> com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
> at 
> com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9839)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.(FsImageProto.java:9770)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901)
> at 
> org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896)
> at 52)
> ...
> {noformat}
> Some further research reveals there's a 64MB max size per PB message, which 
> seems to be what we're hitting here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)