[ 
https://issues.apache.org/jira/browse/HDFS-14771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922518#comment-16922518
 ] 

Stephen O'Donnell commented on HDFS-14771:
------------------------------------------

The failures on branch 2 made me double check branch 3. I believed I tested 
this, but I must have made a mistake when I did.

An image with the sub-sections will not load in the 3.3.x branch without the 
patch in place. It will fail with a message like this:

 
{code:java}
2019-09-04 11:48:35,638 ERROR namenode.FSImage: Failed to load image from 
FSImageFile(file=/tmp/hadoop-sodonnell/dfs/name/current/fsimage_0000000000000000002,
 cpktTxId=0000000000000000002)
java.io.IOException: Unrecognized section INODE_SUB
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:309)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:238)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:954)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:938)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:800)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:720)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:643)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:705)

{code}
This is because of this code, in FSImageFormatProtobuf, which I cannot believe 
I missed:
{code:java}
 String n = s.getName();
SectionName sectionName = SectionName.fromString(n);
if (sectionName == null) {
  throw new IOException("Unrecognized section " + n);
}
{code}
Basically, it reads the section name as text from the image, and then attempts 
to convert it into an ENUM, and when that fails to return non-null value, it 
fails.

Looking at the branches, 3.1 has layout version -64, and then 3.2 and 3.3 have 
-65.

I wonder what the best solution to this is. Perhaps making the feature off by 
default, as that way, a user would need to make a decision to enable it. 
Another option may be to disable the feature while rolling upgrades are going 
on, so the new feature would not be disabled until the upgrade was finalized. 
Or we could create a new layout version for the 3.3 branch, but that does not 
really help with backporting this to branch 2.

> Backport HDFS-14617 to branch-2 (Improve fsimage load time by writing 
> sub-sections to the fsimage index)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14771
>                 URL: https://issues.apache.org/jira/browse/HDFS-14771
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.10.0
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>              Labels: release-blocker
>         Attachments: HDFS-14771.branch-2.001.patch, 
> HDFS-14771.branch-2.002.patch
>
>
> This JIRA aims to backport HDFS-14617 to branch-2: fsimage load time by 
> writing sub-sections to the fsimage index.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to