[ https://issues.apache.org/jira/browse/HDFS-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erik Krogen resolved HDFS-14500. -------------------------------- Resolution: Fixed > NameNode StartupProgress continues to report edit log segments after the > LOADING_EDITS phase is finished > -------------------------------------------------------------------------------------------------------- > > Key: HDFS-14500 > URL: https://issues.apache.org/jira/browse/HDFS-14500 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 > Reporter: Erik Krogen > Assignee: Erik Krogen > Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-14500-branch-2.001.patch, HDFS-14500.000.patch, > HDFS-14500.001.patch > > > When testing out a cluster with the edit log tailing fast path feature > enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in > safe mode for an extended period of time, preventing the NameNode from fully > completing its startup sequence. We noticed that the Startup Progress web UI > displayed many edit log segments (millions of them). > I traced this problem back to {{StartupProgress}}. Within > {{FSEditLogLoader}}, the loader continually tries to update the startup > progress with a new {{Step}} any time that it loads edits. Per the Javadoc > for {{StartupProgress}}, this should be a no-op once startup is completed: > {code:title=StartupProgress.java} > * After startup completes, the tracked data is frozen. Any subsequent > updates > * or counter increments are no-ops. > {code} > However, {{StartupProgress}} only implements that logic once the _entire_ > startup sequence has been completed. When {{FSEditLogLoader}} calls > {{addStep()}}, it adds it into the {{LOADING_EDITS}} phase: > {code:title=FSEditLogLoader.java} > StartupProgress prog = NameNode.getStartupProgress(); > Step step = createStartupProgressStep(edits); > prog.beginStep(Phase.LOADING_EDITS, step); > {code} > This phase, in our case, ended long before, so it is nonsensical to continue > to add steps to it. I believe it is a bug that {{StartupProgress}} accepts > such steps instead of ignoring them; once a phase is complete, it should no > longer change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org