[ 
https://issues.apache.org/jira/browse/HDFS-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849052#comment-15849052
 ] 

Jing Zhao commented on HDFS-4025:
---------------------------------

Thanks for the updating the patch, [~hanishakoneru]. The latest patch looks 
pretty good to me. Some minor comments:
# In hdfs-default.xml, "i" --> "if"
{code}
+  <name>dfs.journalnode.enable.sync</name>
+  <value>true</value>
+  <description>
+    If true, the journal nodes wil sync with each other. The journal nodes
+    will periodically gossip with other journal nodes to compare edit log
+    manifests and i they detect any missing log segment, they will download
+    it from the other journal nodes.
+  </description>
+</property>
{code}
# In JournalNodeSyncer.java, the following code will generate an 
{{UnsupportedOperationException}} since thisJournalEditLogs is an immutable 
list. In fact this add op can be skipped.
{code}
          if (success) {
            thisJournalEditLogs.add(missingLog);
          }
{code}
# Maybe "Transferring" can be changed to "Downloading"?
{code}
LOG.info("Transferring Missing Edit Log from " + url + " to " + jnStorage
        .getRoot());
{code}
# {{finalEditsFile}} should be {{tmpEditsFile}}.
{code}
    LOG.info("Downloaded file " + tmpEditsFile.getName() + " size " +
        finalEditsFile.length() + " bytes.");
{code}
# In {{TestJournalNodeSync}}, {{jid}} can be declared as final, and 
{{editLogExists}} can be private.
# For {{deleteEditLog}},  we can either change the while loop to an if, or 
refresh logFile instance within the while loop.
{code}
+   while (logFile.isInProgress()) {
+      dfsCluster.getNameNode(0).getRpcServer().rollEditLog();
{code}
# The following code can be simplified as "Assert.assertTrue("Couldn't delete 
edit log file", deleteFile.delete());"
{code}
+    if (!deleteFile.delete()) {
+      assert false: "Couldn't delete edit log file";
+      return null;
+    }
{code}
# In {{generateEditLog}}, let's also check the result of {{doAndEdit}}. I.e., 
we do "Assert.assertTrue(doAnEdit());"

> QJM: Sychronize past log segments to JNs that missed them
> ---------------------------------------------------------
>
>                 Key: HDFS-4025
>                 URL: https://issues.apache.org/jira/browse/HDFS-4025
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Hanisha Koneru
>             Fix For: QuorumJournalManager (HDFS-3077)
>
>         Attachments: HDFS-4025.000.patch, HDFS-4025.001.patch, 
> HDFS-4025.002.patch, HDFS-4025.003.patch, HDFS-4025.004.patch, 
> HDFS-4025.005.patch, HDFS-4025.006.patch, HDFS-4025.007.patch, 
> HDFS-4025.008.patch, HDFS-4025.009.patch
>
>
> Currently, if a JournalManager crashes and misses some segment of logs, and 
> then comes back, it will be re-added as a valid part of the quorum on the 
> next log roll. However, it will not have a complete history of log segments 
> (i.e any individual JN may have gaps in its transaction history). This 
> mirrors the behavior of the NameNode when there are multiple local 
> directories specified.
> However, it would be better if a background thread noticed these gaps and 
> "filled them in" by grabbing the segments from other JournalNodes. This 
> increases the resilience of the system when JournalNodes get reformatted or 
> otherwise lose their local disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to