[ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2709:
---------------------------------

    Attachment: HDFS-2709-HDFS-1623.patch

I was indeed able to reproduce this error, usually within about 20 iterations.

Here's an updated patch which fixes that occasional test failure. I don't think 
this will still be an issue once HDFS-2738 goes in, but the fix to get this 
test to pass without that was this change in {{EditLogFileInputStream}}:

{code}
@@ -185,8 +203,9 @@ class EditLogFileInputStream extends EditLogInputStream {
           + logVersion + ". Current version = "
           + HdfsConstants.LAYOUT_VERSION + ".");
     }
-    assert logVersion <= Storage.LAST_UPGRADABLE_LAYOUT_VERSION :
-      "Unsupported version " + logVersion;
+    if (logVersion > Storage.LAST_UPGRADABLE_LAYOUT_VERSION) {
+      throw new IOException("Unsupported version " + logVersion);
+    }
     return logVersion;
   }
{code}

I then ran 70 iterations of {{TestHASafeMode}} with this patch, and never saw 
an error.
                
> HA: Appropriately handle error conditions in EditLogTailer
> ----------------------------------------------------------
>
>                 Key: HDFS-2709
>                 URL: https://issues.apache.org/jira/browse/HDFS-2709
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Aaron T. Myers
>            Priority: Critical
>         Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch
>
>
> Currently if the edit log tailer experiences an error replaying edits in the 
> middle of a file, it will go back to retrying from the beginning of the file 
> on the next tailing iteration. This is incorrect since many of the edits will 
> have already been replayed, and not all edits are idempotent.
> Instead, we either need to (a) support reading from the middle of a finalized 
> file (ie skip those edits already applied), or (b) abort the standby if it 
> hits an error while tailing. If "a" isn't simple, let's do "b" for now and 
> come back to 'a' later since this is a rare circumstance and better to abort 
> than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to