[
https://issues.apache.org/jira/browse/KUDU-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237923#comment-15237923
]
Mike Percy commented on KUDU-1414:
----------------------------------
It's possible that they were fsynced and then a couple of bytes along the
border of the last 2 entries were corrupted. Then Kudu was killed, resulting in
no footer. We would not detect the data loss.
It's not safe to scan forward looking for valid offsets in this way. As I wrote
above, the only safe thing to do, that I can think of, is look for a known
pattern constituting "blank", the simplest of which is zeroes in the rest of
the file. In that case, I believe we can safely assume truncation occurred.
> Corrupting multiple log entries at the end of a WAL file may go undetected
> --------------------------------------------------------------------------
>
> Key: KUDU-1414
> URL: https://issues.apache.org/jira/browse/KUDU-1414
> Project: Kudu
> Issue Type: Bug
> Components: log
> Affects Versions: 0.8.0
> Reporter: Mike Percy
>
> While looking at KUDU-1377, I investigated how we are handling WAL truncation
> when corruption is detected. The way the code is written today, a trailing
> series of corrupt log entries are truncated with only a log warning message.
> I'll post a unit test demonstrating this behavior.
> One way to get around this is to ensure that we only accept zeros following a
> truncated record, instead of just bad records, in order to consider it a
> partially-written record that we can safely truncate. We would have to
> maintain this invariant when preallocating space and truncating partial
> records before continuing to write.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)