[ 
https://issues.apache.org/jira/browse/PHOENIX-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Gwalani updated PHOENIX-7672:
--------------------------------------
    Summary: ReplicationLogReplay should acquire lease on unclosed file before 
processing  (was: ReplicationLogReplay should acquire lease on unclosed file 
before processing them)

> ReplicationLogReplay should acquire lease on unclosed file before processing
> ----------------------------------------------------------------------------
>
>                 Key: PHOENIX-7672
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7672
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>             Fix For: PHOENIX-7562-feature
>
>
> As of now, if a replication log file is not closed gracefully and reader 
> start reading it before HDFS lease timeout on the file, it can lead to 
> partial data being read by the reader. Hence the replication replay must 
> ensure either file is closed or if not, acquire the lease and accordingly 
> validate header and trailer in the file before processing it.
> Pseudo code (by Andrew Purtell)
> {code:java}
> isClosed = ((LeaseRecoverable) fs).isFileClosed(filePath); // May throw 
> ClassCastException - bad!
> if (isClosed) {
>     // This will assert that the file has a valid header and trailer.
>     reader = createLogReader(filePath); // may throw IOE
> } else {
>     // Recover lease via custom method using LeaseRecoverable#recoverLease.
>     // Wait until we get the lease with retries.
>     recoverLease(filePath); // may throw IOE
>     if (fs.getFileStatus(filePath).getLen() > 0) { // may throw IOE
>         try {
>             // This will assert that the file has a valid header and trailer.
>             reader = createLogReader(filePath); // may throw IOE
>         } catch (IOException e) { // how about MissingTrailerException to 
> give clarity? 
>             // check some exception details to confirm it was a trailer issue.
>             // if not a trailer issue, just rethrow the exception. otherwise,
>             // should we continue even though the file is truncated? we are 
> never going
>             // to get that truncated data back, whatever it was. ignoring the 
> whole
>             // file converts potential data loss into certain data loss.
>             LOG.warn("Replication log file {} is missing its trailer, 
> continuing", filePath);
>             reader = createLogReader(filePath, false); // may throw another 
> IOE
>         }
>     } else {
>         // Ignore the file.
>         LOG.info("Ignoring zero length replication log file {}", filePath);
>     }
> }
> // Clean up. Remove the replication log file at filePath. {code}
> After PHOENIX-7669{-}{-}, the low level reader would throw appropriate 
> exceptions and those needs to be handled in replication log replay as part of 
> this Jira (along with acquire lease logic and mentioned in above pseudo code)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to