[ 
https://issues.apache.org/jira/browse/PHOENIX-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Gwalani updated PHOENIX-7672:
--------------------------------------
    Description: 
As of now, if a replication log file is not closed gracefully and reader start 
reading it before HDFS lease timeout on the file, it can lead to partial data 
being read by the reader. Hence the replication replay must ensure either file 
is closed or if not, acquire the lease and accordingly validate header and 
trailer in the file before processing it.

Pseudo code (by Andrew Purtell)
{code:java}
isClosed = ((LeaseRecoverable) fs).isFileClosed(filePath); // May throw 
ClassCastException - bad!
if (isClosed) {
    // This will assert that the file has a valid header and trailer.
    reader = createLogReader(filePath); // may throw IOE
} else {
    // Recover lease via custom method using LeaseRecoverable#recoverLease.
    // Wait until we get the lease with retries.
    recoverLease(filePath); // may throw IOE
    if (fs.getFileStatus(filePath).getLen() > 0) { // may throw IOE
        try {
            // This will assert that the file has a valid header and trailer.
            reader = createLogReader(filePath); // may throw IOE
        } catch (IOException e) { // how about MissingTrailerException to give 
clarity? 
            // check some exception details to confirm it was a trailer issue.
            // if not a trailer issue, just rethrow the exception. otherwise,
            // should we continue even though the file is truncated? we are 
never going
            // to get that truncated data back, whatever it was. ignoring the 
whole
            // file converts potential data loss into certain data loss.
            LOG.warn("Replication log file {} is missing its trailer, 
continuing", filePath);
            reader = createLogReader(filePath, false); // may throw another IOE
        }
    } else {
        // Ignore the file.
        LOG.info("Ignoring zero length replication log file {}", filePath);
    }
}
// Clean up. Remove the replication log file at filePath. {code}
After PHOENIX-7669, the low level reader would throw appropriate exceptions 
(MissingTrailerException, MissingHeaderException) to signal why it failed to 
initialze the reader (in cases when file is unclosed/corrupt)


In HBase, there is already a utility class 
([RecoverLeaseFSUtils|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L88C22-L88C38])
 for this which hasĀ 
[recoverFileLease|https://github.com/apache/hbase/blob/c039e446ed73d7bda96e471be28946ebebc46144/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L88C22-L98]
 method implemented (along with retries). This can be brought in to Phoenix 
Sources and updated as required.

DOD: Acquire lease logic as mentioned in above pseudo code and handle the 
exceptions appropriately while creating Replication Log Reader.

  was:
As of now, if a replication log file is not closed gracefully and reader start 
reading it before HDFS lease timeout on the file, it can lead to partial data 
being read by the reader. Hence the replication replay must ensure either file 
is closed or if not, acquire the lease and accordingly validate header and 
trailer in the file before processing it.

Pseudo code (by Andrew Purtell)
{code:java}
isClosed = ((LeaseRecoverable) fs).isFileClosed(filePath); // May throw 
ClassCastException - bad!
if (isClosed) {
    // This will assert that the file has a valid header and trailer.
    reader = createLogReader(filePath); // may throw IOE
} else {
    // Recover lease via custom method using LeaseRecoverable#recoverLease.
    // Wait until we get the lease with retries.
    recoverLease(filePath); // may throw IOE
    if (fs.getFileStatus(filePath).getLen() > 0) { // may throw IOE
        try {
            // This will assert that the file has a valid header and trailer.
            reader = createLogReader(filePath); // may throw IOE
        } catch (IOException e) { // how about MissingTrailerException to give 
clarity? 
            // check some exception details to confirm it was a trailer issue.
            // if not a trailer issue, just rethrow the exception. otherwise,
            // should we continue even though the file is truncated? we are 
never going
            // to get that truncated data back, whatever it was. ignoring the 
whole
            // file converts potential data loss into certain data loss.
            LOG.warn("Replication log file {} is missing its trailer, 
continuing", filePath);
            reader = createLogReader(filePath, false); // may throw another IOE
        }
    } else {
        // Ignore the file.
        LOG.info("Ignoring zero length replication log file {}", filePath);
    }
}
// Clean up. Remove the replication log file at filePath. {code}
After PHOENIX-7669{-}{{-}}, the low level reader would throw appropriate 
exceptions (MissingTrailerException, MissingHeaderException) to signal why it 
failed to initialze the reader (in cases when file is unclosed/corrupt)


DOD: Acquire lease logic as mentioned in above pseudo code and handle the 
exceptions appropriately while creating Replication Log Reader.


> Replication Log Replay should acquire lease on unclosed file before processing
> ------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7672
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7672
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Himanshu Gwalani
>            Assignee: Himanshu Gwalani
>            Priority: Major
>             Fix For: PHOENIX-7562-feature
>
>
> As of now, if a replication log file is not closed gracefully and reader 
> start reading it before HDFS lease timeout on the file, it can lead to 
> partial data being read by the reader. Hence the replication replay must 
> ensure either file is closed or if not, acquire the lease and accordingly 
> validate header and trailer in the file before processing it.
> Pseudo code (by Andrew Purtell)
> {code:java}
> isClosed = ((LeaseRecoverable) fs).isFileClosed(filePath); // May throw 
> ClassCastException - bad!
> if (isClosed) {
>     // This will assert that the file has a valid header and trailer.
>     reader = createLogReader(filePath); // may throw IOE
> } else {
>     // Recover lease via custom method using LeaseRecoverable#recoverLease.
>     // Wait until we get the lease with retries.
>     recoverLease(filePath); // may throw IOE
>     if (fs.getFileStatus(filePath).getLen() > 0) { // may throw IOE
>         try {
>             // This will assert that the file has a valid header and trailer.
>             reader = createLogReader(filePath); // may throw IOE
>         } catch (IOException e) { // how about MissingTrailerException to 
> give clarity? 
>             // check some exception details to confirm it was a trailer issue.
>             // if not a trailer issue, just rethrow the exception. otherwise,
>             // should we continue even though the file is truncated? we are 
> never going
>             // to get that truncated data back, whatever it was. ignoring the 
> whole
>             // file converts potential data loss into certain data loss.
>             LOG.warn("Replication log file {} is missing its trailer, 
> continuing", filePath);
>             reader = createLogReader(filePath, false); // may throw another 
> IOE
>         }
>     } else {
>         // Ignore the file.
>         LOG.info("Ignoring zero length replication log file {}", filePath);
>     }
> }
> // Clean up. Remove the replication log file at filePath. {code}
> After PHOENIX-7669, the low level reader would throw appropriate exceptions 
> (MissingTrailerException, MissingHeaderException) to signal why it failed to 
> initialze the reader (in cases when file is unclosed/corrupt)
> In HBase, there is already a utility class 
> ([RecoverLeaseFSUtils|https://github.com/apache/hbase/blob/master/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L88C22-L88C38])
>  for this which hasĀ 
> [recoverFileLease|https://github.com/apache/hbase/blob/c039e446ed73d7bda96e471be28946ebebc46144/hbase-asyncfs/src/main/java/org/apache/hadoop/hbase/util/RecoverLeaseFSUtils.java#L88C22-L98]
>  method implemented (along with retries). This can be brought in to Phoenix 
> Sources and updated as required.
> DOD: Acquire lease logic as mentioned in above pseudo code and handle the 
> exceptions appropriately while creating Replication Log Reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to