[ https://issues.apache.org/jira/browse/PHOENIX-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Himanshu Gwalani updated PHOENIX-7672: -------------------------------------- Summary: Replication Log Replay should acquire lease on unclosed file before processing (was: ReplicationLogReplay should acquire lease on unclosed file before processing) > Replication Log Replay should acquire lease on unclosed file before processing > ------------------------------------------------------------------------------ > > Key: PHOENIX-7672 > URL: https://issues.apache.org/jira/browse/PHOENIX-7672 > Project: Phoenix > Issue Type: Sub-task > Reporter: Himanshu Gwalani > Assignee: Himanshu Gwalani > Priority: Major > Fix For: PHOENIX-7562-feature > > > As of now, if a replication log file is not closed gracefully and reader > start reading it before HDFS lease timeout on the file, it can lead to > partial data being read by the reader. Hence the replication replay must > ensure either file is closed or if not, acquire the lease and accordingly > validate header and trailer in the file before processing it. > Pseudo code (by Andrew Purtell) > {code:java} > isClosed = ((LeaseRecoverable) fs).isFileClosed(filePath); // May throw > ClassCastException - bad! > if (isClosed) { > // This will assert that the file has a valid header and trailer. > reader = createLogReader(filePath); // may throw IOE > } else { > // Recover lease via custom method using LeaseRecoverable#recoverLease. > // Wait until we get the lease with retries. > recoverLease(filePath); // may throw IOE > if (fs.getFileStatus(filePath).getLen() > 0) { // may throw IOE > try { > // This will assert that the file has a valid header and trailer. > reader = createLogReader(filePath); // may throw IOE > } catch (IOException e) { // how about MissingTrailerException to > give clarity? > // check some exception details to confirm it was a trailer issue. > // if not a trailer issue, just rethrow the exception. otherwise, > // should we continue even though the file is truncated? we are > never going > // to get that truncated data back, whatever it was. ignoring the > whole > // file converts potential data loss into certain data loss. > LOG.warn("Replication log file {} is missing its trailer, > continuing", filePath); > reader = createLogReader(filePath, false); // may throw another > IOE > } > } else { > // Ignore the file. > LOG.info("Ignoring zero length replication log file {}", filePath); > } > } > // Clean up. Remove the replication log file at filePath. {code} > After PHOENIX-7669{-}{-}, the low level reader would throw appropriate > exceptions and those needs to be handled in replication log replay as part of > this Jira (along with acquire lease logic and mentioned in above pseudo code) -- This message was sent by Atlassian Jira (v8.20.10#820010)