[ https://issues.apache.org/jira/browse/PHOENIX-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Himanshu Gwalani updated PHOENIX-7669: -------------------------------------- Description: As of now, while initializing the ReplicationLogReader, it has optional trailer validation ([code reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77]) and no validation for header (it seeks to the first row, which would throw IOException if header is missing, but not validate if header is as expected or not). Also the writer as of now writes header in lazy fashion (i.e. on receiving the first mutation for log file). This can lead to empty (zero length) log files on target cluster if RS crash before any mutation is written to target, and target would not be able to validate if it's correct log file (essentially validate the header). DOD: 1. Source writer must add header as soon as the file is created (instead of waiting for new mutation) 2. While initializing the ReplicationLogReader, by default it should validate that file has valid header and trailer 3. Another inititilazation method for Reader that optionally allows skipping the trailer validation (to deal with scenarios when RS was not able to close the file successfully) 4. Throw MissingTrailerException / InvalidTrailerException in case of missing/corrupt trailer (and similar for header, i.e. MissingHeaderException) was: As of now, while initializing the ReplicationLogReader, it has optional trailer validation ([code reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77]) and no validation for header (it seeks to the first row, which would throw IOException if header is missing, but not validate if header is as expected or not). Also the writer as of now writes header in lazy fashion (i.e. on receiving the first mutation for log file). This can lead to empty (zero length) log files on target cluster if RS crash before any mutation is written to target, and target would not be able to validate if it's correct log file (essentially validate the header). DOD: 1. Source writer must add header as soon as the file is created (instead of waiting for new mutation) 2. While initializing the ReplicationLogReader, it should validate that file has valid header and trailer 3. Another inititilazation method that optionally allows skipping the trailer validation (to deal with scenarios when RS was not able to close the file successfully) 4. Throw MissingTrailerException / InvalidTrailerException in case of missing/corrupt trailer (and similar for header, i.e. MissingHeaderException) > Enhance Header and Trailer validations to gracefully handle unclosed files > -------------------------------------------------------------------------- > > Key: PHOENIX-7669 > URL: https://issues.apache.org/jira/browse/PHOENIX-7669 > Project: Phoenix > Issue Type: Sub-task > Reporter: Himanshu Gwalani > Assignee: Andrew Kyle Purtell > Priority: Major > Fix For: PHOENIX-7562-feature > > > As of now, while initializing the ReplicationLogReader, it has optional > trailer validation ([code > reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77]) > and no validation for header (it seeks to the first row, which would throw > IOException if header is missing, but not validate if header is as expected > or not). > Also the writer as of now writes header in lazy fashion (i.e. on receiving > the first mutation for log file). This can lead to empty (zero length) log > files on target cluster if RS crash before any mutation is written to target, > and target would not be able to validate if it's correct log file > (essentially validate the header). > DOD: > 1. Source writer must add header as soon as the file is created (instead of > waiting for new mutation) > 2. While initializing the ReplicationLogReader, by default it should validate > that file has valid header and trailer > 3. Another inititilazation method for Reader that optionally allows skipping > the trailer validation (to deal with scenarios when RS was not able to close > the file successfully) > 4. Throw MissingTrailerException / InvalidTrailerException in case of > missing/corrupt trailer (and similar for header, i.e. MissingHeaderException) -- This message was sent by Atlassian Jira (v8.20.10#820010)