[ 
https://issues.apache.org/jira/browse/PHOENIX-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Gwalani updated PHOENIX-7669:
--------------------------------------
    Description: 
As of now, while initializing the ReplicationLogReader, it has optional trailer 
validation ([code 
reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77])
 and no validation for header (it seeks to the first row, which would throw 
IOException if header is missing, but not validate if header is as expected or 
not).

Also the writer as of now writes header in lazy fashion (i.e. on receiving the 
first mutation for log file). This can lead to empty (zero length) log files on 
target cluster if RS crash before any mutation is written to target, and target 
would not be able to validate if it's correct log file (essentially validate 
the header).

DOD:
1. Source writer must add header as soon as the file is created (instead of 
waiting for new mutation)
2. While initializing the ReplicationLogReader, it should validate that file 
has valid header and trailer
3. Another inititilazation method that optionally allows skipping the trailer 
validation (to deal with scenarios when RS was not able to close the file 
successfully)
4. Throw MissingTrailerException / InvalidTrailerException in case of 
missing/corrupt trailer (and similar for header, i.e. MissingHeaderException)

  was:
As of now, while initializing the ReplicationLogReader, it has optional trailer 
validation ([code 
reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77])
 and no validation for header (it seeks to the first row, which would throw 
IOException if header is missing, but not validate if header is as expected or 
not).

Also the writer as of now writes header in lazy fashion (i.e. on receiving the 
first mutation for log file). This can lead to empty (zero length) log files on 
target cluster if RS does not receive any mutations for log file rotation time 
(1 min). This would also make it difficult 

DOD:
1. Source writer must add header as soon as the file is created (instead of 
waiting for new mutation)
2. While initializing the ReplicationLogReader, it should validate that file 
has valid header and trailer
3. Another inititilazation method that optionally allows skipping the trailer 
validation (to deal with scenarios when RS was not able to close the file 
successfully)
4. Throw MissingTrailerException / InvalidTrailerException in case of 
missing/corrupt trailer (and similar for header, i.e. MissingHeaderException)


> Enhance Header and Trailer validations to gracefully handle unclosed files
> --------------------------------------------------------------------------
>
>                 Key: PHOENIX-7669
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7669
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Himanshu Gwalani
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: PHOENIX-7562-feature
>
>
> As of now, while initializing the ReplicationLogReader, it has optional 
> trailer validation ([code 
> reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77])
>  and no validation for header (it seeks to the first row, which would throw 
> IOException if header is missing, but not validate if header is as expected 
> or not).
> Also the writer as of now writes header in lazy fashion (i.e. on receiving 
> the first mutation for log file). This can lead to empty (zero length) log 
> files on target cluster if RS crash before any mutation is written to target, 
> and target would not be able to validate if it's correct log file 
> (essentially validate the header).
> DOD:
> 1. Source writer must add header as soon as the file is created (instead of 
> waiting for new mutation)
> 2. While initializing the ReplicationLogReader, it should validate that file 
> has valid header and trailer
> 3. Another inititilazation method that optionally allows skipping the trailer 
> validation (to deal with scenarios when RS was not able to close the file 
> successfully)
> 4. Throw MissingTrailerException / InvalidTrailerException in case of 
> missing/corrupt trailer (and similar for header, i.e. MissingHeaderException)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to