[ 
https://issues.apache.org/jira/browse/PHOENIX-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Gwalani updated PHOENIX-7669:
--------------------------------------
    Description: 
As of now, while initializing the ReplicationLogReader, it has optional trailer 
validation ([code 
reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77])
 and no validation for header (it seeks to the first row, which would throw 
IOException if header is missing, but not validate if header is as expected or 
not).

Also the writer as of now writes header in lazy fashion (i.e. on receiving the 
first mutation for log file). This can lead to empty (zero length) log files on 
target cluster if RS crash before any mutation is written to target, and target 
would not be able to validate if it's correct log file (essentially validate 
the header).

DOD:
1. Source writer must add header as soon as the file is created (instead of 
waiting for new mutation)
2. While initializing the ReplicationLogReader, by default it should validate 
that file has valid header and trailer
3. Another inititilazation method for Reader that optionally allows skipping 
the trailer validation (to deal with scenarios when RS was not able to close 
the file successfully)
4. Throw MissingTrailerException / InvalidTrailerException in case of 
missing/corrupt trailer (and similar for header, i.e. MissingHeaderException)

  was:
As of now, while initializing the ReplicationLogReader, it has optional trailer 
validation ([code 
reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77])
 and no validation for header (it seeks to the first row, which would throw 
IOException if header is missing, but not validate if header is as expected or 
not).

Also the writer as of now writes header in lazy fashion (i.e. on receiving the 
first mutation for log file). This can lead to empty (zero length) log files on 
target cluster if RS crash before any mutation is written to target, and target 
would not be able to validate if it's correct log file (essentially validate 
the header).

DOD:
1. Source writer must add header as soon as the file is created (instead of 
waiting for new mutation)
2. While initializing the ReplicationLogReader, it should validate that file 
has valid header and trailer
3. Another inititilazation method that optionally allows skipping the trailer 
validation (to deal with scenarios when RS was not able to close the file 
successfully)
4. Throw MissingTrailerException / InvalidTrailerException in case of 
missing/corrupt trailer (and similar for header, i.e. MissingHeaderException)


> Enhance Header and Trailer validations to gracefully handle unclosed files
> --------------------------------------------------------------------------
>
>                 Key: PHOENIX-7669
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7669
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Himanshu Gwalani
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: PHOENIX-7562-feature
>
>
> As of now, while initializing the ReplicationLogReader, it has optional 
> trailer validation ([code 
> reference|https://github.com/apache/phoenix/blob/295848b44600689c626e404fd7a37e84f3c14d02/phoenix-core-server/src/main/java/org/apache/phoenix/replication/log/LogFileFormatReader.java#L58-L77])
>  and no validation for header (it seeks to the first row, which would throw 
> IOException if header is missing, but not validate if header is as expected 
> or not).
> Also the writer as of now writes header in lazy fashion (i.e. on receiving 
> the first mutation for log file). This can lead to empty (zero length) log 
> files on target cluster if RS crash before any mutation is written to target, 
> and target would not be able to validate if it's correct log file 
> (essentially validate the header).
> DOD:
> 1. Source writer must add header as soon as the file is created (instead of 
> waiting for new mutation)
> 2. While initializing the ReplicationLogReader, by default it should validate 
> that file has valid header and trailer
> 3. Another inititilazation method for Reader that optionally allows skipping 
> the trailer validation (to deal with scenarios when RS was not able to close 
> the file successfully)
> 4. Throw MissingTrailerException / InvalidTrailerException in case of 
> missing/corrupt trailer (and similar for header, i.e. MissingHeaderException)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to