metsw24-max opened a new pull request, #664:
URL: https://github.com/apache/logging-log4cxx/pull/664

   This PR fixes an off-by-one error in the UTF-8 three-byte decoding 
validation logic.
   
   ## Problem
   
   `Transcoder::decode` incorrectly rejected the canonical UTF-8 encoding of 
`U+0800`.
   
   The three-byte overlong validation check used:
   
   ```cpp
   if (rv <= 0x800)
   ```
   
   which treated `0x0800` itself as invalid, even though it is the smallest 
valid code point that legitimately requires a three-byte UTF-8 sequence.
   
   As a result, valid UTF-8 input containing the bytes:
   
   ```text
   E0 A0 80
   ```
   
   (the canonical encoding of `U+0800`) was decoded as `0xFFFF`, causing the 
caller to substitute `Transcoder::LOSSCHAR` and silently corrupt the decoded 
output.
   
   ## Fix
   
   Change the validation condition to:
   
   ```cpp
   if (rv < 0x800)
   ```
   
   This preserves rejection of true overlong encodings while correctly 
accepting `U+0800`.
   
   ## Tests
   
   Added `testDecodeUTF8_U0800` regression coverage which:
   
   * decodes the literal UTF-8 byte sequence `E0 A0 80`
   * verifies the decoded output matches `Transcoder::encode(0x0800, …)`
   * asserts that no `LOSSCHAR` is introduced


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to