metsw24-max opened a new pull request, #664: URL: https://github.com/apache/logging-log4cxx/pull/664
This PR fixes an off-by-one error in the UTF-8 three-byte decoding validation logic. ## Problem `Transcoder::decode` incorrectly rejected the canonical UTF-8 encoding of `U+0800`. The three-byte overlong validation check used: ```cpp if (rv <= 0x800) ``` which treated `0x0800` itself as invalid, even though it is the smallest valid code point that legitimately requires a three-byte UTF-8 sequence. As a result, valid UTF-8 input containing the bytes: ```text E0 A0 80 ``` (the canonical encoding of `U+0800`) was decoded as `0xFFFF`, causing the caller to substitute `Transcoder::LOSSCHAR` and silently corrupt the decoded output. ## Fix Change the validation condition to: ```cpp if (rv < 0x800) ``` This preserves rejection of true overlong encodings while correctly accepting `U+0800`. ## Tests Added `testDecodeUTF8_U0800` regression coverage which: * decodes the literal UTF-8 byte sequence `E0 A0 80` * verifies the decoded output matches `Transcoder::encode(0x0800, …)` * asserts that no `LOSSCHAR` is introduced -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
