metsw24-max opened a new pull request, #669:
URL: https://github.com/apache/logging-log4cxx/pull/669

   Reject UTF-8 encodings of UTF-16 surrogate halves (`U+D800–U+DFFF`) during 
decoding in `Transcoder::decode()`.
   
   RFC 3629 §3 explicitly forbids surrogate-half values in UTF-8. Prior to this 
patch, the decoder accepted these sequences and treated them as valid code 
points, allowing malformed Unicode to enter internal `LogString` 
representations and later be re-emitted unchanged by downstream components.
   
   This patch fixes the issue by rejecting surrogate-half values during the 
existing 3-byte UTF-8 validation path.
   
   ---
   
   ## Changes
   
   ### Decoder Validation
   
   Updated the validation check in:
   
   `src/main/cpp/transcoder.cpp`
   
   from:
   
   ```cpp
   if (rv <= 0x800)
   ```
   
   to:
   
   ```cpp
   if (rv <= 0x800 || (0xD800 <= rv && rv <= 0xDFFF))
   ```
   
   The existing `rv <= 0x800` condition is intentionally left unchanged because 
it belongs to the separate `utf8-u0800-boundary-check` issue.
   
   The new clause rejects all UTF-16 surrogate-half code points, which are 
invalid in UTF-8.
   
   ---
   
   ## Tests Added
   
   Added regression coverage in:
   
   `src/test/cpp/helpers/transcodertestcase.cpp`
   
   ### `testDecodeUTF8_RejectSurrogate`
   
   Verifies that the invalid UTF-8 sequence:
   
   ```text
   ED A0 80
   ```
   
   (previously decoded as `U+D800`) is now rejected and converted into 
`LOSSCHAR` substitutions.
   
   ### `testDecodeUTF8_SurrogateBoundaries`
   
   Validates correct handling around the surrogate range boundaries:
   
   * `U+D7FF` → accepted
   * `U+D800` → rejected
   * `U+DBFF` → rejected
   * `U+DC00` → rejected
   * `U+DFFF` → rejected
   * `U+E000` → accepted
   
   This confirms that only surrogate-half values are rejected.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to