N0tre3l opened a new issue, #290:
URL: https://github.com/apache/logging-log4net/issues/290

   Summary
   MaskXmlInvalidCharacters() in Transform.cs processes UTF-16 code units 
instead of Unicode code points. This causes supplementary characters 
(U+10000–U+10FFFF) encoded as surrogate pairs to be replaced with “?”, 
resulting in silent data corruption in XML log output.
   
   Root Cause
   The regex [^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD] operates on UTF-16 char 
units. Supplementary characters are represented as surrogate pairs, and both 
surrogates fall outside the allowed range, so both are replaced.
   
   Impact
   
   Silent corruption of non-BMP characters (emoji, CJK extensions, symbols)
   Loss of original data in XmlLayoutSchemaLog4J XML output
   Affects structured logs consumed by downstream systems
   
   Example
   Input: admin🔑
   Output: admin?
   
   Suggestion
   Use code-point aware processing (surrogate pair handling or 
XmlConvert.IsXmlChar) instead of regex over UTF-16 code units.
   
   Severity
   Low (data integrity issue)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to