N0tre3l opened a new issue, #290: URL: https://github.com/apache/logging-log4net/issues/290
Summary MaskXmlInvalidCharacters() in Transform.cs processes UTF-16 code units instead of Unicode code points. This causes supplementary characters (U+10000–U+10FFFF) encoded as surrogate pairs to be replaced with “?”, resulting in silent data corruption in XML log output. Root Cause The regex [^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD] operates on UTF-16 char units. Supplementary characters are represented as surrogate pairs, and both surrogates fall outside the allowed range, so both are replaced. Impact Silent corruption of non-BMP characters (emoji, CJK extensions, symbols) Loss of original data in XmlLayoutSchemaLog4J XML output Affects structured logs consumed by downstream systems Example Input: admin🔑 Output: admin? Suggestion Use code-point aware processing (surrogate pair handling or XmlConvert.IsXmlChar) instead of regex over UTF-16 code units. Severity Low (data integrity issue) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
