FreeAndNil opened a new pull request, #291:
URL: https://github.com/apache/logging-log4net/pull/291

   fixes #290
   
   The regex [^\x09\x0A\x0D\x20-\uD7FF\uE000-\uFFFD] operated on individual 
UTF-16 char units, causing both halves of a valid surrogate pair to be 
replaced, silently corrupting supplementary characters (U+10000–U+10FFFF) such 
as emoji in XML log output.
   
   Fix by prepending a surrogate-pair alternative to the regex so valid pairs 
are matched and preserved as a unit; only lone surrogates and other XML-illegal 
code units are replaced with the mask string.
   
   Also optimise CountSubstrings: use a char loop for single-character 
substrings (all current callers) and StringComparison.Ordinal for the 
multi-character CDATA token path.
   
   Add unit tests covering surrogate pair preservation, lone surrogates, and 
CountSubstrings edge cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to