> On Apr 26, 2022, at 10:55 AM, David Blevins <[email protected]> wrote:
> 
> I'd need to check on the character encoding issue you mention.  In my mind 
> the original code and current code is trying to create a string of max 
> snippet length.  If it doesn't do that, it's a bug.

So I dug into this and it looks like counting bytes is very flawed and counting 
chars is as perfect as it gets in java.

It looks like even with UTF-8 you can have a single character be anywhere from 
1 to 4 bytes.  The character `ñ` is string length of 1 but a byte length of 2.  
If you grabbed the first 3 bytes of "mañana" you'd get "ma�..."

If you create a UTF-8 string from a four-byte UTF-8 character you get of course 
4 bytes in the OutputStream, but you also get a string instance that claims to 
be of length 2 not 1.  If you call substring(0,1) on that you get an 
unprintable result.

So we fixed a bug in the switch from OutputStream to Writer.  Any issues there 
are with counting chars passed to the Writer and shared by java.lang.String so 
users should not be surprised if they see a funny character at the end of the 
snippet sometimes.


-David



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to