This issue is this line (line 135): $text =~ tr/\xA0\xAD/ /d;
This works great if your data is in a Unicode string. It also works great if your data is a byte string using Latin-1. It works very poorly if your UTF-8 data is in a byte string. In the example given in the original bug report, -Mutf8 was not used, so the data is treated as a series of (two) Latin-1 characters. vauxhall ok % perl -MHTML::FormatText -Mutf8 -C6 -E 'print HTML::FormatText->new->format_string("à")' |hd 00000000 c3 a0 0a |...| 00000003 vauxhall ok % perl -MHTML::FormatText -Mutf8 -E 'print HTML::FormatText->new->format_string("à")' |hd 00000000 e0 0a |..| 00000002 I suspect the correct fix for this bug is documentation. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
signature.asc
Description: Digital signature