This issue is this line (line 135):

  $text =~ tr/\xA0\xAD/ /d;

This works great if your data is in a Unicode string.  It also works
great if your data is a byte string using Latin-1.  It works very poorly
if your UTF-8 data is in a byte string.  In the example given in the
original bug report, -Mutf8 was not used, so the data is treated as a
series of (two) Latin-1 characters.

vauxhall ok % perl -MHTML::FormatText -Mutf8 -C6 -E 'print 
HTML::FormatText->new->format_string("à")' |hd
00000000  c3 a0 0a                                          |...|
00000003
vauxhall ok % perl -MHTML::FormatText -Mutf8 -E 'print 
HTML::FormatText->new->format_string("à")' |hd 
00000000  e0 0a                                             |..|
00000002

I suspect the correct fix for this bug is documentation.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

Attachment: signature.asc
Description: Digital signature

Reply via email to