Le 9 oct. 2006 à 20:34, John Gruber a écrit :
Michel Fortin <[EMAIL PROTECTED]> wrote on 10/9/06 at 8:26 PM:
If anyone is interested in a fix for PHP Markdown, just change
the call to the `strlen` function within detab to a call to
`mb_strlen($line, 'utf-8')`. I'll fix this for the next
version.
Will that still work if people pass in Windows Latin 1 or Mac
Roman-encoded text? Yes, I'm too lazy to try it...
I haven't tried it inside PHP Markdown yet, but I've tested
`mb_strlen` and it seems to treat any invalid UTF-8 byte sequence as
individual characters. So the neat result is that text in ISO Latin,
Windows Latin, or Mac Roman will work fine unless it contains
sequences which are valid UTF-8. For instance, "é" in UTF-8 is seen
as "é" in Mac Roman, so if you have "é" in a Mac Roman-encoded text
it'll be treated as only one character. I'm not sure how high is that
risk for all character combinaisons, but it obviously is less
problematic than the current behaviour is to UTF-8.
Another solution is to omit the 'utf-8' parameter and rely on the PHP
internal encoding to be the same as the input. (The internal encoding
can be set by the user using `mb_internal_encoding('utf-8')`.) Doing
that however implies that PHP Markdown will work with something else
than UTF-8 by default, and I'm not so sure if that's a good idea.
Yet another solution is a distinct configuration variable set to
UTF-8 by default.
Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/
_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss