Le 9 oct. 2006 à 20:34, John Gruber a écrit :

Michel Fortin <[EMAIL PROTECTED]> wrote on 10/9/06 at 8:26 PM:

If anyone is interested in a fix for PHP Markdown, just change
the call to the `strlen` function within detab to a call to
`mb_strlen($line, 'utf-8')`. I'll fix this for the next
version.

Will that still work if people pass in Windows Latin 1 or Mac
Roman-encoded text? Yes, I'm too lazy to try it...

I haven't tried it inside PHP Markdown yet, but I've tested `mb_strlen` and it seems to treat any invalid UTF-8 byte sequence as individual characters. So the neat result is that text in ISO Latin, Windows Latin, or Mac Roman will work fine unless it contains sequences which are valid UTF-8. For instance, "é" in UTF-8 is seen as "√©" in Mac Roman, so if you have "√©" in a Mac Roman-encoded text it'll be treated as only one character. I'm not sure how high is that risk for all character combinaisons, but it obviously is less problematic than the current behaviour is to UTF-8.

Another solution is to omit the 'utf-8' parameter and rely on the PHP internal encoding to be the same as the input. (The internal encoding can be set by the user using `mb_internal_encoding('utf-8')`.) Doing that however implies that PHP Markdown will work with something else than UTF-8 by default, and I'm not so sure if that's a good idea.

Yet another solution is a distinct configuration variable set to UTF-8 by default.


Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/


_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to