Re: Detab should be multi-byte aware?

John Gruber Mon, 09 Oct 2006 20:51:56 -0700

Michel Fortin <[EMAIL PROTECTED]> wrote on 10/9/06 at9:33 PM:

I haven't tried it inside PHP Markdown yet, but I've tested
`mb_strlen` and it seems to treat any invalid UTF-8 byte
sequence as individual characters. So the neat result is that
text in ISO Latin, Windows Latin, or Mac Roman will work fine
unless it contains sequences which are valid UTF-8. For
instance, "é" in UTF-8 is seen as "√©" in Mac Roman, so if you
have "√©" in a Mac Roman-encoded text it'll be treated as only
one character. I'm not sure how high is that risk for all
character combinaisons, but it obviously is less problematic
than the current behaviour is to UTF-8.


That sounds great -- fits right in with my idea that UTF-8 and
only UTF-8 should be officially supported, but other encodings
should "just work" insofar as they've always "just worked" in
Markdown.

It's one of those things where the only time those character
sequences are likely to come up are when you're actually talking
about them as character sequences that look like UTF-8. E.g. this
very message.

Yet another solution is a distinct configuration variable set
to UTF-8 by default.


I think it's simpler and better to just say "use UTF-8".

-J.G.
_______________________________________________
Markdown-Discuss mailing list
Markdown-Discuss@six.pairlist.net
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Re: Detab should be multi-byte aware?

Reply via email to