[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <200409111044.49 [EMAIL PROTECTED]> <[EMAIL PROTECTED]> In-Reply-To: <[EMAIL PROTECTED]>
Paul Rosen writes: | UNICODE Advantage: Any character in any language can be displayed. | UNICODE Disadvantage: Everyone using the structure needs to be UNICODE | aware. Are there systems and computer languages that can't handle it? There is one semi-exception: The UTF-8 encoding has the property that 7-bit ASCII is unchanged. So a dumb program that thinks it's producing 7-bit ASCII is also producing valid UTF-8 text, and doesn't actually need any enhancement. If a file is 7-bit ASCII, you can add a "Content-Type: text/plain; charset=UTF-8" (or the equivalent in whatever markup you're using), and it'll be correct. | That's why I was wondering if there should be some type of switch passed to | the parser about whether to output UNICODE. Good idea. And it would be useful if the first line of an ABC file (or tune) gave both the version of ABC used and the character set. Assuming UTF-8 by default might be a good idea. Allowing the syntax "charset=XYZ" would be a good idea. An ongoing problem, of course, is that when programmers try reading up on unicode, most of the things they read cause them to throw up their hands and decide to wait a few more years until it becomes something that a merely-human programmer can actually understand and maybe even use sanely. I don't think this was the intent of the unicode crowd, and I don't think that unicode is actually all that complicated. But a lot of people can take a simple idea and describe it in a way that it's wonderfully complex and incomprehensible. (My favorite musical example is to inject the phrase "transposing instrument" into a discussion, and watch gleefully as the discussion breaks down into a hopeless muddle of people talking past each other in terms that nobody can quite understand. ;-) To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html