Re: Corrigendum #9
On 06/07/2014 10:33 PM, Asmus Freytag wrote: On 6/7/2014 9:19 PM, Karl Williamson wrote: On 06/02/2014 11:00 AM, Shawn Steele wrote: To further my understanding, can someone provide examples of how these are used in actual practice? I can't think of any offhand and the closest I get is like the old escape characters to get a dot matrix printer to shift modes, or old word processor internal formatting sequences. Here's an example of a possible use. 20 some years ago I wrote a front-end to the Unix diff utility. Showing the differences between files (usually 2 versions of the same program's code) is an extremely common programming activity. I do it many times a day. One reason is to try to find out why a bug has crept in. In doing so, there are some differences that are not relevant to the task at hand, and their being shown is a significant distraction. For example, in programming, one might have renamed a variable (identifier) because its purpose has changed somewhat and the name should accurately reflect its new function so the reader is not subconsciously misled. It would be nice to be able to suppress the variable name changes from the difference display. There could be thousands of them. By changing the name in each file version to the same noncharacter during the diff, these differences won't be displayed, and there would not be any possible conflict with the input files having that noncharacter in them. (For display the noncharacter is changed back to the original value in its respective file) Further, one might want to ignore the name changes of two variables. Just use a second noncharacter, up to 66. I wrote this long before noncharacters were available. What I do instead is scan the files for rarely used characters until I find enough ones that aren't in the files. For example U+9F is unlikely to appear. Scanning the files takes time. This step could be omitted for noncharacters that are known to be illegal in the input. This illegal in the input so I'm free to assume I can use them for my purposes was definitely the primary(!) design goal discussed when the set of 32 were added to Unicode. Having UTC backpedal from that, many years after original design, based on a single meeting and without public review is really a breakdown of the process. A./ I should note that this front-end to 'diff' changes the input files, writes the modified versions out, and calls 'diff' with those modified files as its inputs. By using noncharacters, it would be depending on 'diff' to 1) not use them, and 2) to not filter them out, and 3) for the system to be able to store and retrieve them in files. I think a revision to the text was advisable to clarify that 2) and 3) were acceptable. I haven't heard anybody on this thread disagree with that. But item 1) shows how tricky this issue really is. My utility looks like a fancier 'diff' to those people who call it, so they would be justified in wanting it not to use noncharacters because they have their own purposes for them. If some of those callers were themselves utilities, their callers might want to use noncharacters for their own purposes. And so on and so on. I don't have a good answer, except to say that Asmus' characterization above looks reasonable. The purpose of public reviews is to try to get a broad range of ideas, and if none are forthcoming, then the fact that there was such a review should be an adequate defense of the ultimate decision. Not holding a review is an invitation to lingering suspicions on the part of the public about the motives behind any such decision. These can fester and the trust level is permanently diminished. There will always be people who won't like the decision, and who will assume that the deciders are malevolent. But the vast majority will accept a decision that seems to have been made in good faith after public input. This is just how things work, no matter what the venue or issue. It may be that the UTC thought this was minor enough to not require a review, but if so, time has shown that to have been an incorrect perception. ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
RE: Corrigendum #9
I should note that this front-end to 'diff' changes the input files, writes the modified versions out, and calls 'diff' with those modified files as its inputs. By using noncharacters, it would be depending on 'diff' to 1) not use them, and 2) to not filter them out, and 3) for the system to be able to store and retrieve them in files. In my view that is still internal to your apps use of these characters :) The original text doesn't say that my application cannot store retrieve them from files for internal use. On the contrary, I'd expect proprietary formats for internal use to require that. I agree that the original text is a bit vague on the question of tools to inspect/modify/whatever your internal use. -Shawn ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode
Re: Swift
It does allow some usage that may surprise code reviewers – for example, this is a valid Swift program: let s = let s︀ = let ︀ = let all = s + s︀ + ︀ The value of the constant “all” is . Or at least it is as long as mail software doesn’t harm the variation selectors… Norbert On Jun 5, 2014, at 9:06 , Mark Davis ☕️ m...@macchiato.com wrote: I haven't done any analysis, but on first glance it looks like it is based on http://www.unicode.org/reports/tr31/#Alternative_Identifier_Syntax Mark — Il meglio è l’inimico del bene — On Thu, Jun 5, 2014 at 5:46 PM, Jeff Senn s...@maya.com wrote: Has anyone figured out whether character sequences that are non-canonical (de)compositions but could be recomposed to the same result are the same identifier or not? That is: are identifiers merely sequences of characters or intended to be comparable as “Unicode strings” (under some sort of compatibility rule)? On Jun 5, 2014, at 11:27 AM, Martin v. Löwis mar...@v.loewis.de wrote: Am 04.06.14 11:28, schrieb Andre Schappo: The restrictions seem a little like IDNA2008. Anyone have links to info giving a detailed explanation/tabulation of allowed and non allowed Unicode chars for Swift Variable and Constant names? The language reference is at https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html For reference, the definition of identifier-character is (read each line as an alternative) identifier-character → Digit 0 through 9 identifier-character → U+0300–U+036F, U+1DC0–U+1DFF, U+20D0–U+20FF, or U+FE20–U+FE2F identifier-character → identifier-head where identifier-head is identifier-head → Upper- or lowercase letter A through Z identifier-head → U+00A8, U+00AA, U+00AD, U+00AF, U+00B2–U+00B5, or U+00B7–U+00BA identifier-head → U+00BC–U+00BE, U+00C0–U+00D6, U+00D8–U+00F6, or U+00F8–U+00FF identifier-head → U+0100–U+02FF, U+0370–U+167F, U+1681–U+180D, or U+180F–U+1DBF identifier-head → U+1E00–U+1FFF identifier-head → U+200B–U+200D, U+202A–U+202E, U+203F–U+2040, U+2054, or U+2060–U+206F identifier-head → U+2070–U+20CF, U+2100–U+218F, U+2460–U+24FF, or U+2776–U+2793 identifier-head → U+2C00–U+2DFF or U+2E80–U+2FFF identifier-head → U+3004–U+3007, U+3021–U+302F, U+3031–U+303F, or U+3040–U+D7FF identifier-head → U+F900–U+FD3D, U+FD40–U+FDCF, U+FDF0–U+FE1F, or U+FE30–U+FE44 identifier-head → U+FE47–U+FFFD identifier-head → U+1–U+1FFFD, U+2–U+2FFFD, U+3–U+3FFFD, or U+4–U+4FFFD identifier-head → U+5–U+5FFFD, U+6–U+6FFFD, U+7–U+7FFFD, or U+8–U+8FFFD identifier-head → U+9–U+9FFFD, U+A–U+AFFFD, U+B–U+BFFFD, or U+C–U+CFFFD identifier-head → U+D–U+DFFFD or U+E–U+EFFFD As the construction principle for this list, they say Identifiers begin with an upper case or lower case letter A through Z, an underscore (_), a noncombining alphanumeric Unicode character in the Basic Multilingual Plane, or a character outside the Basic Multilingual Plan that isn’t in a Private Use Area. After the first character, digits and combining Unicode characters are also allowed. Regards, Martin ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode ___ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode