Re: Corrigendum #9

2014-06-08 Thread Karl Williamson

On 06/07/2014 10:33 PM, Asmus Freytag wrote:

On 6/7/2014 9:19 PM, Karl Williamson wrote:

On 06/02/2014 11:00 AM, Shawn Steele wrote:

To further my understanding, can someone provide examples of how
these are used in actual practice?  I can't think of any offhand and
the closest I get is like the old escape characters to get a dot
matrix printer to shift modes, or old word processor internal
formatting sequences.



Here's an example of a possible use.  20 some years ago I wrote a
front-end to the Unix diff utility.  Showing the differences between
files (usually 2 versions of the same program's code) is an extremely
common programming activity.  I do it many times a day.  One reason is
to try to find out why a bug has crept in.  In doing so, there are
some differences that are not relevant to the task at hand, and their
being shown is a significant distraction. For example, in programming,
one might have renamed a variable (identifier) because its purpose has
changed somewhat and the name should accurately reflect its new
function so the reader is not subconsciously misled.  It would be nice
to be able to suppress the variable name changes from the difference
display. There could be thousands of them.  By changing the name in
each file version to the same noncharacter during the diff, these
differences won't be displayed, and there would not be any possible
conflict with the input files having that noncharacter in them.  (For
display the noncharacter is changed back to the original value in its
respective file)  Further, one might want to ignore the name changes
of two variables.  Just use a second noncharacter, up to 66.

I wrote this long before noncharacters were available.  What I do
instead is scan the files for rarely used characters until I find
enough ones that aren't in the files.  For example U+9F is unlikely to
appear.  Scanning the files takes time.  This step could be omitted
for noncharacters that are known to be illegal in the input.



This illegal in the input so I'm free to assume I can use them for my
purposes was definitely the primary(!) design goal discussed when the
set of 32 were added to Unicode. Having UTC backpedal from that, many
years after original design, based on a single meeting and without
public review is really a breakdown of the process.

A./


I should note that this front-end to 'diff' changes the input files, 
writes the modified versions out, and calls 'diff' with those modified 
files as its inputs.  By using noncharacters, it would be depending on 
'diff' to 1) not use them, and 2) to not filter them out, and 3) for the 
system to be able to store and retrieve them in files.


I think a revision to the text was advisable to clarify that 2) and 3) 
were acceptable.  I haven't heard anybody on this thread disagree with 
that.


But item 1) shows how tricky this issue really is.  My utility looks 
like a fancier 'diff' to those people who call it, so they would be 
justified in wanting it not to use noncharacters because they have their 
own purposes for them.  If some of those callers were themselves 
utilities, their callers might want to use noncharacters for their own 
purposes.  And so on and so on.


I don't have a good answer, except to say that Asmus' characterization 
above looks reasonable.


The purpose of public reviews is to try to get a broad range of ideas, 
and if none are forthcoming, then the fact that there was such a review 
should be an adequate defense of the ultimate decision.  Not holding a 
review is an invitation to lingering suspicions on the part of the 
public about the motives behind any such decision.  These can fester and 
the trust level is permanently diminished.  There will always be people 
who won't like the decision, and who will assume that the deciders are 
malevolent.  But the vast majority will accept a decision that seems to 
have been made in good faith after public input.


This is just how things work, no matter what the venue or issue.  It may 
be that the UTC thought this was minor enough to not require a review, 
but if so, time has shown that to have been an incorrect perception.

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


RE: Corrigendum #9

2014-06-08 Thread Shawn Steele
 I should note that this front-end to 'diff' changes the input files, writes 
 the modified versions out, and calls 'diff' with those modified files as its 
 inputs.  By using noncharacters, it would be depending on 'diff' to 1) not 
 use them, and 2) to not filter them out, and 3) for the system to be able to 
 store and retrieve them in files.

In my view that is still internal to your apps use of these characters :)

The original text doesn't say that my application cannot store  retrieve them 
from files for internal use.  On the contrary, I'd expect proprietary formats 
for internal use to require that.  I agree that the original text is a bit 
vague on the question of tools to inspect/modify/whatever your internal use.

-Shawn

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Swift

2014-06-08 Thread Norbert Lindenberg
It does allow some usage that may surprise code reviewers – for example, this 
is a valid Swift program:

let s = 
let s︀ = 
let ︀ = 
let all = s + s︀ + ︀

The value of the constant “all” is . Or at least it is as long as mail 
software doesn’t harm the variation selectors…

Norbert


On Jun 5, 2014, at 9:06 , Mark Davis ☕️ m...@macchiato.com wrote:

 I haven't done any analysis, but on first glance it looks like it is based on 
 
 http://www.unicode.org/reports/tr31/#Alternative_Identifier_Syntax
 
 
 Mark
 
 — Il meglio è l’inimico del bene —
 
 
 On Thu, Jun 5, 2014 at 5:46 PM, Jeff Senn s...@maya.com wrote:
 Has anyone figured out whether character sequences that are non-canonical 
 (de)compositions but could be recomposed to the same result
 are the same identifier or not?
 
 That is: are identifiers merely sequences of characters or intended to be 
 comparable as “Unicode strings” (under some sort of compatibility rule)?
 
 On Jun 5, 2014, at 11:27 AM, Martin v. Löwis mar...@v.loewis.de wrote:
 
  Am 04.06.14 11:28, schrieb Andre Schappo:
  The restrictions seem a little like IDNA2008. Anyone have links to
  info giving a detailed explanation/tabulation of allowed and non
  allowed Unicode chars for Swift Variable and Constant names?
 
  The language reference is at
 
  https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/Swift_Programming_Language/LexicalStructure.html
 
  For reference, the definition of identifier-character is (read each
  line as an alternative)
 
  identifier-character → Digit 0 through 9
  identifier-character → U+0300–U+036F, U+1DC0–U+1DFF, U+20D0–U+20FF, or
  U+FE20–U+FE2F
  identifier-character → identifier-head­
 
  where identifier-head is
 
  identifier-head → Upper- or lowercase letter A through Z
  identifier-head → U+00A8, U+00AA, U+00AD, U+00AF, U+00B2–U+00B5, or
  U+00B7–U+00BA
  identifier-head → U+00BC–U+00BE, U+00C0–U+00D6, U+00D8–U+00F6, or
  U+00F8–U+00FF
  identifier-head → U+0100–U+02FF, U+0370–U+167F, U+1681–U+180D, or
  U+180F–U+1DBF
  identifier-head → U+1E00–U+1FFF
  identifier-head → U+200B–U+200D, U+202A–U+202E, U+203F–U+2040, U+2054,
  or U+2060–U+206F
  identifier-head → U+2070–U+20CF, U+2100–U+218F, U+2460–U+24FF, or
  U+2776–U+2793
  identifier-head → U+2C00–U+2DFF or U+2E80–U+2FFF
  identifier-head → U+3004–U+3007, U+3021–U+302F, U+3031–U+303F, or
  U+3040–U+D7FF
  identifier-head → U+F900–U+FD3D, U+FD40–U+FDCF, U+FDF0–U+FE1F, or
  U+FE30–U+FE44
  identifier-head → U+FE47–U+FFFD
  identifier-head → U+1–U+1FFFD, U+2–U+2FFFD, U+3–U+3FFFD, or
  U+4–U+4FFFD
  identifier-head → U+5–U+5FFFD, U+6–U+6FFFD, U+7–U+7FFFD, or
  U+8–U+8FFFD
  identifier-head → U+9–U+9FFFD, U+A–U+AFFFD, U+B–U+BFFFD, or
  U+C–U+CFFFD
  identifier-head → U+D–U+DFFFD or U+E–U+EFFFD
 
  As the construction principle for this list, they say
 
  Identifiers begin with an upper case or lower case letter A through Z,
  an underscore (_), a noncombining alphanumeric Unicode character in the
  Basic Multilingual Plane, or a character outside the Basic Multilingual
  Plan that isn’t in a Private Use Area. After the first character, digits
  and combining Unicode characters are also allowed.
 
  Regards,
  Martin
  ___
  Unicode mailing list
  Unicode@unicode.org
  http://unicode.org/mailman/listinfo/unicode
 
 
 ___
 Unicode mailing list
 Unicode@unicode.org
 http://unicode.org/mailman/listinfo/unicode
 
 ___
 Unicode mailing list
 Unicode@unicode.org
 http://unicode.org/mailman/listinfo/unicode


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode