[Issue 3455] Some Unicode characters not allowed in identifiers

2015-06-08 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=3455

Andrei Alexandrescu  changed:

   What|Removed |Added

Version|unspecified |D2

--


[Issue 3455] Some Unicode characters not allowed in identifiers

2009-10-30 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=3455


Matti Niemenmaa  changed:

   What|Removed |Added

   Keywords||spec
 CC||matti.niemenmaa+dbugzi...@i
   ||ki.fi
   Platform|Other   |All
 OS/Version|Linux   |All
   Severity|normal  |enhancement


--- Comment #1 from Matti Niemenmaa  
2009-10-30 09:51:09 PDT ---
As http://www.digitalmars.com/d/1.0/lex.html#identifier very clearly states,
the allowed characters in identifiers are those defined in the C99 standard,
ISO/IEC 9899:1999(E) Annex D. Have a look at it:
http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf

9, code point 0xff19, is not in that list. The maximum one is 0xd7a3, in fact. 
This is not a bug, this is an enhancement.

However, rather than an arbitrary and frozen list, I /would/ prefer basing it
simply on Unicode properties, such as Java's choice: identifiers may start with
letters or numeric letters, and may contain, in addition to those, connecting
punctuation, decimal digits, and combining and non-spacing marks. In other
words:

Identifiers may start with code points from the general categories Ll, Lm, Lo,
Lt, Lu, Nl.

Identifiers may contain code points from the general categories Ll, Lm, Lo, Lt,
Lu, Mc, Mn, Nd, Nl, No, Pc.

Java also allows Cc and Cf, of whose usefulness I'm not so convinced. These are
control characters and things like "soft hyphen", which isn't even supposed to
be displayed unless the word line-wraps. Too much potential for confusion IMHO.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 3455] Some Unicode characters not allowed in identifiers

2009-10-30 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=3455



--- Comment #2 from Andrei Alexandrescu  2009-10-30 
11:40:05 PDT ---
(In reply to comment #1)
> As http://www.digitalmars.com/d/1.0/lex.html#identifier very clearly states,
> the allowed characters in identifiers are those defined in the C99 standard,
> ISO/IEC 9899:1999(E) Annex D. Have a look at it:
> http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf
> 
> 9, code point 0xff19, is not in that list. The maximum one is 0xd7a3, in 
> fact. 
> This is not a bug, this is an enhancement.
> 
> However, rather than an arbitrary and frozen list, I /would/ prefer basing it
> simply on Unicode properties, such as Java's choice: identifiers may start 
> with
> letters or numeric letters, and may contain, in addition to those, connecting
> punctuation, decimal digits, and combining and non-spacing marks. In other
> words:
> 
> Identifiers may start with code points from the general categories Ll, Lm, Lo,
> Lt, Lu, Nl.
> 
> Identifiers may contain code points from the general categories Ll, Lm, Lo, 
> Lt,
> Lu, Mc, Mn, Nd, Nl, No, Pc.
> 
> Java also allows Cc and Cf, of whose usefulness I'm not so convinced. These 
> are
> control characters and things like "soft hyphen", which isn't even supposed to
> be displayed unless the word line-wraps. Too much potential for confusion 
> IMHO.

Oh ok. Thanks Matti. I'm leaving this as an enhancement request. Currently the
error message is:

invalid UTF-8 sequence
unsupported char 0x99

This is factually incorrect because the UTF-8 sequence is correct. I suggest
instead:

Unicode character 0xFF19 not allowed in a symbol


Andrei

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 3455] Some Unicode characters not allowed in identifiers

2009-12-12 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=3455


Walter Bright  changed:

   What|Removed |Added

 CC||bugzi...@digitalmars.com


--- Comment #3 from Walter Bright  2009-12-12 
00:17:37 PST ---
I'm slowly becoming convinced that allowing unicode characters in identifiers
is just a bad idea anyway. While there is plenty of interest in writing code
that manipulates unicode and has unicode strings, there is little interest in
writing the code itself in unicode. There's a growing consensus that code
should be written in ascii, for a long list of reasons.

For C compatibility, D should support the C identifiers, but I don't think
there's an advantage to going beyond that. For instance, the unicode character
used in Andrei's test case won't even display properly in Explorer.

I'll fix the error message, then call it resolved.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 3455] Some Unicode characters not allowed in identifiers

2009-12-12 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=3455


Kosmonaut  changed:

   What|Removed |Added

 CC||kosmon...@tempinbox.com


--- Comment #4 from Kosmonaut  2009-12-12 14:21:14 PST 
---
[leandro]Relevant SVN commit:[/leandro]
http://www.dsource.org/projects/dmd/changeset/292

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 3455] Some Unicode characters not allowed in identifiers

2009-12-31 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=3455


Walter Bright  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED


--- Comment #5 from Walter Bright  2009-12-31 
11:11:58 PST ---
Fixed dmd 1.054 and 2.038

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---


[Issue 3455] Some Unicode characters not allowed in identifiers

2009-12-31 Thread d-bugmail
http://d.puremagic.com/issues/show_bug.cgi?id=3455


Ali Cehreli  changed:

   What|Removed |Added

 CC||acehr...@yahoo.com


--- Comment #6 from Ali Cehreli  2009-12-31 16:04:07 PST ---
(In reply to comment #3)
> there is little interest in writing the code itself in unicode.
> There's a growing consensus that code should be written in ascii,
> for a long list of reasons.

Thank you very much for allowing us to program in UTF-8. There is a yet-to-grow
Turkish D community out there who have tremendous joy in being able to program
in Turkish.

I may be in the minority here, but UTF-8 identifiers has been the most
important feature for me to consider D.

Ali

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
--- You are receiving this mail because: ---