support for unicode in identifiers

2014-06-01 Thread Vlad Levenfeld via Digitalmars-d-learn
I was pretty happy to find that I could use mu and sigma when 
writing statistical routines, but I've found that for more 
obscure non-ascii characters the support is hit or miss. For 
example, none of the subscripts are valid characters, but I can 
use superscript n as well as dot-notation for derivatives.
I'm using dmd 2.065. What's the story behind the scenes? Is there 
a rationale behind the supported/unsupported or is it 
happenstance? Is there anywhere I can find a list of supported 
characters?


Re: support for unicode in identifiers

2014-06-01 Thread Chris Nicholson-Sauls via Digitalmars-d-learn

On Sunday, 1 June 2014 at 22:26:42 UTC, Vlad Levenfeld wrote:
I was pretty happy to find that I could use mu and sigma when 
writing statistical routines, but I've found that for more 
obscure non-ascii characters the support is hit or miss. For 
example, none of the subscripts are valid characters, but I can 
use superscript n as well as dot-notation for derivatives.
I'm using dmd 2.065. What's the story behind the scenes? Is 
there a rationale behind the supported/unsupported or is it 
happenstance? Is there anywhere I can find a list of supported 
characters?


The allowed characters are those defined as "universal" in 
ISO/IEC 9899 (the C standard).  It's a pretty long list, but 
almost only "alphas;" I'm actually surprised you got superscripts 
and some other things to work.


As I understand it, the intention was a) be like C99, and b) 
allow things like using "stærð" rather than "staerdh."  I'm not 
sure usage like yours was even thought about, although I'd 
concede that it seems reasonable.


Re: support for unicode in identifiers

2014-06-01 Thread Vlad Levenfeld via Digitalmars-d-learn
With unicode support (especially with UCFS) I can really code 
more in the way I think. I never gave it much thought until I 
worked with D, but now that I have I feel it is a bit weird to 
work with epsilons and deltas on paper and "eps" and "del" or 
something on the screen. And what's a more descriptive variable 
name than the symbol used for it in the canonical representations?


So, this may be a very naive question but I wonder, since dmd is 
open source, is there somewhere that the list of supported 
symbols can be extended? (hopefully something trivial to change, 
like a big array literal tucked away somewhere) I'm looking 
through the files labeled 'lexer' and 'utf' and things like that 
on github currently, but nothing's jumped out at me yet.


Re: support for unicode in identifiers

2014-06-01 Thread Vlad Levenfeld via Digitalmars-d-learn

Ah!, found it in utf.h as ALPHA_TABLE