On Mon, Sep 14, 2020 at 09:49:13PM -0400, James Blachly via Digitalmars-d-learn wrote: > I wish to write a function including ∂x and ∂y (these are trivial to > type with appropriate keyboard shortcuts - alt+d on Mac), but without > a unicode byte order mark at the beginning of the file, the lexer > rejects the tokens. > > It is not apparently easy to insert such marks (AFAICT no common tool > does this specifically), while other languages work fine (i.e., accept > unicode in their source) without it. > > Is there a downside to at least presuming UTF-8?
Tested it locally, with and without BOM; the lexer rejects ∂ as a valid token. I suspect the reason has nothing to do with BOMs, but with the fact that ∂ is not classified as an alphanumeric (see std.uni.isAlpha, which returns false for ∂). The following code, which contains Cyrillic letters, compiles just fine without BOM (std.uni.isAlpha('Ш') returns true): void main() { int Ш = 1; writeln(Ш); } As the docs for std.uni.isAlpha states, it tests for general Unicode category 'Alphabetic'. Probably identifiers are restricted to characters of this category plus the numerics and '_' (and maybe one or two others, perhaps '$'? Don't remember now). T -- People say I'm indecisive, but I'm not sure about that. -- YHL, CONLANG