On Wednesday, August 01, 2012 19:50:10 Philippe Sigaud wrote: > On Wed, Aug 1, 2012 at 5:45 PM, Jonathan M Davis <jmdavisp...@gmx.com> wrote: > > "ウェブサイト" > > "\u30A6\u30A7\u30D6\u30B5\u30A4\u30C8" > > > > The encoding of the source file is irrelevant. > > do you mean I can do: > > string field = "ウェブサイト"; > > ? > > Geez, just tested it, it works. even writeln(field) correctly output > the japanese chars and dmd doesn't choke on it. > Bang, back to state 0: I don't get how D strings work.
>From http://dlang.org/lex.html D source text can be in one of the following formats: * ASCII * UTF-8 * UTF-16BE * UTF-16LE * UTF-32BE * UTF-32LE So, yes, you can stick unicode characters directly in D code. Though I wonder about the correctness of the spec here. It claims that if there's no BOM, then it's ASCII, but unless vim inserts BOM markers into all of my .d files, none of them have BOM markers, but I can put unicode in a .d file just fine with vim. U should probably study up on BOMs. In any case, the source is read in whatever encoding it's in. String literals then all become UTF-8 in the final object code unless they're marked as specifically being another type via the postfix letter or they're inferred as being another type (e.g. when you assign a string literal to a dstring). Regardless, what's in the final object code is based on the types that the type system marks strings as, not what the encoding of the source code was. So, a lexer shouldn't care about what the encoding of the source is beyond what it takes to covert it to a format that it can deal with and potentially being written in a way which makes handling a particular encoding more efficient. The values of literals and the like are completely unaffected regardless. - Jonathan M Davis