My thanks to everyone who responded. I learned something new, which is always a good thing, plus my program now works correctly!

Take care,

Jim

----- Original Message ----- From: "Jonathan M Davis" <jmdavisp...@gmx.com>
To: "digitalmars.D.learn" <digitalmars-d-learn@puremagic.com>
Sent: Thursday, October 20, 2011 8:19 PM
Subject: Re: char and string with umlauts


On Thursday, October 20, 2011 09:48 Jim Danley wrote:
I have been a programmer for many years and started using D about one year
back. Suddenly, I find myself in unfamiliar territory. I need to used
Finish umlauts in chars and strings, but they are not part of my usual
American ASCII character set.

Can anyone point me in the right direction? I am getting "Invalid UTF-8
sequence" errors.

I'd have to see code to really say much about what you're doing. But char is a UTF-8 code unit, wchar is a UTF-16 code unit, and dchar is a UTF-32 code unit. For UTF-8 and UTF-16, it can take multiple code units to make a single code point, and a code point is typically what you would consider to be a character
(it's actually possible for one code point to alter another - e.g. add an
accent or superscript to it - so a true character would be what is called a grapheme, but for the most part, you don't need to worry about that; at the moment, D doesn't do anything special to support graphemes). So, when you're
operating on characters in D, you want to operate on dchars, not chars or
wchars, because they're not necessarily complete characters. That's why range- based functions treat all strings as ranges of dchar, even if they're arrays of char or wchar (e.g. front returns a dchar, not a char or wchar). It's also
why when iterating over a string with foreach, you want to specify the
iteration type. e.g.

foreach(dchar c; str)

not

foreach(c; str)

Since iterating over the individual code units really isn't what you want.
Basically, you pretty much never want to operate on an individual char or
wchar. Always make sure that you operate on dchars when operating on
individual characters.

- Jonathan M Davis


Reply via email to