On 2011-01-17 15:49:26 -0500, Andrei Alexandrescu <seewebsiteforem...@erdani.org> said:

On 1/17/11 2:29 PM, Michel Fortin wrote:
The problem I see currently is that you rely on dchar being the element
type. That should be an implementation detail, not something client code
can see or rely on.

But at some point you must be able to talk about individual characters in a text. It can't be something that client code doesn't see!!!

It seems that it can. NSString only exposes individual UTF-16 code units directly (or semi-directly via an accessor method), even though searching and comparing is grapheme-aware. I'm not saying it's a good design, but it certainly can work in practice.

In any case, I didn't mean to say the client code should't be aware of the characters in a string. I meant that the client shouldn't assume the algorithm works at the same layer as ElementType!(string) for a given string type. Even if ElementType!(string) is dchar, the default function you get if you don't use any of toCodeUnit, toDchar, or toGrapheme can work at the dchar or grapheme level if it makes more sense that way.

In other words, the client says: "I have two strings, compare them!" The client didn't specify if they should be compared by char, wchar, dchar, or by normalized grapheme; so we do what's sensible. That's what I call the 'default' string functions, those you get when you don't ask for anything specific. They should have a signature making them able to work at the grapheme level, even though they might not for practical reasons (performance). This way if it becomes more important or practical to support graphemes, it's easy to evolve to them.


SuperDuperText txt;
auto c = giveMeTheFirstCharacter(txt);

What is the type of c? That is visible to the client!

That depends on how you implement the giveMeTheFirstCharacter function. :-)

More seriously, you have four choice:

1. code unit
2. code point
3. grapheme
4. require the client to state explicitly which kind of 'character' he wants; 'character' being an overloaded word, it's reasonable to ask for disambiguation.

You and Walter can't come to understand each other between 1 and 2, regarding foreach and ranges. To keep things consistent with what I said above I'd tend to say 4, but that's weird for something that looks like an array. My second choice goes for 1 when it comes to consistency, and 3 when it comes to correctness, and 2 when it comes to being practical.

Given something is going to be inconsistent either way, I'd say any of the above is acceptable. But please make sure you and Walter agree on the default element type for ranges and foreach.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Reply via email to