Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Sat, 15 Jan 2011 20:51:00 -0800

On 2011-01-15 20:49:00 -0500, Jonathan M Davis <jmdavisp...@gmx.com> said:

On Saturday 15 January 2011 04:24:33 Michel Fortin wrote:

I have my idea.

I think it'd be a good idea is to improve upon Andrei's first idea --
which was to treat char[], wchar[], and dchar[] all as ranges of dchar
elements -- by changing the element type to be the same as the string.
For instance, iterating on a char[] would give you slices of char[],
each having one grapheme.

The second component would be to make the string equality operator (=

=)

for strings compare them in their normalized form, so that ("e" with
combining acute accent) == (pre-combined "é"). I think this would m

ake

D support for Unicode much more intuitive.

This implies some semantic changes, mainly that everywhere you write a
"character" you must use double-quotes (string "a") instead of single
quote (code point 'a'), but from the user's point of view that's pretty
much all there is to change.

There'll still be plenty of room for specialized algorithms, but their
purpose would be limited to optimization. Correctness would be taken
care of by the basic range interface, and foreach should follow suit
and iterate by grapheme by default.

I wrote this example (or something similar) earlier in this thread:

        foreach (grapheme; "exposé")
                if (grapheme == "é")
                        break;

In this example, even if one of these two strings use the pre-combined
form of "é" and the other uses a combining acute accent, the equality
would still hold since foreach iterates on full graphemes and =
compares using normalization.

I think that that would cause definite problems. Having the elementtype of the range be the same type as the range seems like it couldcause a lot of problems in std.algorithm and the like, and it's_definitely_ going to confuse programmers. I'd expect it to be highlybug-prone. They _need_ to be separate types.

I remember that someone already complained about this issue because hehad a tree of ranges, and Andrei said he would take a look at thisproblem eventually. Perhaps now would be a good time.

Now, given that dchar can't actually work completely as an elementtype, you'd either need the string type to be a new type or the elementtype to be a new type. So, either the string type has char[], wchar[],or dchar[] for its element type, or char[], wchar[], and dchar[] havesomething like uchar as their element type, where uchar is a structwhich contains a char[], wchar[], or dchar[]
which holds a single grapheme.

Having a new type for grapheme would work too. My preference still goesto reusing the string type because it makes the semantic simpler tounderstand, especially when comparing graphemes with literals.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to