Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Tue, 11 Jan 2011 19:20:22 -0800

On 2011-01-11 20:28:26 -0500, Steven Wawryk <stev...@acres.com.au> said:

Sorry if I'm jumping inhere without the appropriate background, but Idon't understand why jumping through these hoops are necessary. Pleaselet me know if I'm missing anything.
Many problems can be solved by another layer of indirection. Isn't astring essentially a bidirectional range of code points built on top ofa random access range of code units?

Actually, displaying a UTF-8/UTF-16 string involves a range of ofglyphs layered over a range of graphemes layered over a range of codepoints layered over a range of code units. Glyphs represent the visualcharacters you can get from a font, they often map one-to-one withgraphemes but not always (ligatures for instance). Graphemes are whatpeople generally reason about when they see text (the so called"user-perceived characters"), they often map one-to-one with codepoints but not always (combining marks for instance). Code points are alist of standardized codes representing various elements of a string,and code units basically encode the code points.

If you're writing an XML, JSON or whatever else parser you'll probablycare about code points. If you're advancing the insertion point in atext field or count the number of user-perceived characters you'llprobably want to deal with graphemes. For searching a substring insidea string, or comparing strings you'll probably want to deal with eithergraphemes or collation elements (collation elements are layered on topof code points). To print a string you'll need to map graphemes to theglyphs from a particular font.

Reducing string operations to code points manipulations will only workas long as all your graphemes, collation elements, or glyphs mapone-to-one with code points.

It seems to me that each abstraction separately already fits within theexisting D range framework and all the difficulties arise as aconsequence of trying to lump them into a single abstraction.

It's true that each of these abstraction can fit within the existingrange framework.

Why not choose which of these abstractions is most appropriate in agiven situation instead of trying to shoe-horn both concepts into asingle abstraction, and provide for easy conversion between them? Whencharacter representation is the primary requirement then make it abidirectional range of code points. When storage representation andrandom access is required then make it a random access range of codeunits.

I think you're right. The need for a new concept isn't that great, andit gets complicated really fast.



--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to