Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Ali Çehreli Wed, 12 Jan 2011 15:05:29 -0800

spir wrote:
> On 01/12/2011 08:28 PM, Don wrote:
>> I think the only problem that we really have, is that "char[]",
>> "dchar[]" implies that code points is always the appropriate level of
>> abstraction.
>
> I'd like to know when it happens that codepoint is the appropriate level
> of abstraction.


When on a document that describes code points... :)

> * If pieces of text are not manipulated, meaning just used in the
> application, or just transferred via the application as is (from file /
> input / literal to any kind of output), then any kind of encoding just
> works. One can even concatenate, provided all pieces use the same
> encoding. --> _lower_ level than codepoint is OK.
> * But any of manipulation (indexing, slicing, compare,

Compare according to which alphabet's ordering? Surely not Unicode's...I may be alone in this, but ordering is tied to an alphabet (or writingsystem), not locale.)


I try to solve that issue with my trileri library:

  http://code.google.com/p/trileri/source/browse/#svn%2Ftrunk%2Ftr

Warning: the code is in Turkish and is not aware of the concept ofcollation at all; it has its own simplistic view of text, where everycharacter is an entity that can be lower/upper cased to a single character.


> search, count,
> replace, not to speak about regex/parsing) requires operating at the
> _higher_ level of characters (in the common sense).

I don't know this about Unicode: should e and ´ (acute accent) be alwayscollated? If so, wouldn't it be impossible to put those two in thatorder say, in a text book? (Perhaps Unicode defines a way to stopcollation.)


> Just like with
> historic character sets in which codes used to represent characters (not
> lower-level thingies as in UCS). Else, one reads, compares, changes
> meaningless bits of text.
>
> As I see it now, we need 2 types:

I think we need more than 2 types...

> * One plain string similar to good old ones (bytestring would do the
> job, since most unicode is utf8 encoded) for the first kind of use

> above. With optional validity check when it's supposed to be unicodetext.

Agreed. D gives us three UTF encondings, but I am not sure that there isonly one abstraction above that.


> * One hiher-level type abstracting from codepoint (not code unit)
> issues, restoring the necessary properties: (1) each character is one
> element in the sequence (2) each character is always represented the
> same way.

I think VLERange should solve only the variable-length-encoding issue.It should not get into higher abstractions.

Ali

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to