On 01/18/2011 04:48 AM, Michel Fortin wrote:
On 2011-01-17 17:54:04 -0500, Michel Fortin <michel.for...@michelf.com>
said:

More seriously, you have four choice:

1. code unit
2. code point
3. grapheme
4. require the client to state explicitly which kind of 'character' he
wants; 'character' being an overloaded word, it's reasonable to ask
for disambiguation.

This makes me think of what I did with my XML parser after you made code
points the element type for strings. Basically, the parser now uses
'front' and 'popFront' whenever it needs to get the next code point, but
most of the time it uses 'frontUnit' and 'popFrontUnit' instead (which I
had to add) when testing for or skipping an ASCII character is
sufficient. This way I avoid a lot of unnecessary decoding of code points.

For this to work, the same range must let you skip either a unit or a
code point. If I were using a separate range with a call to toDchar or
toCodeUnit (or toGrapheme if I needed to check graphemes), it wouldn't
have helped much because the new range would essentially become a new
slice independent of the original, so you can't interleave "I want to
advance by one unit" with "I want to advance by one code point".

So perhaps the best interface for strings would be to provide multiple
range-like interfaces that you can use at the level you want.

I'm not sure if this is a good idea, but I thought I should at least
share my experience.

This looks like a very interesting approach. And clear.
I guess range synchronisation would be based on an internal lowest-level (codeunit) index. Then, you also need internal validity-checking and/or offseting routines when a higher-level range is used after a lowel-level one has been used. (I mean eg to ensure start-of-codepoint after a codeunit popFront, or throw an error.) Also, how to avoid duplicating many operational functions (eg find a given slice) for each level?

Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to