Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Sat, 15 Jan 2011 07:26:12 -0800

On 2011-01-15 09:09:17 -0500, foobar <f...@bar.com> said:

Lutger Blijdestijn Wrote:

Michel Fortin wrote:

On 2011-01-15 05:03:20 -0500, Lutger Blijdestijn
<lutger.blijdest...@gmail.com> said:

...


Is it still possible to solve this problem or are we stuck with
specialized string algorithms? Would it work if VleRange of string was a
bidirectional range with string slices of graphemes as the ElementType
and indexing with code units? Often used string algorithms could be
specialized for performance, but if not, generic algorithms would still
work.


I have my idea.

I think it'd be a good idea is to improve upon Andrei's first idea --
which was to treat char[], wchar[], and dchar[] all as ranges of dchar
elements -- by changing the element type to be the same as the string.
For instance, iterating on a char[] would give you slices of char[],
each having one grapheme.

...

Yes, this is exactly what I meant, but you are much clearer. I hope this can
be made to work!


My two cents are against this kind of design.

The "correct" approach IMO is a 'universal text' type which is a_container_ of said text. This type would provide ranges for thevarious abstraction levels. E.g.

text.codeUnits to iterate by codeUnits

Nothing prevents that in the design I proposed. Andrei's design alreadyimplements "str".byDchar() that would work for code points. I'd suggestchanging the API to by!char(), by!wchar(), and by!cdhar() for when youdeal with whatever kind of code unit or code point you want. This wouldbe mostly symmetric to what you can already do with foreach:


        foreach (char c; "hello") {}
        foreach (wchar c; "hello") {}
        foreach (dchar c; "hello") {}
// same as:
        foreach (c; "hello".by!char()) {}
        foreach (c; "hello".by!wchar()) {}
        foreach (c; "hello".by!dchar()) {}

Here's a (perhaps contrived) example:
Let's say I want to find the combining marks in some text.
For instance, Hebrew uses combining marks for vowels (among otherthings) and they are optional in the language (There's a "full" formwith vowels and a "missing" form without them).I have a Hebrew text with in the "full" form and I want to strip it andconvert it to the "missing" form.
How would I accomplish this with your design?

All you need is a range that takes a string as input and give you codepoints in a decomposed form (NFD), then you use std.algorithm.filter onit:


        // original string
        auto str = "...";

        // create normalized decomposed string as a lazy range of dchar (NFD)
        auto decomposed = decompose(str);

// filter to remove your favorite combining code point (use the hexcode you want)

        auto filtered = filter!"a != 0xFABA"(decomposed);

        // turn it back in composed form (NFC), optional
        auto recomposed = compose(filtered);

        // convert back to a string (could also be wstring or dstring)
        string result = array(recomposed.by!char());

This last line is the one doing everything. All the rest just chainranges together for doing on-the-fly decomposition, filtering, andrecomposition; the last line uses that chain of rage to fill the array.

A more naive implementation not taking advantage of code points butinstead using a replacement table would also work:


        string str = "...";
        string result;
        string[string] replacements = ["é":"e"]; // change this for what you 
want
        foreach (grapheme; str) {
                auto replacement = grapheme in replacements;
                if (replacement)
                        result ~= replacement;
                else
                        result ~= grapheme;
        }
        

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to