On 16-May-02, G. Scott Jones wrote:

> Hi, Carl,

> The idea did look promising, even for the "multi-letter graphemes"
> (like the czech "ch"), but then I believe we run into a limitation
> of 'parse. The longer phrase rule needs to come before the shorter
> one, so that:

> rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"]

> will not correctly sort:

>>> pattern-sort ["c" "ch" "h"]  rule-4
> == ["ch" "c" "h"]
> ;should be "c" "h" "ch"

> At least one other person has mused over the desire to have a
> pattern sort (in this case under the gnu Linux sort) (look near the
> bottom):

> http://budling.nytud.hu/~szigetva/etcetera/converters/README

> In this case, the pattern has a bit more information:
> a=á<b<c<cs<d<e=é<f<g<gy<h<i=í...<z<zs

> where "a" can be told to sort the same as "a with acute", both of
> these sort before "b" ... and "zs" sorts after "z"

> Breaking apart this information might allow a parse rule to set-up
> the sequence to allow the longer phrase rules to come before the
> shorter ones. At least I think it would work.

My first thoughts are that it'd work too, but then we're talking about
my coding here. (;

Anyway, only the order of the parse rules should need to be changed. 
ie, this is what's currently generated...

>> probe rule-4
[some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) | "C"
(r 6) | "h" (r 7) | "H" (r 8) | "ch" (r 9) | "Ch" (r 10) | skip (r
11)]]

Moving the "ch"s to the front of the rule gives us this...

rule-5: [some ["ch" (r 9) | "Ch" (r 10) | "a" (r 1) |
    "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) |
    "C" (r 6) | "h" (r 7) | "H" (r 8) | skip (r 11)
]]

Using that fixes your error above...

>> pattern-sort ["c" "ch" "h"]  rule-5                  
== ["c" "h" "ch"]

though it screws up string sorting big-time...

>> pattern-sort "cchh"  rule-5        
== "bch"

(:  Anyway, I'll see if I can get it to behave, and I'll try out the
speed improvements I thought of as well.

-- 
Carl Read

--
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the
subject, without the quotes.

Reply via email to