On 16-May-02, G. Scott Jones wrote: > Hi, Carl,
> The idea did look promising, even for the "multi-letter graphemes" > (like the czech "ch"), but then I believe we run into a limitation > of 'parse. The longer phrase rule needs to come before the shorter > one, so that: > rule-4: pattern-rule ["a" "A" "b" "B" "c" "C" "h" "H" "ch" "Ch"] > will not correctly sort: >>> pattern-sort ["c" "ch" "h"] rule-4 > == ["ch" "c" "h"] > ;should be "c" "h" "ch" > At least one other person has mused over the desire to have a > pattern sort (in this case under the gnu Linux sort) (look near the > bottom): > http://budling.nytud.hu/~szigetva/etcetera/converters/README > In this case, the pattern has a bit more information: > a=á<b<c<cs<d<e=é<f<g<gy<h<i=í...<z<zs > where "a" can be told to sort the same as "a with acute", both of > these sort before "b" ... and "zs" sorts after "z" > Breaking apart this information might allow a parse rule to set-up > the sequence to allow the longer phrase rules to come before the > shorter ones. At least I think it would work. My first thoughts are that it'd work too, but then we're talking about my coding here. (; Anyway, only the order of the parse rules should need to be changed. ie, this is what's currently generated... >> probe rule-4 [some ["a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) | "C" (r 6) | "h" (r 7) | "H" (r 8) | "ch" (r 9) | "Ch" (r 10) | skip (r 11)]] Moving the "ch"s to the front of the rule gives us this... rule-5: [some ["ch" (r 9) | "Ch" (r 10) | "a" (r 1) | "A" (r 2) | "b" (r 3) | "B" (r 4) | "c" (r 5) | "C" (r 6) | "h" (r 7) | "H" (r 8) | skip (r 11) ]] Using that fixes your error above... >> pattern-sort ["c" "ch" "h"] rule-5 == ["c" "h" "ch"] though it screws up string sorting big-time... >> pattern-sort "cchh" rule-5 == "bch" (: Anyway, I'll see if I can get it to behave, and I'll try out the speed improvements I thought of as well. -- Carl Read -- To unsubscribe from this list, please send an email to [EMAIL PROTECTED] with "unsubscribe" in the subject, without the quotes.