On Wed, Apr 27, 2011 at 5:40 PM, Shiro Kawai <[email protected]> wrote: > Ok, I rehash the argument and make it more a proposal. > > The draft's wording of char-numeric? is confusing, for Unicode doesn't > define "Numeric" property explicitly like "Alphabetic" or "Uppercase" > properties. So I propose to change it. > > There can be a few possible resolutions. > > (1) Define char-numeric? returns #t if the character's Numeric_Type > property value is other than 'None'. This seems a natural > interpretation of the current wording. However, I think it is > practically useless, since it *can't* be used to separate numbers from > a string. Characters whose Numeric_Type isn't 'None' includes > ordinary alphabetic characters (category Lo) that happens to have > meanings related to numbers. For example, '幺' (U+5e7a) has > Numeric_Type = 'Numeric', since the character means small or young, so > it can sometimes mean 1 in some specific context (for Japanese, > probably the only place it means '1' is in some Mah-jong terms.) So, > when I'm scanning a string and found that char-numeric? returns #t for > a character, and that character happens to '幺' (U+5e7a), and then what > I do? It is probably a part of other word so I should treat it as an > alphabetic character. And even if I want to make use of it, I need a > separate database to look up to know what number '幺' is representing. > > (2) Drop char-numeric?, and add char-numeric-type and > char-numeric-value. The former returns the value of Numeric_Type > property, and the latter returns the value of Numeric_Value property. > This should be the way to provide access to a character's Unicode > "Numeric" property. > > (3) Define char-numeric? to return #t only for 0,1,2,3,4,5,6,7,8 and > 9. This retains the compatibility to R5RS, and we can still use > char-numeric? to parse numbers, and safely use (- (char->integer c) > (char->integer #\0)) to obtain the digit value the character > represents. (Note: R5RS programs that use char-numeric? to parse > numbers will break if we adopt the current draft's definition of > char-numeric?).
I'll have more to say about this when I get back from my vacation, but will make a quick comment now. We're unlikely to remove char-numeric?, since that would break R5RS compatibility. We could add char-numeric-type and char-numeric-value in addition to them though. At my work we recently had a case of an application written for English which detected numbers in text by looking for ASCII '0'..'9'. It turns out that we probably want this to apply to all digits in all scripts. That would include the standard ideographic numbers as well as the old accounting numbers (壱 U+58F1, etc.), but probably not with a Unihan numeric value of kOtherNumeric (as with 幺). So a middle ground between (1) and (3) may be desirable. -- Alex _______________________________________________ Scheme-reports mailing list [email protected] http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports
