Martin v. Löwis wrote: > Scott David Daniels wrote: >> In reading over the source for CPython's PyUnicode_EncodeDecimal, >> I see a dance to handle characters which are neither dec-equiv nor >> in Latin-1. Does anyone know about the intent of such a conversion? > > To support this: > > >>> int(u"\N{DEVANAGARI DIGIT SEVEN}") > 7 OK, That much I have handled. I am fiddling with direct-to-number conversions and wondering about cases like >>> int(u"\N{DEVANAGARI DIGIT SEVEN}" + XXX + u"\N{DEVANAGARI DIGIT SEVEN}")
Where XXX does not pass the digit test, but must either: (A) be dropped, giving a result of 77 or (B) get translated (e.g. to u'234') giving 72347 or (C) get translated (to u'2' + YYY + u'4') where YYY will require further handling ... I don't really understand how the "ignore" or "something_else" cases get caused by python source [where they come from]. Are they only there for C-program access? > In the "ignore" case, no output is produced at all, for the unencodable > character; this is the same way that '?' would be treated (it is > also unencodable). If I understand you correctly -- I can consider the digit stream to stop as soon as I hit a non-digit (except for handling bases 11-36). > In the something_else case, a user-defined exception handler could > treat the error in any way it liked, e.g. encoding all letters > u'A' to digit '0'. This might be different from the way this error > handler would treat '?'. --Scott David Daniels [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list