Re: [Python-Dev] Python and the Unicode Character Database
Terry Reedy wrote: On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: My general answers to the questions you have raised are as follows: 1. Each new feature release should use the latest version of the UCD as of the first beta release (or perhaps a week or so before). New chars are new features and the beta period can be used to (hopefully) iron out any bugs introduced by a new UCD version. The UCD is versioned just like Python is, so if the Unicode Consortium decides to ship a 5.2.1 version of the UCD, we can add that to Python 2.7.x, since Python 2.7 started out with 5.2.0. 2. The language specification should not be UCD version specific. Martin pointed out that the definition of identifiers was intentionally written to not be, bu referring to 'current version' or some such. On the other hand, the UCD version used should be programatically discoverable, perhaps as an attribute of sys or str. It already is and has been for while, e.g. Python 2.5: import unicodedata unicodedata.unidata_version '4.1.0' 3.. The UCD should not change in bugfix releases. New chars are new features. Adding them in bugfix releases will introduce gratuitous imcompatibilities between releases. People who want the latest Unicode should either upgrade to the latest Python version or patch an older version (but not expect core support for any problems that creates). See above. Patch level revisions of the UCD are fine for patch level releases of Python, since those patch level revisions of the UCD fix bugs just like we do in Python. Note that each new UCD major.minor version is a new standard on its own, so it's perfectly ok to stick with one such standard version per Python version. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 01 2010) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Martin v. Löwis wrote: Am 30.11.2010 21:24, schrieb Ben Finney: haiyang kang corn...@gmail.com writes: I think it is a little ugly to have code like this: num = float(一.一), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that change your assessment of whether non-ASCII digits are used? I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. You would need a number of key strokes to enter each individual ideograph, plus you have to press the keys for keyboard layout switching to enter the Latin decimal separator (which you normally wouldn't use along with the Han numerals). That's a somewhat limited view, IMHO. Numbers are not always entered using a computer keyboard, you have tool like cash registries, special numeric keypads, scanners, OCR, etc. for external entry, and you also have other programs producing such output, e.g. MS Office if configured that way. The argument with the decimal point doesn't work well either, since it's obvious that float() and int() do not support localized input. E.g. in Germany we write 3,141 instead of 3.141: float('3,141') Traceback (most recent call last): File stdin, line 1, in module ValueError: invalid literal for float(): 3,141 No surprise there. The localization of the input data, e.g. removal of thousands separators and conversion of decimal marks to the dot, have to be done by the application, just like you have to now for German floating point number literals. The locale module already has locale.atof() and locale.atoi() for just this purpose. FYI, here's a list of decimal digits supported by Python 2.7: http://www.unicode.org/Public/5.2.0/ucd/extracted/DerivedNumericType.txt: 0030..0039; Decimal # Nd [10] DIGIT ZERO..DIGIT NINE 0660..0669; Decimal # Nd [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE 06F0..06F9; Decimal # Nd [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED ARABIC-INDIC DIGIT NINE 07C0..07C9; Decimal # Nd [10] NKO DIGIT ZERO..NKO DIGIT NINE 0966..096F; Decimal # Nd [10] DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE 09E6..09EF; Decimal # Nd [10] BENGALI DIGIT ZERO..BENGALI DIGIT NINE 0A66..0A6F; Decimal # Nd [10] GURMUKHI DIGIT ZERO..GURMUKHI DIGIT NINE 0AE6..0AEF; Decimal # Nd [10] GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE 0B66..0B6F; Decimal # Nd [10] ORIYA DIGIT ZERO..ORIYA DIGIT NINE 0BE6..0BEF; Decimal # Nd [10] TAMIL DIGIT ZERO..TAMIL DIGIT NINE 0C66..0C6F; Decimal # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE 0CE6..0CEF; Decimal # Nd [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE 0D66..0D6F; Decimal # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE 0E50..0E59; Decimal # Nd [10] THAI DIGIT ZERO..THAI DIGIT NINE 0ED0..0ED9; Decimal # Nd [10] LAO DIGIT ZERO..LAO DIGIT NINE 0F20..0F29; Decimal # Nd [10] TIBETAN DIGIT ZERO..TIBETAN DIGIT NINE 1040..1049; Decimal # Nd [10] MYANMAR DIGIT ZERO..MYANMAR DIGIT NINE 1090..1099; Decimal # Nd [10] MYANMAR SHAN DIGIT ZERO..MYANMAR SHAN DIGIT NINE 17E0..17E9; Decimal # Nd [10] KHMER DIGIT ZERO..KHMER DIGIT NINE 1810..1819; Decimal # Nd [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE 1946..194F; Decimal # Nd [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE 19D0..19DA; Decimal # Nd [11] NEW TAI LUE DIGIT ZERO..NEW TAI LUE THAM DIGIT ONE 1A80..1A89; Decimal # Nd [10] TAI THAM HORA DIGIT ZERO..TAI THAM HORA DIGIT NINE 1A90..1A99; Decimal # Nd [10] TAI THAM THAM DIGIT ZERO..TAI THAM THAM DIGIT NINE 1B50..1B59; Decimal # Nd [10] BALINESE DIGIT ZERO..BALINESE DIGIT NINE 1BB0..1BB9; Decimal # Nd [10] SUNDANESE DIGIT ZERO..SUNDANESE DIGIT NINE 1C40..1C49; Decimal # Nd [10] LEPCHA DIGIT ZERO..LEPCHA DIGIT NINE 1C50..1C59; Decimal # Nd [10] OL CHIKI DIGIT ZERO..OL CHIKI DIGIT NINE A620..A629; Decimal # Nd [10] VAI DIGIT ZERO..VAI DIGIT NINE A8D0..A8D9; Decimal # Nd [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE A900..A909; Decimal # Nd [10] KAYAH LI DIGIT ZERO..KAYAH LI DIGIT NINE A9D0..A9D9; Decimal # Nd [10] JAVANESE DIGIT ZERO..JAVANESE DIGIT NINE AA50..AA59; Decimal # Nd [10] CHAM DIGIT ZERO..CHAM DIGIT NINE ABF0..ABF9; Decimal # Nd [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT NINE FF10..FF19; Decimal # Nd [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE 104A0..104A9 ; Decimal # Nd [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE 1D7CE..1D7FF ; Decimal # Nd [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE The Chinese and Japanese ideographs are not supported because of the way they are defined in the Unihan database. I'm currently investigating how we could support them as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source
Re: [Python-Dev] Python and the Unicode Character Database
Terry Reedy wrote: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...) for example. I do not think that anyone, at least not me, has argued for anything other than 0-9 digits (or 0-f for hex) in literals in program code. The only issue is whether non-programmer *users* should be able to use their native digits in applications in response to input prompts. Me neither. This is solely about Python being able to parse numeric input in the float(), int() and complex() constructors. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 01 2010) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Martin v. Löwis wrote: Am 30.11.2010 23:43, schrieb Terry Reedy: On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese ASCII numerals or Arabic cursive numerals in for i in range(...) for example. I do not think that anyone, at least not me, has argued for anything other than 0-9 digits (or 0-f for hex) in literals in program code. The only issue is whether non-programmer *users* should be able to use their native digits in applications in response to input prompts. And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it was supposedly meant to enable. By that argument, English speakers wanting to enter integers using Arabic numerals can't either! I'd like to use grouping for large literals, if only I could think of a half-decent syntax, and if only Python supported it. This fails on both counts: x = 123_456_789_012_345 The lack of grouping and the lack of a native decimal point doesn't mean that the feature doesn't work -- it merely means the feature requires some compromise before it can be used. In the same way, if I wanted to enter a number using non-Arabic digits, it works provided I compromise by using the Anglo-American decimal point instead of the European comma or the native decimal point I might prefer. The lack of support for non-dot decimal points is arguably a bug that should be fixed, not a reason to remove functionality. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
Nick Coghlan wrote: For the directory-as-module-not-package idea ... you would need to be very careful with it, since all the files would be sharing a common globals() namespace. One of the things I like about Python's module system is that once I know which module a name was imported from, I also know which file to look in for its definition. If a module can be spread over several files, that feature would be lost. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On Wed, Dec 1, 2010 at 8:22 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Nick Coghlan wrote: For the directory-as-module-not-package idea ... you would need to be very careful with it, since all the files would be sharing a common globals() namespace. One of the things I like about Python's module system is that once I know which module a name was imported from, I also know which file to look in for its definition. If a module can be spread over several files, that feature would be lost. There are many potential problems with the idea, I just chose to mention one of the ones that could easily make the affected code *break* :) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull step...@xemacs.org wrote: Sure you can. In Python program text, all keywords will be ASCII Yes, yes, sure, but not the contents of variables, I see no reason not to make a similar promise for numeric literals. Wait what, literas? The example was float('١٢٣٤.٥٦') Which doesn't have any numeric literals in them at all. Do that work? Nope, it's a syntax error. Too badm that would have been cool, but whatever. Why would this be a problem: T1234 = float('١٢٣٤.٥٦') T1234 1234.56 But this OK? T١٢٣٤ = float('1234.56') T١٢٣٤ 1234.56 I don't see that. Should we bother to implement ١٢٣٤.٥٦ as a literal equivalent to 1234.56? Well, not unless somebody askes for it, or it turns out to be easy. :-) But that's another question. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On 12/01/2010 04:39 AM, Nick Coghlan wrote: On Wed, Dec 1, 2010 at 8:22 PM, Greg Ewinggreg.ew...@canterbury.ac.nz wrote: Nick Coghlan wrote: For the directory-as-module-not-package idea ... you would need to be very careful with it, since all the files would be sharing a common globals() namespace. One of the things I like about Python's module system is that once I know which module a name was imported from, I also know which file to look in for its definition. If a module can be spread over several files, that feature would be lost. There are many potential problems with the idea, I just chose to mention one of the ones that could easily make the affected code *break* :) Right. It would require additional pieces as well. Ron :-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg m...@egenix.com wrote: .. With Python 3.1: exec('\u0CF1 = 1') Traceback (most recent call last): File stdin, line 1, in module File string, line 1 ೱ = 1 ^ SyntaxError: invalid character in identifier but with Python 3.2a4: exec('\u0CF1 = 1') eval('\u0CF1') 1 Such changes are not new, but I agree that they should probably be highlighted in the What's new in Python x.x. As of today, What’s New In Python 3.2 [1] does not even mention the unicodedata upgrade to 6.0.0. Here are the features form the unicode.org summary [2] that I think should be reflected in Python's What's New document: * adds 2,088 characters, including over 1,000 additional symbols—chief among them the additional emoji symbols, which are especially important for mobile phones; * corrects character properties for existing characters including - a general category change to two Kannada characters (U+0CF1, U+0CF2), which has the effect of making them newly eligible for inclusion in identifiers; - a general category change to one New Tai Lue numeric character (U+19DA), which would have the effect of disqualifying it from inclusion in identifiers unless grandfathering measures are in place for the defining identifier syntax. The above may be too verbose for inclusion to What’s New In Python 3.2, but I think we should add a possibly shorter summary with a link to unicode.org for details. PS: Yes, I think everyone should know about the Python 3.2 killer feature: ('\N{CAT FACE WITH WRY SMILE}'! [1] http://docs.python.org/dev/whatsnew/3.2.html [2] http://www.unicode.org/versions/Unicode6.0.0/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On 12/1/2010 12:55 PM, Alexander Belopolsky wrote: On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburgm...@egenix.com wrote: .. With Python 3.1: exec('\u0CF1 = 1') Traceback (most recent call last): File stdin, line 1, inmodule File string, line 1 ೱ = 1 ^ SyntaxError: invalid character in identifier but with Python 3.2a4: exec('\u0CF1 = 1') eval('\u0CF1') 1 Such changes are not new, but I agree that they should probably be highlighted in the What's new in Python x.x. As of today, What’s New In Python 3.2 [1] does not even mention the unicodedata upgrade to 6.0.0. Here are the features form the unicode.org summary [2] that I think should be reflected in Python's What's New document: * adds 2,088 characters, including over 1,000 additional symbols—chief among them the additional emoji symbols, which are especially important for mobile phones; * corrects character properties for existing characters including - a general category change to two Kannada characters (U+0CF1, U+0CF2), which has the effect of making them newly eligible for inclusion in identifiers; - a general category change to one New Tai Lue numeric character (U+19DA), which would have the effect of disqualifying it from inclusion in identifiers unless grandfathering measures are in place for the defining identifier syntax. The above may be too verbose for inclusion to What’s New In Python 3.2, I think those 11 lines are pretty good. Put them in ('\N{CAT FACE WITH WRY SMILE}'! Plus give a link to Unicode site (Issue numbers are implicit links). -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On Wed, Dec 1, 2010 at 12:51, Prashant Kumar contactprashan...@gmail.comwrote: Hello everyone. My name is Prashant. I and my friend Zubin recently ported 'Configobj'. It would be great if somebody can suggest about any utilities or scripts that are being widely used and need to be ported. http://onpython3yet.com/ might be helpful to you. It orders the projects on PyPI with the most dependencies which are not yet ported to 3.x. Note that there are a number of false positives, e.g., the first result -- NumPy, since people don't seem to keep their classifiers up-to-date. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
http://onpython3yet.com/ might be helpful to you. It orders the projects on PyPI with the most dependencies which are not yet ported to 3.x. Note that there are a number of false positives, e.g., the first result -- NumPy, since people don't seem to keep their classifiers up-to-date. That could be a nice list. But quite disturbing content, as Python, the programming language is stated as not being ported to 3.0. Does not really provoke trust. Harald -- GHUM GmbH Harald Armin Massa Spielberger Straße 49 70435 Stuttgart 0173/9409607 Amtsgericht Stuttgart, HRB 734971 - persuadere. et programmare ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On Wed, 1 Dec 2010 13:02:00 -0600 Brian Curtin brian.cur...@gmail.com wrote: On Wed, Dec 1, 2010 at 12:51, Prashant Kumar contactprashan...@gmail.comwrote: Hello everyone. My name is Prashant. I and my friend Zubin recently ported 'Configobj'. It would be great if somebody can suggest about any utilities or scripts that are being widely used and need to be ported. http://onpython3yet.com/ might be helpful to you. It orders the projects on PyPI with the most dependencies which are not yet ported to 3.x. I don't know who did that page but it seems like there's some FUD there. simplejson, ctypes, pysqlite and others are available in the 3.x stdlib. Mercurial is a command-line tool and doesn't need to be ported to be used for Python 3 projects. setuptools is supplanted by distribute which should Python 3 compatible. And I'm not sure what this package called Python is (“a high-level object-oriented programming language”? like Java?), but I'm pretty sure I've heard there's a Python 3 compatible version. (granted, it's probably less FUD than stupid automation) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On Wed, Dec 1, 2010 at 13:17, Antoine Pitrou solip...@pitrou.net wrote: On Wed, 1 Dec 2010 13:02:00 -0600 Brian Curtin brian.cur...@gmail.com wrote: On Wed, Dec 1, 2010 at 12:51, Prashant Kumar contactprashan...@gmail.comwrote: Hello everyone. My name is Prashant. I and my friend Zubin recently ported 'Configobj'. It would be great if somebody can suggest about any utilities or scripts that are being widely used and need to be ported. http://onpython3yet.com/ might be helpful to you. It orders the projects on PyPI with the most dependencies which are not yet ported to 3.x. I don't know who did that page but it seems like there's some FUD there. simplejson, ctypes, pysqlite and others are available in the 3.x stdlib. It grabs the info from their PyPI pages, which are probably not kept up-to-date. This was brought up at a local user group meeting and I think it can be a useful tool, but as you can see it requires good input data which isn't always the case for some packages. Package authors: if you spent time making your project work on 3.x -- let the world know, update your classifiers. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Deprecating undocumented, unused functions in difflib.
Difflib.SequenceMatcher object currently get two feature attributes: self.isbjunk = junk.__contains__ self.isbpopular = popular.__contains__ Tim Peters agrees that the junk and popular sets should be directly exposed and documented as part of the api, thereby making the functions redundant. The two functions are not currently documented (and should not be now). A google codesearch of 'isbjunk' and 'isbpopular' only returns hits in difflib.py itself (and its predecessor, ndiff.py). It would be easiest to just remove the two lines above. Or should I define functions _xxx names that issue a deprecation warning and attach them as attributes to each object? (Defining instance methods would not be the same). There is only one internal use of one of the two functions which is easily edited. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Preview] Comments and change proposals on documentation
I think it looks great.! If you are looking for some suggestions to make it a little more elegant: 1. If I delete a comment that has no children, it should remove it completely (currently, it just replaces it with [deleted]). If there are children, I think it is doing the right thing. 2. When I post a comment, it should automatically vote that comment up. I wouldn't have posted it if I didn't like it. 3. As far as text formatting, I personally think there should be some hilighting support for code spans/blocks (IMO that should match the idle colors). Also, I seemed to manage to trigger a visible system warning in my badly formatted comment on math.fabs(x), :) -Daniel ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it was supposedly meant to enable. By that argument, English speakers wanting to enter integers using Arabic numerals can't either! That's correct, and the key point here for the argument. It's just not *meant* to support localized number forms, but deliberately constrains them to a formal grammar which users using it must be aware of in order to use it. I'd like to use grouping for large literals, if only I could think of a half-decent syntax, and if only Python supported it. This fails on both counts: x = 123_456_789_012_345 Here you are confusing issues, though: this fragment uses the syntax of the Python programming language. Whether or not the syntax of the float() constructor arguments matches that syntax is also a subject of the debate. I take it that you speak in favor of the float syntax also being used for the float() constructor. The lack of grouping and the lack of a native decimal point doesn't mean that the feature doesn't work -- it merely means the feature requires some compromise before it can be used. No, it means that the Python programming language syntax for floating point numbers just doesn't take local notation into account *at all*. This is not a flaw - it just means that this feature is non-existent. Now, for the float() constructor, some people in this thread have claimed that it *is* aimed at people who want to enter numbers in their local spellings. I claim that this feature either doesn't work, or is absent also. In the same way, if I wanted to enter a number using non-Arabic digits, it works provided I compromise by using the Anglo-American decimal point instead of the European comma or the native decimal point I might prefer. Why would you want that, if, what you really wanted, could not be done. There certainly *is* a way to convert strings into floats, and there would be a way if that restricted itself to the digits 0..9. So it can't be the mere desire to convert strings to float that make you ask for non-ASCII digits. The lack of support for non-dot decimal points is arguably a bug that should be fixed, not a reason to remove functionality. I keep repeating my two concerns: a) if that was a feature, it is not specified at all in the documentation. In fact, the documentation was recently clarified to deny existence of that feature. b) fixing it will be much more difficult than you apparently think. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. Who's talking about *entering* it into the program at a keyboard directly, though? Input to a program can come from all kinds of crazy sources. Just because it wasn't typed by the person at the keyboard using this program doesn't stop it being input to the program. I think haiyang kang claimed exactly that - it won't ever be input to a program. I trust him on that - and so should you, unless you have sufficient experience with the Chinese language and writing system. Note that I'm not saying this is common. Nor am I saying it's a desirable situation. I'm saying it is a feasible use case, to be dismissed only if there is strong evidence that it's not used by existing Python code. And indeed, for the Chinese numerals, we have such strong evidence. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
As of today, What’s New In Python 3.2 [1] does not even mention the unicodedata upgrade to 6.0.0. One reason was that I was instructed not to change What's New a few years ago. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
Am 01.12.2010 20:02, schrieb Brian Curtin: On Wed, Dec 1, 2010 at 12:51, Prashant Kumar contactprashan...@gmail.com mailto:contactprashan...@gmail.com wrote: Hello everyone. My name is Prashant. I and my friend Zubin recently ported 'Configobj'. It would be great if somebody can suggest about any utilities or scripts that are being widely used and need to be ported. http://onpython3yet.com/ might be helpful to you. Another such list is at http://www.python.org/3kpoll Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Martin v. Löwis wrote: I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. Who's talking about *entering* it into the program at a keyboard directly, though? Input to a program can come from all kinds of crazy sources. Just because it wasn't typed by the person at the keyboard using this program doesn't stop it being input to the program. I think haiyang kang claimed exactly that - it won't ever be input to a program. I trust him on that - and so should you, unless you have sufficient experience with the Chinese language and writing system. Note that I'm not saying this is common. Nor am I saying it's a desirable situation. I'm saying it is a feasible use case, to be dismissed only if there is strong evidence that it's not used by existing Python code. And indeed, for the Chinese numerals, we have such strong evidence. With full respect to haiyang kang, hear-say from one person can hardly be described as strong evidence -- particularly, as Alexander Belopolsky pointed out, the use-case described isn't currently supported by Python. Given that what haiyang kang describes *can't* be done, the fact that people don't do it is hardly surprising -- nor is it a good reason for taking away functionality that does exist. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Wed, Dec 1, 2010 at 5:36 PM, Martin v. Löwis mar...@v.loewis.de wrote: .. Note that I'm not saying this is common. Nor am I saying it's a desirable situation. I'm saying it is a feasible use case, to be dismissed only if there is strong evidence that it's not used by existing Python code. And indeed, for the Chinese numerals, we have such strong evidence. Indeed: it over 10 years that Python's int() accepted Arabic-Indic numerals, nobody has complained that it *did not* accept Chinese. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Martin v. Löwis wrote: And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it was supposedly meant to enable. By that argument, English speakers wanting to enter integers using Arabic numerals can't either! That's correct, and the key point here for the argument. It's just not *meant* to support localized number forms, but deliberately constrains them to a formal grammar which users using it must be aware of in order to use it. You're *agreeing* that English speakers can't enter integers using Arabic numerals? What do you think I'm doing when I do this? int(1234) 1234 Ah wait... did you think I meant Arabic numerals in the sense of digits used by Arabs in Arabia? I meant Arabic numerals as opposed to Roman numerals. Sorry for the confusion. Your argument was that even though Python's int() supports many non-ASCII digits, the lack of grouping means that it doesn't actually work. If that argument were correct, then it applies equally to ASCII digits as well. It's clearly nonsense to say that int(1234) doesn't work just because of the lack of grouping. It's equally nonsense to say that int(١٢٣٤) doesn't work because of the lack of grouping. [...] I take it that you speak in favor of the float syntax also being used for the float() constructor. I'm sorry, I don't understand what you mean here. I've repeatedly said that the syntax for numeric literals should remain constrained to the ASCII digits, as it currently is. n = ١٢٣٤ gives a SyntaxError, and I don't want to see that change. But I've also argued that the float constructor currently accepts non-ASCII strings: n = int(١٢٣٤) we should continue to support the existing behaviour. None of the arguments against it seem convincing to me, particularly since the opponents of the current behaviour admit that there is a use-case for it, but they just want it to move elsewhere, such as the locale module. We've even heard from one person -- I forget who, sorry -- who claimed that C++ has the same behaviour, and if you want ASCII-only digits, you have to explicitly ask for it. For what it's worth, Microsoft warns developers not to assume users will enter numeric data using ASCII digits: Number representation can also use non-ASCII native digits, so your application may encounter characters other than 0-9 as inputs. Avoid filtering on U+0030 through U+0039 to prevent frustration for users who are trying to enter data using non-ASCII digits. http://msdn.microsoft.com/en-us/magazine/cc163506.aspx There was a similar discussion going on in Perl-land recently: http://www.nntp.perl.org/group/perl.perl5.porters/2010/07/msg162400.html although, being Perl, the discussion was dominated by concerns about regexes and implicit conversions, rather than an explicit call to float() or int() as we are discussing here. [...] In the same way, if I wanted to enter a number using non-Arabic digits, it works provided I compromise by using the Anglo-American decimal point instead of the European comma or the native decimal point I might prefer. Why would you want that, if, what you really wanted, could not be done. There certainly *is* a way to convert strings into floats, and there would be a way if that restricted itself to the digits 0..9. So it can't be the mere desire to convert strings to float that make you ask for non-ASCII digits. Why do Europeans use programming languages that force them to use a dot instead of a comma for the decimal place? Why do I misspell string.centre as string.center? Because if you want to get something done, you use the tools you have and not the tools you'd like to have. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Lennart Regebro writes: On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull step...@xemacs.org wrote: Sure you can. In Python program text, all keywords will be ASCII Yes, yes, sure, but not the contents of variables, Irrelevant, you're not converting these to a string representation. If you're generating numerals for internal use, I don't see why you would want to do arithmetic on them; conversion is a YAGNI. This is only interesting to allow naive users to input in a comfortable way. As yet there is no evidence that there are *any* such naive users, 1.3 billion of possibles are shut out, and at least two cultures which use non-ASCII numerals every day, representing 1.3 billion naive users (the coincidence of numbers is no coincidence), have reported that nobody in their right mind would would *input* the numbers that way, and at least for Japanese, the use cases are not really numeric anyway. I see no reason not to make a similar promise for numeric literals. Wait what, literas? Sorry, my bad. Why would this be a problem: T1234 = float('.~~') T1234 1234.56 But this OK? T = float('1234.56') T 1234.56 (Sorry, the Arabic is going to get munged, my mailer is beta and somebody screwed up.) Because the characters in the identifier are uninterpreted and have no syntactic content other than their identity. They're arbitrary. That's not true of numerics. Because that works, but print(T1234) doesn't (it prints ASCII). You can't round-trip, but users will want/expect that. Because that works but this doesn't: T1000 = float('一.◯◯◯') Violates TOOWTDI. If you're proposing to fix the numeric parsers, I still don't like it but I could go to -0 on it. However as Alexander points out and MAL admits, it's apparently not so easy to do that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Wed, Dec 1, 2010 at 7:17 PM, Steven D'Aprano st...@pearwood.info wrote: .. we should continue to support the existing behaviour. None of the arguments against it seem convincing to me, particularly since the opponents of the current behaviour admit that there is a use-case for it, but they just want it to move elsewhere, such as the locale module. I don't remember who made this argument, but I think you misunderstood it. The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how Eastern Arabic numerals are written. So far nobody has even claimed to know conclusively that Arabic-Indic digits are always written left-to-right. unicodedata.bidirectional('٤') 'AN' is not very helpful because it means any Arabic-Indic digit according to unicode.org. (To me, a special category hints that it may be written in either direction and the proper interpretation may depend on context.) I have not seen a real use case reported in this thread and for theoretical use cases, the current implementation is either outright wrong or does not solve the problem completely. Given that a function that replaces all Unicode digits in a string with 0-9 can be written in 3 lines of Python code, it is very unlikely that anyone would prefer to rely on undocumented behavior of Python builtins instead of having explicit control over parsing of their data. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Steven D'Aprano writes: With full respect to haiyang kang, hear-say from one person can hardly be described as strong evidence That's *disrespectful* nonsense. What Haiyang reported was not hearsay, it's direct observation of what he sees around him and personal experience, plus extrapolation. Look up hearsay, please. Furthermore, he provided good *objective* reason (excessive cost, to which I can also testify, in several different input methods for Japanese) why numbers simply would not be input that way. What's left is copy/paste via the mouse. I assure you, every day I see dozens of Japanese copy/pasting *only* ASCII numerals, and the sales figures for Microsoft Excel (not to mention the download numbers for Open Office) strongly suggest that 30 million Japanese salarymen are similarly dedicated to ASCII. (That's not hearsay either, that's direct observation and extrapolation, which is more than the we need float to translate Arabic supporters can offer.) I have seen only *one* use case: it's a toy for sophisticated programmers who want to think of themselves as broadminded. We've seen several examples of that in this thread, so I can't deny that is a real use case. Please, give us just *one* more real use case that isn't somebody might. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On 01/12/2010 19:17, Antoine Pitrou wrote: On Wed, 1 Dec 2010 13:02:00 -0600 Brian Curtinbrian.cur...@gmail.com wrote: On Wed, Dec 1, 2010 at 12:51, Prashant Kumarcontactprashan...@gmail.comwrote: Hello everyone. My name is Prashant. I and my friend Zubin recently ported 'Configobj'. It would be great if somebody can suggest about any utilities or scripts that are being widely used and need to be ported. http://onpython3yet.com/ might be helpful to you. It orders the projects on PyPI with the most dependencies which are not yet ported to 3.x. I don't know who did that page but it seems like there's some FUD there. simplejson, ctypes, pysqlite and others are available in the 3.x stdlib. Mercurial is a command-line tool and doesn't need to be ported to be used for Python 3 projects. setuptools is supplanted by distribute which should Python 3 compatible. And I'm not sure what this package called Python is (“a high-level object-oriented programming language”? like Java?), but I'm pretty sure I've heard there's a Python 3 compatible version. (granted, it's probably less FUD than stupid automation) From what I can tell it simply looks at dependencies and availability of those dependencies with a Python 3 trove classification. Some manual filtering may well be useful. It is well *possible* that there are packages with a runtime dependency on libraries in mercurial however. Those would need mercurial porting to Python 3 if they are to run on Python 3. If they simply shell out to mercurial that wouldn't be the case. Michael Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.
On Thu, Dec 2, 2010 at 6:23 AM, Terry Reedy tjre...@udel.edu wrote: It would be easiest to just remove the two lines above. Or should I define functions _xxx names that issue a deprecation warning and attach them as attributes to each object? (Defining instance methods would not be the same). Given that functions are converted to bound methods only on retrieval from an instance, why wouldn't it be the same? But yes, if you want to get rid of them, then deprecation for 3.2 and removal in 3.3 is the way to go. Alternatively, not deprecating them at all and just leaving them undocumented with a comment in the source to say they have been deliberately omitted from the docs would also be fine. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.
On 01/12/2010 20:23, Terry Reedy wrote: Difflib.SequenceMatcher object currently get two feature attributes: self.isbjunk = junk.__contains__ self.isbpopular = popular.__contains__ Tim Peters agrees that the junk and popular sets should be directly exposed and documented as part of the api, thereby making the functions redundant. The two functions are not currently documented (and should not be now). A google codesearch of 'isbjunk' and 'isbpopular' only returns hits in difflib.py itself (and its predecessor, ndiff.py). It would be easiest to just remove the two lines above. Or should I define functions _xxx names that issue a deprecation warning and attach them as attributes to each object? (Defining instance methods would not be the same). There is only one internal use of one of the two functions which is easily edited. I would still be tempted to go through a single release of deprecation. You can add a test that the names are gone if the version of Python is 3.3. When the tests start failing the code and the tests can be ripped out. All the best, Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Stephen J. Turnbull step...@xemacs.org writes: Furthermore, he provided good *objective* reason (excessive cost, to which I can also testify, in several different input methods for Japanese) why numbers simply would not be input that way. What's left is copy/paste via the mouse. For direct entry by an interactive user, yes. Why are some people in this discussion thinking only of direct entry by an interactive user? Input to a program comes from various sources other than direct entry by the interactive user, as has been pointed out many times. Please, give us just *one* more real use case that isn't somebody might. Input from an existing text file, as I said earlier. Or any other way of text data making its way into a Python program. Direct entry at the console is a red herring. -- \ “First things first, but not necessarily in that order.” —The | `\ Doctor, _Doctor Who_ | _o__) | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.
On 12/1/2010 8:22 PM, Michael Foord wrote: I would still be tempted to go through a single release of deprecation. You can add a test that the names are gone if the version of Python is 3.3. When the tests start failing the code and the tests can be ripped out. I was wondering how people remember... It would be nice is there were instead a central place to 'deposit' simple future patches that just consist of removals -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On 12/1/2010 8:17 PM, Michael Foord wrote: It is well *possible* that there are packages with a runtime dependency on libraries in mercurial however. Those would need mercurial porting to Python 3 if they are to run on Python 3. If they simply shell out to mercurial that wouldn't be the case. It would be nice is all the Python-coded tools needed to work on Python3 ran on Python3, so one did not have to install 2.x just for that purpose ;-). Does Sphinx run on PY3 yet? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On Wed, Dec 1, 2010 at 9:53 PM, Terry Reedy tjre...@udel.edu wrote: .. Does Sphinx run on PY3 yet? It does, but see issue10224 for details. http://bugs.python.org/issue10224 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.
On 12/1/2010 8:22 PM, Nick Coghlan wrote: On Thu, Dec 2, 2010 at 6:23 AM, Terry Reedytjre...@udel.edu wrote: It would be easiest to just remove the two lines above. Or should I define functions _xxx names that issue a deprecation warning and attach them as attributes to each object? (Defining instance methods would not be the same). Given that functions are converted to bound methods only on retrieval from an instance, why wouldn't it be the same? The two SequenceMatcher instance attributes are bound functions of the two sets, not of the instance. But you are right in the sense that the usage would be the same. Since, as of a week ago, the sets were implemented as dicts, any code depending on the class of the underlying instance is already broken. So I will go with S-M methods and add a doc string like Undocumented, deprecated method that will disappear in 3.3. Do not use! to show up in a help() listing. But yes, if you want to get rid of them, then deprecation for 3.2 and removal in 3.3 is the way to go. OK. Alternatively, not deprecating them at all and just leaving them undocumented with a comment in the source to say they have been deliberately omitted from the docs would also be fine. Too messy and too useless ;-). -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86924 - python/branches/py3k/Doc/library/random.rst
On Thu, Dec 2, 2010 at 12:41 PM, raymond.hettinger python-check...@python.org wrote: +A more general approach is to arrange the weights in a cumulative probability +distribution with :func:`itertools.accumulate`, and then locate the random value +with :func:`bisect.bisect`:: + + choices, weights = zip(*weighted_choices) + cumdist = list(itertools.accumulate(weights)) + x = random.random() * cumdist[-1] + choices[bisect.bisect(cumdist, x)] + 'Blue' Neat example, although it would be easier to follow if you broke that last line into two pieces: . random_index = bisect.bisect(cumdist, x) . choices[random_index] 'Blue' It took me a moment to remember how bisect.bisect worked, but it would have been instant if the return value was assigned to an appropriately named variable. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On 12/1/2010 7:44 PM, Alexander Belopolsky wrote: it. The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how Eastern Arabic numerals are written. So far nobody has even claimed to know conclusively that Arabic-Indic digits are always written left-to-right. Both my personal observations when travelling from Turkey to India and Wikipedia say yes. When representing a number in Arabic, the lowest-valued position is placed on the right, so the order of positions is the same as in left-to-right scripts. https://secure.wikimedia.org/wikipedia/en/wiki/Arabic_language#Numerals -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Wed, Dec 1, 2010 at 10:11 PM, Terry Reedy tjre...@udel.edu wrote: On 12/1/2010 7:44 PM, Alexander Belopolsky wrote: it. The argument was that if there was a use case for parsing Eastern Arabic numerals, it would be better served by a module written by someone who speaks one of the Arabic languages and knows the details of how Eastern Arabic numerals are written. So far nobody has even claimed to know conclusively that Arabic-Indic digits are always written left-to-right. Both my personal observations when travelling from Turkey to India and Wikipedia say yes. When representing a number in Arabic, the lowest-valued position is placed on the right, so the order of positions is the same as in left-to-right scripts. https://secure.wikimedia.org/wikipedia/en/wiki/Arabic_language#Numerals This matches my limited research on this topic as well. However, I am not sure that when these codes are embedded in Arabic text, their logical order always matches their display order. It seems to me that it can go either way depending on the surrounding text and/or presence of explicit formatting codes. Also, I don't understand why Eastern Arabic-Indic digits have the same Bidi-Class as European digits, but Arabic-Indic digits, Arabic decimal and thousands separators have Bidi-Class AN. http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ICU
On Tue, Nov 30, 2010 at 3:13 PM, Antoine Pitrou solip...@pitrou.net wrote: Oh, about ICU: Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. By that, I stand - however, I have given up the hope that this will happen anytime soon. Perhaps this could be made a GSOC topic. Incidentally, this may also address another Python's Achilles' heel: the timezone support. http://icu-project.org/download/icutzu.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Porting Ideas
On Wed, Dec 01, 2010 at 10:06:24PM -0500, Alexander Belopolsky wrote: On Wed, Dec 1, 2010 at 9:53 PM, Terry Reedy tjre...@udel.edu wrote: .. Does Sphinx run on PY3 yet? It does, but see issue10224 for details. http://bugs.python.org/issue10224 Also, docutils has an unported module. /me needs to write a bug report for that as he really doesn't have the time he thought he did to perform the port. -Toshio pgplgIh22rxh1.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Ben Finney writes: Input from an existing text file, as I said earlier. Or any other way of text data making its way into a Python program. Direct entry at the console is a red herring. I don't think it is. Not at all. Here's why: '''print %d % some_integer''' doesn't now, and never will (unless Kristan gets his Python 2.8wink), produce Arabic or Han numerals. Not in any language I know of, not in Microsoft Excel, and definitely not in Python 2. *Somebody* typed that text at some point. If it's Han, that somebody had *way* too much time on his hands, not a working accountant nor a graduate assistant in a research lab for sure. How about old archived texts, copied and recopied? At least for Japanese, old archival (text) data will *all* be in ASCII, because the earliest implementations of Japanese language text used JIS X 0201 (or its predecessor), which doesn't have Han digits (and kana digits don't exist even if you write with a brush and ink AFAIK). Ditto Arabic, I would imagine; ISO 8859/6 (aka Latin/Arabic) does not contain the Arabic digits that have been presented here earlier AFAICT. Note that there's plenty of space for them in that code table (eg, 0xB0-0xB9 is empty). Apparently nobody *ever* thought it was useful to have them! So, which culture, using which script and in which application, inputs numeric data in other than ASCII digits? Or would want to, if only somebody would tell them they can do it in Python? Hearsay will do, for starters. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com