Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On Wed, Dec 1, 2010 at 3:59 PM, Ron Adam wrote: > Yes, it's realising that it is a *lot* more *complicated*, that gets me. > Flawed isn't the right word, it's rather a feeling things could have been > simpler if perhaps some things were done differently. *That* feeling I can understand. The import system has steadily acquired features over time, with each addition constrained by backwards compatibility concerns with all the past additions, including the exotic hacks people were using to fake features that were added more cleanly later. For the directory-as-module-not-package idea, you could probably implement a PEP 302 importer/loader that did that (independent of the stdlib). It would have the advantage of avoiding a lot of the pickle compatibility problems that a "flat package" like the new unittest layout can cause. However, you would need to be very careful with it, since all the files would be sharing a common globals() namespace. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On 11/30/2010 07:19 PM, Nick Coghlan wrote: On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam wrote: * It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though. It isn't flawed, it is just a *lot* more complicated than most people realise (cf. PEP 302). Yes, it's realising that it is a *lot* more *complicated*, that gets me. Flawed isn't the right word, it's rather a feeling things could have been simpler if perhaps some things were done differently. Here is the gist of ideas I got from these feelings. (Food for thought and YMMV and all that.) Python doesn't have a nice way to define a collection of modules that isn't also a package. So we have packages used to organise modules, and packages inside other packages. A collection of modules wouldn't require importing a package before importing a module in it. Another idea is, to have a way to split a large module into files, and have it still *be* a module, and not a package. And also be able to tell what is what, by looking at the directory structure. The train of thought these things came from is, how can we get back to having the directory tree have enough info in it so it's clear what is what? And how can we avoid some of the *interdependent* nesting? In this case, the signature of find_module (returning an already open file) is unfortunate, but probably necessary given the way the import internals currently work. As Brett says, returning a loader would be preferable, but the builtin import machinery doesn't have proper loaders defined (and won't until we manage to get to the point where importlib *is* the import machinery). I'll be looking forward to the new loaders. :-) Cheers, Ron ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What is PyBuffer_SizeFromFormat?
On Wed, Dec 1, 2010 at 12:30 PM, INADA Naoki wrote: > PyBuffer_SizeFromFormat is documented and defined in abstract.h. > But I can't find an implementation of the function. > Do I overlook anything? PEP 3118 describes what it is *meant* to do. Looks like it might be yet another thing that was missed in the implementation of that PEP though :P Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] What is PyBuffer_SizeFromFormat?
PyBuffer_SizeFromFormat is documented and defined in abstract.h. But I can't find an implementation of the function. Do I overlook anything? -- INADA Naoki ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam wrote: > * It almost seems like the concept of a sub-module (in a package) is flawed. > I'm not sure I can explain what causes me to feel that way at the moment > though. It isn't flawed, it is just a *lot* more complicated than most people realise (cf. PEP 302). In this case, the signature of find_module (returning an already open file) is unfortunate, but probably necessary given the way the import internals currently work. As Brett says, returning a loader would be preferable, but the builtin import machinery doesn't have proper loaders defined (and won't until we manage to get to the point where importlib *is* the import machinery). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] I/O ABCs
The documentation for the collections Abstract Base Classes (ABCs) [1] contains a table listing all of the collections ABCs, their parent classes, their abstract methods, and the methods they provide. This table makes it very easy to figure out which methods I must override when I derive from one of the ABCs, as well as which methods will be provided for me. I'm working on a similar table for the I/O ABCs ( http://bugs.python.org/issue10589). The existing documentation [2] describes the methods of each class but doesn't describe which methods provide a meaningful implementation and which methods a user should override. If I want to inherit from one of the I/O ABCs, I have to go poking into Lib/_pyio.py to figure out which methods I need to override. While starting to examine the I/O ABCs, I discovered that there are some inconsistencies. For example, RawIOBase provides .read() if the subclass provides .readinto(). BufferedIOBase does the opposite; it provides .readinto() if the subclass provides .read() [3]. I would like to fix some of these inconsistencies. However, this will be a backwards-incompatible change. A Google Code Search suggests that the ABCs are currently only used within the standard library [4]. Just to be clear, the changes would NOT impact code that merely uses I/O objects; they would only impact code that implements I/O by subclassing one of the I/O ABCs and depending on features that are currently undocumented. Does anyone have any categorical objections? [1]: http://docs.python.org/py3k/library/collections.html#abcs-abstract-base-classes [2]: http://docs.python.org/py3k/library/io.html#class-hierarchy [3]: Possibly hurting performance by forcing .readinto() to perform the extra allocations, copies, and deallocations required by .read(). [4]: http://www.google.com/codesearch?hl=en&sa=N&q=BufferedIOBase++lang:python&ct=rr&cs_r=lang:python -- Daniel Stutzbach ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
"Martin v. Löwis" writes: > Am 30.11.2010 21:24, schrieb Ben Finney: > > The string need not be a literal in the program; it can be input to > > the program. > > > > num = float(input_from_the_external_world) > > > > Does that change your assessment of whether non-ASCII digits are > > used? > > I think the OP (haiyang kang) already indicated that he finds it quite > unlikely that anybody would possibly want to enter that. Who's talking about *entering* it into the program at a keyboard directly, though? Input to a program can come from all kinds of crazy sources. Just because it wasn't typed by the person at the keyboard using this program doesn't stop it being input to the program. A concrete example, but certainly not the only possible case: non-ASCII digit characters representing integers, stored as text in a file. Note that I'm not saying this is common. Nor am I saying it's a desirable situation. I'm saying it is a feasible use case, to be dismissed only if there is strong evidence that it's not used by existing Python code. -- \ “When a well-packaged web of lies has been sold to the masses | `\over generations, the truth will seem utterly preposterous and | _o__)its speaker a raving lunatic.” —Dresden James | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On 11/30/2010 10:05 AM, Alexander Belopolsky wrote: My general answers to the questions you have raised are as follows: 1. Each new feature release should use the latest version of the UCD as of the first beta release (or perhaps a week or so before). New chars are new features and the beta period can be used to (hopefully) iron out any bugs introduced by a new UCD version. 2. The language specification should not be UCD version specific. Martin pointed out that the definition of identifiers was intentionally written to not be, bu referring to 'current version' or some such. On the other hand, the UCD version used should be programatically discoverable, perhaps as an attribute of sys or str. 3.. The UCD should not change in bugfix releases. New chars are new features. Adding them in bugfix releases will introduce gratuitous imcompatibilities between releases. People who want the latest Unicode should either upgrade to the latest Python version or patch an older version (but not expect core support for any problems that creates). Given that 2.7 will be maintained for 5 years and arguably Unicode Consortium takes backward compatibility very seriously, wouldn't it make sense to consider a backport at some point? I am sure we will soon see a bug report that the following does not work in 2.7: :-) ord('\N{CAT FACE WITH WRY SMILE}') 128572 3 (cont). 2.7 is no different in that regard. It is feature frozen just like all other x.y releases. And that is the answer to any such report. If that code became valid in 2.7.2, for instance, it would still not work in 2.7 and 2.7.1. Not working is not a bug; working is a new feature introduced after 2.7 was released. - How specific should library reference manual be in defining methods affected by UCD such as str.upper()? It should specify what this actually does in Unicode terminology (probably in addition to a layman's rephrase of that) I opened an issue for this: http://bugs.python.org/issue10587 1,2 (cont). Good idea in general. I was more concerned about wide an narrow unicode CPython builds. Is it a bug that '\U'.isalpha() may disagree even when the two implementations are based on the same version of UCD? 4. While the difference between narrow/wide builds of (CPython) x.y (which should have once constant UCD) cannot be completely masked, I appreciate and generally agree with your efforts to minimize them. In some cases, there will be a conflict/tradeoff between eliminating this difference versus that. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Am 30.11.2010 23:43, schrieb Terry Reedy: > On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: > >> I see no reason not to make a similar promise for numeric literals. I >> see no good reason to allow compatibility full-width Japanese "ASCII" >> numerals or Arabic cursive numerals in "for i in range(...)" for >> example. > > I do not think that anyone, at least not me, has argued for anything > other than 0-9 digits (or 0-f for hex) in literals in program code. The > only issue is whether non-programmer *users* should be able to use their > native digits in applications in response to input prompts. And here, my observation stands: if they wanted to, they currently couldn't - at least not for real numbers (and also not for integers if they want to use grouping). So the presumed application of this feature doesn't actually work, despite the presence of the feature it was supposedly meant to enable. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Am 30.11.2010 21:24, schrieb Ben Finney: > haiyang kang writes: > >> I think it is a little ugly to have code like this: num = >> float("一.一"), expected result is: num = 1.1 > > That's a straw man, though. The string need not be a literal in the > program; it can be input to the program. > > num = float(input_from_the_external_world) > > Does that change your assessment of whether non-ASCII digits are used? I think the OP (haiyang kang) already indicated that he finds it quite unlikely that anybody would possibly want to enter that. You would need a number of key strokes to enter each individual ideograph, plus you have to press the keys for keyboard layout switching to enter the Latin decimal separator (which you normally wouldn't use along with the Han numerals). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On 11/30/2010 01:41 PM, Brett Cannon wrote: On Mon, Nov 29, 2010 at 12:21, Ron Adam wrote: On 11/29/2010 01:22 PM, Brett Cannon wrote: On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault wrote: On 25 novembre 11:22, Ron Adam wrote: On 11/25/2010 08:30 AM, Emile Anclin wrote: hello, working on Pylint, we have a lot of voluntary corrupted files to test Pylint behavior; for instance $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py # -*- coding: IBO-8859-1 -*- """ check correct unknown encoding declaration """ __revision__ = '' and we try to find that module : find_module('func_unknown_encoding', None). But python3 raises SyntaxError in that case ; it didn't raise SyntaxError on python2 nor does so on our func_nonascii_noencoding and func_wrong_encoding modules (with obvious names) Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >from imp import find_module find_module('func_unknown_encoding', None) Traceback (most recent call last): File "", line 1, in SyntaxError: encoding problem: with BOM I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. IMO the point is that we can consider as a bug the fact that find_module tries to somewhat read the content of the file, no? Though it seems to only doing this for encoding detection or like since find_module doesn't choke on a module containing another kind of syntax error. So the question is, should we deal with this in pylint/astng, or can we expect this to be fixed at some point? Considering these semantics changed between Python 2 and 3 w/o a discernable benefit (I would consider it a negative as finding a module should not be impacted by syntactic correctness; the full act of importing should be the only thing that cares about that), I would consider it a bug that should be filed. The output of imp.find_module() returns an open file io object, and it's output feeds directly into to imp.load_module(). imp.find_module('pydoc') (<_io.TextIOWrapper name=4 encoding='utf-8'>, '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) So I think the imp.find_module() is suppose to be used when you *do* want to do the full act of importing and not for just finding out if or where module xyz exists. Going with your line of argument, why can't imp.load_module be the call that figures out there is a syntax error? If you look at this from the perspective of PEP 302, finding a module has absolutely nothing to do with the validity of the found source, just that something was found somewhere which (hopefully) contains code that represents the module. The part that I'm looking at, is what would find_module return if the encoding is bad or not found for the encoding? <_io.TextIOWrapper name=4 encoding='bad_encoding'> Maybe we could have some library introspection function in the inspect for just looking in the library rather than loading modules. But I think those would have the same issues, as packages need to be loaded in order to find sub modules.* * It almost seems like the concept of a sub-module (in a package) is flawed. I'm not sure I can explain what causes me to feel that way at the moment though. Ron ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote: I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example. I do not think that anyone, at least not me, has argued for anything other than 0-9 digits (or 0-f for hex) in literals in program code. The only issue is whether non-programmer *users* should be able to use their native digits in applications in response to input prompts. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 291 versus Python 3
On Nov 30, 2010, at 12:11 PM, Brett Cannon wrote: >I will channel Neal: "I decline and/or do not want to respond". =) PEP 291 updated. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
haiyang kang writes: > I think it is a little ugly to have code like this: num = > float("一.一"), expected result is: num = 1.1 That's a straw man, though. The string need not be a literal in the program; it can be input to the program. num = float(input_from_the_external_world) Does that change your assessment of whether non-ASCII digits are used? -- \“The greatest tragedy in mankind's entire history may be the | `\ hijacking of morality by religion.” —Arthur C. Clarke, 1991 | _o__) | Ben Finney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ICU
Oh, about ICU: > > Actually, I remember you saying that locale should ideally be replaced > > with a wrapper around the ICU library. > > By that, I stand - however, I have given up the hope that this will > happen anytime soon. Perhaps this could be made a GSOC topic. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 291 versus Python 3
On Tue, Nov 30, 2010 at 07:35, Barry Warsaw wrote: > On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: > >>PEP 291 is very old and should probably be retired. I don't think anyone is >>maintaining standard libraries in py3k that are also compatible with Python >>2.anything. (At least not in a single codebase.) > > I agree. Same here; I have purposefully ignored compatibility requirements because I always found those promises to be extremely annoying and somewhat painful to enforce. > I think we should change the status of PEP 291 to Final, and add a > few words to make it clear it applies only to Python 2. Since Neal owns the > PEP, he should get first crack at doing the update, but I volunteer to make > those changes if he declines (or does not respond). > I will channel Neal: "I decline and/or do not want to respond". =) > We may eventually need a similar document for Python 3, but it should be a new > PEP. I hope not. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Le mardi 30 novembre 2010 à 20:55 +0100, "Martin v. Löwis" a écrit : > Wrt. to local number parsing, I think that the locale module would be > way better than the nonsense that Python currently does. In the locale > module, somebody at least has thought about what specifically > constitutes a number. The current not-ASCII-but-not-local-either > approach is just useless. It depends what you need. If you parse integers it's probably good enough. And it's better to have a trustable standard (unicode) than a myriad of ad-hoc, possibly buggy or incomplete, often unavailable, cultural specifications drafted by OS vendors who have no business (and no expertise) in drafting them. At least you can build more sophisticated routines on the simple information given to you by the unicode database. You cannot build anything solid on the C locale functions (and even then you are limited by various issues inherent in the locale semantics, such as the fact that it relies on process-wide state, which would only be ok, at best, for single-user applications). There's a reason that e.g. Babel (*) reimplements locale-like functionality from scratch. (*) http://pypi.python.org/pypi/Babel/ Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
> Because we all know how locale is a pile of cr*p, both in specification > and in implementations. Our unit tests for it are a clear proof of that. I wouldn't use expletives, but rather claim that the locale module is highly platform-dependent. > Actually, I remember you saying that locale should ideally be replaced > with a wrapper around the ICU library. By that, I stand - however, I have given up the hope that this will happen anytime soon. Wrt. to local number parsing, I think that the locale module would be way better than the nonsense that Python currently does. In the locale module, somebody at least has thought about what specifically constitutes a number. The current not-ASCII-but-not-local-either approach is just useless. Maintaining a reasonable implementation is a burden, so deferring to the C library is more attractive than having to maintain an unreasonable implementation. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On Tue, Nov 30, 2010 at 00:34, Sylvain Thénault wrote: > On 29 novembre 14:21, Ron Adam wrote: >> On 11/29/2010 01:22 PM, Brett Cannon wrote: >> >Considering these semantics changed between Python 2 and 3 w/o a >> >discernable benefit (I would consider it a negative as finding a >> >module should not be impacted by syntactic correctness; the full act >> >of importing should be the only thing that cares about that), I would >> >consider it a bug that should be filed. >> >> The output of imp.find_module() returns an open file io object, and >> it's output feeds directly into to imp.load_module(). >> >> >>> imp.find_module('pydoc') >> (<_io.TextIOWrapper name=4 encoding='utf-8'>, >> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) >> >> So I think the imp.find_module() is suppose to be used when you *do* >> want to do the full act of importing and not for just finding out if >> or where module xyz exists. > > in python 2, find_module was usable for such usage, and this is a needed api > for a tool like pylint. Is there another way to do so with python 3? At the moment, no. Best option would be to create an importlib.find_module function which returns a loader if the module is found, else returns None. The loader can have its get_source method called to read the source code (w/o verification). I have this planned for Python 3.3 but not 3.2 with us so close to 3.2b1. > -- > Sylvain Thénault LOGILAB, Paris (France) > Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations > Développement logiciel sur mesure: http://www.logilab.fr/services > CubicWeb, the semantic web framework: http://www.cubicweb.org > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Le mardi 30 novembre 2010 à 20:40 +0100, "Martin v. Löwis" a écrit : > Am 30.11.2010 20:23, schrieb Antoine Pitrou: > > Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit : > >>> Would moving this functionality to the locale module make the issues any > >>> easier to fix? > >> > >> You could delegate it to the C library, so: yes. > > > > I hope you don't suggest delegating it to the C locale functions. > > Do you? > > Yes, I do. Why do you hope I don't? Because we all know how locale is a pile of cr*p, both in specification and in implementations. Our unit tests for it are a clear proof of that. Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On Mon, Nov 29, 2010 at 12:21, Ron Adam wrote: > > > On 11/29/2010 01:22 PM, Brett Cannon wrote: >> >> On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault >> wrote: >>> >>> On 25 novembre 11:22, Ron Adam wrote: On 11/25/2010 08:30 AM, Emile Anclin wrote: > > hello, > > working on Pylint, we have a lot of voluntary corrupted files to test > Pylint behavior; for instance > > $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py > # -*- coding: IBO-8859-1 -*- > """ check correct unknown encoding declaration > """ > > __revision__ = '' > > > and we try to find that module : > find_module('func_unknown_encoding', None). But python3 raises > SyntaxError > in that case ; it didn't raise SyntaxError on python2 nor does so on > our > func_nonascii_noencoding and func_wrong_encoding modules (with obvious > names) > > Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>> >>> >from imp import find_module find_module('func_unknown_encoding', None) > > Traceback (most recent call last): > File "", line 1, in > SyntaxError: encoding problem: with BOM I don't think there is a clear reason by design. Also try importing the same modules directly and noting the differences in the errors you get. >>> >>> IMO the point is that we can consider as a bug the fact that find_module >>> tries to somewhat read the content of the file, no? Though it seems to >>> only >>> doing this for encoding detection or like since find_module doesn't choke >>> on >>> a module containing another kind of syntax error. >>> >>> So the question is, should we deal with this in pylint/astng, or can we >>> expect >>> this to be fixed at some point? >> >> Considering these semantics changed between Python 2 and 3 w/o a >> discernable benefit (I would consider it a negative as finding a >> module should not be impacted by syntactic correctness; the full act >> of importing should be the only thing that cares about that), I would >> consider it a bug that should be filed. > > The output of imp.find_module() returns an open file io object, and it's > output feeds directly into to imp.load_module(). > imp.find_module('pydoc') > (<_io.TextIOWrapper name=4 encoding='utf-8'>, > '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) > > So I think the imp.find_module() is suppose to be used when you *do* want to > do the full act of importing and not for just finding out if or where module > xyz exists. Going with your line of argument, why can't imp.load_module be the call that figures out there is a syntax error? If you look at this from the perspective of PEP 302, finding a module has absolutely nothing to do with the validity of the found source, just that something was found somewhere which (hopefully) contains code that represents the module. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Am 30.11.2010 20:23, schrieb Antoine Pitrou: > Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit : >>> Would moving this functionality to the locale module make the issues any >>> easier to fix? >> >> You could delegate it to the C library, so: yes. > > I hope you don't suggest delegating it to the C locale functions. > Do you? Yes, I do. Why do you hope I don't? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit : > > Would moving this functionality to the locale module make the issues any > > easier to fix? > > You could delegate it to the C library, so: yes. I hope you don't suggest delegating it to the C locale functions. Do you? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
> Would moving this functionality to the locale module make the issues any > easier to fix? You could delegate it to the C library, so: yes. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Am 30.11.2010 09:15, schrieb Hagen Fürstenau: >>> During PEP 3003 discussion, it was suggested to handle it on a case by >>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >>> 3003. >> >> It's covered by "As the standard library is not directly tied to the >> language definition it is not covered by this moratorium." > > How is this restricted to the stdlib if it defines the set of valid > identifiers? The language does not change. The language specification says Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module. That remains unchanged. It was a deliberate design decision of PEP 3131 to not codify a fixed set of characters that can be used in identifiers. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou wrote: .. >> I am not sure this belongs to the locale module, however. It seems to >> me, something like 'unicodealgo' for unicode algorithms would be more >> appropriate. > > It could simply be in unicodedata if you split the implementation into a > core C part and some Python bits. > Splitting unicodedata may not be a bad idea. There are many more pieces in UCD than covered by unicodedata. [1] Hardcoding them all into unicodedata module is hard to justify, but some are quite useful. For example, PropertyValueAliases.txt is quite useful for those like myself who cannot remember what Pd or Zl category names stand for. SpecialCasing.txt is required for proper casing, but is not currently included in Python. I would not want to change str.upper or str.title because of this, but providing the raw info to someone who wants to implement proper case mappings may not be a bad idea. Blocks.txt is certainly useful for any language-dependent processing. On the other hand, I think we should keep Unicode data and Unicode algorithms separate. And the latter may not even belong to the Python stdlib. [1] http://unicode.org/Public/UNIDATA/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
> Sure, if we code it in Python, supporting it will by much easier: > > def normalize_digits(s): > digits = {m.group(1) for m in re.finditer('(\d)', s)} > trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} > return s.translate(trtab) > > >>> normalize_digits('١٢٣٤.٥٦') > '1234.56' > > I am not sure this belongs to the locale module, however. It seems to > me, something like 'unicodealgo' for unicode algorithms would be more > appropriate. It could simply be in unicodedata if you split the implementation into a core C part and some Python bits. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord wrote: .. >> If you think non-ASCII digits are not difficult to support, please >> contribute to the following tracker issues: >> > > Would moving this functionality to the locale module make the issues any > easier to fix? > Sure, if we code it in Python, supporting it will by much easier: def normalize_digits(s): digits = {m.group(1) for m in re.finditer('(\d)', s)} trtab = {ord(d): str(unicodedata.digit(d)) for d in digits} return s.translate(trtab) >>> normalize_digits('١٢٣٤.٥٦') '1234.56' I am not sure this belongs to the locale module, however. It seems to me, something like 'unicodealgo' for unicode algorithms would be more appropriate. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On 30/11/2010 16:40, Alexander Belopolsky wrote: [snip...] And of course, unicodedata.digit('\U0001D7CE') 0 but int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: Would moving this functionality to the locale module make the issues any easier to fix? Michael http://bugs.python.org/issue10581 (Review and document string format accepted in numeric data type constructors) http://bugs.python.org/issue10557 (Malformed error message from float()) http://bugs.python.org/issue10435 (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) http://bugs.python.org/issue8646 (PyUnicode_EncodeDecimal is undocumented) http://bugs.python.org/issue6632 (Include more fullwidth chars in the decimal codec) and back to the issue of user confusion http://bugs.python.org/issue652104 [closed/invalid] (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky wrote: .. >> Still, if it's not detrimental and it it's not difficult to support, >> then why do you care? > > It is difficult to support. A fix for issue10557 would be much > simpler if we did not support non-European digits. I now added a > patch that handles non-ascii digits, so you can see what's involved. > Note that when Unicode Consortium inevitably adds more Nd characters > to the non-BMP planes, we will have to add surrogate pairs' support to > this code. > It turns out that this did in fact happen: # Newly assigned in Unicode 3.1.0 (March, 2001) .. 1D7CE..1D7FF ; 3.1 # [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE See http://unicode.org/Public/UNIDATA/DerivedAge.txt And of course, >>> unicodedata.digit('\U0001D7CE') 0 but >>> int('\U0001D7CE') .. UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' .. on a narrow Unicode build. (Note the character reported in the error message!) If you think non-ASCII digits are not difficult to support, please contribute to the following tracker issues: http://bugs.python.org/issue10581 (Review and document string format accepted in numeric data type constructors) http://bugs.python.org/issue10557 (Malformed error message from float()) http://bugs.python.org/issue10435 (Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal) http://bugs.python.org/issue8646 (PyUnicode_EncodeDecimal is undocumented) http://bugs.python.org/issue6632 (Include more fullwidth chars in the decimal codec) and back to the issue of user confusion http://bugs.python.org/issue652104 [closed/invalid] (int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Alexander Belopolsky wrote: > On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: > >> But you should be able to write: > >> > >> text = input("Enter a number using your preferred digits: ") > >> num = float(text) > >> > >> without caring whether the user enters 一.一 or 1.1 or something else. > > > > yes. from logical point of view, this can happen. ... > > Please stop discussing a non-feature. Python's float *does not* > accept ' 一.一'. This was reported as a bug and closed as invalid. That seems irrelevant to me. One of the main topics of this thread is whether actual native speakers would be happy with ascii-only input for float(). haiyang kang confirmed that this is the case. I hope that more local speakers will contribute their views. Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 291 versus Python 3
On Nov 30, 2010, at 01:09 PM, Michael Foord wrote: >PEP 291 is very old and should probably be retired. I don't think anyone is >maintaining standard libraries in py3k that are also compatible with Python >2.anything. (At least not in a single codebase.) I agree. I think we should change the status of PEP 291 to Final, and add a few words to make it clear it applies only to Python 2. Since Neal owns the PEP, he should get first crack at doing the update, but I volunteer to make those changes if he declines (or does not respond). We may eventually need a similar document for Python 3, but it should be a new PEP. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang wrote: >> But you should be able to write: >> >> text = input("Enter a number using your preferred digits: ") >> num = float(text) >> >> without caring whether the user enters 一.一 or 1.1 or something else. > > yes. from logical point of view, this can happen. ... Please stop discussing a non-feature. Python's float *does not* accept ' 一.一'. This was reported as a bug and closed as invalid. See "makeunicodedata.py does not support Unihan digit data" http://bugs.python.org/issue10575 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis" wrote: >> - Should Python documentation refer to the specific version of Unicode >> that it supports? > > You mean, mention it somewhere? Sure (although it would be nice if the > documentation generator would automatically extract it from the source, > just as it extracts the Python version number). > > Of course, such mentioning should explain that this is specific to > CPython, and not an aspect of Python-the-language. > >> Current documentation refers to old versions. Should version be >> updated or removed to imply the latest? > > What specific reference are you referring to? > I found two places: A reference to Unicode 3.0 (!) in the Data Model section and a reference to 5.2.0 in unicodedata docs. See http://mail.python.org/pipermail/docs/2010-November/002074.html >> - How UCD updates should be handled during the language moratorium? > > It's clearly not affected. > This is not what Guido said last year: """ > One question: > > There are currently number of patch waiting on the tracker for > additional Unicode feature support and it's also likely that we'll > want to upgrade to a more recent Unicode version within the > next few years. > > How would such indirect changes be seen under the moratorium ? That would fall under the Case-by-Case Exemptions section. "Within the next few years" sounds like it might well wait until the moratorium is ended though. :-) """ http://mail.python.org/pipermail/python-dev/2009-November/093666.html I don't see it as a big deal, but technically speaking, with Unicode 6.0 changing properties of two characters to become identifiers Python language definition is affected. For example, an alternative implementation based on 5.2.0 will not accept a valid CPython program that uses one of these characters. >> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." > See above. Also, it has been suggested that semantics of built-ins cannot change. (If that was so, it would put int('١٢٣٤') debate to rest at least for the time being.:-) >> Should this upgrade be backported to 2.7? > > No, it's a new feature. > Given that 2.7 will be maintained for 5 years and arguably Unicode Consortium takes backward compatibility very seriously, wouldn't it make sense to consider a backport at some point? I am sure we will soon see a bug report that the following does not work in 2.7: :-) >>> ord('\N{CAT FACE WITH WRY SMILE}') 128572 >> - How specific should library reference manual be in defining methods >> affected by UCD such as str.upper()? > > It should specify what this actually does in Unicode terminology > (probably in addition to a layman's rephrase of that) > I opened an issue for this: http://bugs.python.org/issue10587 >> .. For example, if '\U'.isalpha() returns true >> in one implementation, can it return false in another? > > Implementations are free to use any version of the UCD. I was more concerned about wide an narrow unicode CPython builds. Is it a bug that '\U'.isalpha() may disagree even when the two implementations are based on the same version of UCD? Thanks for your answers. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
> But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters 一.一 or 1.1 or something else. yes. from logical point of view, this can happen. But i really doubt that if really there are users who would like to input number like that, means that they first use google pinyin method to input 一, then change to english input method to input . , then change to google pinyin again for the other 一; or maybe you mean they input the whole 一.一 words with google pinyin input method. To input 1, users only need to type one time keyboard, but to input 一, they need to type three times (yi SPACE). Of course, users can also input something accidentally, but we just need to give them some kind reminders. At least coders in my around will restrain their system users to input numbers with ASCII, and seems that users are still happy with the ASCII type numbers :). br, khy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Module size
On Tue, Nov 30, 2010 at 09:41, Antoine Pitrou wrote: > That said, I don't think the size is very important. For any non-trivial > Python application, the size of unicodedata will be negligible compared > to the size of Python objects. That depends very much on the platform and the application. For our embedded use of Python, static data size (like the text segment of a shared object) is far dearer than the heap space used by Python objects, which is why we've had to excise both the UCD and the CJK codecs in our builds. -- Tim Lesher ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Module size
Le mardi 30 novembre 2010 à 09:32 -0500, Alexander Belopolsky a écrit : > On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou wrote: > > On Mon, 29 Nov 2010 22:46:33 -0500 > > Alexander Belopolsky wrote: > >> > >> In practical terms, UCD comes at a price. The unicodedata module size > >> is over 700K on my machine. This is almost half the size of the > >> python executable and by far the largest extension module. (only CJK > >> encodings come close.) Making builtins depend on the largest > >> extension module for operation does not strike me as sound design. > > > > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend > > only on Objects/unicodectype.c. > > My mistake. That was a late night post. I wonder why unicodedata.so > is so big then. > > It must be character names: > > $ python -v > >>> '\N{DIGIT ONE}' > dlopen("/.../unicodedata.so", 2); > import unicodedata # dynamically loaded from /.../unicodedata.so > '1' From a quick peek using hexdump, character names seem to only account for 1/4 of the module size. That said, I don't think the size is very important. For any non-trivial Python application, the size of unicodedata will be negligible compared to the size of Python objects. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Module size
On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou wrote: > On Mon, 29 Nov 2010 22:46:33 -0500 > Alexander Belopolsky wrote: >> >> In practical terms, UCD comes at a price. The unicodedata module size >> is over 700K on my machine. This is almost half the size of the >> python executable and by far the largest extension module. (only CJK >> encodings come close.) Making builtins depend on the largest >> extension module for operation does not strike me as sound design. > > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend > only on Objects/unicodectype.c. My mistake. That was a late night post. I wonder why unicodedata.so is so big then. It must be character names: $ python -v >>> '\N{DIGIT ONE}' dlopen("/.../unicodedata.so", 2); import unicodedata # dynamically loaded from /.../unicodedata.so '1' ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano wrote: .. > But you should be able to write: > > text = input("Enter a number using your preferred digits: ") > num = float(text) > > without caring whether the user enters 一.一 or 1.1 or something else. > I find it ironic that people who argue for preservation of the current behavior do it without checking what it actually is: >>> float('一.一') .. UnicodeEncodeError: 'decimal' codec can't encode character '\u4e00' .. This one of the biggest problems with this feature. It does not fit user's expectations. Even the original author of the decimal "codec" expected the above to work. [1] > Python can already do this, and has been able to for many years: > >>> int(u'٣') > 3 but you can do this without support from int() as well: >>> import unicodedata >>> unicodedata.digit('٣') 3 and for Unihan numbers, you can do >>> unicodedata.numeric('一') 1.0 and >>> unicodedata.numeric('ⅷ') 8.0 and if you are so inclined, >>> [unicodedata.numeric(c) for c in "ↂ ↁ ⅗ ⅞ 𐄳".split()] [1.0, 5000.0, 0.6, 0.875, 9.0] Do you want to see all these supported by float()? [1] "makeunicodedata.py does not support Unihan digit data" http://bugs.python.org/issue10575 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Module size
On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky wrote: > > In practical terms, UCD comes at a price. The unicodedata module size > is over 700K on my machine. This is almost half the size of the > python executable and by far the largest extension module. (only CJK > encodings come close.) Making builtins depend on the largest > extension module for operation does not strike me as sound design. Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend only on Objects/unicodectype.c. $ size Objects/unicode*.o textdata bss dec hex filename 60398 0 0 60398ebee Objects/unicodectype.o 130440 135592208 146207 23b1f Objects/unicodeobject.o Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Wed, 01 Dec 2010 00:23:22 +1100 Steven D'Aprano wrote: > > But I think there is a good case for allowing the constructors int, > float and complex to continue to accept numeric *strings* with non-ASCII > digits. The code already exists, there's probably people out there who > rely on it, and in the absence of any convincing demonstration that the > existing behaviour is causing widespread difficulty, we should leave > well-enough alone. +1 > It seems to me that there's no need to move this functionality into locale. Not only, but moving it into locale won't make it easier to maintain anyway. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Stephen J. Turnbull wrote: Lennart Regebro writes: > *I* think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8) for the forseeable future. I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example. I agree with you that numeric *literals* should be restricted to the ASCII digits. I don't think anyone here is arguing differently -- if they are, they should speak up and try to make the case for allowing numeric literals in arbitrary scripts. Python doesn't currently allow non-ASCII numeric literals, and even if such a change were desirable, it would run up against the moratorium. So let's just forget the specter of code like: x = math.sqrt(١٢٣٤.٥٦ ** 一.一) It ain't gonna happen :) But I think there is a good case for allowing the constructors int, float and complex to continue to accept numeric *strings* with non-ASCII digits. The code already exists, there's probably people out there who rely on it, and in the absence of any convincing demonstration that the existing behaviour is causing widespread difficulty, we should leave well-enough alone. Various people have suggested that there should be a function in the locale module that handles numeric string input in non-ASCII digits. This is a de facto admission that there are use-cases for taking user input like the string '٣' and turning it into the int 3. Python can already do this, and has been able to for many years: [st...@sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> int(u'٣') 3 It seems to me that there's no need to move this functionality into locale. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 291 versus Python 3
On 30/11/2010 06:33, Éric Araujo wrote: Good morning python-dev, PEP 291 (Backward Compatibility for Standard Library) does not seem to take Python 3 into account. Is this PEP only relevant for the 2.7 branch?* If it’s supposed to apply to 3.x too, despite the view that 3.0 was a clean break, what does it mean to have a module that is developed in the py3k branch and should retain compatibility with 2.3 or 1.5.2? PEP 291 is very old and should probably be retired. I don't think anyone is maintaining standard libraries in py3k that are also compatible with Python 2.anything. (At least not in a single codebase.) For Python 2.7 that may not be true, but for Python 3 I think we can start with a clean slate on compatibility. * Tarek’s interpretation: “The 2.x needs to stay 2.3 compatible so we should keep the 3.x as similar as possible for bugfixes.” In the particular case of distutils (should be compatible with 2.3), we (including I) have been lax. Our tests for example use modern unittest features like skips, which makes them not runnable on old Pythons. They can be run on old Pythons with unittest2. This is what distutils2 is doing. I am very uncomfortable with code that seems to run fine but which tests (however few) cannot be run, so I think I’ll have to trade the skips for old-style “return” statements. The other way of solving that is to change the compat policy. This is only an issue for distutils in Python 2.7 right? Maintaining the compat policy for that will be a short-lived pain, and distutils itself is getting only infrequent bugfixes *anyway*, right? I defer to Tarek on that particular decision. All the best, Michael If I remember correctly, the rationale for code compat in distutils is that people may copy distutils from Python x.y to their install of x.y-n; I don’t know if this is still an active practice, and if it is, I don’t know if it should be supported, considering that distutils2 (compatible with 2.4+ and available from PyPI) is coming. Regards ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
haiyang kang wrote: hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print "一" I think it is a little ugly to have code like this: num = float("一.一"), expected result is: num = 1.1 I don't expect that anyone would sensibly write code like that, except for testing. You wouldn't write num = float("1.1") instead of just num = 1.1 either. But you should be able to write: text = input("Enter a number using your preferred digits: ") num = float(text) without caring whether the user enters 一.一 or 1.1 or something else. -- Steven ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On Monday 29 November 2010 20:22:22 Brett Cannon wrote: > > Considering these semantics changed between Python 2 and 3 w/o a > discernable benefit (I would consider it a negative as finding a > module should not be impacted by syntactic correctness; the full act > of importing should be the only thing that cares about that), I would > consider it a bug that should be filed. ok, here it is : http://bugs.python.org/issue10588 Since I did not understand all of it, I just quoted Brett Cannon in the ticket. -- Emile Anclin http://www.logilab.fr/ http://www.logilab.org/ Informatique scientifique & et gestion de connaissances ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 291 versus Python 3
On Tue, Nov 30, 2010 at 7:33 AM, Éric Araujo wrote: > Good morning python-dev, > > PEP 291 (Backward Compatibility for Standard Library) does not seem to > take Python 3 into account. Is this PEP only relevant for the 2.7 > branch?* If it’s supposed to apply to 3.x too, despite the view that > 3.0 was a clean break, what does it mean to have a module that is > developed in the py3k branch and should retain compatibility with 2.3 or > 1.5.2? > > * Tarek’s interpretation: “The 2.x needs to stay 2.3 compatible > so we should keep the 3.x as similar as possible for bugfixes.” > > In the particular case of distutils (should be compatible with 2.3), we > (including I) have been lax. Our tests for example use modern unittest > features like skips, which makes them not runnable on old Pythons. I am > very uncomfortable with code that seems to run fine but which tests > (however few) cannot be run, so I think I’ll have to trade the skips for > old-style “return” statements. You shouldn't be uncomfortable with the current state of distutils and try to improve its tests (or improve any other nasty stuff you'll find in that code) Distutils is dead code. All we have to do is the bare minimum maintenance. Everything else is a waste of time. > The other way of solving that is to > change the compat policy. If I remember correctly, the rationale for > code compat in distutils is that people may copy distutils from Python > x.y to their install of x.y-n; I don’t know if this is still an active > practice, and if it is, I don’t know if it should be supported, > considering that distutils2 (compatible with 2.4+ and available from > PyPI) is coming. Again, don't worry about these rules in Distutils now. The only rule that now apply to Distutils is that we do only bug fixing, and we should not waste our precious time to do other stuff in there. Plain python tests are fine for what we want to do and simplify our forward ports and backports. One thing we should do though, is fix those bugs in Distutils2 first when they exist there too. I really appreciate all the hard work your are doing in triaging the issues and bug fixing by the way ! Tarek ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
hi, I agree with this. I never seen any man in China using chinese number literals (at least two kinds:一, 壹, same meaning with 1) in Python program, except UI output. They can do some mappings when want to output these non-ascii numbers. Example: if 1: print "一" I think it is a little ugly to have code like this: num = float("一.一"), expected result is: num = 1.1 br, khy On Tue, Nov 30, 2010 at 4:23 PM, Stephen J. Turnbull wrote: > Lennart Regebro writes: > > > *I* think it is more important. In python 3, you can never ever assume > > anything is ASCII any more. > > Sure you can. In Python program text, all keywords will be ASCII > (English, even, though it may be en_NL.UTF-8) for the forseeable > future. > > I see no reason not to make a similar promise for numeric literals. I > see no good reason to allow compatibility full-width Japanese "ASCII" > numerals or Arabic cursive numerals in "for i in range(...)" for > example. > > As soon as somebody gives an example of a culture, however minor, that > uses computers but actively prefers to use non-ASCII numerals to > express numbers in an IT context, I'll review my thinking. But at the > moment it's 101% YAGNI. > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/cornsea%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On 29 novembre 14:21, Ron Adam wrote: > On 11/29/2010 01:22 PM, Brett Cannon wrote: > >Considering these semantics changed between Python 2 and 3 w/o a > >discernable benefit (I would consider it a negative as finding a > >module should not be impacted by syntactic correctness; the full act > >of importing should be the only thing that cares about that), I would > >consider it a bug that should be filed. > > The output of imp.find_module() returns an open file io object, and > it's output feeds directly into to imp.load_module(). > > >>> imp.find_module('pydoc') > (<_io.TextIOWrapper name=4 encoding='utf-8'>, > '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1)) > > So I think the imp.find_module() is suppose to be used when you *do* > want to do the full act of importing and not for just finding out if > or where module xyz exists. in python 2, find_module was usable for such usage, and this is a needed api for a tool like pylint. Is there another way to do so with python 3? -- Sylvain Thénault LOGILAB, Paris (France) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework:http://www.cubicweb.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
Lennart Regebro writes: > *I* think it is more important. In python 3, you can never ever assume > anything is ASCII any more. Sure you can. In Python program text, all keywords will be ASCII (English, even, though it may be en_NL.UTF-8) for the forseeable future. I see no reason not to make a similar promise for numeric literals. I see no good reason to allow compatibility full-width Japanese "ASCII" numerals or Arabic cursive numerals in "for i in range(...)" for example. As soon as somebody gives an example of a culture, however minor, that uses computers but actively prefers to use non-ASCII numerals to express numbers in an IT context, I'll review my thinking. But at the moment it's 101% YAGNI. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
>> During PEP 3003 discussion, it was suggested to handle it on a case by >> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP >> 3003. > > It's covered by "As the standard library is not directly tied to the > language definition it is not covered by this moratorium." How is this restricted to the stdlib if it defines the set of valid identifiers? - Hagen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python and the Unicode Character Database
On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky wrote: > While we have little choice but to follow UCD in defining > str.isidentifier(), I think Python can promise users more stability in > what it treats as space or as a digit in its builtins. Why? I can see this is a problem if one character that earlier was allowed no longer is. That breaks backwards compatibility. This doesn't. float('١٢٣٤.٥٦') > 1234.56 > > is more important than to assure users that once their program > accepted some text as a number, they can assume that the text is > ASCII. *I* think it is more important. In python 3, you can never ever assume anything is ASCII any more. ASCII is practically dead an buried as far as Python goes, unless you explicitly encode to it. > def deposit(self, amountstr): > self.balance += float(amountstr) > audit_log("Deposited: " + amountstr) > > Auditor: > > $ cat numbered-account.log > Deposited: ?.?? That log reasonably should be in UTF-8 or something else, in which case this is not a problem. And that's ignoring that it makes way more sense to log the numerical amount. -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com