Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Nick Coghlan
On Wed, Dec 1, 2010 at 3:59 PM, Ron Adam  wrote:
> Yes, it's realising that it is a *lot* more *complicated*, that gets me.
> Flawed isn't the right word, it's rather a feeling things could have been
> simpler if perhaps some things were done differently.

*That* feeling I can understand. The import system has steadily
acquired features over time, with each addition constrained by
backwards compatibility concerns with all the past additions,
including the exotic hacks people were using to fake features that
were added more cleanly later.

For the directory-as-module-not-package idea, you could probably
implement a PEP 302 importer/loader that did that (independent of the
stdlib). It would have the advantage of avoiding a lot of the pickle
compatibility problems that a "flat package" like the new unittest
layout can cause. However, you would need to be very careful with it,
since all the files would be sharing a common globals() namespace.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Ron Adam



On 11/30/2010 07:19 PM, Nick Coghlan wrote:

On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam  wrote:

* It almost seems like the concept of a sub-module (in a package) is flawed.
  I'm not sure I can explain what causes me to feel that way at the moment
though.


It isn't flawed, it is just a *lot* more complicated than most people
realise (cf. PEP 302).


Yes, it's realising that it is a *lot* more *complicated*, that gets me. 
Flawed isn't the right word, it's rather a feeling things could have been 
simpler if perhaps some things were done differently.


Here is the gist of ideas I got from these feelings.  (Food for thought and 
YMMV and all that.)


Python doesn't have a nice way to define a collection of modules that isn't 
also a package.  So we have packages used to organise modules, and packages 
inside other packages.  A collection of modules wouldn't require importing 
a package before importing a module in it.


Another idea is, to have a way to split a large module into files, and have 
it still *be* a module, and not a package.  And also be able to tell what 
is what, by looking at the directory structure.


The train of thought these things came from is, how can we get back to 
having the directory tree have enough info in it so it's clear what is 
what?  And how can we avoid some of the *interdependent* nesting?




In this case, the signature of find_module (returning an already open
file) is unfortunate, but probably necessary given the way the import
internals currently work. As Brett says, returning a loader would be
preferable, but the builtin import machinery doesn't have proper
loaders defined (and won't until we manage to get to the point where
importlib *is* the import machinery).


I'll be looking forward to the new loaders. :-)

Cheers,

Ron

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What is PyBuffer_SizeFromFormat?

2010-11-30 Thread Nick Coghlan
On Wed, Dec 1, 2010 at 12:30 PM, INADA Naoki  wrote:
> PyBuffer_SizeFromFormat is documented and defined in abstract.h.
> But I can't find an implementation of the function.
> Do I overlook anything?

PEP 3118 describes what it is *meant* to do. Looks like it might be
yet another thing that was missed in the implementation of that PEP
though :P

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] What is PyBuffer_SizeFromFormat?

2010-11-30 Thread INADA Naoki
PyBuffer_SizeFromFormat is documented and defined in abstract.h.
But I can't find an implementation of the function.
Do I overlook anything?

-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Nick Coghlan
On Wed, Dec 1, 2010 at 8:48 AM, Ron Adam  wrote:
> * It almost seems like the concept of a sub-module (in a package) is flawed.
>  I'm not sure I can explain what causes me to feel that way at the moment
> though.

It isn't flawed, it is just a *lot* more complicated than most people
realise (cf. PEP 302).

In this case, the signature of find_module (returning an already open
file) is unfortunate, but probably necessary given the way the import
internals currently work. As Brett says, returning a loader would be
preferable, but the builtin import machinery doesn't have proper
loaders defined (and won't until we manage to get to the point where
importlib *is* the import machinery).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] I/O ABCs

2010-11-30 Thread Daniel Stutzbach
The documentation for the collections Abstract Base Classes (ABCs) [1]
contains a table listing all of the collections ABCs, their parent classes,
their abstract methods, and the methods they provide.  This table makes it
very easy to figure out which methods I must override when I derive from one
of the ABCs, as well as which methods will be provided for me.

I'm working on a similar table for the I/O ABCs (
http://bugs.python.org/issue10589).  The existing documentation [2]
describes the methods of each class but doesn't describe which methods
provide a meaningful implementation and which methods a user should
override.  If I want to inherit from one of the I/O ABCs, I have to go
poking into Lib/_pyio.py to figure out which methods I need to override.

While starting to examine the I/O ABCs, I discovered that there are
some inconsistencies.  For example, RawIOBase provides .read() if the
subclass provides .readinto().  BufferedIOBase does the opposite; it
provides .readinto() if the subclass provides .read() [3].

I would like to fix some of these inconsistencies.  However, this will be a
backwards-incompatible change.  A Google Code Search suggests that the ABCs
are currently only used within the standard library [4].  Just to be clear,
the changes would NOT impact code that merely uses I/O objects; they would
only impact code that implements I/O by subclassing one of the I/O ABCs and
depending on features that are currently undocumented.

Does anyone have any categorical objections?

[1]:
http://docs.python.org/py3k/library/collections.html#abcs-abstract-base-classes
 [2]: http://docs.python.org/py3k/library/io.html#class-hierarchy
[3]: Possibly hurting performance by forcing .readinto() to perform the
extra allocations, copies, and deallocations required by .read().
[4]:
http://www.google.com/codesearch?hl=en&sa=N&q=BufferedIOBase++lang:python&ct=rr&cs_r=lang:python

-- 
Daniel Stutzbach
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
"Martin v. Löwis"  writes:

> Am 30.11.2010 21:24, schrieb Ben Finney:
> > The string need not be a literal in the program; it can be input to
> > the program.
> > 
> > num = float(input_from_the_external_world)
> > 
> > Does that change your assessment of whether non-ASCII digits are
> > used?
>
> I think the OP (haiyang kang) already indicated that he finds it quite
> unlikely that anybody would possibly want to enter that.

Who's talking about *entering* it into the program at a keyboard
directly, though? Input to a program can come from all kinds of crazy
sources. Just because it wasn't typed by the person at the keyboard
using this program doesn't stop it being input to the program.

A concrete example, but certainly not the only possible case: non-ASCII
digit characters representing integers, stored as text in a file.

Note that I'm not saying this is common. Nor am I saying it's a
desirable situation. I'm saying it is a feasible use case, to be
dismissed only if there is strong evidence that it's not used by
existing Python code.

-- 
 \   “When a well-packaged web of lies has been sold to the masses |
  `\over generations, the truth will seem utterly preposterous and |
_o__)its speaker a raving lunatic.” —Dresden James |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy

On 11/30/2010 10:05 AM, Alexander Belopolsky wrote:

My general answers to the questions you have raised are as follows:

1. Each new feature release should use the latest version of the UCD as 
of the first beta release (or perhaps a week or so before). New chars 
are new features and the beta period can be used to (hopefully) iron out 
any bugs introduced by a new UCD version.


2. The language specification should not be UCD version specific. Martin 
pointed out that the definition of identifiers was intentionally written 
to not be, bu referring to 'current version' or some such. On the other 
hand, the UCD version used should be programatically discoverable, 
perhaps as an attribute of sys or str.


3.. The UCD should not change in bugfix releases. New chars are new 
features. Adding them in bugfix releases will introduce gratuitous 
imcompatibilities between releases. People who want the latest Unicode 
should either upgrade to the latest Python version or patch an older 
version (but not expect core support for any problems that creates).



Given that 2.7 will be maintained for 5 years and arguably Unicode
Consortium takes backward compatibility very seriously, wouldn't it
make sense to consider a backport at some point?

I am sure we will soon see a bug report that the following does not
work in 2.7: :-)

ord('\N{CAT FACE WITH WRY SMILE}')

128572


3 (cont). 2.7 is no different in that regard. It is feature frozen just 
like all other x.y releases. And that is the answer to any such report. 
If that code became valid in 2.7.2, for instance, it would still not 
work in 2.7 and 2.7.1. Not working is not a bug; working is a new 
feature introduced after 2.7 was released.



- How specific should library reference manual be in defining methods
affected by UCD such as str.upper()?


It should specify what this actually does in Unicode terminology
(probably in addition to a layman's rephrase of that)



I opened an issue for this:

http://bugs.python.org/issue10587


1,2 (cont). Good idea in general.


I was more concerned about wide an narrow unicode CPython builds.  Is
it a bug that   '\U'.isalpha() may disagree even when the two
implementations are based on the same version of UCD?



4. While the difference between narrow/wide builds of (CPython) x.y 
(which should have once constant UCD) cannot be completely masked, I 
appreciate and generally agree with  your efforts to minimize them. In 
some cases, there will be a conflict/tradeoff between eliminating this 
difference versus that.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 23:43, schrieb Terry Reedy:
> On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote:
> 
>> I see no reason not to make a similar promise for numeric literals.  I
>> see no good reason to allow compatibility full-width Japanese "ASCII"
>> numerals or Arabic cursive numerals in "for i in range(...)" for
>> example.
> 
> I do not think that anyone, at least not me, has argued for anything
> other than 0-9 digits (or 0-f for hex) in literals in program code. The
> only issue is whether non-programmer *users* should be able to use their
> native digits in applications in response to input prompts.

And here, my observation stands: if they wanted to, they currently
couldn't - at least not for real numbers (and also not for integers
if they want to use grouping). So the presumed application of this
feature doesn't actually work, despite the presence of the feature it
was supposedly meant to enable.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 21:24, schrieb Ben Finney:
> haiyang kang  writes:
> 
>>   I think it is a little ugly to have code like this: num =
>> float("一.一"), expected result is: num = 1.1
> 
> That's a straw man, though. The string need not be a literal in the
> program; it can be input to the program.
> 
> num = float(input_from_the_external_world)
> 
> Does that change your assessment of whether non-ASCII digits are used?

I think the OP (haiyang kang) already indicated that he finds it quite
unlikely that anybody would possibly want to enter that. You would need
a number of key strokes to enter each individual ideograph, plus you
have to press the keys for keyboard layout switching to enter the Latin
decimal separator (which you normally wouldn't use along with the Han
numerals).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Ron Adam



On 11/30/2010 01:41 PM, Brett Cannon wrote:

On Mon, Nov 29, 2010 at 12:21, Ron Adam  wrote:



On 11/29/2010 01:22 PM, Brett Cannon wrote:


On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault
wrote:


On 25 novembre 11:22, Ron Adam wrote:


On 11/25/2010 08:30 AM, Emile Anclin wrote:


hello,

working on Pylint, we have a lot of voluntary corrupted files to test
Pylint behavior; for instance

$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
# -*- coding: IBO-8859-1 -*-
""" check correct unknown encoding declaration
"""

__revision__ = ''


and we try to find that module :
find_module('func_unknown_encoding', None). But python3 raises
SyntaxError
in that case ; it didn't raise SyntaxError on python2 nor does so on
our
func_nonascii_noencoding and func_wrong_encoding modules (with obvious
names)

Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.


>from imp import find_module


find_module('func_unknown_encoding', None)


Traceback (most recent call last):
   File "", line 1, in
SyntaxError: encoding problem: with BOM


I don't think there is a clear reason by design.  Also try importing
the same modules directly and noting the differences in the errors
you get.


IMO the point is that we can consider as a bug the fact that find_module
tries to somewhat read the content of the file, no? Though it seems to
only
doing this for encoding detection or like since find_module doesn't choke
on
a module containing another kind of syntax error.

So the question is, should we deal with this in pylint/astng, or can we
expect
this to be fixed at some point?


Considering these semantics changed between Python 2 and 3 w/o a
discernable benefit (I would consider it a negative as finding a
module should not be impacted by syntactic correctness; the full act
of importing should be the only thing that cares about that), I would
consider it a bug that should be filed.


The output of imp.find_module() returns an open file io object, and it's
output feeds directly into to imp.load_module().


imp.find_module('pydoc')

(<_io.TextIOWrapper name=4 encoding='utf-8'>,
'/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))

So I think the imp.find_module() is suppose to be used when you *do* want to
do the full act of importing and not for just finding out if or where module
xyz exists.


Going with your line of argument, why can't imp.load_module be the
call that figures out there is a syntax error? If you look at this
from the perspective of PEP 302, finding a module has absolutely
nothing to do with the validity of the found source, just that
something was found somewhere which (hopefully) contains code that
represents the module.


The part that I'm looking at, is what would find_module return if the 
encoding is bad or not found for the encoding?


   <_io.TextIOWrapper name=4 encoding='bad_encoding'>


Maybe we could have some library introspection function in the inspect for 
just looking in the library rather than loading modules.  But I think those 
would have the same issues, as packages need to be loaded in order to find 
sub modules.*


* It almost seems like the concept of a sub-module (in a package) is 
flawed.  I'm not sure I can explain what causes me to feel that way at the 
moment though.


Ron

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Terry Reedy

On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote:


I see no reason not to make a similar promise for numeric literals.  I
see no good reason to allow compatibility full-width Japanese "ASCII"
numerals or Arabic cursive numerals in "for i in range(...)" for
example.


I do not think that anyone, at least not me, has argued for anything 
other than 0-9 digits (or 0-f for hex) in literals in program code. The 
only issue is whether non-programmer *users* should be able to use their 
native digits in applications in response to input prompts.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Barry Warsaw
On Nov 30, 2010, at 12:11 PM, Brett Cannon wrote:

>I will channel Neal: "I decline and/or do not want to respond". =)

PEP 291 updated.
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Ben Finney
haiyang kang  writes:

>   I think it is a little ugly to have code like this: num =
> float("一.一"), expected result is: num = 1.1

That's a straw man, though. The string need not be a literal in the
program; it can be input to the program.

num = float(input_from_the_external_world)

Does that change your assessment of whether non-ASCII digits are used?

-- 
 \“The greatest tragedy in mankind's entire history may be the |
  `\   hijacking of morality by religion.” —Arthur C. Clarke, 1991 |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ICU

2010-11-30 Thread Antoine Pitrou

Oh, about ICU:

> > Actually, I remember you saying that locale should ideally be replaced
> > with a wrapper around the ICU library.
> 
> By that, I stand - however, I have given up the hope that this will
> happen anytime soon.

Perhaps this could be made a GSOC topic.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Brett Cannon
On Tue, Nov 30, 2010 at 07:35, Barry Warsaw  wrote:
> On Nov 30, 2010, at 01:09 PM, Michael Foord wrote:
>
>>PEP 291 is very old and should probably be retired. I don't think anyone is
>>maintaining standard libraries in py3k that are also compatible with Python
>>2.anything. (At least not in a single codebase.)
>
> I agree.

Same here; I have purposefully ignored compatibility requirements
because I always found those promises to be extremely annoying and
somewhat painful to enforce.

>  I think we should change the status of PEP 291 to Final, and add a
> few words to make it clear it applies only to Python 2.  Since Neal owns the
> PEP, he should get first crack at doing the update, but I volunteer to make
> those changes if he declines (or does not respond).
>

I will channel Neal: "I decline and/or do not want to respond". =)

> We may eventually need a similar document for Python 3, but it should be a new
> PEP.

I hope not.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:55 +0100, "Martin v. Löwis" a écrit :
> Wrt. to local number parsing, I think that the locale module would be
> way better than the nonsense that Python currently does. In the locale
> module, somebody at least has thought about what specifically
> constitutes a number. The current not-ASCII-but-not-local-either
> approach is just useless.

It depends what you need. If you parse integers it's probably good
enough. And it's better to have a trustable standard (unicode) than a
myriad of ad-hoc, possibly buggy or incomplete, often unavailable,
cultural specifications drafted by OS vendors who have no business (and
no expertise) in drafting them.

At least you can build more sophisticated routines on the simple
information given to you by the unicode database. You cannot build
anything solid on the C locale functions (and even then you are limited
by various issues inherent in the locale semantics, such as the fact
that it relies on process-wide state, which would only be ok, at best,
for single-user applications). There's a reason that e.g. Babel (*)
reimplements locale-like functionality from scratch.

(*) http://pypi.python.org/pypi/Babel/

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
> Because we all know how locale is a pile of cr*p, both in specification
> and in implementations. Our unit tests for it are a clear proof of that.

I wouldn't use expletives, but rather claim that the locale module is
highly platform-dependent.

> Actually, I remember you saying that locale should ideally be replaced
> with a wrapper around the ICU library.

By that, I stand - however, I have given up the hope that this will
happen anytime soon.

Wrt. to local number parsing, I think that the locale module would be
way better than the nonsense that Python currently does. In the locale
module, somebody at least has thought about what specifically
constitutes a number. The current not-ASCII-but-not-local-either
approach is just useless.

Maintaining a reasonable implementation is a burden, so deferring
to the C library is more attractive than having to maintain an
unreasonable implementation.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Brett Cannon
On Tue, Nov 30, 2010 at 00:34, Sylvain Thénault
 wrote:
> On 29 novembre 14:21, Ron Adam wrote:
>> On 11/29/2010 01:22 PM, Brett Cannon wrote:
>> >Considering these semantics changed between Python 2 and 3 w/o a
>> >discernable benefit (I would consider it a negative as finding a
>> >module should not be impacted by syntactic correctness; the full act
>> >of importing should be the only thing that cares about that), I would
>> >consider it a bug that should be filed.
>>
>> The output of imp.find_module() returns an open file io object, and
>> it's output feeds directly into to imp.load_module().
>>
>> >>> imp.find_module('pydoc')
>> (<_io.TextIOWrapper name=4 encoding='utf-8'>,
>> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
>>
>> So I think the imp.find_module() is suppose to be used when you *do*
>> want to do the full act of importing and not for just finding out if
>> or where module xyz exists.
>
> in python 2, find_module was usable for such usage, and this is a needed api
> for a tool like pylint. Is there another way to do so with python 3?

At the moment, no. Best option would be to create an
importlib.find_module function which returns a loader if the module is
found, else returns None. The loader can have its get_source method
called to read the source code (w/o verification). I have this planned
for Python 3.3 but not 3.2 with us so close to 3.2b1.

> --
> Sylvain Thénault                               LOGILAB, Paris (France)
> Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
> Développement logiciel sur mesure:       http://www.logilab.fr/services
> CubicWeb, the semantic web framework:    http://www.cubicweb.org
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:40 +0100, "Martin v. Löwis" a écrit :
> Am 30.11.2010 20:23, schrieb Antoine Pitrou:
> > Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit :
> >>> Would moving this functionality to the locale module make the issues any
> >>> easier to fix?
> >>
> >> You could delegate it to the C library, so: yes.
> > 
> > I hope you don't suggest delegating it to the C locale functions.
> > Do you?
> 
> Yes, I do. Why do you hope I don't?

Because we all know how locale is a pile of cr*p, both in specification
and in implementations. Our unit tests for it are a clear proof of that.

Actually, I remember you saying that locale should ideally be replaced
with a wrapper around the ICU library.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Brett Cannon
On Mon, Nov 29, 2010 at 12:21, Ron Adam  wrote:
>
>
> On 11/29/2010 01:22 PM, Brett Cannon wrote:
>>
>> On Mon, Nov 29, 2010 at 03:53, Sylvain Thénault
>>   wrote:
>>>
>>> On 25 novembre 11:22, Ron Adam wrote:

 On 11/25/2010 08:30 AM, Emile Anclin wrote:
>
> hello,
>
> working on Pylint, we have a lot of voluntary corrupted files to test
> Pylint behavior; for instance
>
> $ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
> # -*- coding: IBO-8859-1 -*-
> """ check correct unknown encoding declaration
> """
>
> __revision__ = ''
>
>
> and we try to find that module :
> find_module('func_unknown_encoding', None). But python3 raises
> SyntaxError
> in that case ; it didn't raise SyntaxError on python2 nor does so on
> our
> func_nonascii_noencoding and func_wrong_encoding modules (with obvious
> names)
>
> Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
> [GCC 4.3.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> >from imp import find_module

 find_module('func_unknown_encoding', None)
>
> Traceback (most recent call last):
>   File "", line 1, in
> SyntaxError: encoding problem: with BOM

 I don't think there is a clear reason by design.  Also try importing
 the same modules directly and noting the differences in the errors
 you get.
>>>
>>> IMO the point is that we can consider as a bug the fact that find_module
>>> tries to somewhat read the content of the file, no? Though it seems to
>>> only
>>> doing this for encoding detection or like since find_module doesn't choke
>>> on
>>> a module containing another kind of syntax error.
>>>
>>> So the question is, should we deal with this in pylint/astng, or can we
>>> expect
>>> this to be fixed at some point?
>>
>> Considering these semantics changed between Python 2 and 3 w/o a
>> discernable benefit (I would consider it a negative as finding a
>> module should not be impacted by syntactic correctness; the full act
>> of importing should be the only thing that cares about that), I would
>> consider it a bug that should be filed.
>
> The output of imp.find_module() returns an open file io object, and it's
> output feeds directly into to imp.load_module().
>
 imp.find_module('pydoc')
> (<_io.TextIOWrapper name=4 encoding='utf-8'>,
> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
>
> So I think the imp.find_module() is suppose to be used when you *do* want to
> do the full act of importing and not for just finding out if or where module
> xyz exists.

Going with your line of argument, why can't imp.load_module be the
call that figures out there is a syntax error? If you look at this
from the perspective of PEP 302, finding a module has absolutely
nothing to do with the validity of the found source, just that
something was found somewhere which (hopefully) contains code that
represents the module.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 20:23, schrieb Antoine Pitrou:
> Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit :
>>> Would moving this functionality to the locale module make the issues any
>>> easier to fix?
>>
>> You could delegate it to the C library, so: yes.
> 
> I hope you don't suggest delegating it to the C locale functions.
> Do you?

Yes, I do. Why do you hope I don't?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 20:16 +0100, "Martin v. Löwis" a écrit :
> > Would moving this functionality to the locale module make the issues any
> > easier to fix?
> 
> You could delegate it to the C library, so: yes.

I hope you don't suggest delegating it to the C locale functions.
Do you?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
> Would moving this functionality to the locale module make the issues any
> easier to fix?

You could delegate it to the C library, so: yes.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Martin v. Löwis
Am 30.11.2010 09:15, schrieb Hagen Fürstenau:
>>> During PEP 3003 discussion, it was suggested to handle it on a case by
>>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP
>>> 3003.
>>
>> It's covered by "As the standard library is not directly tied to the
>> language definition it is not covered by this moratorium."
> 
> How is this restricted to the stdlib if it defines the set of valid
> identifiers?

The language does not change. The language specification says

Python 3.0 introduces additional characters from outside the ASCII range
(see PEP 3131). For these characters, the classification uses the
version of the Unicode Character Database as included in the unicodedata
module.

That remains unchanged. It was a deliberate design decision of PEP 3131
to not codify a fixed set of characters that can be used in identifiers.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 1:29 PM, Antoine Pitrou  wrote:
..
>> I am not sure this belongs to the locale module, however.  It seems to
>> me, something like 'unicodealgo' for unicode algorithms would be more
>> appropriate.
>
> It could simply be in unicodedata if you split the implementation into a
> core C part and some Python bits.
>

Splitting unicodedata may not be a bad idea.  There are many more
pieces in UCD than covered by unicodedata. [1]  Hardcoding them all
into unicodedata module is hard to justify, but some are quite useful.
 For example, PropertyValueAliases.txt is quite useful for those like
myself who cannot remember what Pd or Zl category names stand for.
SpecialCasing.txt is required for proper casing, but is not currently
included in Python.  I would not want to change str.upper or str.title
because of this, but providing the raw info to someone who wants to
implement proper case mappings may not be a bad idea.  Blocks.txt is
certainly useful for any language-dependent processing.

On the other hand, I think we should keep Unicode data and Unicode
algorithms separate.  And the latter may not even belong to the Python
stdlib.

[1] http://unicode.org/Public/UNIDATA/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou

> Sure, if we code it in Python, supporting it will by much easier:
> 
> def normalize_digits(s):
> digits = {m.group(1) for m in re.finditer('(\d)', s)}
> trtab = {ord(d): str(unicodedata.digit(d)) for d in digits}
> return s.translate(trtab)
> 
> >>> normalize_digits('١٢٣٤.٥٦')
> '1234.56'
> 
> I am not sure this belongs to the locale module, however.  It seems to
> me, something like 'unicodealgo' for unicode algorithms would be more
> appropriate.

It could simply be in unicodedata if you split the implementation into a
core C part and some Python bits.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 12:40 PM, Michael Foord
 wrote:
..
>> If you think non-ASCII digits are not difficult to support, please
>> contribute to the following tracker issues:
>>
>
> Would moving this functionality to the locale module make the issues any
> easier to fix?
>

Sure, if we code it in Python, supporting it will by much easier:

def normalize_digits(s):
digits = {m.group(1) for m in re.finditer('(\d)', s)}
trtab = {ord(d): str(unicodedata.digit(d)) for d in digits}
return s.translate(trtab)

>>> normalize_digits('١٢٣٤.٥٦')
'1234.56'

I am not sure this belongs to the locale module, however.  It seems to
me, something like 'unicodealgo' for unicode algorithms would be more
appropriate.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Michael Foord

On 30/11/2010 16:40, Alexander Belopolsky wrote:

[snip...]
And of course,


unicodedata.digit('\U0001D7CE')

0

but


int('\U0001D7CE')

..
UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' ..

on a narrow Unicode build.  (Note the character reported in the error message!)


If you think non-ASCII digits are not difficult to support, please
contribute to the following tracker issues:



Would moving this functionality to the locale module make the issues any 
easier to fix?


Michael


http://bugs.python.org/issue10581
(Review and document string format accepted in numeric data type constructors)

http://bugs.python.org/issue10557
(Malformed error message from float())

http://bugs.python.org/issue10435
(Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal)

http://bugs.python.org/issue8646
(PyUnicode_EncodeDecimal is undocumented)

http://bugs.python.org/issue6632
(Include more fullwidth chars in the decimal codec)

and back to the issue of user confusion

http://bugs.python.org/issue652104 [closed/invalid]
(int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 2:38 PM, Alexander Belopolsky
 wrote:
..
>> Still, if it's not detrimental and it it's not difficult to support,
>> then why do you care?
>
> It is difficult to support.  A fix for issue10557 would be much
> simpler if we did not support non-European digits.  I now added a
> patch that handles non-ascii digits, so you can see what's involved.
> Note that when Unicode Consortium inevitably adds more Nd characters
> to the non-BMP planes, we will have to add surrogate pairs' support to
> this code.
>

It turns out that this did in fact happen:

# Newly assigned in Unicode 3.1.0 (March, 2001)
..
1D7CE..1D7FF  ; 3.1 #  [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL
MONOSPACE DIGIT NINE

See http://unicode.org/Public/UNIDATA/DerivedAge.txt

And of course,

>>> unicodedata.digit('\U0001D7CE')
0

but

>>> int('\U0001D7CE')
..
UnicodeEncodeError: 'decimal' codec can't encode character '\ud835' ..

on a narrow Unicode build.  (Note the character reported in the error message!)


If you think non-ASCII digits are not difficult to support, please
contribute to the following tracker issues:

http://bugs.python.org/issue10581
(Review and document string format accepted in numeric data type constructors)

http://bugs.python.org/issue10557
(Malformed error message from float())

http://bugs.python.org/issue10435
(Document unicode C-API in reST - Specifically, PyUnicode_EncodeDecimal)

http://bugs.python.org/issue8646
(PyUnicode_EncodeDecimal is undocumented)

http://bugs.python.org/issue6632
(Include more fullwidth chars in the decimal codec)

and back to the issue of user confusion

http://bugs.python.org/issue652104 [closed/invalid]
(int(u"\u1234") raises UnicodeEncodeError by Guido van Rossum)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stefan Krah
Alexander Belopolsky  wrote:
> On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang  wrote:
> >> But you should be able to write:
> >>
> >> text = input("Enter a number using your preferred digits: ")
> >> num = float(text)
> >>
> >> without caring whether the user enters 一.一 or 1.1 or something else.
> >
> > yes. from logical point of view, this can happen. ...
> 
> Please stop discussing a non-feature.  Python's float *does not*
> accept ' 一.一'.  This was reported as a bug and closed as invalid.

That seems irrelevant to me. One of the main topics of this thread is
whether actual native speakers would be happy with ascii-only input for
float().

haiyang kang confirmed that this is the case. I hope that more
local speakers will contribute their views.


Stefan Krah


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Barry Warsaw
On Nov 30, 2010, at 01:09 PM, Michael Foord wrote:

>PEP 291 is very old and should probably be retired. I don't think anyone is
>maintaining standard libraries in py3k that are also compatible with Python
>2.anything. (At least not in a single codebase.)

I agree.  I think we should change the status of PEP 291 to Final, and add a
few words to make it clear it applies only to Python 2.  Since Neal owns the
PEP, he should get first crack at doing the update, but I volunteer to make
those changes if he declines (or does not respond).

We may eventually need a similar document for Python 3, but it should be a new
PEP.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 9:56 AM, haiyang kang  wrote:
>> But you should be able to write:
>>
>> text = input("Enter a number using your preferred digits: ")
>> num = float(text)
>>
>> without caring whether the user enters 一.一 or 1.1 or something else.
>
> yes. from logical point of view, this can happen. ...

Please stop discussing a non-feature.  Python's float *does not*
accept ' 一.一'.  This was reported as a bug and closed as invalid.

See "makeunicodedata.py does not support Unihan digit data"
http://bugs.python.org/issue10575
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Mon, Nov 29, 2010 at 4:13 PM, "Martin v. Löwis"  wrote:
>> - Should Python documentation refer to the specific version of Unicode
>> that it supports?
>
> You mean, mention it somewhere? Sure (although it would be nice if the
> documentation generator would automatically extract it from the source,
> just as it extracts the Python version number).
>
> Of course, such mentioning should explain that this is specific to
> CPython, and not an aspect of Python-the-language.
>
>> Current documentation refers to old versions.  Should version be
>> updated or removed to imply the latest?
>
> What specific reference are you referring to?
>
I found two places: A reference to Unicode 3.0 (!) in the Data Model
section and a reference to 5.2.0 in unicodedata docs.

See http://mail.python.org/pipermail/docs/2010-November/002074.html

>> - How UCD updates should be handled during the language moratorium?
>
> It's clearly not affected.
>

This is not what Guido said last year:
"""
> One question:
>
> There are currently number of patch waiting on the tracker for
> additional Unicode feature support and it's also likely that we'll
> want to upgrade to a more recent Unicode version within the
> next few years.
>
> How would such indirect changes be seen under the moratorium ?

That would fall under the Case-by-Case Exemptions section. "Within the
next few years" sounds like it might well wait until the moratorium is
ended though. :-)
"""

http://mail.python.org/pipermail/python-dev/2009-November/093666.html

I don't see it as a big deal, but technically speaking, with Unicode
6.0 changing properties of two characters to become identifiers Python
language definition is affected.  For example, an alternative
implementation based on 5.2.0 will not accept a valid CPython program
that uses one of these characters.

>> During PEP 3003 discussion, it was suggested to handle it on a case by
>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP
>> 3003.
>
> It's covered by "As the standard library is not directly tied to the
> language definition it is not covered by this moratorium."
>

See above.  Also, it has been suggested that semantics of built-ins
cannot change.  (If that was so, it would put int('١٢٣٤') debate to
rest at least for the time being.:-)

>>  Should this upgrade be backported to 2.7?
>
> No, it's a new feature.
>
Given that 2.7 will be maintained for 5 years and arguably Unicode
Consortium takes backward compatibility very seriously, wouldn't it
make sense to consider a backport at some point?

I am sure we will soon see a bug report that the following does not
work in 2.7: :-)
>>> ord('\N{CAT FACE WITH WRY SMILE}')
128572


>> - How specific should library reference manual be in defining methods
>> affected by UCD such as str.upper()?
>
> It should specify what this actually does in Unicode terminology
> (probably in addition to a layman's rephrase of that)
>

I opened an issue for this:

http://bugs.python.org/issue10587

>> .. For example, if '\U'.isalpha() returns true
>> in one implementation, can it return false in another?
>
> Implementations are free to use any version of the UCD.

I was more concerned about wide an narrow unicode CPython builds.  Is
it a bug that   '\U'.isalpha() may disagree even when the two
implementations are based on the same version of UCD?


Thanks for your answers.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
> But you should be able to write:
>
> text = input("Enter a number using your preferred digits: ")
> num = float(text)
>
> without caring whether the user enters 一.一 or 1.1 or something else.

yes. from logical point of view, this can happen.

But i really doubt that if really there are users who would like to
input number like that,
means that they first use google pinyin method to input 一, then change
to english input method to input . , then change to google pinyin
again for the other 一;
 or maybe you mean they input the whole  一.一 words with google pinyin
input method.

To input 1, users only need to type one time keyboard, but to input 一,
they need to type three times (yi SPACE).

Of course, users can also input something accidentally, but we just
need to give them some kind reminders.

At least coders in my around will restrain their system users to input
numbers with ASCII,
and seems that users are still happy with the ASCII type numbers :).

br,
khy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Module size

2010-11-30 Thread Tim Lesher
On Tue, Nov 30, 2010 at 09:41, Antoine Pitrou  wrote:
> That said, I don't think the size is very important. For any non-trivial
> Python application, the size of unicodedata will be negligible compared
> to the size of Python objects.

That depends very much on the platform and the application.  For our
embedded use of Python, static data size (like the text segment of a
shared object) is far dearer than the heap space used by Python
objects, which is why we've had to excise both the UCD and the CJK
codecs in our builds.
-- 
Tim Lesher 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Module size

2010-11-30 Thread Antoine Pitrou
Le mardi 30 novembre 2010 à 09:32 -0500, Alexander Belopolsky a écrit :
> On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou  wrote:
> > On Mon, 29 Nov 2010 22:46:33 -0500
> > Alexander Belopolsky  wrote:
> >>
> >> In practical terms, UCD comes at a price.  The unicodedata module size
> >> is over 700K on my machine.  This is almost half the size of the
> >> python executable and by far the largest extension module. (only CJK
> >> encodings come close.)  Making builtins depend on the largest
> >> extension module for operation does not strike me as sound design.
> >
> > Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend
> > only on Objects/unicodectype.c.
> 
> My mistake. That was a late night post.  I wonder why unicodedata.so
> is so big then.
> 
> It must be character names:
> 
> $ python -v
> >>> '\N{DIGIT ONE}'
> dlopen("/.../unicodedata.so", 2);
> import unicodedata # dynamically loaded from /.../unicodedata.so
> '1'

From a quick peek using hexdump, character names seem to only account
for 1/4 of the module size.
That said, I don't think the size is very important. For any non-trivial
Python application, the size of unicodedata will be negligible compared
to the size of Python objects.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Module size

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou  wrote:
> On Mon, 29 Nov 2010 22:46:33 -0500
> Alexander Belopolsky  wrote:
>>
>> In practical terms, UCD comes at a price.  The unicodedata module size
>> is over 700K on my machine.  This is almost half the size of the
>> python executable and by far the largest extension module. (only CJK
>> encodings come close.)  Making builtins depend on the largest
>> extension module for operation does not strike me as sound design.
>
> Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend
> only on Objects/unicodectype.c.

My mistake. That was a late night post.  I wonder why unicodedata.so
is so big then.

It must be character names:

$ python -v
>>> '\N{DIGIT ONE}'
dlopen("/.../unicodedata.so", 2);
import unicodedata # dynamically loaded from /.../unicodedata.so
'1'
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano  wrote:
..
> But you should be able to write:
>
> text = input("Enter a number using your preferred digits: ")
> num = float(text)
>
> without caring whether the user enters 一.一 or 1.1 or something else.
>

I find it ironic that people who argue for preservation of the current
behavior do it without checking what it actually is:

>>> float('一.一')
..
UnicodeEncodeError: 'decimal' codec can't encode character '\u4e00' ..

This one of the biggest problems with this feature.  It does not fit
user's expectations.  Even the original author of the decimal "codec"
expected the above to work. [1]

> Python can already do this, and has been able to for many years:
> >>> int(u'٣')
> 3

but you can do this without support from int() as well:

>>> import unicodedata
>>> unicodedata.digit('٣')
3

and for Unihan numbers, you can do
>>> unicodedata.numeric('一')
1.0

and

>>> unicodedata.numeric('ⅷ')
8.0

and if you are so inclined,

>>> [unicodedata.numeric(c) for c in "ↂ ↁ ⅗ ⅞ 𐄳".split()]
[1.0, 5000.0, 0.6, 0.875, 9.0]

Do you want to see all these supported by float()?

[1] "makeunicodedata.py does not support Unihan digit data"
http://bugs.python.org/issue10575
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Module size

2010-11-30 Thread Antoine Pitrou
On Mon, 29 Nov 2010 22:46:33 -0500
Alexander Belopolsky  wrote:
> 
> In practical terms, UCD comes at a price.  The unicodedata module size
> is over 700K on my machine.  This is almost half the size of the
> python executable and by far the largest extension module. (only CJK
> encodings come close.)  Making builtins depend on the largest
> extension module for operation does not strike me as sound design.

Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend
only on Objects/unicodectype.c.

$ size Objects/unicode*.o
   textdata bss dec hex filename
  60398   0   0   60398ebee Objects/unicodectype.o
 130440   135592208  146207   23b1f Objects/unicodeobject.o


Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Antoine Pitrou
On Wed, 01 Dec 2010 00:23:22 +1100
Steven D'Aprano  wrote:
> 
> But I think there is a good case for allowing the constructors int, 
> float and complex to continue to accept numeric *strings* with non-ASCII 
>   digits. The code already exists, there's probably people out there who 
> rely on it, and in the absence of any convincing demonstration that the 
> existing behaviour is causing widespread difficulty, we should leave 
> well-enough alone.

+1

> It seems to me that there's no need to move this functionality into locale.

Not only, but moving it into locale won't make it easier to maintain
anyway.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano

Stephen J. Turnbull wrote:

Lennart Regebro writes:

 > *I* think it is more important. In python 3, you can never ever assume
 > anything is ASCII any more.

Sure you can.  In Python program text, all keywords will be ASCII
(English, even, though it may be en_NL.UTF-8) for the forseeable
future.

I see no reason not to make a similar promise for numeric literals.  I
see no good reason to allow compatibility full-width Japanese "ASCII"
numerals or Arabic cursive numerals in "for i in range(...)" for
example.


I agree with you that numeric *literals* should be restricted to the 
ASCII digits. I don't think anyone here is arguing differently -- if 
they are, they should speak up and try to make the case for allowing 
numeric literals in arbitrary scripts. Python doesn't currently allow 
non-ASCII numeric literals, and even if such a change were desirable, it 
would run up against the moratorium. So let's just forget the specter of 
code like:


x = math.sqrt(١٢٣٤.٥٦ ** 一.一)

It ain't gonna happen :)


But I think there is a good case for allowing the constructors int, 
float and complex to continue to accept numeric *strings* with non-ASCII 
 digits. The code already exists, there's probably people out there who 
rely on it, and in the absence of any convincing demonstration that the 
existing behaviour is causing widespread difficulty, we should leave 
well-enough alone.


Various people have suggested that there should be a function in the 
locale module that handles numeric string input in non-ASCII digits. 
This is a de facto admission that there are use-cases for taking user 
input like the string '٣' and turning it into the int 3. Python can 
already do this, and has been able to for many years:


[st...@sylar ~]$ python2.4
Python 2.4.6 (#1, Mar 30 2009, 10:08:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> int(u'٣')
3

It seems to me that there's no need to move this functionality into locale.


--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Michael Foord

On 30/11/2010 06:33, Éric Araujo wrote:

Good morning python-dev,

PEP 291 (Backward Compatibility for Standard Library) does not seem to
take Python 3 into account.  Is this PEP only relevant for the 2.7
branch?*  If it’s supposed to apply to 3.x too, despite the view that
3.0 was a clean break, what does it mean to have a module that is
developed in the py3k branch and should retain compatibility with 2.3 or
1.5.2?


PEP 291 is very old and should probably be retired. I don't think anyone 
is maintaining standard libraries in py3k that are also compatible with 
Python 2.anything. (At least not in a single codebase.)


For Python 2.7 that may not be true, but for Python 3 I think we can 
start with a clean slate on compatibility.



* Tarek’s interpretation: “The 2.x needs to stay 2.3 compatible
   so we should keep the 3.x as similar as possible for bugfixes.”

In the particular case of distutils (should be compatible with 2.3), we
(including I) have been lax.  Our tests for example use modern unittest
features like skips, which makes them not runnable on old Pythons.
They can be run on old Pythons with unittest2. This is what distutils2 
is doing.



  I am
very uncomfortable with code that seems to run fine but which tests
(however few) cannot be run, so I think I’ll have to trade the skips for
old-style “return” statements.  The other way of solving that is to
change the compat policy.


This is only an issue for distutils in Python 2.7 right? Maintaining the 
compat policy for that will be a short-lived pain, and distutils itself 
is getting only infrequent bugfixes *anyway*, right? I defer to Tarek on 
that particular decision.


All the best,

Michael

If I remember correctly, the rationale for
code compat in distutils is that people may copy distutils from Python
x.y to their install of x.y-n; I don’t know if this is still an active
practice, and if it is, I don’t know if it should be supported,
considering that distutils2 (compatible with 2.4+ and available from
PyPI) is coming.

Regards

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Steven D'Aprano

haiyang kang wrote:

hi,

  I agree with this.

  I never seen any man in China using chinese number literals (at
least two kinds:一, 壹, same meaning with 1)
  in Python program, except UI output.

  They can do some mappings when want to output these non-ascii numbers.
  Example: if 1: print "一"

  I think it is a little ugly to have code like this: num =
float("一.一"), expected result is: num = 1.1


I don't expect that anyone would sensibly write code like that, except 
for testing. You wouldn't write num = float("1.1") instead of just

num = 1.1 either.

But you should be able to write:

text = input("Enter a number using your preferred digits: ")
num = float(text)

without caring whether the user enters 一.一 or 1.1 or something else.


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Emile Anclin
On Monday 29 November 2010 20:22:22 Brett Cannon wrote:
> 
> Considering these semantics changed between Python 2 and 3 w/o a
> discernable benefit (I would consider it a negative as finding a
> module should not be impacted by syntactic correctness; the full act
> of importing should be the only thing that cares about that), I would
> consider it a bug that should be filed.

ok, here it is :

http://bugs.python.org/issue10588

Since I did not understand all of it, I just quoted Brett Cannon
in the ticket.

-- 

Emile Anclin 
http://www.logilab.fr/   http://www.logilab.org/ 
Informatique scientifique & et gestion de connaissances
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 291 versus Python 3

2010-11-30 Thread Tarek Ziadé
On Tue, Nov 30, 2010 at 7:33 AM, Éric Araujo  wrote:
> Good morning python-dev,
>
> PEP 291 (Backward Compatibility for Standard Library) does not seem to
> take Python 3 into account.  Is this PEP only relevant for the 2.7
> branch?*  If it’s supposed to apply to 3.x too, despite the view that
> 3.0 was a clean break, what does it mean to have a module that is
> developed in the py3k branch and should retain compatibility with 2.3 or
> 1.5.2?
>
> * Tarek’s interpretation: “The 2.x needs to stay 2.3 compatible
>  so we should keep the 3.x as similar as possible for bugfixes.”
>
> In the particular case of distutils (should be compatible with 2.3), we
> (including I) have been lax.  Our tests for example use modern unittest
> features like skips, which makes them not runnable on old Pythons.  I am
> very uncomfortable with code that seems to run fine but which tests
> (however few) cannot be run, so I think I’ll have to trade the skips for
> old-style “return” statements.

You shouldn't be uncomfortable with the current state of distutils and
try to improve its tests (or improve any other nasty stuff you'll find
in that code)

Distutils is dead code. All we have to do is the bare minimum
maintenance. Everything else is a waste of time.

> The other way of solving that is to
> change the compat policy.  If I remember correctly, the rationale for
> code compat in distutils is that people may copy distutils from Python
> x.y to their install of x.y-n; I don’t know if this is still an active
> practice, and if it is, I don’t know if it should be supported,
> considering that distutils2 (compatible with 2.4+ and available from
> PyPI) is coming.

Again, don't worry about these rules in Distutils now. The only rule
that now apply to Distutils is that we do only bug fixing, and we
should not waste our precious time to do other stuff in there. Plain
python tests are fine for what we want to do and simplify our forward
ports and backports.  One thing we should do though, is fix those bugs
in Distutils2 first when they exist there too.

I really appreciate all the hard work your are doing in triaging the
issues and bug fixing by the way !

Tarek
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread haiyang kang
hi,

  I agree with this.

  I never seen any man in China using chinese number literals (at
least two kinds:一, 壹, same meaning with 1)
  in Python program, except UI output.

  They can do some mappings when want to output these non-ascii numbers.
  Example: if 1: print "一"

  I think it is a little ugly to have code like this: num =
float("一.一"), expected result is: num = 1.1

br,
khy

On Tue, Nov 30, 2010 at 4:23 PM, Stephen J. Turnbull  wrote:
> Lennart Regebro writes:
>
>  > *I* think it is more important. In python 3, you can never ever assume
>  > anything is ASCII any more.
>
> Sure you can.  In Python program text, all keywords will be ASCII
> (English, even, though it may be en_NL.UTF-8) for the forseeable
> future.
>
> I see no reason not to make a similar promise for numeric literals.  I
> see no good reason to allow compatibility full-width Japanese "ASCII"
> numerals or Arabic cursive numerals in "for i in range(...)" for
> example.
>
> As soon as somebody gives an example of a culture, however minor, that
> uses computers but actively prefers to use non-ASCII numerals to
> express numbers in an IT context, I'll review my thinking.  But at the
> moment it's 101% YAGNI.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/cornsea%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-30 Thread Sylvain Thénault
On 29 novembre 14:21, Ron Adam wrote:
> On 11/29/2010 01:22 PM, Brett Cannon wrote:
> >Considering these semantics changed between Python 2 and 3 w/o a
> >discernable benefit (I would consider it a negative as finding a
> >module should not be impacted by syntactic correctness; the full act
> >of importing should be the only thing that cares about that), I would
> >consider it a bug that should be filed.
> 
> The output of imp.find_module() returns an open file io object, and
> it's output feeds directly into to imp.load_module().
> 
> >>> imp.find_module('pydoc')
> (<_io.TextIOWrapper name=4 encoding='utf-8'>,
> '/usr/local/lib/python3.2/pydoc.py', ('.py', 'U', 1))
> 
> So I think the imp.find_module() is suppose to be used when you *do*
> want to do the full act of importing and not for just finding out if
> or where module xyz exists.

in python 2, find_module was usable for such usage, and this is a needed api
for a tool like pylint. Is there another way to do so with python 3?
-- 
Sylvain Thénault   LOGILAB, Paris (France)
Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations
Développement logiciel sur mesure:   http://www.logilab.fr/services
CubicWeb, the semantic web framework:http://www.cubicweb.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Stephen J. Turnbull
Lennart Regebro writes:

 > *I* think it is more important. In python 3, you can never ever assume
 > anything is ASCII any more.

Sure you can.  In Python program text, all keywords will be ASCII
(English, even, though it may be en_NL.UTF-8) for the forseeable
future.

I see no reason not to make a similar promise for numeric literals.  I
see no good reason to allow compatibility full-width Japanese "ASCII"
numerals or Arabic cursive numerals in "for i in range(...)" for
example.

As soon as somebody gives an example of a culture, however minor, that
uses computers but actively prefers to use non-ASCII numerals to
express numbers in an IT context, I'll review my thinking.  But at the
moment it's 101% YAGNI.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Hagen Fürstenau
>> During PEP 3003 discussion, it was suggested to handle it on a case by
>> case basis, but I don't see discussion of the upgrade to 6.0.0 in PEP
>> 3003.
> 
> It's covered by "As the standard library is not directly tied to the
> language definition it is not covered by this moratorium."

How is this restricted to the stdlib if it defines the set of valid
identifiers?

- Hagen

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-11-30 Thread Lennart Regebro
On Sun, Nov 28, 2010 at 21:24, Alexander Belopolsky
 wrote:
> While we have little choice but to follow UCD in defining
> str.isidentifier(), I think Python can promise users more stability in
> what it treats as space or as a digit in its builtins.

Why? I can see this is a problem if one character that earlier was
allowed no longer is. That breaks backwards compatibility. This
doesn't.

 float('١٢٣٤.٥٦')
> 1234.56
>
> is more important than to assure users that once their program
> accepted some text as a number, they can assume that the text is
> ASCII.

*I* think it is more important. In python 3, you can never ever assume
anything is ASCII any more. ASCII is practically dead an buried as far
as Python goes, unless you explicitly encode to it.

> def deposit(self, amountstr):
>   self.balance += float(amountstr)
>   audit_log("Deposited: " + amountstr)
>
> Auditor:
>
> $ cat numbered-account.log
> Deposited: ?.??

That log reasonably should be in UTF-8 or something else, in which
case this is not a problem. And that's ignoring that it makes way more
sense to log the numerical amount.

-- 
Lennart Regebro: http://regebro.wordpress.com/
Python 3 Porting: http://python3porting.com/
+33 661 58 14 64
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com