Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Terry Reedy wrote:
 On 11/30/2010 10:05 AM, Alexander Belopolsky wrote:
 
 My general answers to the questions you have raised are as follows:
 
 1. Each new feature release should use the latest version of the UCD as
 of the first beta release (or perhaps a week or so before). New chars
 are new features and the beta period can be used to (hopefully) iron out
 any bugs introduced by a new UCD version.

The UCD is versioned just like Python is, so if the Unicode Consortium
decides to ship a 5.2.1 version of the UCD, we can add that to Python 2.7.x,
since Python 2.7 started out with 5.2.0.

 2. The language specification should not be UCD version specific. Martin
 pointed out that the definition of identifiers was intentionally written
 to not be, bu referring to 'current version' or some such. On the other
 hand, the UCD version used should be programatically discoverable,
 perhaps as an attribute of sys or str.

It already is and has been for while, e.g.

Python 2.5:
 import unicodedata
 unicodedata.unidata_version
'4.1.0'

 3.. The UCD should not change in bugfix releases. New chars are new
 features. Adding them in bugfix releases will introduce gratuitous
 imcompatibilities between releases. People who want the latest Unicode
 should either upgrade to the latest Python version or patch an older
 version (but not expect core support for any problems that creates).

See above. Patch level revisions of the UCD are fine for patch level
releases of Python, since those patch level revisions of the UCD fix
bugs just like we do in Python.

Note that each new UCD major.minor version is a new standard on its
own, so it's perfectly ok to stick with one such standard version
per Python version.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 01 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Am 30.11.2010 21:24, schrieb Ben Finney:
 haiyang kang corn...@gmail.com writes:

   I think it is a little ugly to have code like this: num =
 float(一.一), expected result is: num = 1.1

 That's a straw man, though. The string need not be a literal in the
 program; it can be input to the program.

 num = float(input_from_the_external_world)

 Does that change your assessment of whether non-ASCII digits are used?
 
 I think the OP (haiyang kang) already indicated that he finds it quite
 unlikely that anybody would possibly want to enter that. You would need
 a number of key strokes to enter each individual ideograph, plus you
 have to press the keys for keyboard layout switching to enter the Latin
 decimal separator (which you normally wouldn't use along with the Han
 numerals).

That's a somewhat limited view, IMHO. Numbers are not always entered
using a computer keyboard, you have tool like cash registries, special
numeric keypads, scanners, OCR, etc. for external entry, and you also
have other programs producing such output, e.g. MS Office if configured
that way.

The argument with the decimal point doesn't work well either, since
it's obvious that float() and int() do not support localized input.

E.g. in Germany we write 3,141 instead of 3.141:

 float('3,141')
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: invalid literal for float(): 3,141

No surprise there. The localization of the input data, e.g. removal
of thousands separators and conversion of decimal marks to the dot,
have to be done by the application, just like you have to now for
German floating point number literals.

The locale module already has locale.atof() and locale.atoi() for
just this purpose.

FYI, here's a list of decimal digits supported by Python 2.7:

http://www.unicode.org/Public/5.2.0/ucd/extracted/DerivedNumericType.txt:

0030..0039; Decimal # Nd  [10] DIGIT ZERO..DIGIT NINE
0660..0669; Decimal # Nd  [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT 
NINE
06F0..06F9; Decimal # Nd  [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED 
ARABIC-INDIC DIGIT NINE
07C0..07C9; Decimal # Nd  [10] NKO DIGIT ZERO..NKO DIGIT NINE
0966..096F; Decimal # Nd  [10] DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE
09E6..09EF; Decimal # Nd  [10] BENGALI DIGIT ZERO..BENGALI DIGIT NINE
0A66..0A6F; Decimal # Nd  [10] GURMUKHI DIGIT ZERO..GURMUKHI DIGIT NINE
0AE6..0AEF; Decimal # Nd  [10] GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE
0B66..0B6F; Decimal # Nd  [10] ORIYA DIGIT ZERO..ORIYA DIGIT NINE
0BE6..0BEF; Decimal # Nd  [10] TAMIL DIGIT ZERO..TAMIL DIGIT NINE
0C66..0C6F; Decimal # Nd  [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE
0CE6..0CEF; Decimal # Nd  [10] KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0D66..0D6F; Decimal # Nd  [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
0E50..0E59; Decimal # Nd  [10] THAI DIGIT ZERO..THAI DIGIT NINE
0ED0..0ED9; Decimal # Nd  [10] LAO DIGIT ZERO..LAO DIGIT NINE
0F20..0F29; Decimal # Nd  [10] TIBETAN DIGIT ZERO..TIBETAN DIGIT NINE
1040..1049; Decimal # Nd  [10] MYANMAR DIGIT ZERO..MYANMAR DIGIT NINE
1090..1099; Decimal # Nd  [10] MYANMAR SHAN DIGIT ZERO..MYANMAR SHAN DIGIT 
NINE
17E0..17E9; Decimal # Nd  [10] KHMER DIGIT ZERO..KHMER DIGIT NINE
1810..1819; Decimal # Nd  [10] MONGOLIAN DIGIT ZERO..MONGOLIAN DIGIT NINE
1946..194F; Decimal # Nd  [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE
19D0..19DA; Decimal # Nd  [11] NEW TAI LUE DIGIT ZERO..NEW TAI LUE THAM 
DIGIT ONE
1A80..1A89; Decimal # Nd  [10] TAI THAM HORA DIGIT ZERO..TAI THAM HORA 
DIGIT NINE
1A90..1A99; Decimal # Nd  [10] TAI THAM THAM DIGIT ZERO..TAI THAM THAM 
DIGIT NINE
1B50..1B59; Decimal # Nd  [10] BALINESE DIGIT ZERO..BALINESE DIGIT NINE
1BB0..1BB9; Decimal # Nd  [10] SUNDANESE DIGIT ZERO..SUNDANESE DIGIT NINE
1C40..1C49; Decimal # Nd  [10] LEPCHA DIGIT ZERO..LEPCHA DIGIT NINE
1C50..1C59; Decimal # Nd  [10] OL CHIKI DIGIT ZERO..OL CHIKI DIGIT NINE
A620..A629; Decimal # Nd  [10] VAI DIGIT ZERO..VAI DIGIT NINE
A8D0..A8D9; Decimal # Nd  [10] SAURASHTRA DIGIT ZERO..SAURASHTRA DIGIT NINE
A900..A909; Decimal # Nd  [10] KAYAH LI DIGIT ZERO..KAYAH LI DIGIT NINE
A9D0..A9D9; Decimal # Nd  [10] JAVANESE DIGIT ZERO..JAVANESE DIGIT NINE
AA50..AA59; Decimal # Nd  [10] CHAM DIGIT ZERO..CHAM DIGIT NINE
ABF0..ABF9; Decimal # Nd  [10] MEETEI MAYEK DIGIT ZERO..MEETEI MAYEK DIGIT 
NINE
FF10..FF19; Decimal # Nd  [10] FULLWIDTH DIGIT ZERO..FULLWIDTH DIGIT NINE
104A0..104A9  ; Decimal # Nd  [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE
1D7CE..1D7FF  ; Decimal # Nd  [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL 
MONOSPACE DIGIT NINE


The Chinese and Japanese ideographs are not supported because of the
way they are defined in the Unihan database. I'm currently
investigating how we could support them as well.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  

Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread M.-A. Lemburg
Terry Reedy wrote:
 On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote:
 
 I see no reason not to make a similar promise for numeric literals.  I
 see no good reason to allow compatibility full-width Japanese ASCII
 numerals or Arabic cursive numerals in for i in range(...) for
 example.
 
 I do not think that anyone, at least not me, has argued for anything
 other than 0-9 digits (or 0-f for hex) in literals in program code. The
 only issue is whether non-programmer *users* should be able to use their
 native digits in applications in response to input prompts.

Me neither. This is solely about Python being able to parse numeric
input in the float(), int() and complex() constructors.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 01 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano

Martin v. Löwis wrote:

Am 30.11.2010 23:43, schrieb Terry Reedy:

On 11/30/2010 3:23 AM, Stephen J. Turnbull wrote:


I see no reason not to make a similar promise for numeric literals.  I
see no good reason to allow compatibility full-width Japanese ASCII
numerals or Arabic cursive numerals in for i in range(...) for
example.

I do not think that anyone, at least not me, has argued for anything
other than 0-9 digits (or 0-f for hex) in literals in program code. The
only issue is whether non-programmer *users* should be able to use their
native digits in applications in response to input prompts.


And here, my observation stands: if they wanted to, they currently
couldn't - at least not for real numbers (and also not for integers
if they want to use grouping). So the presumed application of this
feature doesn't actually work, despite the presence of the feature it
was supposedly meant to enable.


By that argument, English speakers wanting to enter integers using 
Arabic numerals can't either! I'd like to use grouping for large 
literals, if only I could think of a half-decent syntax, and if only 
Python supported it. This fails on both counts:


x = 123_456_789_012_345

The lack of grouping and the lack of a native decimal point doesn't mean 
that the feature doesn't work -- it merely means the feature requires 
some compromise before it can be used.


In the same way, if I wanted to enter a number using non-Arabic digits, 
it works provided I compromise by using the Anglo-American decimal point 
instead of the European comma or the native decimal point I might prefer.


The lack of support for non-dot decimal points is arguably a bug that 
should be fixed, not a reason to remove functionality.



--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-12-01 Thread Greg Ewing

Nick Coghlan wrote:


For the directory-as-module-not-package idea ...

 you would need to be very careful with it,

since all the files would be sharing a common globals() namespace.


One of the things I like about Python's module system
is that once I know which module a name was imported
from, I also know which file to look in for its
definition. If a module can be spread over several
files, that feature would be lost.

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-12-01 Thread Nick Coghlan
On Wed, Dec 1, 2010 at 8:22 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote:
 Nick Coghlan wrote:

 For the directory-as-module-not-package idea ...
 you would need to be very careful with it,
 since all the files would be sharing a common globals() namespace.

 One of the things I like about Python's module system
 is that once I know which module a name was imported
 from, I also know which file to look in for its
 definition. If a module can be spread over several
 files, that feature would be lost.

There are many potential problems with the idea, I just chose to
mention one of the ones that could easily make the affected code
*break* :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Lennart Regebro
On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull step...@xemacs.org wrote:
 Sure you can.  In Python program text, all keywords will be ASCII

Yes, yes, sure, but not the contents of variables,

 I see no reason not to make a similar promise for numeric literals.

Wait what, literas? The example was

 float('١٢٣٤.٥٦')

Which doesn't have any numeric literals in them at all. Do that work?
Nope, it's a syntax error. Too badm that would have been cool, but whatever.

Why would this be a problem:

 T1234 = float('١٢٣٤.٥٦')
 T1234
1234.56

But this OK?

 T١٢٣٤ = float('1234.56')
 T١٢٣٤
1234.56

I don't see that.


Should we bother to implement ١٢٣٤.٥٦ as a literal equivalent to
1234.56? Well, not unless somebody askes for it, or it turns out to be
easy. :-) But that's another question.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-12-01 Thread Ron Adam



On 12/01/2010 04:39 AM, Nick Coghlan wrote:

On Wed, Dec 1, 2010 at 8:22 PM, Greg Ewinggreg.ew...@canterbury.ac.nz  wrote:

Nick Coghlan wrote:


For the directory-as-module-not-package idea ...
you would need to be very careful with it,
since all the files would be sharing a common globals() namespace.


One of the things I like about Python's module system
is that once I know which module a name was imported
from, I also know which file to look in for its
definition. If a module can be spread over several
files, that feature would be lost.



There are many potential problems with the idea, I just chose to
mention one of the ones that could easily make the affected code
*break* :)


Right.  It would require additional pieces as well.

Ron :-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburg m...@egenix.com wrote:
..
 With Python 3.1:

 exec('\u0CF1 = 1')
 Traceback (most recent call last):
  File stdin, line 1, in module
  File string, line 1
    ೱ = 1
      ^
 SyntaxError: invalid character in identifier

 but with Python 3.2a4:

 exec('\u0CF1 = 1')
 eval('\u0CF1')
 1

 Such changes are not new, but I agree that they should probably
 be highlighted in the What's new in Python x.x.


As of today, What’s New In Python 3.2 [1] does not even mention the
unicodedata upgrade to 6.0.0.  Here are the features form the
unicode.org summary [2] that I think should be reflected in Python's
What's New document:


* adds 2,088 characters, including over 1,000 additional symbols—chief
among them the additional emoji symbols, which are especially
important for mobile phones;

* corrects character properties for existing characters including
 - a general category change to two Kannada characters (U+0CF1,
U+0CF2), which has the effect of making them newly eligible for
inclusion in identifiers;

 - a general category change to one New Tai Lue numeric character
(U+19DA), which would have the effect of disqualifying it from
inclusion in identifiers unless grandfathering measures are in place
for the defining identifier syntax.


The above may be too verbose for inclusion to What’s New In Python
3.2, but I think we should add a possibly shorter summary with a link
to unicode.org for details.

PS: Yes, I think everyone should know about the Python 3.2 killer
feature: ('\N{CAT FACE WITH WRY SMILE}'!

[1] http://docs.python.org/dev/whatsnew/3.2.html
[2] http://www.unicode.org/versions/Unicode6.0.0/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Terry Reedy

On 12/1/2010 12:55 PM, Alexander Belopolsky wrote:

On Sun, Nov 28, 2010 at 5:48 PM, M.-A. Lemburgm...@egenix.com  wrote:
..

With Python 3.1:


exec('\u0CF1 = 1')

Traceback (most recent call last):
  File stdin, line 1, inmodule
  File string, line 1
ೱ = 1
  ^
SyntaxError: invalid character in identifier

but with Python 3.2a4:


exec('\u0CF1 = 1')
eval('\u0CF1')

1


Such changes are not new, but I agree that they should probably
be highlighted in the What's new in Python x.x.



As of today, What’s New In Python 3.2 [1] does not even mention the
unicodedata upgrade to 6.0.0.  Here are the features form the
unicode.org summary [2] that I think should be reflected in Python's
What's New document:


* adds 2,088 characters, including over 1,000 additional symbols—chief
among them the additional emoji symbols, which are especially
important for mobile phones;

* corrects character properties for existing characters including
  - a general category change to two Kannada characters (U+0CF1,
U+0CF2), which has the effect of making them newly eligible for
inclusion in identifiers;

  - a general category change to one New Tai Lue numeric character
(U+19DA), which would have the effect of disqualifying it from
inclusion in identifiers unless grandfathering measures are in place
for the defining identifier syntax.




The above may be too verbose for inclusion to What’s New In Python
3.2,


I think those 11 lines are pretty good. Put them in
('\N{CAT FACE WITH WRY SMILE}'!

Plus give a link to Unicode site (Issue numbers are implicit links).

--
Terry Jan Reedy


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Brian Curtin
On Wed, Dec 1, 2010 at 12:51, Prashant Kumar contactprashan...@gmail.comwrote:

 Hello everyone. My name is Prashant. I and my friend Zubin recently
 ported 'Configobj'. It would be great if somebody can suggest about
 any utilities or scripts that are being widely used and need to be
 ported.


http://onpython3yet.com/ might be helpful to you. It orders the projects on
PyPI with the most dependencies which are not yet ported to 3.x.

Note that there are a number of false positives, e.g., the first result --
NumPy, since people don't seem to keep their classifiers up-to-date.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Massa, Harald Armin

 http://onpython3yet.com/ might be helpful to you. It orders the projects
 on PyPI with the most dependencies which are not yet ported to 3.x.

 Note that there are a number of false positives, e.g., the first result --
 NumPy, since people don't seem to keep their classifiers up-to-date.

 That could be a nice list. But quite disturbing content, as Python, the
programming language is stated as not being ported to 3.0. Does not
really provoke trust.


Harald


-- 
GHUM GmbH
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607

Amtsgericht Stuttgart, HRB 734971
-
persuadere.
et programmare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Antoine Pitrou
On Wed, 1 Dec 2010 13:02:00 -0600
Brian Curtin brian.cur...@gmail.com wrote:
 On Wed, Dec 1, 2010 at 12:51, Prashant Kumar 
 contactprashan...@gmail.comwrote:
 
  Hello everyone. My name is Prashant. I and my friend Zubin recently
  ported 'Configobj'. It would be great if somebody can suggest about
  any utilities or scripts that are being widely used and need to be
  ported.
 
 http://onpython3yet.com/ might be helpful to you. It orders the projects on
 PyPI with the most dependencies which are not yet ported to 3.x.

I don't know who did that page but it seems like there's some FUD there.

simplejson, ctypes, pysqlite and others are available in the 3.x
stdlib. Mercurial is a command-line tool and doesn't need to be ported
to be used for Python 3 projects. setuptools is supplanted by
distribute which should Python 3 compatible.

And I'm not sure what this package called Python is (“a high-level
object-oriented programming language”? like Java?), but I'm pretty sure
I've heard there's a Python 3 compatible version.

(granted, it's probably less FUD than stupid automation)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Brian Curtin
On Wed, Dec 1, 2010 at 13:17, Antoine Pitrou solip...@pitrou.net wrote:

 On Wed, 1 Dec 2010 13:02:00 -0600
 Brian Curtin brian.cur...@gmail.com wrote:
  On Wed, Dec 1, 2010 at 12:51, Prashant Kumar 
 contactprashan...@gmail.comwrote:
 
   Hello everyone. My name is Prashant. I and my friend Zubin recently
   ported 'Configobj'. It would be great if somebody can suggest about
   any utilities or scripts that are being widely used and need to be
   ported.
 
  http://onpython3yet.com/ might be helpful to you. It orders the projects
 on
  PyPI with the most dependencies which are not yet ported to 3.x.

 I don't know who did that page but it seems like there's some FUD there.

 simplejson, ctypes, pysqlite and others are available in the 3.x
 stdlib.


It grabs the info from their PyPI pages, which are probably not kept
up-to-date.

This was brought up at a local user group meeting and I think it can be a
useful tool, but as you can see it requires good input data which isn't
always the case for some packages.


Package authors: if you spent time making your project work on 3.x -- let
the world know, update your classifiers.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Deprecating undocumented, unused functions in difflib.

2010-12-01 Thread Terry Reedy

Difflib.SequenceMatcher object currently get two feature attributes:
self.isbjunk = junk.__contains__
self.isbpopular = popular.__contains__

Tim Peters agrees that the junk and popular sets should be directly 
exposed and documented as part of the api, thereby making the functions 
redundant. The two functions are not currently documented (and should 
not be now). A google codesearch of 'isbjunk' and 'isbpopular' only 
returns hits in difflib.py itself (and its predecessor, ndiff.py).


It would be easiest to just remove the two lines above.
Or should I define functions _xxx names that issue a deprecation warning 
and attach them as attributes to each object? (Defining instance methods 
would not be the same).


There is only one internal use of one of the two functions which is 
easily edited.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Preview] Comments and change proposals on documentation

2010-12-01 Thread Daniel da Silva
I think it looks great.!

If you are looking for some suggestions to make it a little more elegant:
1. If I delete a comment that has no children, it should remove it
completely (currently, it just replaces it with [deleted]). If there are
children, I think it is doing the right thing.
2. When I post a comment, it should automatically vote that comment up. I
wouldn't have posted it if I didn't like it.
3. As far as text formatting, I personally think there should be some
hilighting support for code spans/blocks (IMO that should match the idle
colors).

Also, I seemed to manage to trigger a visible system warning in my badly
formatted comment on math.fabs(x), :)


-Daniel
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
 And here, my observation stands: if they wanted to, they currently
 couldn't - at least not for real numbers (and also not for integers
 if they want to use grouping). So the presumed application of this
 feature doesn't actually work, despite the presence of the feature it
 was supposedly meant to enable.
 
 By that argument, English speakers wanting to enter integers using
 Arabic numerals can't either!

That's correct, and the key point here for the argument. It's just not
*meant* to support localized number forms, but deliberately constrains
them to a formal grammar which users using it must be aware of in order
to use it.

 I'd like to use grouping for large
 literals, if only I could think of a half-decent syntax, and if only
 Python supported it. This fails on both counts:
 
 x = 123_456_789_012_345

Here you are confusing issues, though: this fragment uses the syntax of
the Python programming language. Whether or not the syntax of the
float() constructor arguments matches that syntax is also a subject of
the debate.

I take it that you speak in favor of the float syntax also being used
for the float() constructor.

 The lack of grouping and the lack of a native decimal point doesn't mean
 that the feature doesn't work -- it merely means the feature requires
 some compromise before it can be used.

No, it means that the Python programming language syntax for floating
point numbers just doesn't take local notation into account *at all*.
This is not a flaw - it just means that this feature is non-existent.

Now, for the float() constructor, some people in this thread have
claimed that it *is* aimed at people who want to enter numbers in their
local spellings. I claim that this feature either doesn't work, or is
absent also.

 In the same way, if I wanted to enter a number using non-Arabic digits,
 it works provided I compromise by using the Anglo-American decimal point
 instead of the European comma or the native decimal point I might prefer.

Why would you want that, if, what you really wanted, could not be
done. There certainly *is* a way to convert strings into floats,
and there would be a way if that restricted itself to the digits 0..9.
So it can't be the mere desire to convert strings to float that make
you ask for non-ASCII digits.

 The lack of support for non-dot decimal points is arguably a bug that
 should be fixed, not a reason to remove functionality.

I keep repeating my two concerns:
a) if that was a feature, it is not specified at all in the
   documentation. In fact, the documentation was recently clarified
   to deny existence of that feature.
b) fixing it will be much more difficult than you apparently think.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
 I think the OP (haiyang kang) already indicated that he finds it quite
 unlikely that anybody would possibly want to enter that.
 
 Who's talking about *entering* it into the program at a keyboard
 directly, though? Input to a program can come from all kinds of crazy
 sources. Just because it wasn't typed by the person at the keyboard
 using this program doesn't stop it being input to the program.

I think haiyang kang claimed exactly that - it won't ever be input to a
program. I trust him on that - and so should you, unless you have
sufficient experience with the Chinese language and writing system.

 Note that I'm not saying this is common. Nor am I saying it's a
 desirable situation. I'm saying it is a feasible use case, to be
 dismissed only if there is strong evidence that it's not used by
 existing Python code.

And indeed, for the Chinese numerals, we have such strong evidence.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Martin v. Löwis
 As of today, What’s New In Python 3.2 [1] does not even mention the
 unicodedata upgrade to 6.0.0.

One reason was that I was instructed not to change What's New a few
years ago.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Martin v. Löwis
Am 01.12.2010 20:02, schrieb Brian Curtin:
 On Wed, Dec 1, 2010 at 12:51, Prashant Kumar
 contactprashan...@gmail.com mailto:contactprashan...@gmail.com wrote:
 
 Hello everyone. My name is Prashant. I and my friend Zubin recently
 ported 'Configobj'. It would be great if somebody can suggest about
 any utilities or scripts that are being widely used and need to be
 ported.
 
 
 http://onpython3yet.com/ might be helpful to you.

Another such list is at

http://www.python.org/3kpoll

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano

Martin v. Löwis wrote:

I think the OP (haiyang kang) already indicated that he finds it quite
unlikely that anybody would possibly want to enter that.

Who's talking about *entering* it into the program at a keyboard
directly, though? Input to a program can come from all kinds of crazy
sources. Just because it wasn't typed by the person at the keyboard
using this program doesn't stop it being input to the program.


I think haiyang kang claimed exactly that - it won't ever be input to a
program. I trust him on that - and so should you, unless you have
sufficient experience with the Chinese language and writing system.


Note that I'm not saying this is common. Nor am I saying it's a
desirable situation. I'm saying it is a feasible use case, to be
dismissed only if there is strong evidence that it's not used by
existing Python code.


And indeed, for the Chinese numerals, we have such strong evidence.


With full respect to haiyang kang, hear-say from one person can hardly 
be described as strong evidence -- particularly, as Alexander 
Belopolsky pointed out, the use-case described isn't currently supported 
by Python. Given that what haiyang kang describes *can't* be done, the 
fact that people don't do it is hardly surprising -- nor is it a good 
reason for taking away functionality that does exist.




--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 5:36 PM, Martin v. Löwis mar...@v.loewis.de wrote:
..
 Note that I'm not saying this is common. Nor am I saying it's a
 desirable situation. I'm saying it is a feasible use case, to be
 dismissed only if there is strong evidence that it's not used by
 existing Python code.

 And indeed, for the Chinese numerals, we have such strong evidence.


Indeed: it over 10 years that Python's int() accepted Arabic-Indic
numerals, nobody has complained that it *did not* accept Chinese.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Steven D'Aprano

Martin v. Löwis wrote:

And here, my observation stands: if they wanted to, they currently
couldn't - at least not for real numbers (and also not for integers
if they want to use grouping). So the presumed application of this
feature doesn't actually work, despite the presence of the feature it
was supposedly meant to enable.

By that argument, English speakers wanting to enter integers using
Arabic numerals can't either!


That's correct, and the key point here for the argument. It's just not
*meant* to support localized number forms, but deliberately constrains
them to a formal grammar which users using it must be aware of in order
to use it.


You're *agreeing* that English speakers can't enter integers using 
Arabic numerals? What do you think I'm doing when I do this?


 int(1234)
1234

Ah wait... did you think I meant Arabic numerals in the sense of digits 
used by Arabs in Arabia? I meant Arabic numerals as opposed to Roman 
numerals. Sorry for the confusion.


Your argument was that even though Python's int() supports many 
non-ASCII digits, the lack of grouping means that it doesn't actually 
work. If that argument were correct, then it applies equally to ASCII 
digits as well.


It's clearly nonsense to say that int(1234) doesn't work just 
because of the lack of grouping. It's equally nonsense to say that

int(١٢٣٤) doesn't work because of the lack of grouping.


[...]

I take it that you speak in favor of the float syntax also being used
for the float() constructor.


I'm sorry, I don't understand what you mean here. I've repeatedly said 
that the syntax for numeric literals should remain constrained to the 
ASCII digits, as it currently is.


n = ١٢٣٤

gives a SyntaxError, and I don't want to see that change.

But I've also argued that the float constructor currently accepts 
non-ASCII strings:


n = int(١٢٣٤)

we should continue to support the existing behaviour. None of the 
arguments against it seem convincing to me, particularly since the 
opponents of the current behaviour admit that there is a use-case for 
it, but they just want it to move elsewhere, such as the locale module.


We've even heard from one person -- I forget who, sorry -- who claimed 
that C++ has the same behaviour, and if you want ASCII-only digits, you 
have to explicitly ask for it.


For what it's worth, Microsoft warns developers not to assume users will 
enter numeric data using ASCII digits:


Number representation can also use non-ASCII native digits, so your 
application may encounter characters other than 0-9 as inputs. Avoid 
filtering on U+0030 through U+0039 to prevent frustration for users who 
are trying to enter data using non-ASCII digits.


http://msdn.microsoft.com/en-us/magazine/cc163506.aspx


There was a similar discussion going on in Perl-land recently:

http://www.nntp.perl.org/group/perl.perl5.porters/2010/07/msg162400.html

although, being Perl, the discussion was dominated by concerns about 
regexes and implicit conversions, rather than an explicit call to 
float() or int() as we are discussing here.



[...]

In the same way, if I wanted to enter a number using non-Arabic digits,
it works provided I compromise by using the Anglo-American decimal point
instead of the European comma or the native decimal point I might prefer.


Why would you want that, if, what you really wanted, could not be
done. There certainly *is* a way to convert strings into floats,
and there would be a way if that restricted itself to the digits 0..9.
So it can't be the mere desire to convert strings to float that make
you ask for non-ASCII digits.


Why do Europeans use programming languages that force them to use a dot 
instead of a comma for the decimal place? Why do I misspell 
string.centre as string.center? Because if you want to get something 
done, you use the tools you have and not the tools you'd like to have.





--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Lennart Regebro writes:
  On Tue, Nov 30, 2010 at 09:23, Stephen J. Turnbull step...@xemacs.org 
  wrote:
   Sure you can.  In Python program text, all keywords will be ASCII
  
  Yes, yes, sure, but not the contents of variables,

Irrelevant, you're not converting these to a string representation.
If you're generating numerals for internal use, I don't see why you
would want to do arithmetic on them; conversion is a YAGNI.  This is
only interesting to allow naive users to input in a comfortable way.

As yet there is no evidence that there are *any* such naive users, 1.3
billion of possibles are shut out, and at least two cultures which
use non-ASCII numerals every day, representing 1.3 billion naive users
(the coincidence of numbers is no coincidence), have reported that
nobody in their right mind would would *input* the numbers that way,
and at least for Japanese, the use cases are not really numeric anyway.

   I see no reason not to make a similar promise for numeric literals.
  
  Wait what, literas?

Sorry, my bad.

  Why would this be a problem:
  
   T1234 = float('.~~')
   T1234
  1234.56
  
  But this OK?
  
   T = float('1234.56')
   T
  1234.56

(Sorry, the Arabic is going to get munged, my mailer is beta and
somebody screwed up.)

Because the characters in the identifier are uninterpreted and have no
syntactic content other than their identity.  They're arbitrary.
That's not true of numerics.

Because that works, but

print(T1234)

doesn't (it prints ASCII).  You can't round-trip, but users will
want/expect that.

Because that works but this doesn't:

T1000 = float('一.◯◯◯')

Violates TOOWTDI.

If you're proposing to fix the numeric parsers, I still don't like it
but I could go to -0 on it.  However as Alexander points out and MAL
admits, it's apparently not so easy to do that.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 7:17 PM, Steven D'Aprano st...@pearwood.info wrote:
..
 we should continue to support the existing behaviour. None of the arguments
 against it seem convincing to me, particularly since the opponents of the
 current behaviour admit that there is a use-case for it, but they just want
 it to move elsewhere, such as the locale module.


I don't remember who made this argument, but I think you misunderstood
it.  The argument was that if there was a use case for parsing Eastern
Arabic numerals, it would be better served by a module written by
someone who speaks one of the Arabic languages and knows the details
of how  Eastern Arabic numerals are written.  So far nobody has even
claimed to know conclusively that Arabic-Indic digits are always
written left-to-right.

 unicodedata.bidirectional('٤')
'AN'

is not very helpful because it means any Arabic-Indic digit
according to unicode.org.  (To me, a special category hints that it
may be written in either direction and the proper interpretation may
depend on context.)   I have not seen a real use case reported in this
thread and for theoretical use cases, the current implementation is
either outright wrong or does not solve the problem completely. Given
that a function that replaces all Unicode digits in a string with 0-9
can be written in 3 lines of Python code, it is very unlikely that
anyone would prefer to rely on undocumented behavior of Python
builtins instead of having explicit control over parsing of their
data.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Steven D'Aprano writes:

  With full respect to haiyang kang, hear-say from one person can hardly 
  be described as strong evidence

That's *disrespectful* nonsense.  What Haiyang reported was not
hearsay, it's direct observation of what he sees around him and
personal experience, plus extrapolation.  Look up hearsay, please.

Furthermore, he provided good *objective* reason (excessive cost, to
which I can also testify, in several different input methods for
Japanese) why numbers simply would not be input that way.

What's left is copy/paste via the mouse.  I assure you, every day I
see dozens of Japanese copy/pasting *only* ASCII numerals, and the
sales figures for Microsoft Excel (not to mention the download numbers
for Open Office) strongly suggest that 30 million Japanese salarymen
are similarly dedicated to ASCII.  (That's not hearsay either,
that's direct observation and extrapolation, which is more than the
we need float to translate Arabic supporters can offer.)

I have seen only *one* use case: it's a toy for sophisticated
programmers who want to think of themselves as broadminded.  We've
seen several examples of that in this thread, so I can't deny that is
a real use case.

Please, give us just *one* more real use case that isn't somebody
might.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Michael Foord

On 01/12/2010 19:17, Antoine Pitrou wrote:

On Wed, 1 Dec 2010 13:02:00 -0600
Brian Curtinbrian.cur...@gmail.com  wrote:

On Wed, Dec 1, 2010 at 12:51, Prashant Kumarcontactprashan...@gmail.comwrote:


Hello everyone. My name is Prashant. I and my friend Zubin recently
ported 'Configobj'. It would be great if somebody can suggest about
any utilities or scripts that are being widely used and need to be
ported.

http://onpython3yet.com/ might be helpful to you. It orders the projects on
PyPI with the most dependencies which are not yet ported to 3.x.

I don't know who did that page but it seems like there's some FUD there.

simplejson, ctypes, pysqlite and others are available in the 3.x
stdlib. Mercurial is a command-line tool and doesn't need to be ported
to be used for Python 3 projects. setuptools is supplanted by
distribute which should Python 3 compatible.

And I'm not sure what this package called Python is (“a high-level
object-oriented programming language”? like Java?), but I'm pretty sure
I've heard there's a Python 3 compatible version.

(granted, it's probably less FUD than stupid automation)


From what I can tell it simply looks at dependencies and availability 
of those dependencies with a Python 3 trove classification. Some 
manual filtering may well be useful.


It is well *possible* that there are packages with a runtime dependency 
on libraries in mercurial however. Those would need mercurial porting to 
Python 3 if they are to run on Python 3. If they simply shell out to 
mercurial that wouldn't be the case.


Michael


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.

2010-12-01 Thread Nick Coghlan
On Thu, Dec 2, 2010 at 6:23 AM, Terry Reedy tjre...@udel.edu wrote:
 It would be easiest to just remove the two lines above.
 Or should I define functions _xxx names that issue a deprecation warning and
 attach them as attributes to each object? (Defining instance methods would
 not be the same).

Given that functions are converted to bound methods only on retrieval
from an instance, why wouldn't it be the same?

But yes, if you want to get rid of them, then deprecation for 3.2 and
removal in 3.3 is the way to go.

Alternatively, not deprecating them at all and just leaving them
undocumented with a comment in the source to say they have been
deliberately omitted from the docs would also be fine.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.

2010-12-01 Thread Michael Foord

On 01/12/2010 20:23, Terry Reedy wrote:

Difflib.SequenceMatcher object currently get two feature attributes:
self.isbjunk = junk.__contains__
self.isbpopular = popular.__contains__

Tim Peters agrees that the junk and popular sets should be directly 
exposed and documented as part of the api, thereby making the 
functions redundant. The two functions are not currently documented 
(and should not be now). A google codesearch of 'isbjunk' and 
'isbpopular' only returns hits in difflib.py itself (and its 
predecessor, ndiff.py).


It would be easiest to just remove the two lines above.
Or should I define functions _xxx names that issue a deprecation 
warning and attach them as attributes to each object? (Defining 
instance methods would not be the same).


There is only one internal use of one of the two functions which is 
easily edited.


I would still be tempted to go through a single release of deprecation. 
You can add a test that the names are gone if the version of Python is 
3.3. When the tests start failing the code and the tests can be ripped out.


All the best,

Michael

--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Ben Finney
Stephen J. Turnbull step...@xemacs.org writes:

 Furthermore, he provided good *objective* reason (excessive cost, to
 which I can also testify, in several different input methods for
 Japanese) why numbers simply would not be input that way.

 What's left is copy/paste via the mouse.

For direct entry by an interactive user, yes. Why are some people in
this discussion thinking only of direct entry by an interactive user?

Input to a program comes from various sources other than direct entry by
the interactive user, as has been pointed out many times.

 Please, give us just *one* more real use case that isn't somebody
 might.

Input from an existing text file, as I said earlier. Or any other way of
text data making its way into a Python program.

Direct entry at the console is a red herring.

-- 
 \   “First things first, but not necessarily in that order.” —The |
  `\  Doctor, _Doctor Who_ |
_o__)  |
Ben Finney

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.

2010-12-01 Thread Terry Reedy

On 12/1/2010 8:22 PM, Michael Foord wrote:


I would still be tempted to go through a single release of deprecation.
You can add a test that the names are gone if the version of Python is
3.3. When the tests start failing the code and the tests can be ripped out.


I was wondering how people remember...
It would be nice is there were instead a central place to 'deposit' 
simple future patches that just consist of removals


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Terry Reedy

On 12/1/2010 8:17 PM, Michael Foord wrote:


It is well *possible* that there are packages with a runtime dependency
on libraries in mercurial however. Those would need mercurial porting to
Python 3 if they are to run on Python 3. If they simply shell out to
mercurial that wouldn't be the case.


It would be nice is all the Python-coded tools needed to work on Python3 
ran on Python3, so one did not have to install 2.x just for that purpose 
;-).


Does Sphinx run on PY3 yet?

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 9:53 PM, Terry Reedy tjre...@udel.edu wrote:
..
 Does Sphinx run on PY3 yet?

It does, but see issue10224 for details.

 http://bugs.python.org/issue10224
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecating undocumented, unused functions in difflib.

2010-12-01 Thread Terry Reedy

On 12/1/2010 8:22 PM, Nick Coghlan wrote:

On Thu, Dec 2, 2010 at 6:23 AM, Terry Reedytjre...@udel.edu  wrote:

It would be easiest to just remove the two lines above.
Or should I define functions _xxx names that issue a deprecation warning and
attach them as attributes to each object? (Defining instance methods would
not be the same).


Given that functions are converted to bound methods only on retrieval
from an instance, why wouldn't it be the same?


The two SequenceMatcher instance attributes are bound functions of the 
two sets, not of the instance. But you are right in the sense that the 
usage would be the same. Since, as of a week ago, the sets were 
implemented as dicts, any code depending on the class of the underlying 
instance is already broken. So I will go with S-M methods and add a doc 
string like Undocumented, deprecated method that will disappear in 3.3. 
Do not use! to show up in a help() listing.



But yes, if you want to get rid of them, then deprecation for 3.2 and
removal in 3.3 is the way to go.


OK.


Alternatively, not deprecating them at all and just leaving them
undocumented with a comment in the source to say they have been
deliberately omitted from the docs would also be fine.


Too messy and too useless ;-).

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86924 - python/branches/py3k/Doc/library/random.rst

2010-12-01 Thread Nick Coghlan
On Thu, Dec 2, 2010 at 12:41 PM, raymond.hettinger
python-check...@python.org wrote:
 +A more general approach is to arrange the weights in a cumulative probability
 +distribution with :func:`itertools.accumulate`, and then locate the random 
 value
 +with :func:`bisect.bisect`::
 +
 +     choices, weights = zip(*weighted_choices)
 +     cumdist = list(itertools.accumulate(weights))
 +     x = random.random() * cumdist[-1]
 +     choices[bisect.bisect(cumdist, x)]
 +    'Blue'

Neat example, although it would be easier to follow if you broke that
last line into two pieces:

. random_index = bisect.bisect(cumdist, x)
. choices[random_index]
'Blue'

It took me a moment to remember how bisect.bisect worked, but it would
have been instant if the return value was assigned to an appropriately
named variable.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Terry Reedy

On 12/1/2010 7:44 PM, Alexander Belopolsky wrote:


it.  The argument was that if there was a use case for parsing Eastern
Arabic numerals, it would be better served by a module written by
someone who speaks one of the Arabic languages and knows the details
of how  Eastern Arabic numerals are written.  So far nobody has even
claimed to know conclusively that Arabic-Indic digits are always
written left-to-right.


Both my personal observations when travelling from Turkey to India and 
Wikipedia say yes. When representing a number in Arabic, the 
lowest-valued position is placed on the right, so the order of positions 
is the same as in left-to-right scripts.

https://secure.wikimedia.org/wikipedia/en/wiki/Arabic_language#Numerals

--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Alexander Belopolsky
On Wed, Dec 1, 2010 at 10:11 PM, Terry Reedy tjre...@udel.edu wrote:
 On 12/1/2010 7:44 PM, Alexander Belopolsky wrote:

 it.  The argument was that if there was a use case for parsing Eastern
 Arabic numerals, it would be better served by a module written by
 someone who speaks one of the Arabic languages and knows the details
 of how  Eastern Arabic numerals are written.  So far nobody has even
 claimed to know conclusively that Arabic-Indic digits are always
 written left-to-right.

 Both my personal observations when travelling from Turkey to India and
 Wikipedia say yes. When representing a number in Arabic, the lowest-valued
 position is placed on the right, so the order of positions is the same as in
 left-to-right scripts.
 https://secure.wikimedia.org/wikipedia/en/wiki/Arabic_language#Numerals

This matches my limited research on this topic as well.  However, I am
not sure that when these codes are embedded in Arabic text, their
logical order always matches their display order.  It seems to me that
it can go either way depending on the surrounding text and/or presence
of explicit formatting codes.  Also, I don't understand why Eastern
Arabic-Indic digits have the same Bidi-Class as European digits, but
Arabic-Indic digits, Arabic decimal and thousands separators have
Bidi-Class AN.

http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ICU

2010-12-01 Thread Alexander Belopolsky
On Tue, Nov 30, 2010 at 3:13 PM, Antoine Pitrou solip...@pitrou.net wrote:

 Oh, about ICU:

  Actually, I remember you saying that locale should ideally be replaced
  with a wrapper around the ICU library.

 By that, I stand - however, I have given up the hope that this will
 happen anytime soon.

 Perhaps this could be made a GSOC topic.


Incidentally, this may also address another Python's Achilles' heel:
the timezone support.

http://icu-project.org/download/icutzu.html
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Porting Ideas

2010-12-01 Thread Toshio Kuratomi
On Wed, Dec 01, 2010 at 10:06:24PM -0500, Alexander Belopolsky wrote:
 On Wed, Dec 1, 2010 at 9:53 PM, Terry Reedy tjre...@udel.edu wrote:
 ..
  Does Sphinx run on PY3 yet?
 
 It does, but see issue10224 for details.
 
  http://bugs.python.org/issue10224

Also, docutils has an unported module.

/me needs to write a bug report for that as he really doesn't have the time
he thought he did to perform the port.

-Toshio


pgplgIh22rxh1.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python and the Unicode Character Database

2010-12-01 Thread Stephen J. Turnbull
Ben Finney writes:

  Input from an existing text file, as I said earlier. Or any other way of
  text data making its way into a Python program.

  Direct entry at the console is a red herring.

I don't think it is.  Not at all.  Here's why: '''print %d %
some_integer''' doesn't now, and never will (unless Kristan gets his
Python 2.8wink), produce Arabic or Han numerals.  Not in any
language I know of, not in Microsoft Excel, and definitely not in
Python 2.  *Somebody* typed that text at some point.  If it's Han,
that somebody had *way* too much time on his hands, not a working
accountant nor a graduate assistant in a research lab for sure.

How about old archived texts, copied and recopied?  At least for
Japanese, old archival (text) data will *all* be in ASCII, because the
earliest implementations of Japanese language text used JIS X 0201 (or
its predecessor), which doesn't have Han digits (and kana digits don't
exist even if you write with a brush and ink AFAIK).  Ditto Arabic, I
would imagine; ISO 8859/6 (aka Latin/Arabic) does not contain the
Arabic digits that have been presented here earlier AFAICT.  Note that
there's plenty of space for them in that code table (eg, 0xB0-0xB9 is
empty).  Apparently nobody *ever* thought it was useful to have them!

So, which culture, using which script and in which application, inputs
numeric data in other than ASCII digits?  Or would want to, if only
somebody would tell them they can do it in Python?  Hearsay will do,
for starters.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com