[issue12741] Add function similar to shutil.move that does not overwrite

2011-08-14 Thread David Townshend

David Townshend aquavita...@gmail.com added the comment:

A bit of research has shown that the proposed implementation will not work 
either, so my next suggestion is something along the lines of

def move2(src, dst):
try:
os.link(src, dst)
except OSError as err:
# handle error appropriately, raise shutil.Error if dst exists,
# or use shutil.copy2 if dst is on a different filesystem.
pass
else:
os.unlink(src)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti ezio.melo...@gmail.com added the comment:

 It is simply a design error to pretend that the number of characters
 is the number of code units instead of code points.  A terrible and
 ugly one, but it does not mean you are UCS-2.

 If you are referring to the value returned by len(unicode_string), it
 is the number of code units.  This is a matter of practicality beats
 purity.  Returning the number of code units is O(1) (num_of_bytes/2).
 To calculate the number of characters it's instead necessary to scan
 all the string looking for surrogates and then count any surrogate
 pair as 1 character.  It was therefore decided that it was not worth
 to slow down the common case just to be 100% accurate in the
 uncommon case.

If speed is more important than correctness, I can make any algorithm
infinitely fast.  Given the choice between correct and quick, I will 
take correct every single time.

Plus your strings our immutable! You know how long they are and they 
never change.  Correctness comes at a negligible cost.  

It was a bad choice to return the wrong answer.

 That said it would be nice to have an API (maybe in unicodedata or as
 new str methods?) able to return the number of code units, code
 points, graphemes, etc, but I'm not sure that it should be the default
 behavior of len().

Always code points, never code units.  I even use a class whose length
method returns the grapheme count, because even code points aren't good
enough.  Yes of course graphemes have to be counted.  Big deal.   How 
would you like it if you said to move three to the left in vim and 
it *didn't* count each graphemes as one position?  Madness.

 The ugly terrible design error is digusting and wrong, just as much
 in Python as in Java, and perhaps moreso because of the idiocy of
 narrow builds even existing.

 Again, wide builds use twice as much the space than narrow ones, but
 one the other hand you can have fast and correct behavior with e.g.
 len().  If people don't care about/don't need to use non-BMP chars and
 would rather use less space, they can do so.  Until we agree that the
 difference in space used/speed is no longer relevant and/or that non-
 BMP characters become common enough to prefer the correct behavior
 over the fast-but-inaccurate one, we will probably keep both.

Which is why I always put loud warnings in my Unicode-related Python
programs that they do not work right on Unicode if running under
a narrow build.  I almost feel I should just exit.

 I haven't checked its UTF-16 codecs, but Python's UTF-8 codec is
 broken in a bunch of ways.  You should be raising as exception in
 all kinds of places and you aren't.

 I am aware of some problems of the UTF-8 codec on Python 2.  It used
 to follow RFC 2279 until last year and now it's been updated to follow
 RFC 3629.

Unicode says you can't put surrogates or noncharacters in a UTF-anything 
stream.  It's a bug to do so and pretend it's a UTF-whatever.

Perl has an encoding form, which it does not call UTF-8, that you 
can use the UTF-8 algorithm on for any code point, include non-characters
and surrogates and even non-Unicode code points far above 0x10_, up
to in fact 0x___ on 64-bit machines.  It's the internal
format we use in memory.  But we don't call it real UTF-8, either.

It sounds like this is the kind of thing that would be useful to you.

 However, for backward compatibility, it still encodes/decodes
 surrogate pairs.  This broken behavior has been kept because on Python
 2, you can encode every code point with UTF-8, and decode it back
 without errors:

No, that's not UTF-8 then.  By definition.  See the Unicode Standard.

 x = [unichr(c).encode('utf-8') for c in range(0x11)]


 and breaking this invariant would probably make more harm than good.

Why?  Create something called utf8-extended or utf8-lax or utf8-nonstrict
or something.  But you really can't call it UTF-8 and do that.  

We actually equate UTF-8 and utf8-strict.  Our internal extended
UTF-8 is something else.  It seems like you're still doing the old
relaxed version we used to have until 2003 or so.  It seems useful
to be able to have both flavors, the strict and the relaxed one,
and to call them different things.  

Perl defaults to the relaxed one, which gives warnings not exceptions,
if you do things like setting PERLUNICODE to S or SD and such for the
default I/I encoding.  If you actually use UTF-8 as the encoding on the 
stream, though, you
get the version that gives exceptions instead.  

UTF-8 = utf8-strict strictly by the standard, raises exceptions 
otherwise
utf8  loosely only, emits warnings on encoding 
illegal things

We currently only emit warnings or raise exceptions on I/O, not on chr
operations and such.  We used to raise exceptions on things like
chr(0xD800), but that was a mistake caused by misunderstanding the in-
memory requirements being 

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

 If speed is more important than correctness, I can make any algorithm
 infinitely fast.  Given the choice between correct and quick, I will 
 take correct every single time.

It's a trade-off.  Using non-BMP chars is fairly unusual (many real-world 
applications hardly use non-ASCII chars).  Slowing everything down just to 
allow non-BMP chars on narrow builds is not a good idea IMHO.  Wide builds can 
be used if one really wants len() and other methods to work properly with 
non-BMP chars.

 Plus your strings our immutable! You know how long they are and they 
 never change.  Correctness comes at a negligible cost. 

Sure, we can cache the len, but we still have to compute it at least once.  
Also it's not just len(), but many other operations like slicing that are 
affected.

 Unicode says you can't put surrogates or noncharacters in a 
 UTF-anything stream.  It's a bug to do so and pretend it's a 
 UTF-whatever.

The UTF-8 codec described by RFC 2279 didn't say so, so, since our codec was 
following RFC 2279, it was producing valid UTF-8.  With RFC 3629 a number of 
things changed in a non-backward compatible way.  Therefore we couldn't just 
change the behavior of the UTF-8 codec nor rename it to something else in 
Python 2.  We had to wait till Python 3 in order to fix it.

 Perl has an encoding form, which it does not call UTF-8, that you
 can use the UTF-8 algorithm on for any code point, include 
 non-characters and surrogates and even non-Unicode code points far
 above 0x10_, up to in fact 0x___ on 64-bit 
 machines.  It's the internal format we use in memory.  But we don't
 call it real UTF-8, either.

This sounds like RFC 2279 UTF-8.  It allowed up to 6 bytes (following the same 
encoding scheme) and had no restrictions about surrogates (at the time I think 
only BMP chars existed, so there were no surrogates and the Unicode consortium 
didn't decide that the limit was 0x10).

 It sounds like this is the kind of thing that would be useful to you.

I believe this is what the surrogateescape error handler does (up to 0x10).

 Why?  Create something called utf8-extended or utf8-lax or 
 utf8-nonstrict or something.  But you really can't call it UTF-8 and 
 do that. 

That's what we did in Python 3, but on Python 2 is too late to fix it, 
especially in a point release.  (Just to clarify, I don't think any of these 
things will be fixed in 2.7.  There won't be any 2.8, and major changes 
(especially backwards-incompatible ones) are unlikely to happen in a point 
release (e.g. 2.7.3), so it's better to focus on Python 3.  Minor bug fixes can 
still be done even in 2.7 though.)

 Perl defaults to the relaxed one, which gives warnings not exceptions,
 if you do things like setting PERLUNICODE to S or SD and such for the
 default I/I encoding.  If you actually use UTF-8 as the encoding on 
 the stream, though, you get the version that gives exceptions 
 instead.

In Python we don't usually use warnings for this kind of things (also we don't 
have things like use strict).

 I don't imagine most of the Python devel team knows Perl very well,
 and maybe not even Java or ICU.  So I get the idea that there isn't 
 as much awareness of Unicode in your team as there tends to be in
 those others.

I would say there are at least 5-10 Unicode experts in our team.  It might be 
true though that we don't always follow closely what other languages and the 
Unicode consortium do, but if people reports problem we are willing to fix them 
(so thanks for reporting them!).

 From my point of view, learning from other people's mistakes is a way
 to get ahead without incurring all the learning-bumps oneself, so if
 there's a way to do that for you, that could be to your benefit, and 
 I'm very happy to share some of our blunders so you can avoid them
 yourselves.

While I really appreciate the fact that you are sharing with us your 
experience, the solution found and applied in Perl might not always be the best 
one for Python (but it's still good to learn from others' mistakes).
For example I don't think removing the 0x10 upper limit is going to happen 
-- even if it might be useful for other things.
Also regular expressions are not part of the core and are not used that often, 
so I consider problems with narrow/wide builds, codecs and the unicode type 
much more important than problems with the re/regex module (they should be 
fixed too, but have lower priority IMHO).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12743] C API marshalling doc contains XXX

2011-08-14 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Would you just remove the XXX string, or the entire comment? XXX is 
typically used to indicate that something needs to be done, and the comment 
makes a clear statement as to what it is that needs to be done.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12743
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread Ned Deily

Ned Deily n...@acm.org added the comment:

Chances are that you used the python.org 2.7.2 64-bit/32-bit installer but you 
did not install the latest ActiveState Tcl, currently 8.5.10, as documented 
here:

http://www.python.org/download/mac/tcltk/

On OS X 10.6, there should have been a warning message about this in the IDLE 
shell window. The Apple-supplied Tcl/Tk 8.5 in both Mac OS X 10.6 and 10.7 have 
known problems as described in the web page above.  Please try with the latest 
ActiveState Tcl installed and reopen this issue if that does not resolve the 
problems you see.

--
assignee: ronaldoussoren - ned.deily
resolution:  - works for me
stage:  - committed/rejected
status: open - pending

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread hy

hy hoyeung...@gmail.com added the comment:

Thanks but the problem is not completely solved
I followed your instruction and I can now use mouse to click the menu to copy 
and paste without problems.
But it still halts when using keyboard to do so.
Is there a complete solution?

--
resolution: works for me - wont fix
status: pending - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Jeremy Kloth

Changes by Jeremy Kloth jeremy.kl...@gmail.com:


--
nosy: +jkloth

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Tom Christiansen

New submission from Tom Christiansen tchr...@perl.com:

On neither narrow nor wide builds does this UTF8-encoded bit run without 
raising an exception: 

   if re.search([풜-풵], 풞, re.UNICODE): 
   print(match 1 passed)
   else:
   print(match 2 failed)

The best you can possibly do is to use both a wide build *and* symbolic 
literals, in which case it will pass. But remove either of both of those 
conditions and you fail.  This is too restrictive for full Unicode use. 

There should never be any sitation where [a-z] fails to match c when a  c  z, 
and neither a nor z is something special in a character class.  There is, or 
perhaps should be, no difference at all between [a-z] and [풜-풵], just as 
there is, or at least should b, no difference between c and 풞. You can’t 
have second-class citizens like this that can't be used.

And no, this one is *not* fixed by Matthew Barnett's regex library. There is 
some dumb UCS-2 assumption lurking deep in Python somewhere that makes this 
break, even on wide builds, which is incomprehensible to me.

--
components: Regular Expressions
files: bigrange.py
messages: 142058
nosy: Arfrever, ezio.melotti, jkloth, mrabarnett, pitrou, r.david.murray, 
tchrist, terry.reedy
priority: normal
severity: normal
status: open
title: lib re cannot match non-BMP ranges (all versions, all builds)
type: behavior
versions: Python 3.2
Added file: http://bugs.python.org/file22897/bigrange.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

On a wide 2.7 and 3.3 all the 3 tests pass.

On a narrow 3.2 I get 
match 1 passed
Traceback (most recent call last):
  File /home/wolf/dev/py/3.2/Lib/functools.py, line 176, in wrapper
result = cache[key]
KeyError: (class 'str', '[풜-풵]', 32)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File bigrange.py, line 16, in module
if re.search([풜-풵], 풞, flags): 
  File /home/wolf/dev/py/3.2/Lib/re.py, line 158, in search
return _compile(pattern, flags).search(string)
  File /home/wolf/dev/py/3.2/Lib/re.py, line 255, in _compile
return _compile_typed(type(pattern), pattern, flags)
  File /home/wolf/dev/py/3.2/Lib/functools.py, line 180, in wrapper
result = user_function(*args, **kwds)
  File /home/wolf/dev/py/3.2/Lib/re.py, line 267, in _compile_typed
return sre_compile.compile(pattern, flags)
  File /home/wolf/dev/py/3.2/Lib/sre_compile.py, line 491, in compile
p = sre_parse.parse(p, flags)
  File /home/wolf/dev/py/3.2/Lib/sre_parse.py, line 692, in parse
p = _parse_sub(source, pattern, 0)
  File /home/wolf/dev/py/3.2/Lib/sre_parse.py, line 315, in _parse_sub
itemsappend(_parse(source, state))
  File /home/wolf/dev/py/3.2/Lib/sre_parse.py, line 461, in _parse
raise error(bad character range)
sre_constants.error: bad character range

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

On wide 3.2 it passes too, so the failure is limited to narrow builds (are you 
sure that it fails on wide builds for you?).

On a narrow 2.7 I get a slightly different error though:

match 1 passed
Traceback (most recent call last):
  File bigrange.py, line 16, in module
if re.search([풜-풵], 풞, flags): 
  File /home/wolf/dev/py/2.7/Lib/re.py, line 142, in search
return _compile(pattern, flags).search(string)
  File /home/wolf/dev/py/2.7/Lib/re.py, line 244, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

I haven't looked at the code, but I think that the re module is just trying to 
calculate the range between the low surrogate of 풜 and the high surrogate of 풵.
If this is the case, this is the usual bug that narrow builds have.

Also note that re.search(u[\N{MATHEMATICAL SCRIPT CAPITAL A}-\N{MATHEMATICAL 
SCRIPT CAPITAL Z}].encode('utf-8'), u\N{MATHEMATICAL SCRIPT CAPITAL 
C}.encode('utf-8'), re.UNICODE)
works, but it returns a wrong result.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread Ned Deily

Ned Deily n...@acm.org added the comment:

That is encouraging.  This is almost certainly a problem with Tk.  The Cocoa 
Tcl/Tk 8.5 used by Apple and ActiveState has been known to have issues with 
composite characters.  There are a couple of IDLE things to ask about first.  
Have you made any Custom Key Bindings for IDLE?  Or added any IDLE extensions?  
Both of these would show up in your ~/.idlerc directory.

On to Tk-related questions:  Which OS X keyboard layout are you using?  Are you 
using any Input Methods?  (Both of these options are shown in System 
Preferences.)  What keystrokes are used for the menu shortcuts that cause the 
hang?  And, by hang, you mean that menu item changes color indicating that it 
is selected but IDLE freezes at that point?

If you have the time and feel comfortable doing so, it would be helpful to know 
if the same problems are displayed using the older Carbon Tcl/Tk 8.4.  You 
could temporarily move your current 2.7 installation out of the way by doing 
this in a Terminal shell:

cd /Library/Frameworks/Python.framework/Versions
sudo mv 2.7 2.7-SAVED
cd /Applications
sudo mv Python\ 2.7 Python\ 2.7-SAVED

and then downloading and installing the 32-bit-only (10.3+) 2.7.2 installer 
from python.org.  It is not necessary to install an ActiveState Tcl/Tk 8.4 for 
this.  Note that if you have migrated to OS X 10.7 already, you probably will 
not want to stay with this version because it is not easy with Xcode 4 to 
install third-party Python packages that require building C extension modules.  
You can restore your previous Python by:

cd /Library/Frameworks/Python.framework/Versions
sudo mv 2.7-SAVED 2.7
cd /Applications
sudo mv Python\ 2.7-SAVED Python\ 2.7

--
resolution: wont fix - 
stage: committed/rejected - 

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

The error on 3.2 comes from the lru_cache, here's a minimal testcase to 
reproduce it:
 from functools import lru_cache
 @lru_cache()
... def func(arg): raise ValueError()
... 
 func(3)
Traceback (most recent call last):
  File /home/wolf/dev/py/3.2/Lib/functools.py, line 176, in wrapper
result = cache[key]
KeyError: (3,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File stdin, line 1, in module
  File /home/wolf/dev/py/3.2/Lib/functools.py, line 180, in wrapper
result = user_function(*args, **kwds)
  File stdin, line 2, in func
ValueError


Raymond, is this expected or should I open another issue?

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti rep...@bugs.python.org wrote
   on Sun, 14 Aug 2011 07:15:09 -:

 Unicode says you can't put surrogates or noncharacters in a
 UTF-anything stream.  It's a bug to do so and pretend it's a
 UTF-whatever.

 The UTF-8 codec described by RFC 2279 didn't say so, so, since our
 codec was following RFC 2279, it was producing valid UTF-8.  With RFC
 3629 a number of things changed in a non-backward compatible way.
 Therefore we couldn't just change the behavior of the UTF-8 codec nor
 rename it to something else in Python 2.  We had to wait till Python 3
 in order to fix it.

I'm a bit confused on this.  You no longer fix bugs in Python 2?

I've dug out the references that state that you are not allowed to do things the
way you are doing them.  This is from the published Unicode Standard version 
6.0.0,
chapter 3, Conformance.  It is a very important chapter.

http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf

Python is in violation of that published Standard by interpreting noncharacter 
code
points as abstract characters and tolerating them in character encoding forms 
like
UTF-8 or UTF-16.  This explains that conformant processes are forbidden from 
doing this.

Code Points Unassigned to Abstract Characters

 C1 A process shall not interpret a high-surrogate code point or a 
low-surrogate code point
 as an abstract character.
   · The high-surrogate and low-surrogate code points are designated for 
surrogate
 code units in the UTF-16 character encoding form. They are unassigned 
to any
 abstract character.

==  C2 A process shall not interpret a noncharacter code point as an abstract 
character.
   · The noncharacter code points may be used internally, such as for 
sentinel val-
 ues or delimiters, but should not be exchanged publicly.

 C3 A process shall not interpret an unassigned code point as an abstract 
character.
   · This clause does not preclude the assignment of certain generic 
semantics to
 unassigned code points (for example, rendering with a glyph to 
indicate the
 position within a character block) that allow for graceful behavior in 
the pres-
 ence of code points that are outside a supported subset.
   · Unassigned code points may have default property values. (See D26.)
   · Code points whose use has not yet been designated may be assigned to 
abstract
 characters in future versions of the standard. Because of this fact, 
due care in
 the handling of generic semantics for such code points is likely to 
provide better
 robustness for implementations that may encounter data based on future 
ver-
 sions of the standard.

Next we have exactly how something you call UTF-{8,16-32} must be formed.
*This* is the Standard against which these things are measured; it is not the 
RFC.

You are of course perfectly free to say you conform to this and that RFC, but 
you
must not say you conform to the Unicode Standard when you don't.  These are 
different
things.  I feel it does users a grave disservice to ignore the Unicode Standard 
in
this, and sheer casuistry to rely on an RFC definition while ignoring the 
Unicode
Standard whence it originated, because this borders on being intentionally 
misleading.

Character Encoding Forms

 C8 When a process interprets a code unit sequence which purports to be in 
a Unicode char-
 acter encoding form, it shall interpret that code unit sequence 
according to the corre-
 sponding code point sequence.
==· The specification of the code unit sequences for UTF-8 is given in D92.
   · The specification of the code unit sequences for UTF-16 is given in 
D91.
   · The specification of the code unit sequences for UTF-32 is given in 
D90.

 C9 When a process generates a code unit sequence which purports to be in a 
Unicode char-
 acter encoding form, it shall not emit ill-formed code unit sequences.
   · The definition of each Unicode character encoding form specifies the 
ill-
 formed code unit sequences in the character encoding form. For 
example, the
 definition of UTF-8 (D92) specifies that code unit sequences such as 
C0 AF
 are ill-formed.

== C10 When a process interprets a code unit sequence which purports to be in 
a Unicode char-
 acter encoding form, it shall treat ill-formed code unit sequences as 
an error condition
 and shall not interpret such sequences as characters.
   · For example, in UTF-8 every code unit of the form 1102 must be 
followed
 by a code unit of the form 10xx2. A sequence such as 110x2 
0xxx2
 is ill-formed and must never be generated. When faced with this 
ill-formed
 code unit sequence while transforming or interpreting text, a 
conformant pro-
 cess must treat the first code unit 110x2 as an 

[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti ezio.melo...@gmail.com added the comment:

On wide 3.2 it passes too, so the failure is limited to narrow builds (are =
you sure that it fails on wide builds for you?).

You're right: my wide build is not Python3, just Python2.  In fact,
it's even worse, because it's the stock build on Linux, which seems
on this machine to be 2.6 not 2.7.

I have private builds that are 2.7 and 3.2, but those are both narrow.
I do not have a 3.3 build.  Should I?

I'm remembering why I removed Python2 from my Unicode talk, because
of how it made me pull my hair out.  People at the talk wanted to know
what I meant, but I didn't have time to go into it.  I think this
gets added to the hairpulling list.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti rep...@bugs.python.org wrote
   on Sun, 14 Aug 2011 07:15:09 -: 

 For example I don't think removing the 0x10 upper limit is going to
 happen -- even if it might be useful for other things. 

I agree entirely.  That's why I appended a triple exclamation point to where I
said I certainly do not expect this.  It can only work fully on UTF-8ish systems
and up to 32 bits on UTF-32, and it is most emphatically *not* Unicode.  Yes,
there are things you can do with it, but it risks serious misunderstanding and
even noncomformance if not done very carefully.  The Standard does not forbid
such things internally, but you are not allowed to pass them around in
noninternal streams claiming they are real UTF streams.

 Also regular expressions are not part of the core and are not used
 that often, so I consider problems with narrow/wide builds, codecs and
 the unicode type much more important than problems with the re/regex
 module (they should be fixed too, but have lower priority IMHO).

One advantage of having an external library is the ability to update
it asynchronously.  Another is the possibility to swap in out altogether.
Perl only gained that ability, which Python has always had, some four
years ago with its 5.10 release.  To my knowledge, the only thing people
tend to use this for is to get Russ Cox's re2 library, which has very
different performance characteristics and guarantees that allow it to 
be used in potential starvation denial-of-service situations that the
normal Perl, Python, Java, etc regex engine cannot be safely used for.

-tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

 You're right: my wide build is not Python3, just Python2.

And is it failing?  Here the tests pass on the wide builds, on both Python 2 
and 3.

 In fact, it's even worse, because it's the stock build on Linux, 
 which seems on this machine to be 2.6 not 2.7.

What is worse?  FWIW on my system the default `python` is a 2.7 wide. `python3` 
is a 3.2 wide.

 I have private builds that are 2.7 and 3.2, but those are both narrow.
 I do not have a 3.3 build.  Should I?

3.3 is the version in development, not released yet.  If you have an HG clone 
of Python you can make a wide build of 3.x with ./configure --with-wide-unicode 
andof 2.7 using ./configure --enable-unicode=ucs4.

 I'm remembering why I removed Python2 from my Unicode talk, because
 of how it made me pull my hair out.  People at the talk wanted to know
 what I meant, but I didn't have time to go into it.  I think this
 gets added to the hairpulling list.

I'm not sure what you are referring to here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I have private builds that are 2.7 and 3.2, but those are both narrow.
 I do not have a 3.3 build.  Should I?

I don't know if you *should*. But you can make one easily by passing
--with-wide-unicode to ./configure.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

  The UTF-8 codec described by RFC 2279 didn't say so, so, since our
  codec was following RFC 2279, it was producing valid UTF-8.  With RFC
  3629 a number of things changed in a non-backward compatible way.
  Therefore we couldn't just change the behavior of the UTF-8 codec nor
  rename it to something else in Python 2.  We had to wait till Python 3
  in order to fix it.
 
 I'm a bit confused on this.  You no longer fix bugs in Python 2?

In general, we try not to introduce changes that have a high probability
of breaking existing code, especially when what is being fixed is a
minor issue which almost nobody complains about.

This is even truer for stable branches, and Python 2 is very much a
stable branch now (no more feature releases after 2.7).

 That's why I say that you are of conformance by having encoders and decoders 
 of UTF
 streams tolerate noncharacters.  You are not allowed to call something a UTF 
 and do
 non-UTF things with it, because this in violation of conformance requirement 
 C2.

Perhaps, but it is not Python's fault if the IETF and the Unicode
consortium have disagreed on what UTF-8 should be. I'm not sure what
people called UTF-8 when support for it was first introduced in
Python, but you can't blame us for maintaining a consistent behaviour
across releases.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

 I'm a bit confused on this.  You no longer fix bugs in Python 2?

We do, but it's unlikely that we will introduce major changes in behavior.
Even if we had to get rid of narrow builds and/or fix len(), we would probably 
only do it in the next 3.x version (i.e. 3.3), and not in the next bug fix 
release of 3.2 (i.e. 3.2.2).

 That's why I say that you are of conformance by having encoders and
 decoders of UTF streams tolerate noncharacters.  You are not allowed
 to call something a UTF and do non-UTF things with it, because this
 in violation of conformance requirement C2.

This IMHO should be fixed, but it's another issue.

 If you have not reread its Chapter 3 of late in its entirety, you
 probably want to do so.  There is quite a bit of material there that
 is fundamental to any process that claims to be conformant with
 the Unicode Standard.

I am familiar with the Chapter 3, but admittedly I only read the parts that 
were relevant to the bugs I was fixing.  I never went through it checking that 
everything in Python matches the described behavior.
Thanks for pointing out the parts were Python doesn't follow the specs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12266] str.capitalize contradicts oneself

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Attached patch + tests.

--
keywords: +patch
Added file: http://bugs.python.org/file22898/issue12266.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12266
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12611] 2to3 crashes when converting doctest using reduce()

2011-08-14 Thread Catalin Iacob

Catalin Iacob iacobcata...@gmail.com added the comment:

I looked at this and understood why it's happening. I don't know exactly how to 
fix it though, so here's what I found out.

When a doctest appears in a docstring at line n in a file, 
RefactorTool.parse_block will return a tree corresponding to n - 1 newline 
characters followed by the code in the doctest. That tree is refactored by 
RefactoringTool.refactor_tree which usually returns n - 1 newline characters 
followed by the refactored doctest. However, for the reduce fixer, the tree 
returned by refactor_tree starts with from functools import reduce followed by 
n - 1 newline characters and then the doctest reduce line. The failing assert 
happens when stripping those newlines because they are expected to be at the 
beginning of the output while in reality they're after the import line.

So the problem is a mismatch between the expectations of the doctest machinery 
(refactoring code that starts with some newlines results in code that starts 
with the same number of newlines) and the reduce fixer which adds an import, 
imports are added at the beginning of the file, therefore something appears 
before the newlines. Other fixers could exhibit the same problem.

--
nosy: +catalin.iacob

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12611
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Stefan Krah

Stefan Krah stefan-use...@bytereef.org added the comment:

I like random tests in the stdlib, otherwise the same thing gets tested
over and over again. `make buildbottest` prints the seed, and you can do
it for a single test as well:

 $ ./python -m test -r test_heapq
Using random seed 5857004
[1/1] test_heapq
1 test OK.


It looks like the choice is between s.nmembers and len(s). I thought
about len(s), but since Struct.pack() returns a bytes object, this
might be confusing.

Struct.arity may be another option. This also reflects that pack()
will be an n-ary function for the given format string (and that
Struct is a packing object, not really a struct itself).


Still, probably I'm +0.5 on 'nmembers' compared to the other options.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 It looks like the choice is between s.nmembers and len(s). I thought
 about len(s), but since Struct.pack() returns a bytes object, this
 might be confusing.

I agree there's a risk of confusion between len()-number-of-elements and
size()-number-of-bytes.
We have a similar confusion with the memoryview object and in retrospect
it's often quite misleading.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti rep...@bugs.python.org wrote on Sun, 14 Aug 2011 17:15:52 -: 

 You're right: my wide build is not Python3, just Python2.

 And is it failing?  Here the tests pass on the wide builds, on both Python 2 
 and 3.

Perhaps I am doing something wrong?

linux% python --version
Python 2.6.2

linux% python -c 'import sys; print sys.maxunicode'
1114111

linux% cat -n bigrange.py
 1  #!/usr/bin/env python
 2  # -*- coding: UTF-8 -*-
 3  
 4  from __future__ import print_function
 5  from __future__ import unicode_literals
 6  
 7  import re
 8  
 9  flags = re.UNICODE
10  
11  if re.search([a-z], c, flags): 
12  print(match 1 passed)
13  else:
14  print(match 1 failed)
15  
16  if re.search([풜-풵], 풞, flags): 
17  print(match 2 passed)
18  else:
19  print(match 2 failed)
20  
21  if re.search([\U0001D49C-\U0001D4B5], \U0001D49E, flags): 
22  print(match 3 passed)
23  else:
24  print(match 3 failed)
25  
26  if re.search([\N{MATHEMATICAL SCRIPT CAPITAL A}-\N{MATHEMATICAL SCRIPT 
CAPITAL Z}],
27  \N{MATHEMATICAL SCRIPT CAPITAL C}, flags): 
28  print(match 4 passed)
29  else:
30  print(match 4 failed)

linux% python bigrange.py
match 1 passed
Traceback (most recent call last):
  File bigrange.py, line 16, in module
if re.search([풜-풵], 풞, flags): 
  File /usr/lib64/python2.6/re.py, line 142, in search
return _compile(pattern, flags).search(string)
  File /usr/lib64/python2.6/re.py, line 245, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range

 In fact, it's even worse, because it's the stock build on Linux, 
 which seems on this machine to be 2.6 not 2.7.

 What is worse?  FWIW on my system the default `python` is a 2.7 wide. 
 `python3` is a 3.2 wide.

I meant that it was running 2.6 not 2.7.  

 I have private builds that are 2.7 and 3.2, but those are both narrow.
 I do not have a 3.3 build.  Should I?

 3.3 is the version in development, not released yet.  If you have an
 HG clone of Python you can make a wide build of 3.x with ./configure
 --with-wide-unicode andof 2.7 using ./configure --enable-
 unicode=ucs4.

And Antoine Pitrou pit...@free.fr wrote:

 I have private builds that are 2.7 and 3.2, but those are both narrow.
 I do not have a 3.3 build.  Should I?

 I don't know if you *should*. But you can make one easily by passing
 --with-wide-unicode to ./configure.

Oh good.  I need to read configure --help more carefully next time.
I have to some Lucene work this afternoon, so I can let several builds
chug along.  

Is there a way to easily have these co-exist on the same system?  I'm sure
I have to rebuild all C extensions for the new builds, but I wonder what to
about (for example) /usr/local/lib/python3.2 being able to be only one of
narrow or wide.  Probably I just to go reading the configure stuff better
for alternate paths.  Unsure.  

Variant Perl builds can coexist on the same system with some directories
shared and others not, but I often find other systems aren't quite that
flexible, usually requiring their own dedicated trees.  Manpaths can get
tricky, too.

 I'm remembering why I removed Python2 from my Unicode talk, because
 of how it made me pull my hair out.  People at the talk wanted to know
 what I meant, but I didn't have time to go into it.  I think this
 gets added to the hairpulling list.

 I'm not sure what you are referring to here.

There seem to many more things to get wrong with Unicode in v2 than in v3.

I don't know how much of this just my slowness at ramping up the learning
curve, how much is due to historical defaults that don't work well for 
Unicode, and how much is 

Python2:

re.search(u[\N{MATHEMATICAL SCRIPT CAPITAL A}-\N{MATHEMATICAL SCRIPT 
CAPITAL Z}].encode('utf-8'), 
   u\N{MATHEMATICAL SCRIPT CAPITAL C}.encode('utf-8'), re.UNICODE)

Python3:

re.search([\N{MATHEMATICAL SCRIPT CAPITAL A}-\N{MATHEMATICAL SCRIPT 
CAPITAL Z}],
   \N{MATHEMATICAL SCRIPT CAPITAL C}, re.UNICODE)

The Python2 version is *much* noisier.  

(1) You have keep remembering to u... everything because neither
# -*- coding: UTF-8 -*-
nor even
from __future__ import unicode_literals
suffices.  

(2) You have to manually encode every string, which is utterly bizarre to me.

(3) Plus you then have turn around and tell re, Hey by the way, you know those
Unicode strings I just passed you?  Those are Unicode strings, you know.
Like it couldn't tell that already by realizing it got Unicode not byte 
strings.  So weird.

It's a very awkward model.  Compare Perl's

   \N{MATHEMATICAL SCRIPT CAPITAL C} =~ /\N{MATHEMATICAL SCRIPT CAPITAL 
A}-\N{MATHEMATICAL SCRIPT CAPITAL Z}]/

That's the kind of thing I'm used 

[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Ezio Melotti rep...@bugs.python.org wrote
   on Sun, 14 Aug 2011 17:46:55 -: 

 I'm a bit confused on this.  You no longer fix bugs in Python 2?

 We do, but it's unlikely that we will introduce major changes in behavior.

 Even if we had to get rid of narrow builds and/or fix len(), we would
 probably only do it in the next 3.x version (i.e. 3.3), and not in the
 next bug fix release of 3.2 (i.e. 3.2.2).

Antoine Pitrou rep...@bugs.python.org wrote
   on Sun, 14 Aug 2011 17:36:42 -:

 This is even truer for stable branches, and Python 2 is very much a
 stable branch now (no more feature releases after 2.7).

Does that mean you now go to 2.7.1, 2.7.2, etc?

I had thought that 2.6 was going to be the last, but then 2.7
ame out.  I think I remember Guido said something about there 
never being a 2.10, so I wasn't too surprised to see 2.7.  

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

On a narrow build, \N{MATHEMATICAL SCRIPT CAPITAL A} is stored as 2 code 
units, and neither re nor regex recombine them when compiling a regex or 
looking for a match.

regex supports \xNN, \u and \U and \N{XYZ} itself, so they can be 
used in a raw string literal, but it doesn't recombine code units.

I could add recombination to regex at some point if time has passed and no 
further progress has been made in the language's support for Unicode.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

 Perhaps I am doing something wrong?

That's weird, I tried on a wide Python 2.6.6 too and it works even there.  
Maybe a bug that got fixed between 2.6.2 and 2.6.6?  Or maybe something else?

 Is there a way to easily have these co-exist on the same system?

Here I have different HG clones, one for each release (2.7, 3.2, 3.3), and I 
run ./configure (--with-wide-unicode)  make -j2.  Then I just run ./python 
from there without installing it in the system.
You might do the same or look at make altinstall.  If you run make install 
it will install it as the default Python, so that's probably what you want.  
Another option is to use virtualenv.

 The Python2 version is *much* noisier.  

Yes, Python 3 fixed many of these things and it's a much cleaner language.

 (1) You have keep remembering to u... everything because neither
# -*- coding: UTF-8 -*-
nor even
from __future__ import unicode_literals
suffices.  

Before Unicode Python only had plain (byte)strings, when Unicode strings were 
introduced the u... syntax was chosen to distinguish them.  On Python 3, 
... is a Unicode string, whereas b... is used for bytes.
# -*- coding: UTF-8 -*- is only about the encoding used to save the file, and 
doesn't affect other things.  Also this is the default on Python 3 so it's not 
necessary anymore (it's ASCII (or iso-8859-1?) on Python2).
from __future__ import unicode_literals allows you to use ... and b... 
instead of u... and ... on Python 2.  In my example I used u... to be 
explicit and because I was running from the terminal without using 
unicode_literals.

 (2) You have to manually encode every string, which is utterly
 bizarre to me.

re works with both bytes and Unicode strings, on both Python 2 and Python 3.  I 
was encoding them to see if it was able to handle the range when it was in a 
UTF-8 encoded string, rather than a Unicode string.  Even if it didn't fail 
with an exception, it failed with a wrong result (and that's even worse).

 (3) Plus you then have turn around and tell re, Hey by the way, you
 know those Unicode strings I just passed you?  Those are Unicode 
 strings, you know.
 Like it couldn't tell that already by realizing it got Unicode not
 byte strings.  So weird.

The re.UNICODE flags affects the behavior of e.g. \w and \d, it's not telling 
re that we are passing Unicode strings rather than bytes.  By default on Python 
2 those only match ASCII letters and digits.  This is also fixed on Python 3, 
where by default they match non-ASCII letters and digits (unless you pass 
re.ASCII).

 *  Requiring explicitly coded callouts to a library are at best 
 tedious and annoying.  ICU4J's UCharacter and JDK7's Character 
 classes both have
 String  getName(int codePoint)

FWIW we have unicodedata.lookup('SNOWMAN')

 One question: If one really must use code point numbers in strings, 
 does Python have any clean uniform way to enter them besides having
 to choose the clunky \u vs \U thing?

Nope.  OTOH it doesn't happen to often to use those (especially the \U 
version), so I'm not sure that it's worth adding something else just to save a 
few chars (also \x{12345} is only one char less than \U00012345).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

2.7 is the last 2.x.  There won't be any 2.8 (also I never heard that 2.6 was 
supposed to be the last).
We already have 2.7.2, and we will continue with 2.7.3, 2.7.4, etc for a few 
more years.  Eventually 2.7 will only get security fixes and the development 
will be focused on 3.x only.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

BTW, you can find more information about the one-dir-per-clone setup (and other 
useful info) here: 
http://docs.python.org/devguide/committing.html#using-several-working-copies

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10744] ctypes arrays have incorrect buffer information (PEP-3118)

2011-08-14 Thread Stefan Krah

Stefan Krah stefan-use...@bytereef.org added the comment:

Thanks for the patch. I agree with the interpretation of the format string.
One thing is unclear though: Using this interpretation the multi-dimensional 
array notation in format strings only seems useful for pointers to arrays.

The PEP isn't so clear on that, would you agree?


I'm not done reviewing the patch, just a couple of nitpicks:

  - We need a function declaration of _ctypes_alloc_format_string_with_shape()
in ctypes.h.

  - prefix_len = 32*(ndim+1) + 3: This is surely sufficient, but (ndim+1)
is not obvious to me. I think we need (20 + 1) * ndim + 3.

  - I'd use %zd for Py_ssize_t (I know that in other parts of the
code %ld is used, too).

--
assignee: theller - 
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10744
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Stefan Krah

Stefan Krah stefan-use...@bytereef.org added the comment:

Just to throw in a new name: Struct.nitems would also be possible.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11835] python (x64) ctypes incorrectly pass structures parameter

2011-08-14 Thread Vlad Riscutia

Vlad Riscutia riscutiav...@gmail.com added the comment:

Attached patch for this issue.

This only happens on MSVC x64 (I actually tired to repro on Arch Linux x64 
before starting work on it and it didn't repro).

What happens is that MSVC on x64 always passes structures larger than 8 bytes 
by reference. See here: 
http://msdn.microsoft.com/en-us/library/ms235286(v=vs.90).aspx

Now this was accounted for in callproc.c, line 1143 in development branch with 
this:

if (atypes[i]-type == FFI_TYPE_STRUCT
#ifdef _WIN64
 atypes[i]-size = sizeof(void *)
#endif
)
avalues[i] = (void *)args[i].value.p;
else
avalues[i] = (void *)args[i].value;

This fix wasn't made in libffi_msvc/ffi.c though. Here, regardless of whether 
we have x64 or x86 build, if z = sizeof(int) we will hit else branch in 
libffi_msvc/ffi.c at line 114 and do:

  else
{
  memcpy(argp, *p_argv, z);
}
  p_argv++;
  argp += z;

In our case, we copy 28 bytes as arguments (size of our structure) but in fact 
for x64 we only need 8 as structure is passed by reference so argument is just 
a pointer. My patch will adjust z before hitting if statement on x64 and it 
will cause correct copy as pointer.

--
nosy: +vladris
Added file: http://bugs.python.org/file22899/issue11835_patch.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Tom, I appreciate your taking the time to help us improve our Unicode story. I 
agree that the compromises made a decade ago need to be revisited and revised.

I think it will help if you better understand our development process. Our 
current *intent* is that 'Python x.y' be a fixed language and that 'CPython 
x.y.0', '.1', '.2', etc be increasingly (and strictly -- no regressions) better 
implementations of Python x.y. (Of course, the distribution and installation 
names and up-to-now dropping of '.0' may confuse the distinction, but it is 
real.) As a consequence, correct Python x.y code that runs correctly on the 
CPython x.y.z implementation should run correctly on x.y.(z+1).

For the purpose of this tracker, a behavior issue ('bug') is a discrepancy 
between the documented intent of a supported Python x.y and the behavior of the 
most recent CPython x.y.z implementation thereof. A feature request is a design 
issue, a request for a change in the language definition (and in the 
corresponding .0 implementation). Most people (including you, obviously) that 
file feature requests regard them as addressing design bugs. But still, 
language definition bugs are different from implementation bugs.

Of course, this all assumes that the documents are correct and unambiguous. But 
accomplishing that can be as difficult as correct code. Obvious mistakes are 
quickly corrected. Ambiguities in relation to uncontroversial behavior are 
resolved by more exactly specifying the actual behavior. But ambiguities about 
behavior that some consider wrong, are messy. We can consult the original 
author if available, consult relevant tests if present, take votes, but some 
fairly arbitrary decision may be needed. A typical response may be to clarify 
behavior in the docs for the current x.y release and consider behavior changes 
for the next x.(y+1) release.

So the answer to your question, Do we fix bugs?, is that we fix doc and 
implementation behavior bugs in the next micro x.y.z behavior bug-fix release 
and language design bugs in the next minor x.y language release. But note that 
language changes merely have to be improvements for Python in the future 
without necessarily worrying about whether a design decision made years ago was 
 or is a 'bug'.

The purpose of me discussing or questioning the 'type' of some of your issues 
is to *facilitate* change by getting the issue on the right track, in relation 
to our development process, as soon as possible.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11835] python (x64) ctypes incorrectly pass structures parameter

2011-08-14 Thread Stefan Krah

Changes by Stefan Krah stefan-use...@bytereef.org:


--
nosy: +amaury.forgeotdarc, belopolsky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

This is off-topic, but there was discussion on whether or not to have a 2.7. 
The decision was to focus on back-porting things that would make the eventual 
transition to 3.x easier.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12740] Add struct.Struct.nmemb

2011-08-14 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

In general, I think we can prevent confusion about the meaning of __len__ by 
sticking to the general rule:  len(object)==len(list(obj)) for anything that 
produces an iterable result.  In the case of struct, that would be the length 
of the tuple returned by struct.unpack() or the number of values consumed by 
struct.pack().

This choice is similar to what was done for collections.Counter where 
len(Counter(a=10, b=20)) returns 2 (the number of dict keys) rather than 30 
(the number of elements in the Bag-like container).  A similar choice   was 
make for structseq objects when len(ss) == len(iter(ss)) despite there being 
other non-positional names that are retrievable.

It's true that we get greater clarity by spelling out the specific meaning in 
the context of structs, as in s.num_members or some such, but we start to lose 
the advantages of polymorphism and ease of learning/remembering that comes with 
having consistent API choices.  For any one API such as structs, it probably 
makes sense to use s.num_members, but for the standard library as a whole, it 
is probably better to try to make len(obj) have a consistent meaning rather 
than having many different names for the size of the returned tuple.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12740
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12748] IDLE halts on osx when copy and paste

2011-08-14 Thread hy

hy hoyeung...@gmail.com added the comment:

Thank you. I kinda know what happens now.

First, I didn't made any change to IDLE after installed.

Second, I'm using dvorak-qwerty. Normally the keyboard layout changes to qwerty 
when I press Cmd key so that I can type in Dvorak and use the short cut in 
qwerty. But in IDLE it's not the same case. I find that the halt problem only 
occur when I copy. So I tried cut and paste. It happens that I can use both 
Cmd+x and Cmd+b (x in Dvorak layout) to cut and both Cmd+v and Cmd+.(v in 
Dvorak layout) to paste. So if I press Cmd+c, I'm inputting both Cmd+c and 
Cmd+j at the same time. And I think that's the reason why it halts. 

By hang, it's exactly what you described.

Also, i tried Tcl/Tk 8.4, the same problem happens.

It's weird since I don't have this problem in Windows when I use a third-party 
dvorak-qwerty input method.

I temporally changed to Dvorak now to avoid this problem, although there is a 
little bit inconvenient since all shortcut has changed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12748
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12672] Some problems in documentation extending/newtypes.html

2011-08-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

I agree that the sentence is a bit confusing and the 'object method' ambiguous. 
I suspect that the sentence was written years ago. In current Python, [].append 
is a bound method of class 'builtin_function_or_method'. I *suspect* that the 
intended contrast, and certainly the important one, is that between C 
functions, which get added to PyTypeObject structures, and their Python object 
wrappers that are visible from Python, but which must not be put in the type 
structure. The varieties of wrappers are irrelevant in this context and for the 
purpose of avoiding that mistake. So I would rewrite the sentence as:

These C functions are called “type methods” to distinguish them from Python 
wrapper objects, such as ``list.append`` or ``[].append``, visible in Python 
code.

Looking further down,
Now if you go and look up the definition of PyTypeObject in object.h you’ll 
see that it has many more fields that the definition above.,
needs 'that' changed to 'than' and I would insert  following tp_doc after 
'fields'.

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12672
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Python's narrow builds are, in a sense, 'between' UCS-2 and UTF-16. They 
support non-BMP chars but only partially, because, BY DESIGN*, indexing and len 
are by code units, not codepoints. They are documented as being UCS-2 because 
that is what M-A Lemburg, the original designer and writer of Python's unicode 
type and the unicode-capable re module, wants them to be called. The link to 
msg142037, which is one of 50+ in the thread (and many or most other disagree), 
pretty well explains his viewpoint. The positive side is that we deliver more 
than we promise. The negative side is that by not promising what perhaps we 
should allows is not to deliver what perhaps we should.

*While I think this design decision may have been OK a decade ago for a first 
implementation of an *optional* text type, I do not think it so for the future 
for revised implementations of what is now *the* text type. I think narrow 
builds can and should be revised and upgraded to index, slice, and measure by 
codepoints. Here is my current idea:

If the code unit stream contains any non-BMP characters (ie, surrogate pair of 
16-bit code units), construct a sequence of *indexes* of such characters 
(pairs). The fixed length of the string in codepoints is n-k, where n is the 
number of code units (the current length) and k is the length of the auxiliary 
sequence and the number of pairs. For indexing, look up the character index in 
the list of indexes by binary search and increment the codepoint index by the 
index of the index found to get the corresponding code unit index. (I have 
omitted the details needed avoid off-by-1 errors.)

This would make indexing O(log(k)) when there are surrogates. If that is really 
a problem because k is a substantial fraction of a 'large' n, then one should 
use a wide build. By using a separate internal class, there would be no time or 
space penalty for all-BMP text. I will work on a prototype in Python.

PS: The OSCON link in msg142036 currently gives me 404 not found

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12738] Bug in multiprocessing.JoinableQueue() implementation on Ubuntu 11.04

2011-08-14 Thread Michael Hall

Michael Hall michaelhal...@gmail.com added the comment:

I tried switching from joining on the work_queue to just joining on the 
individual child processes, and it seems to work now. Weird. Anyway, it'd be 
nice to see the JoinableQueue fixed, but it's not pressing any more.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12738
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

Have a look here: http://98.245.80.27/tcpc/OSCON2011/gbu/index.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11835] python (x64) ctypes incorrectly pass structures parameter

2011-08-14 Thread Vlad Riscutia

Vlad Riscutia riscutiav...@gmail.com added the comment:

Changing type to behavior as it doesn't crash on 3.3. I believe issue was 
opened against 2.6 and Santoso changed it to 2.7 and up where there is no crash.

Another data point: there is similar fix in current version of libffi here: 
https://github.com/atgreen/libffi/blob/master/.pc/win64-struct-args/src/x86/ffi.c

Since at the moment we are not integrating new libffi, I believe my fix should 
do (libffi fix is slightly different but I'm matching what we have in 
callproc.c which is not part of libffi).

--
type: crash - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Terry J. Reedy rep...@bugs.python.org wrote
   on Mon, 15 Aug 2011 00:26:53 -: 

 PS: The OSCON link in msg142036 currently gives me 404 not found

Sorry, I wrote 

 http://training.perl.com/OSCON/index.html

but meant 

 http://training.perl.com/OSCON2011/index.html

I'll fix it on the server in a short spell.

I am trying to keep the document up to date as I learn more, so it
isn't precisely the talk I gave in Portland.

 Python's narrow builds are, in a sense, 'between' UCS-2 and UTF-16.

So I'm finding.  Perhaps that's why I keep getting confused. I do have a pretty 
firm
notion of what UCS-2 and UTF-16 are, and so I get sometimes self-contradictory 
results.
Can you think of anywhere that Python acts like UCS-2 and not UTF-16?  I'm not 
sure I
have found one, although the regex thing might count.

Thank you guys for being so helpful and understanding.

 They support non-BMP chars but only partially, because, BY DESIGN*,
 indexing and len are by code units, not codepoints. 

That's what Java did, too, and for the same reason.  Because they had
a UCS-2 implementation for Unicode 1.1 so when Unicode 2.0 came out
and they learned that they would need more than 16 bits, they piggybacked
UTF-16 onto the top of it instead of going for UTF-8 or UTF-32, and they're
still paying that price, and to my mind, heavily and continually.

Do you use Java?  It is very like Python in many of its 16-bit character issues.
Most of the length and indexing type functions address things by code unit
only, not copepoint.  But they would never claim to be UCS-2.

Oh, I realize why they did it.  For one thing, they had bytecode out there
that they had to support.  For another, they had some pretty low-level APIs
that didn't have enough flexibility of abstraction, so old source had to keep
working as before, even though this penalized the future.  Forever, kinda.

While I wish they had done better, and kinda think they could have, it
isn't my place to say.  I wasn't there (well, not paying attention) when
this was all happening, because I was so underwhelmed by the how annoyingly
overhyped it was.  A billion dollars of marketing can't be wrong, you know?
I know that smart people looked at it, seriously.  I just find the cure
they devised to be more in the problem set than the solution set.

I like how Python works on wide builds, especially with Python3. I was
pretty surprised that the symbolic names weren't working right on the
earlier version of the 2.6 wide build I tried them on.

I know have both wide and narrow builds installed of both 2.7 and 3.2,
so that shouldn't happen again.

 They are documented as being UCS-2 because that is what M-A Lemburg,
 the original designer and writer of Python's unicode type and the unicode-
 capable re module, wants them to be called. The link to msg142037,
 which is one of 50+ in the thread (and many or most other disagree),
 pretty well explains his viewpoint.

Count me as one of those many/most others who disagree. :)

 The positive side is that we deliver more than we promise. The
 negative side is that by not promising what perhaps we should allows
 is not to deliver what perhaps we should.

It is always better to deliver more than you say than to deliver less.

 * While I think this design decision may have been OK a decade ago for
   a first implementation of an *optional* text type, I do not think it
   so for the future for revised implementations of what is now *the*
   text type. I think narrow builds can and should be revised and
   upgraded to index, slice, and measure by codepoints. 

Yes, I think so, too.  If you look at the growth curve of UTF-8 alone,
it has followed a mathematically exponential growth curve in the 
first decade of this century.  I suspect that will turn into an S
surve with with aymtoptotic shoulders any time now.  I haven't looked
at it lately, so maybe it already has.  I know that huge corpora I work
with at work are all absolutely 100% Unicode now.  Thank XML for that.

 Here is my current idea:

 If the code unit stream contains any non-BMP characters (ie, surrogate
 pair of 16-bit code units), construct a sequence of *indexes* of such
 characters (pairs). The fixed length of the string in codepoints is
 n-k, where n is the number of code units (the current length) and k is
 the length of the auxiliary sequence and the number of pairs. For
 indexing, look up the character index in the list of indexes by binary
 search and increment the codepoint index by the index of the index
 found to get the corresponding code unit index. (I have omitted the
 details needed avoid off-by-1 errors.)

 This would make indexing O(log(k)) when there are surrogates. If that
 is really a problem because k is a substantial fraction of a 'large'
 n, then one should use a wide build. By using a separate internal
 class, there would be no time or space penalty for all-BMP text. I
 will work on a prototype in 

[issue12693] test.support.transient_internet prints to stderr when verbose is false

2011-08-14 Thread Brett Cannon

Brett Cannon br...@python.org added the comment:

The line from the source I am talking about is 
http://hg.python.org/cpython/file/49e9e34da512/Lib/test/support.py#l943 . And 
as for the output:

 ./python.exe -m test -uall test_ssl   

[1/1] test_ssl
Resource 'ipv6.google.com' is not available
1 test OK.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12693
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12750] datetime.datetime timezone problems

2011-08-14 Thread Daniel O'Connor

New submission from Daniel O'Connor dar...@dons.net.au:

It isn't possible to add a timezone to a naive datetime object which means that 
if you are getting them from some place you can't directly control there is no 
way to set the TZ.

eg pywws' DataStore returns naive datetime's which are in UTC. There is no way 
to set this and hence strftime seems to think they are in local time.

I can sort of see why you would disallow changing a TZ once set but it doesn't 
make sense to prevent this for naive DTs.

Also, utcnow() returns a naive DT whereas it would seem to be more sensible to 
return it with a UTC TZ.

--
components: Library (Lib)
messages: 142095
nosy: Daniel.O'Connor
priority: normal
severity: normal
status: open
title: datetime.datetime timezone problems
type: feature request
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12750
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

I wrote:

 Python's narrow builds are, in a sense, 'between' UCS-2 and UTF-16.

 So I'm finding.  Perhaps that's why I keep getting confused. I do have a 
 pretty firm
 notion of what UCS-2 and UTF-16 are, and so I get sometimes 
 self-contradictory results.
 Can you think of anywhere that Python acts like UCS-2 and not UTF-16?  I'm 
 not sure I
 have found one, although the regex thing might count.

I just thought of one.  The casemapping functions don't work right on
Deseret, which is a non-BMP case-changing scripts.  That's one I submitted
as a bug, because I figure if the the UTF-8 decoder can decode the non-BMP
code points into paired UTF-16 surrogates, then the casing functions had
jolly well be able to deal with it.  If the UTF-8 decoder knows it is only
going to UCS-2, then it should have raised on exception on my non-BMP source.
Since it went to UTF-16, the rest of the language should have behaved 
accordingly.
Java does to this right, BTW, despite its UTF-16ness.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

It is always better to deliver more than you say than to deliver less.
Except when promising too little is a copout.

Everyone always talks about important they're sure O(1) access must be,

I thought that too until your challenge. But now that you mention it, indexing 
is probably not the bottleneck in most document processing. We are optimizing 
without measuring! We all know that is bad.

If done transparently, non-O(1) indexing should only be done when it is 
*needed*. And if it is a bottleneck, switch to a wide build -- or get a newer, 
faster machine.

I first used Python 1.3 on a 10 megahertz DOS machine. I just got a  multicore 
3.+ gigahertz machine. Tradeoffs have changed and just as we use cycles (and 
space) for nice graphical interfaces, we should use some for global text 
support. In the same pair of machines, core memory jumped from 2 megabytes to 
24 gigabytes. (And the new machine cost perhaps as much in adjusted dollars.) 
Of course, better unicode support should come standard with the OS and not have 
to be re-invented by every language and app.

Having promised to actually 'work on a prototype in Python', I decided to do so 
before playing. I wrote the following test:

tucs2 = 'A\U0001043cBC\U0001042f\U00010445DE\U00010428H'
tutf16= UTF16(tucs2)
tlist = ['A', '\U0001043c','B','C','\U0001042f','\U00010445',
 'D','E','\U00010428','H']
tlis2 = [tutf16[i] for i in range(len(tlist))]
assert tlist == tlis2

and in a couple hours wrote and debugged the class to make it pass (and added a 
couple of length tests). See the uploaded file.

Adding an __iter__ method to iterate by characters (with hi chars returned as 
wrapped length-1 surrogate pairs) instead of code units would be trivial. 
Adding the code to __getitem__ to handle slices should not be too hard. Slices 
containing hi characters should be wrapped. The cpdex array would make that 
possible without looking at the whole slice.

The same idea could be used to index by graphemes. For European text that used 
codepoints for pre-combined (accented) characters as much as possible, the 
overhead should not be too much. 

This may not be the best issue to attach this to, but I believe that improving 
the narrow build would allow fixing of the re/regex problems reported here.

--
Added file: http://bugs.python.org/file22900/utf16.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12672] Some problems in documentation extending/newtypes.html

2011-08-14 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12672
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-14 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Keep in mind that we should be able to access and use lone surrogates too, 
therefore:
s = '\ud800'  # should be valid
len(s)  # should this raise an error? (or return 0.5 ;)?
s[0]  # error here too?
list(s)  # here too?

p = s + '\udc00'
len(p)  # 1?
s[0]  # '\U0001' ?
s[1]  # IndexError?
list(p + 'a')  # ['\ud800\udc00', 'a']?

We can still decide that strings with lone surrogates work only with a limited 
number of methods/functions but:
1) it's not backward compatible;
2) it's not very consistent

Another thing I noticed is that (at least on wide builds) surrogate pairs are 
not joined on the fly:
 p
'\ud800\udc00'
 len(p)
2
 p.encode('utf-16').decode('utf-16')
''
 len(_)
1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com