[issue12831] 2to3 and integer division

2011-08-26 Thread Mark Dickinson

Mark Dickinson dicki...@gmail.com added the comment:

 / 2 is an integer division, so it should be // 3 in Python 3.

No, I don't think that's right: 2to3 has no way of knowing that the programmer 
intended an integer division here (self.maxstars could be a float).

Instead, you should always use '//' in Python 2 code where an integer division 
is intended.

--
nosy: +mark.dickinson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12831
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12831] 2to3 and integer division

2011-08-26 Thread Mark Dickinson

Changes by Mark Dickinson dicki...@gmail.com:


--
nosy: +benjamin.peterson
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12831
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12844] Support more than 255 arguments

2011-08-26 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

The approach looks fine to me. Would you like to work on a patch?

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12844
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12808] Coverage of codecs.py

2011-08-26 Thread Tennessee Leeuwenburg

Tennessee Leeuwenburg tleeuwenb...@gmail.com added the comment:

Here is a stab at updated documentation. I would suggest that if further 
changes are recommended to the documentation, that a core committer go ahead 
and make them. I'm absolutely more than happy to keep taking stabs at it, but 
ultimately I probably don't understand the purpose of these classes as well as 
some of the rest of you, and I don't feel best placed to decide exactly how 
this should read

--
Added file: http://bugs.python.org/file23049/codecs.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12808
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12831] 2to3 and integer division

2011-08-26 Thread Alexander Rødseth

Alexander Rødseth rods...@gmail.com added the comment:

Even though it's hard to cover every case, it should be possible in quite a few 
cases:

self.maxstars = 4
half = self.maxstars / 2

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12831
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12845] PEP-3118: C-contiguity with zero strides

2011-08-26 Thread Stefan Krah

New submission from Stefan Krah stefan-use...@bytereef.org:

Numpy and PyBuffer_IsContiguous() have different ideas of
C-contiguity if there is a zero in strides (this is allowed,
I asked Pauli Virtanen).


 from numpy import *
 nd = ndarray(shape=[10], strides=[0])
 nd.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
 
 from _testbuffer import ndarray as pyarray
 from _testbuffer import PyBUF_FULL_RO
 x = pyarray(nd, getbuf=PyBUF_FULL_RO)
 x.c_contiguous
False

--
assignee: skrah
components: Interpreter Core
messages: 143005
nosy: skrah
priority: normal
severity: normal
status: open
title: PEP-3118: C-contiguity with zero strides
type: behavior
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12845
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12831] 2to3 and integer division

2011-08-26 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

Running python with the -3 command line option will warn about Python 3.x 
incompatibilities that 2to3 cannot trivially fix.

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12831
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12820] Tests for Lib/xml/dom/minicompat.py

2011-08-26 Thread John Chandler

John Chandler therealmetal...@gmail.com added the comment:

Cool, thanks for the feedback! :-)

I'll make the appropriate changes to the tests and add some coverage for 
defproperty as soon as I can.

John

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12820
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12846] unicodedata.normalize turkish letter problem

2011-08-26 Thread Cem YILDIZ

New submission from Cem YILDIZ c...@fizy.com:

unicodedata.normalize cannot convert turkish letter ı into i:


import unicodedata
s = uüfürükçü ağaç ve ıslıkçı çeşme
print(shoehorn_unicode_into_ascii(s))
print unicodedata.normalize('NFKD', s).encode('ascii','ignore')

 ufurukcu agac ve slkc cesme

but the result should be
 ufurukcu agac ve islikci cesme

--
components: Unicode
messages: 143008
nosy: fizymania
priority: normal
severity: normal
status: open
title: unicodedata.normalize turkish letter problem
versions: Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12846
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12846] unicodedata.normalize turkish letter problem

2011-08-26 Thread Cem YILDIZ

Changes by Cem YILDIZ c...@fizy.com:


--
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12846
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12846] unicodedata.normalize turkish letter problem

2011-08-26 Thread Cem YILDIZ

Cem YILDIZ c...@fizy.com added the comment:

unicodedata.normalize cannot convert turkish letter ı into i:


import unicodedata
s = uüfürükçü ağaç ve ıslıkçı çeşme

print unicodedata.normalize('NFKD', s).encode('ascii','ignore')

 ufurukcu agac ve slkc cesme

but the result should be
 ufurukcu agac ve islikci cesme

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12846
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9302] distutils API Reference: setup() and Extension parameters' description not correct.

2011-08-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 96f0ccb9716d by Éric Araujo in branch '3.2':
Fix type information in distutils API reference (#9302).
http://hg.python.org/cpython/rev/96f0ccb9716d

New changeset a410b857efe3 by Éric Araujo in branch 'default':
Merge from 3.2 (#9302 fix and other changes)
http://hg.python.org/cpython/rev/a410b857efe3

New changeset 59b3f845f7a3 by Éric Araujo in branch 'default':
Synchronize packaging docs with distutils’ (includes fix for #9302)
http://hg.python.org/cpython/rev/59b3f845f7a3

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9302
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9302] distutils API Reference: setup() and Extension parameters' description not correct.

2011-08-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 78b26e7720c0 by Éric Araujo in branch '2.7':
Fix type information in distutils API reference (#9302).
http://hg.python.org/cpython/rev/78b26e7720c0

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9302
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12678] test_packaging and test_distutils failures under Windows

2011-08-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 8ad1670c0f1f by Éric Araujo in branch '2.7':
Try to fix test_distutils on Windows (#12678)
http://hg.python.org/cpython/rev/8ad1670c0f1f

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12678
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11360] In documentation of getopt, advertise argparse instead of optparse

2011-08-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 40f7a6e71930 by Éric Araujo in branch '3.2':
Remove outdated pointer to optparse (fixes #11360).
http://hg.python.org/cpython/rev/40f7a6e71930

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11360] In documentation of getopt, advertise argparse instead of optparse

2011-08-26 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 6d3c645fa52f by Éric Araujo in branch '2.7':
Remove outdated pointer to optparse (fixes #11360).
http://hg.python.org/cpython/rev/6d3c645fa52f

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12833] raw_input misbehaves when readline is imported

2011-08-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Maybe you need to call sys.stdin.flush() before raw_input?

In any way, 2.6 is in security mode, so we need to reproduce this with current 
versions: 2.7, 3.2 or 3.3.

--
components: +IO, Interpreter Core -Library (Lib)
nosy: +eric.araujo, pitrou
stage:  - test needed
versions:  -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12833
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12842] Docs: first parameter of tp_richcompare() always has the correct type

2011-08-26 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
keywords: +needs review
stage:  - patch review
versions:  -Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12842
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9302] distutils API Reference: setup() and Extension parameters' description not correct.

2011-08-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Improved and committed, thanks again!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9302
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12759] (?P=) input for Tools/scripts/redemo.py raises unnhandled exception

2011-08-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

I can reproduce in 3.3 (the file has been moved to Tools/demo/redemo.py).  The 
Tk application does not crash but there is a traceback.  Would you like to work 
on a patch?  If so, there are good guidelines in the devguide.

--
keywords: +easy
nosy: +eric.araujo
stage:  - needs patch
title: (?P=) input for Tools/scripts/redemo.py throw an exception - (?P=) 
input for Tools/scripts/redemo.py raises unnhandled exception
versions: +Python 2.7, Python 3.2, Python 3.3 -Python 2.6

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12759
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12806] argparse: Hybrid help text formatter

2011-08-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

Steven: What do you think?

GraylinKim: You can open a feature request for message preview on the 
metatracker (see “Report Tracker Problem” in the sidebar).

--
nosy: +bethard, eric.araujo
type:  - feature request
versions: +Python 3.3 -Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12806
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12768] docstrings for the threading module

2011-08-26 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

I have made a review on Rietveld.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12768
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12195] Little documentation of annotations

2011-08-26 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
nosy: +eric.araujo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12195
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12742] Add support for CESU-8 encoding

2011-08-26 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

Can you provide some example?
The page you linked says It should be used exclusively for internal processing 
and never for external data exchange., so I'm not sure why these APIs would 
want to use it.

--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12742
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12195] Little documentation of annotations

2011-08-26 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

 some simple examples showing the syntax would go a long way.

Sorry, there as just too many ways to go and we are intentionally not stating 
which way is preferred.  I've seen many variants  a:[Integral] for a list of 
integers, a:(int,str) for a 2-tuple of an int and a string, a:(str,file,None) 
for something that is a string or a file or None, a:'light_years' to indicate 
units of measure, a:range_check(10.5, 20.1) for range validation, and some 
variants for converters, adapters, factory functions, documentation aids, etc.

If you want to advance the state of the art, perhaps write a blog post on what 
you consider to be a best practice.  If a consensus emerges, we 
will follow.

--
resolution:  - rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12195
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12768] docstrings for the threading module

2011-08-26 Thread Eli Bendersky

Eli Bendersky eli...@gmail.com added the comment:

Éric, yeah I received an email. Hopefully Graeme did too.

It's a shame a new review isn't notified in the tracker instead.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12768
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Brian, Tim, I'd feel more comfortable if any of you confirmed this isn't a 
stupid proposal on my part :)

--
components: +Interpreter Core
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Brian Curtin

Brian Curtin br...@python.org added the comment:

I could see how they'd use EINVAL, but to me ENOTDIR makes more sense here. 
However, I'm not sure if anyone is depending on this (or what they could depend 
on it for).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12833] raw_input misbehaves when readline is imported

2011-08-26 Thread Idan Kamara

Idan Kamara idank...@gmail.com added the comment:

Reproduced on 2.7.

(flushing stdin/out doesn't help)

--
versions: +Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12833
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 I could see how they'd use EINVAL, but to me ENOTDIR makes more sense
 here. However, I'm not sure if anyone is depending on this (or what
 they could depend on it for).

Right now I'm not sure, but if PEP 3151 is accepted it will make much
more sense to get a NotADirectoryError.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Brian Curtin

Brian Curtin br...@python.org added the comment:

With that PEP likely to be accepted, I say go ahead with the change for that 
benefit.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Tim Golden

Tim Golden m...@timgolden.me.uk added the comment:

Obviously someone's code would break if it were relying on the Unix 
errno only in a Windows-only situation to determine the situation of 
opening a directory which isn't one. But that combination of events 
doesn't seem terribly likely.

Speaking for myself, since the exception is a WindowsError with the 
winerror attribute prominent, [Error 267] I'd be far more likely to be 
trapping that.

I say go ahead

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

Note that this file is not written by hand. It's generated by PC/generrmap.c, 
which uses the _dosmaperr() function provided by the msvcrt.

If we want to modify it, this should be clearly marked somewhere.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

If you have a copy of Visual Studio, you can see the code of _dosmaperr() in 
VC/crt/src/dosmap.c.
Otherwise the Google query inurl:dosmap.c returns some online copies of this 
file.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9262] IDLE: Use tabbed shell and edit windows

2011-08-26 Thread Roger Serwy

Roger Serwy roger.se...@gmail.com added the comment:

Attached is an extension which provides tabbed windows for IDLE. It supports 
drag-and-drop reordering and separate windows. 

The implementation relies on monkey-patching a few subroutines and duck-typing 
for the toplevel window. The extension emulates each tab as if it were its own 
toplevel object. 

There can be flickering when switching tabs due to swapping the toplevel menu 
bar. This seems to be a limitation of Tk.

--
Added file: http://bugs.python.org/file23050/TabExtension.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9262
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11913] sdist should allow for README.rst

2011-08-26 Thread resc

resc thomat...@gmail.com added the comment:

Just wanted to note that this confuses other people too...

http://stackoverflow.com/questions/4384796/readme-extension-for-python-projects

Is this something that could be changed in 'distribute'?

--
nosy: +Thomas.Smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11913
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Wow.  A very educational discussion.  We will be referencing this issue for 
many years to come.

As long as the buck stops with me, I feel strongly that *today* changing 
indexing from O(1) to O(log N) is a bad idea, partly for technical reasons, 
partly because the Python culture isn't ready.  In 5 or 10 years we need to 
revisit this, and it wouldn't hurt if in the mean time we started seriously 
thinking about how to change our APIs so that O(1) indexing is not relied upon 
so much.  This may include rewriting tutorials to nudge users in the direction 
of using different idioms for text processing.

In the meantime, I think our best option is to switch CPython to the PEP 393 
string implementation.  Despite its disadvantages (I understand the spoiler 
issue) is is generally no worse than a wide build, and there is working code 
today that we can optimize before 3.3 is released.

For Python implementations where this is not an option (I'm thinking Jython and 
IronPython, both of which are closely tied to a system string type that behaves 
like UTF-16) I hope that at least the regular expression behavior can be fixed 
so that . matches a surrogate pair.  (Possibly they already behave that way, 
if they use a native regex library.)

In all cases, for future Python versions, we should tighten the codecs to 
reject data that the Unicode standard considers invalid (and we should offer 
separate non-strict codecs for situations where such invalid data needs to be 
processed).

I wish we could fix the codecs and the regex . issue on narrow builds for 
Python versions before 3.3 (esp. 3.2 and 2.7), but I fear that this is 
considered too backwards incompatible (though for each specific fix we should 
consider this carefully).

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12728] Python re lib fails case insensitive matches on Unicode data

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

This bug could do with a little less attitude.  That said, I think it is a bug 
and should be fixed, at the very least for Python 3.3.  As always, it is a 
matter of much debate to what extent bugs can be fixed in previous Python 
versions (specifically, 2.7 and 3.2) without breaking more code than it fixes, 
and I don't want to jump the gun on that issue.  Let's first see what it takes 
to fix this for 3.3.

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12728
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Sounds like a fair feature request for Python 3.3, as long as the intention is 
that users must import some module from the standard library and use functions 
defined in that module.  The operations and methods defined for str instances 
(e.g. ==, , etc.) should not change their behavior.

Is there an existing 3rd party library that we could adopt (even if it isn't 
perfect yet)?

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

I presume this applies to builtin str methods like .lower(), right?  I think it 
is a good thing to do for Python 3.3.

We'd need to define what should happen in edge cases, e.g. when (against all 
odds) a string happens to contain a lone surrogate or some other code point or 
sequence of code points that the Unicode standard considers illegal.  I think 
it should not fail but just leave those code points alone.

Does this require us to import more data files from the Unicode standard?  By 
itself that doesn't scare me.

Would this also affect .islower() and friends?

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

We should at least get this fixed in 3.3.  Then we can discuss the benefits of 
backporting the fixes to 2.7 and 3.2 (though it sounds to me like the backports 
will fix more than they will break, since it is pretty much impossible to do 
the right thing in those versions today).

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12749
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Yeah, this should be fixed in 3.3 and probably backported to 3.2 and 2.7.  
(There is already no guarantee that len(s) == len(s.title()), right?)

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12737
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12746] normalization is affected by unicode width

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Yeah, we should fix this.  At least in 3.3, but (without knowing what exactly 
is involved) I think backporting to 2.7 and 3.2 makes sense too.

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12746
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Really?  The re module cannot be salvaged and we should add regex but keep the 
(buggy) re?  That does not make a lot of sense to me.  I think it should just 
be fixed in the re module.  Or the re module should be *replaced* by the code 
from the regex module (but renamed to re, and with certain backwards 
compatibilities restored, probably).  But I really hope the re module (really: 
the _sre extension module) can be fixed.  We should also make a habit in our 
docs of citing specific versions of the Unicode standard, and specific TR 
numbers and versions where they apply.  (And hopefully we can supply URLs to 
the Unicode consortium's canonical copies of those documents.)

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12731
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12733] Request for grapheme support in Python re lib

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Again, I would be disappointed if the re (_sre) module could not be fixed.  It 
is a reasonable feature request.

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12733
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12734] Request for property support in Python re lib

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

+1 on adding the feature to 3.3 in whichever way makes sense.

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12734
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

+1 on the feature request.

--
nosy: +gvanrossum

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12753
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

 Sounds like a fair feature request for Python 3.3, as long as the
 intention is that users must import some module from the standard
 library and use functions defined in that module.  The operations and
 methods defined for str instances (e.g. ==, , etc.) should not change
 their behavior.

 Is there an existing 3rd party library that we could adopt (even if it isn't 
 perfect yet)?

I *think* you could use ICU's.  

I'm pretty sure the Parrot people use ICU libraries.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

I know I sound like NIH, but I'm always reluctant to add a big 3rd party lib 
like ICU to the permanent dependencies of all future Python distros.  If people 
want to use ICU they already can.  OTOH I don't have a better idea. :-(

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Guido van Rossum rep...@bugs.python.org wrote
   on Fri, 26 Aug 2011 21:16:57 -: 

 Yeah, this should be fixed in 3.3 and probably backported to 3.2
 and 2.7.  (There is already no guarantee that len(s) ==
 len(s.title()), right?)

Well, *I* don't know of any such guarantee, 
but I don't know Python very well.

In general, Unicode makes very few guarantees about casing.  Under full
casemapping, which is the only way to do the silly Turkish stuff amongst
quite a bit else, any of the three casemappings can change the length of
the string.

Other things you can't rely on are round tripping and single paths.  By
roundtripping, just look at the two lowercase sigmas and think about how
you can't get back to one of them if you uppercase them both.  By single
paths, I mean that code that does some sort of conversion where it first
lowercases everything and then titlecases the first letter can produce
something different from titlecasing just the original first letter and
then lowercasing the rest of them.  That's because tc(x) and tc(lc(x)) can
be different.

--tom

--
title: str.title()  is overzealous by upcasing combining marks inappropriately 
- str.title() is overzealous by upcasing combining marks inappropriately

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12737
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Raymond Hettinger

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

I would like to be involved in the design of the API for a UCA module and its 
routines for loading Unicode Collation Element Tables (not making the mistake 
of using global state like the locale module does).

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Raymond Hettinger raymond.hettin...@gmail.com added the comment:

 I would like to be involved in the design of the API for a UCA module
 and its routines for loading Unicode Collation Element Tables (not
 making the mistake of using global state like the locale module does).

Is this the problem where a locale is global to a process (or thread)?

The way I'm used to using the UCA module in Perl, that's never a problem,
because it's completely object-oriented.  There's no global state.  You 
instantiate a collator object with all the state it needs, like

collation_level
upper_before_lower
backwards_levels
normalization
override_CJK
override_Hangul
katakana_before_hiragana
variable
locale
preprocess

And then you use that object for all your collation needs, including
not just sorting but also string comparison and even searches.

For example, you could instantiate a first collator object with its level
set to one, meaning just compare base alphanumerics not diacritics or case
or nonletters, and a second with the defaults so that it uses all four
levels or a different normalization.  I have on occasion had more than one
collator object around at once each with its own locale, like if I want to
compare different locales' comparisons.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

I should probably mention the importance in the design of a UCA module of
being able to specify which UCA version number you want it to behave like
in case you plan to override some of the DUCET entries.  That way if you
run under a later UCA with different DUCET weights, your own tailorings will
still make sense.  If you don't do this, your collation tailorings can break 
in a new release of the UCA.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12735] request full Unicode collation support in std python library

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Guido van Rossum rep...@bugs.python.org wrote
   on Fri, 26 Aug 2011 21:55:03 -: 

 I know I sound like NIH, but I'm always reluctant to add a big 3rd
 party lib like ICU to the permanent dependencies of all future Python
 distros.  If people want to use ICU they already can.  OTOH I don't
 have a better idea. :-(

I know exactly what you mean.  I would not want to push that on anyone,
being dependent on a gigantic 3rd-party module.  I just tried to answer
the question.  The only two full UCA implementations I know of are ICU's
and Perl's, which does not use ICU (since we're UTF-8, etc).

I just wish Python had Unicode collation, is all.

--tom

PS: (I haven't had good luck the ICU bindings in 3.2.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12735
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Guido van Rossum rep...@bugs.python.org wrote
   on Fri, 26 Aug 2011 21:11:24 -: 

 Guido van Rossum gu...@python.org added the comment:

 I presume this applies to builtin str methods like .lower(), right?  I
 think it is a good thing to do for Python 3.3.

Yes, the full casemaps are for upper, title, and lowercase.  There is 
also a full casefold and turkic case fold (which is full), but you
don't have a casefold function so I guess that doesn't matter.

 We'd need to define what should happen in edge cases, e.g. when
 (against all odds) a string happens to contain a lone surrogate or
 some other code point or sequence of code points that the Unicode
 standard considers illegal.  I think it should not fail but just leave
 those code points alone.

Well, it's a funny thing.  There are properties given for all
Unicode code points, even noncharacter code points.  This
includes the casing properties, oddly enough.

From UnicodeData.txt, which has a few surrogate entries; notice
no casing is given:

D800;Non Private Use High Surrogate, First;Cs;0;L;N;
DB7F;Non Private Use High Surrogate, Last;Cs;0;L;N;
DB80;Private Use High Surrogate, First;Cs;0;L;N;
DBFF;Private Use High Surrogate, Last;Cs;0;L;N;
DC00;Low Surrogate, First;Cs;0;L;N;
DFFF;Low Surrogate, Last;Cs;0;L;N;

And in SpecialCasing.txt, which does not have surrogates but does have
a default clause:

# This file is a supplement to the UnicodeData file.
# It contains additional information about the casing of Unicode characters.
# (For compatibility, the UnicodeData.txt file only contains case mappings 
for
# characters where they are 1-1, and independent of context and language.
# For more information, see the discussion of Case Mappings in the Unicode 
Standard.
#
# All code points not listed in this file that do not have a simple case 
mappings
# in UnicodeData.txt map to themselves.

And in CaseFolding.txt, which also does not have surrogates but again does 
have a default clause:

# The data supports both implementations that require simple case foldings
# (where string lengths don't change), and implementations that allow full 
case folding
# (where string lengths may grow). Note that where they can be supported, 
the
# full case foldings are superior: for example, they allow MASSE and 
Maße to match.
#
# All code points not listed in this file map to themselves.

Taken all together, it follows that the surrogates have case{map,fold}s
back to themselves, since they have no case{map,fold}s listed.

It's ok to have arbitrary code points in memory, including surrogates and
the 66 noncharacters.  It just isn't legal to have them in a UTF stream
for open interchange, whatever that means.  

 Does this require us to import more data files from the Unicode
 standard?  By itself that doesn't scare me.

One way or the other, yes, notably the SpecialCasing file for
casemapping and the CaseFolding file for casefolding (which you
should do anyway to fix re.I).  But you can and should process the
new files into some tighter format optimized for your own lookups.

Oddly, Java doesn't provide for String methods that do full casing on
titlecase, even those they do do so on lowercase and uppercase.  On
titlecase they only expose the simple casemaps via the Character class,
which are the ones from UnicodeData.  They recognize that this is flaw, 
but it was too late to fix it for JAva 7.

 Would this also affect .islower() and friends?

Well, it shouldn't, but .islower() and friends are already mistaken.
They seem to be checking for GC=Ll and such, but they need to be
checking the Unicode binary property Lowercase and such.  Watch:

test 37 for string Ⅷ
wanted ⅷ to be lowercase of Ⅷ but python disagrees
wanted Ⅷ to be titlecase of Ⅷ but python disagrees
wanted Ⅷ to be uppercase of Ⅷ but python disagrees
test 37 failed 3 subtests

test 39 for string Ⓚ
wanted ⓚ to be lowercase of Ⓚ but python disagrees
wanted Ⓚ to be titlecase of Ⓚ but python disagrees
wanted Ⓚ to be uppercase of Ⓚ but python disagrees
test 39 failed 3 subtests

That's because the Roman numerals are GC=Nl but still have
case and change case.  Similarly for the circled letters which
are GC=So but have case and change case.  Plus there's U+0345,
the iota subscript, which is GC=Mn but has case and changes case.

I don't remember whether I've sent in my full test suite or not.  
If I haven't yet, I should attach it to the bug report.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-26 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Here’s my casing test suite; I thought I sent it in but the mux file here isn’t 
the full thing.

 It does several things, including letting you run it with regex vs re.  It 
also checks for the islower, etc functions. It has both simple and full (and 
turkic) maps and folds in it, but is configured to only check the simple 
versions for now.  The islower and isupper etc functions seem to be checking 
the wrong Unicode property.

Yes, it has my quaint Unixisms in it, because it needs to run with UTF-8 
output, or you can't read what's going on.

--
Added file: http://bugs.python.org/file23051/casing-tests.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-26 Thread Vlad Riscutia

Vlad Riscutia riscutiav...@gmail.com added the comment:

I wasn't aware this is an auto-generated file. I can add a comment but looking 
at it, it seems we auto-generate this file just to save a call to _dosmaperr. I 
would refactor the whole function to call _dosmaperr first then if result is 
still EINVAL, tweak with custom switch case. The way I see it, this looks like 
premature optimization since OS error shouldn't be on a hot code path, meaning 
an application should be able to live with an extra CRT function call on such 
exceptions. I'm willing to implement this if  there are no objections. 
Something like:

errno = _dosmaperr(err)
if (EINVAL == errno)
{
switch (err)
{
// Our tweaks
}
}

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-26 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

My proposal is better than log(N) in 2 respects.

1) There need only be a time penalty when there are non-BMP chars and indexing 
currently gives the 'wrong' answer and therefore when a time-penalty should be 
acceptable. Lookup for normal all-BMP strings could remain the same.

2) The penalty is log(K), where K in the number of non-BMP chars. In theory, 
O(logK) is as 'bad' as O(logN), for any fixed ratio K/N. In practice, the 
difference should be noticeable when there are just a few (say .01%) 
extended-range chars.

I am aware that this is an idea for the future, not now.
---

Fixing string iteration on narrow builds to produce code points the same
as with wide builds is easy and costs O(1) per code point (character), which is 
the same as the current cost. Then

 from unicodedata import name
 name('\U0001043c')
'DESERET SMALL LETTER DEE'
 for c in 'a\U0001043c': name(c)
'LATIN SMALL LETTER A'
Traceback (most recent call last):
  File pyshell#3, line 1, in module
for c in 'a\U0001043c': name(c)
ValueError: no such name

would work like it does on wide builds instead of failing.

I admit that it would be strange to have default iteration produce different 
items than default indexing (and indeed, str currently iterates by sequential 
indexing). But keeping them in sync means that buggy iteration is another cost 
of O(1) indexing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-26 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

To me, making (default) iteration deviate from indexing is anathema.

However, there is nothing wrong with providing a library function that
takes a string and returns an iterator that iterates over code points,
joining surrogate pairs as needed. You could even have one that
iterates over characters (I think Tom calls them graphemes), if that
is well-defined and useful.

--
title: Python lib re cannot handle Unicode properly due to  narrow/wide bug 
- Python lib re cannot handle Unicode properly due to narrow/wide bug

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-26 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

PEP-393 will take care of iterating by code points.
Where would you have other iterators go? The string module?
Something else I have not thought of? Or something new?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com