[issue12831] 2to3 and integer division
Mark Dickinson dicki...@gmail.com added the comment: / 2 is an integer division, so it should be // 3 in Python 3. No, I don't think that's right: 2to3 has no way of knowing that the programmer intended an integer division here (self.maxstars could be a float). Instead, you should always use '//' in Python 2 code where an integer division is intended. -- nosy: +mark.dickinson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12831 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12831] 2to3 and integer division
Changes by Mark Dickinson dicki...@gmail.com: -- nosy: +benjamin.peterson resolution: - invalid status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12831 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12844] Support more than 255 arguments
Martin v. Löwis mar...@v.loewis.de added the comment: The approach looks fine to me. Would you like to work on a patch? -- nosy: +loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12844 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12808] Coverage of codecs.py
Tennessee Leeuwenburg tleeuwenb...@gmail.com added the comment: Here is a stab at updated documentation. I would suggest that if further changes are recommended to the documentation, that a core committer go ahead and make them. I'm absolutely more than happy to keep taking stabs at it, but ultimately I probably don't understand the purpose of these classes as well as some of the rest of you, and I don't feel best placed to decide exactly how this should read -- Added file: http://bugs.python.org/file23049/codecs.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12808 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12831] 2to3 and integer division
Alexander Rødseth rods...@gmail.com added the comment: Even though it's hard to cover every case, it should be possible in quite a few cases: self.maxstars = 4 half = self.maxstars / 2 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12831 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12845] PEP-3118: C-contiguity with zero strides
New submission from Stefan Krah stefan-use...@bytereef.org: Numpy and PyBuffer_IsContiguous() have different ideas of C-contiguity if there is a zero in strides (this is allowed, I asked Pauli Virtanen). from numpy import * nd = ndarray(shape=[10], strides=[0]) nd.flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False from _testbuffer import ndarray as pyarray from _testbuffer import PyBUF_FULL_RO x = pyarray(nd, getbuf=PyBUF_FULL_RO) x.c_contiguous False -- assignee: skrah components: Interpreter Core messages: 143005 nosy: skrah priority: normal severity: normal status: open title: PEP-3118: C-contiguity with zero strides type: behavior versions: Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12845 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12831] 2to3 and integer division
Raymond Hettinger raymond.hettin...@gmail.com added the comment: Running python with the -3 command line option will warn about Python 3.x incompatibilities that 2to3 cannot trivially fix. -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12831 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12820] Tests for Lib/xml/dom/minicompat.py
John Chandler therealmetal...@gmail.com added the comment: Cool, thanks for the feedback! :-) I'll make the appropriate changes to the tests and add some coverage for defproperty as soon as I can. John -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12846] unicodedata.normalize turkish letter problem
New submission from Cem YILDIZ c...@fizy.com: unicodedata.normalize cannot convert turkish letter ı into i: import unicodedata s = uüfürükçü ağaç ve ıslıkçı çeşme print(shoehorn_unicode_into_ascii(s)) print unicodedata.normalize('NFKD', s).encode('ascii','ignore') ufurukcu agac ve slkc cesme but the result should be ufurukcu agac ve islikci cesme -- components: Unicode messages: 143008 nosy: fizymania priority: normal severity: normal status: open title: unicodedata.normalize turkish letter problem versions: Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12846 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12846] unicodedata.normalize turkish letter problem
Changes by Cem YILDIZ c...@fizy.com: -- type: - behavior ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12846 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12846] unicodedata.normalize turkish letter problem
Cem YILDIZ c...@fizy.com added the comment: unicodedata.normalize cannot convert turkish letter ı into i: import unicodedata s = uüfürükçü ağaç ve ıslıkçı çeşme print unicodedata.normalize('NFKD', s).encode('ascii','ignore') ufurukcu agac ve slkc cesme but the result should be ufurukcu agac ve islikci cesme -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12846 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9302] distutils API Reference: setup() and Extension parameters' description not correct.
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 96f0ccb9716d by Éric Araujo in branch '3.2': Fix type information in distutils API reference (#9302). http://hg.python.org/cpython/rev/96f0ccb9716d New changeset a410b857efe3 by Éric Araujo in branch 'default': Merge from 3.2 (#9302 fix and other changes) http://hg.python.org/cpython/rev/a410b857efe3 New changeset 59b3f845f7a3 by Éric Araujo in branch 'default': Synchronize packaging docs with distutils’ (includes fix for #9302) http://hg.python.org/cpython/rev/59b3f845f7a3 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9302 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9302] distutils API Reference: setup() and Extension parameters' description not correct.
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 78b26e7720c0 by Éric Araujo in branch '2.7': Fix type information in distutils API reference (#9302). http://hg.python.org/cpython/rev/78b26e7720c0 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9302 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12678] test_packaging and test_distutils failures under Windows
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 8ad1670c0f1f by Éric Araujo in branch '2.7': Try to fix test_distutils on Windows (#12678) http://hg.python.org/cpython/rev/8ad1670c0f1f -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12678 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11360] In documentation of getopt, advertise argparse instead of optparse
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 40f7a6e71930 by Éric Araujo in branch '3.2': Remove outdated pointer to optparse (fixes #11360). http://hg.python.org/cpython/rev/40f7a6e71930 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11360 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11360] In documentation of getopt, advertise argparse instead of optparse
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 6d3c645fa52f by Éric Araujo in branch '2.7': Remove outdated pointer to optparse (fixes #11360). http://hg.python.org/cpython/rev/6d3c645fa52f -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11360 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12833] raw_input misbehaves when readline is imported
Éric Araujo mer...@netwok.org added the comment: Maybe you need to call sys.stdin.flush() before raw_input? In any way, 2.6 is in security mode, so we need to reproduce this with current versions: 2.7, 3.2 or 3.3. -- components: +IO, Interpreter Core -Library (Lib) nosy: +eric.araujo, pitrou stage: - test needed versions: -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12833 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12842] Docs: first parameter of tp_richcompare() always has the correct type
Changes by Éric Araujo mer...@netwok.org: -- keywords: +needs review stage: - patch review versions: -Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12842 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9302] distutils API Reference: setup() and Extension parameters' description not correct.
Éric Araujo mer...@netwok.org added the comment: Improved and committed, thanks again! -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9302 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12759] (?P=) input for Tools/scripts/redemo.py raises unnhandled exception
Éric Araujo mer...@netwok.org added the comment: I can reproduce in 3.3 (the file has been moved to Tools/demo/redemo.py). The Tk application does not crash but there is a traceback. Would you like to work on a patch? If so, there are good guidelines in the devguide. -- keywords: +easy nosy: +eric.araujo stage: - needs patch title: (?P=) input for Tools/scripts/redemo.py throw an exception - (?P=) input for Tools/scripts/redemo.py raises unnhandled exception versions: +Python 2.7, Python 3.2, Python 3.3 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12759 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12806] argparse: Hybrid help text formatter
Éric Araujo mer...@netwok.org added the comment: Steven: What do you think? GraylinKim: You can open a feature request for message preview on the metatracker (see “Report Tracker Problem” in the sidebar). -- nosy: +bethard, eric.araujo type: - feature request versions: +Python 3.3 -Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12806 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12768] docstrings for the threading module
Éric Araujo mer...@netwok.org added the comment: I have made a review on Rietveld. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12768 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12195] Little documentation of annotations
Changes by Éric Araujo mer...@netwok.org: -- nosy: +eric.araujo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12195 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12742] Add support for CESU-8 encoding
Ezio Melotti ezio.melo...@gmail.com added the comment: Can you provide some example? The page you linked says It should be used exclusively for internal processing and never for external data exchange., so I'm not sure why these APIs would want to use it. -- nosy: +ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12742 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12195] Little documentation of annotations
Raymond Hettinger raymond.hettin...@gmail.com added the comment: some simple examples showing the syntax would go a long way. Sorry, there as just too many ways to go and we are intentionally not stating which way is preferred. I've seen many variants a:[Integral] for a list of integers, a:(int,str) for a 2-tuple of an int and a string, a:(str,file,None) for something that is a string or a file or None, a:'light_years' to indicate units of measure, a:range_check(10.5, 20.1) for range validation, and some variants for converters, adapters, factory functions, documentation aids, etc. If you want to advance the state of the art, perhaps write a blog post on what you consider to be a best practice. If a consensus emerges, we will follow. -- resolution: - rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12195 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12768] docstrings for the threading module
Eli Bendersky eli...@gmail.com added the comment: Éric, yeah I received an email. Hopefully Graeme did too. It's a shame a new review isn't notified in the tracker instead. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12768 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Antoine Pitrou pit...@free.fr added the comment: Brian, Tim, I'd feel more comfortable if any of you confirmed this isn't a stupid proposal on my part :) -- components: +Interpreter Core stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Brian Curtin br...@python.org added the comment: I could see how they'd use EINVAL, but to me ENOTDIR makes more sense here. However, I'm not sure if anyone is depending on this (or what they could depend on it for). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12833] raw_input misbehaves when readline is imported
Idan Kamara idank...@gmail.com added the comment: Reproduced on 2.7. (flushing stdin/out doesn't help) -- versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12833 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Antoine Pitrou pit...@free.fr added the comment: I could see how they'd use EINVAL, but to me ENOTDIR makes more sense here. However, I'm not sure if anyone is depending on this (or what they could depend on it for). Right now I'm not sure, but if PEP 3151 is accepted it will make much more sense to get a NotADirectoryError. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Brian Curtin br...@python.org added the comment: With that PEP likely to be accepted, I say go ahead with the change for that benefit. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Tim Golden m...@timgolden.me.uk added the comment: Obviously someone's code would break if it were relying on the Unix errno only in a Windows-only situation to determine the situation of opening a directory which isn't one. But that combination of events doesn't seem terribly likely. Speaking for myself, since the exception is a WindowsError with the winerror attribute prominent, [Error 267] I'd be far more likely to be trapping that. I say go ahead -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Note that this file is not written by hand. It's generated by PC/generrmap.c, which uses the _dosmaperr() function provided by the msvcrt. If we want to modify it, this should be clearly marked somewhere. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: If you have a copy of Visual Studio, you can see the code of _dosmaperr() in VC/crt/src/dosmap.c. Otherwise the Google query inurl:dosmap.c returns some online copies of this file. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9262] IDLE: Use tabbed shell and edit windows
Roger Serwy roger.se...@gmail.com added the comment: Attached is an extension which provides tabbed windows for IDLE. It supports drag-and-drop reordering and separate windows. The implementation relies on monkey-patching a few subroutines and duck-typing for the toplevel window. The extension emulates each tab as if it were its own toplevel object. There can be flickering when switching tabs due to swapping the toplevel menu bar. This seems to be a limitation of Tk. -- Added file: http://bugs.python.org/file23050/TabExtension.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9262 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11913] sdist should allow for README.rst
resc thomat...@gmail.com added the comment: Just wanted to note that this confuses other people too... http://stackoverflow.com/questions/4384796/readme-extension-for-python-projects Is this something that could be changed in 'distribute'? -- nosy: +Thomas.Smith ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11913 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Guido van Rossum gu...@python.org added the comment: Wow. A very educational discussion. We will be referencing this issue for many years to come. As long as the buck stops with me, I feel strongly that *today* changing indexing from O(1) to O(log N) is a bad idea, partly for technical reasons, partly because the Python culture isn't ready. In 5 or 10 years we need to revisit this, and it wouldn't hurt if in the mean time we started seriously thinking about how to change our APIs so that O(1) indexing is not relied upon so much. This may include rewriting tutorials to nudge users in the direction of using different idioms for text processing. In the meantime, I think our best option is to switch CPython to the PEP 393 string implementation. Despite its disadvantages (I understand the spoiler issue) is is generally no worse than a wide build, and there is working code today that we can optimize before 3.3 is released. For Python implementations where this is not an option (I'm thinking Jython and IronPython, both of which are closely tied to a system string type that behaves like UTF-16) I hope that at least the regular expression behavior can be fixed so that . matches a surrogate pair. (Possibly they already behave that way, if they use a native regex library.) In all cases, for future Python versions, we should tighten the codecs to reject data that the Unicode standard considers invalid (and we should offer separate non-strict codecs for situations where such invalid data needs to be processed). I wish we could fix the codecs and the regex . issue on narrow builds for Python versions before 3.3 (esp. 3.2 and 2.7), but I fear that this is considered too backwards incompatible (though for each specific fix we should consider this carefully). -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12728] Python re lib fails case insensitive matches on Unicode data
Guido van Rossum gu...@python.org added the comment: This bug could do with a little less attitude. That said, I think it is a bug and should be fixed, at the very least for Python 3.3. As always, it is a matter of much debate to what extent bugs can be fixed in previous Python versions (specifically, 2.7 and 3.2) without breaking more code than it fixes, and I don't want to jump the gun on that issue. Let's first see what it takes to fix this for 3.3. -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12728 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Guido van Rossum gu...@python.org added the comment: Sounds like a fair feature request for Python 3.3, as long as the intention is that users must import some module from the standard library and use functions defined in that module. The operations and methods defined for str instances (e.g. ==, , etc.) should not change their behavior. Is there an existing 3rd party library that we could adopt (even if it isn't perfect yet)? -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Guido van Rossum gu...@python.org added the comment: I presume this applies to builtin str methods like .lower(), right? I think it is a good thing to do for Python 3.3. We'd need to define what should happen in edge cases, e.g. when (against all odds) a string happens to contain a lone surrogate or some other code point or sequence of code points that the Unicode standard considers illegal. I think it should not fail but just leave those code points alone. Does this require us to import more data files from the Unicode standard? By itself that doesn't scare me. Would this also affect .islower() and friends? -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12749] lib re cannot match non-BMP ranges (all versions, all builds)
Guido van Rossum gu...@python.org added the comment: We should at least get this fixed in 3.3. Then we can discuss the benefits of backporting the fixes to 2.7 and 3.2 (though it sounds to me like the backports will fix more than they will break, since it is pretty much impossible to do the right thing in those versions today). -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12749 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12737] str.title() is overzealous by upcasing combining marks inappropriately
Guido van Rossum gu...@python.org added the comment: Yeah, this should be fixed in 3.3 and probably backported to 3.2 and 2.7. (There is already no guarantee that len(s) == len(s.title()), right?) -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12746] normalization is affected by unicode width
Guido van Rossum gu...@python.org added the comment: Yeah, we should fix this. At least in 3.3, but (without knowing what exactly is involved) I think backporting to 2.7 and 3.2 makes sense too. -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12746 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Guido van Rossum gu...@python.org added the comment: Really? The re module cannot be salvaged and we should add regex but keep the (buggy) re? That does not make a lot of sense to me. I think it should just be fixed in the re module. Or the re module should be *replaced* by the code from the regex module (but renamed to re, and with certain backwards compatibilities restored, probably). But I really hope the re module (really: the _sre extension module) can be fixed. We should also make a habit in our docs of citing specific versions of the Unicode standard, and specific TR numbers and versions where they apply. (And hopefully we can supply URLs to the Unicode consortium's canonical copies of those documents.) -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12731 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12733] Request for grapheme support in Python re lib
Guido van Rossum gu...@python.org added the comment: Again, I would be disappointed if the re (_sre) module could not be fixed. It is a reasonable feature request. -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12733 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12734] Request for property support in Python re lib
Guido van Rossum gu...@python.org added the comment: +1 on adding the feature to 3.3 in whichever way makes sense. -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12734 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12753] \N{...} neglects formal aliases and named sequences from Unicode charnames namespace
Guido van Rossum gu...@python.org added the comment: +1 on the feature request. -- nosy: +gvanrossum ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12753 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Tom Christiansen tchr...@perl.com added the comment: Sounds like a fair feature request for Python 3.3, as long as the intention is that users must import some module from the standard library and use functions defined in that module. The operations and methods defined for str instances (e.g. ==, , etc.) should not change their behavior. Is there an existing 3rd party library that we could adopt (even if it isn't perfect yet)? I *think* you could use ICU's. I'm pretty sure the Parrot people use ICU libraries. --tom -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Guido van Rossum gu...@python.org added the comment: I know I sound like NIH, but I'm always reluctant to add a big 3rd party lib like ICU to the permanent dependencies of all future Python distros. If people want to use ICU they already can. OTOH I don't have a better idea. :-( -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12737] str.title() is overzealous by upcasing combining marks inappropriately
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:16:57 -: Yeah, this should be fixed in 3.3 and probably backported to 3.2 and 2.7. (There is already no guarantee that len(s) == len(s.title()), right?) Well, *I* don't know of any such guarantee, but I don't know Python very well. In general, Unicode makes very few guarantees about casing. Under full casemapping, which is the only way to do the silly Turkish stuff amongst quite a bit else, any of the three casemappings can change the length of the string. Other things you can't rely on are round tripping and single paths. By roundtripping, just look at the two lowercase sigmas and think about how you can't get back to one of them if you uppercase them both. By single paths, I mean that code that does some sort of conversion where it first lowercases everything and then titlecases the first letter can produce something different from titlecasing just the original first letter and then lowercasing the rest of them. That's because tc(x) and tc(lc(x)) can be different. --tom -- title: str.title() is overzealous by upcasing combining marks inappropriately - str.title() is overzealous by upcasing combining marks inappropriately ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12737 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Raymond Hettinger raymond.hettin...@gmail.com added the comment: I would like to be involved in the design of the API for a UCA module and its routines for loading Unicode Collation Element Tables (not making the mistake of using global state like the locale module does). -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Tom Christiansen tchr...@perl.com added the comment: Raymond Hettinger raymond.hettin...@gmail.com added the comment: I would like to be involved in the design of the API for a UCA module and its routines for loading Unicode Collation Element Tables (not making the mistake of using global state like the locale module does). Is this the problem where a locale is global to a process (or thread)? The way I'm used to using the UCA module in Perl, that's never a problem, because it's completely object-oriented. There's no global state. You instantiate a collator object with all the state it needs, like collation_level upper_before_lower backwards_levels normalization override_CJK override_Hangul katakana_before_hiragana variable locale preprocess And then you use that object for all your collation needs, including not just sorting but also string comparison and even searches. For example, you could instantiate a first collator object with its level set to one, meaning just compare base alphanumerics not diacritics or case or nonletters, and a second with the defaults so that it uses all four levels or a different normalization. I have on occasion had more than one collator object around at once each with its own locale, like if I want to compare different locales' comparisons. --tom -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Tom Christiansen tchr...@perl.com added the comment: I should probably mention the importance in the design of a UCA module of being able to specify which UCA version number you want it to behave like in case you plan to override some of the DUCET entries. That way if you run under a later UCA with different DUCET weights, your own tailorings will still make sense. If you don't do this, your collation tailorings can break in a new release of the UCA. --tom -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12735] request full Unicode collation support in std python library
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:55:03 -: I know I sound like NIH, but I'm always reluctant to add a big 3rd party lib like ICU to the permanent dependencies of all future Python distros. If people want to use ICU they already can. OTOH I don't have a better idea. :-( I know exactly what you mean. I would not want to push that on anyone, being dependent on a gigantic 3rd-party module. I just tried to answer the question. The only two full UCA implementations I know of are ICU's and Perl's, which does not use ICU (since we're UTF-8, etc). I just wish Python had Unicode collation, is all. --tom PS: (I haven't had good luck the ICU bindings in 3.2.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12735 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:11:24 -: Guido van Rossum gu...@python.org added the comment: I presume this applies to builtin str methods like .lower(), right? I think it is a good thing to do for Python 3.3. Yes, the full casemaps are for upper, title, and lowercase. There is also a full casefold and turkic case fold (which is full), but you don't have a casefold function so I guess that doesn't matter. We'd need to define what should happen in edge cases, e.g. when (against all odds) a string happens to contain a lone surrogate or some other code point or sequence of code points that the Unicode standard considers illegal. I think it should not fail but just leave those code points alone. Well, it's a funny thing. There are properties given for all Unicode code points, even noncharacter code points. This includes the casing properties, oddly enough. From UnicodeData.txt, which has a few surrogate entries; notice no casing is given: D800;Non Private Use High Surrogate, First;Cs;0;L;N; DB7F;Non Private Use High Surrogate, Last;Cs;0;L;N; DB80;Private Use High Surrogate, First;Cs;0;L;N; DBFF;Private Use High Surrogate, Last;Cs;0;L;N; DC00;Low Surrogate, First;Cs;0;L;N; DFFF;Low Surrogate, Last;Cs;0;L;N; And in SpecialCasing.txt, which does not have surrogates but does have a default clause: # This file is a supplement to the UnicodeData file. # It contains additional information about the casing of Unicode characters. # (For compatibility, the UnicodeData.txt file only contains case mappings for # characters where they are 1-1, and independent of context and language. # For more information, see the discussion of Case Mappings in the Unicode Standard. # # All code points not listed in this file that do not have a simple case mappings # in UnicodeData.txt map to themselves. And in CaseFolding.txt, which also does not have surrogates but again does have a default clause: # The data supports both implementations that require simple case foldings # (where string lengths don't change), and implementations that allow full case folding # (where string lengths may grow). Note that where they can be supported, the # full case foldings are superior: for example, they allow MASSE and Maße to match. # # All code points not listed in this file map to themselves. Taken all together, it follows that the surrogates have case{map,fold}s back to themselves, since they have no case{map,fold}s listed. It's ok to have arbitrary code points in memory, including surrogates and the 66 noncharacters. It just isn't legal to have them in a UTF stream for open interchange, whatever that means. Does this require us to import more data files from the Unicode standard? By itself that doesn't scare me. One way or the other, yes, notably the SpecialCasing file for casemapping and the CaseFolding file for casefolding (which you should do anyway to fix re.I). But you can and should process the new files into some tighter format optimized for your own lookups. Oddly, Java doesn't provide for String methods that do full casing on titlecase, even those they do do so on lowercase and uppercase. On titlecase they only expose the simple casemaps via the Character class, which are the ones from UnicodeData. They recognize that this is flaw, but it was too late to fix it for JAva 7. Would this also affect .islower() and friends? Well, it shouldn't, but .islower() and friends are already mistaken. They seem to be checking for GC=Ll and such, but they need to be checking the Unicode binary property Lowercase and such. Watch: test 37 for string Ⅷ wanted ⅷ to be lowercase of Ⅷ but python disagrees wanted Ⅷ to be titlecase of Ⅷ but python disagrees wanted Ⅷ to be uppercase of Ⅷ but python disagrees test 37 failed 3 subtests test 39 for string Ⓚ wanted ⓚ to be lowercase of Ⓚ but python disagrees wanted Ⓚ to be titlecase of Ⓚ but python disagrees wanted Ⓚ to be uppercase of Ⓚ but python disagrees test 39 failed 3 subtests That's because the Roman numerals are GC=Nl but still have case and change case. Similarly for the circled letters which are GC=So but have case and change case. Plus there's U+0345, the iota subscript, which is GC=Mn but has case and changes case. I don't remember whether I've sent in my full test suite or not. If I haven't yet, I should attach it to the bug report. --tom -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Tom Christiansen tchr...@perl.com added the comment: Here’s my casing test suite; I thought I sent it in but the mux file here isn’t the full thing. It does several things, including letting you run it with regex vs re. It also checks for the islower, etc functions. It has both simple and full (and turkic) maps and folds in it, but is configured to only check the simple versions for now. The islower and isupper etc functions seem to be checking the wrong Unicode property. Yes, it has my quaint Unixisms in it, because it needs to run with UTF-8 output, or you can't read what's going on. -- Added file: http://bugs.python.org/file23051/casing-tests.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Vlad Riscutia riscutiav...@gmail.com added the comment: I wasn't aware this is an auto-generated file. I can add a comment but looking at it, it seems we auto-generate this file just to save a call to _dosmaperr. I would refactor the whole function to call _dosmaperr first then if result is still EINVAL, tweak with custom switch case. The way I see it, this looks like premature optimization since OS error shouldn't be on a hot code path, meaning an application should be able to live with an extra CRT function call on such exceptions. I'm willing to implement this if there are no objections. Something like: errno = _dosmaperr(err) if (EINVAL == errno) { switch (err) { // Our tweaks } } -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Terry J. Reedy tjre...@udel.edu added the comment: My proposal is better than log(N) in 2 respects. 1) There need only be a time penalty when there are non-BMP chars and indexing currently gives the 'wrong' answer and therefore when a time-penalty should be acceptable. Lookup for normal all-BMP strings could remain the same. 2) The penalty is log(K), where K in the number of non-BMP chars. In theory, O(logK) is as 'bad' as O(logN), for any fixed ratio K/N. In practice, the difference should be noticeable when there are just a few (say .01%) extended-range chars. I am aware that this is an idea for the future, not now. --- Fixing string iteration on narrow builds to produce code points the same as with wide builds is easy and costs O(1) per code point (character), which is the same as the current cost. Then from unicodedata import name name('\U0001043c') 'DESERET SMALL LETTER DEE' for c in 'a\U0001043c': name(c) 'LATIN SMALL LETTER A' Traceback (most recent call last): File pyshell#3, line 1, in module for c in 'a\U0001043c': name(c) ValueError: no such name would work like it does on wide builds instead of failing. I admit that it would be strange to have default iteration produce different items than default indexing (and indeed, str currently iterates by sequential indexing). But keeping them in sync means that buggy iteration is another cost of O(1) indexing. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Guido van Rossum gu...@python.org added the comment: To me, making (default) iteration deviate from indexing is anathema. However, there is nothing wrong with providing a library function that takes a string and returns an iterator that iterates over code points, joining surrogate pairs as needed. You could even have one that iterates over characters (I think Tom calls them graphemes), if that is well-defined and useful. -- title: Python lib re cannot handle Unicode properly due to narrow/wide bug - Python lib re cannot handle Unicode properly due to narrow/wide bug ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Terry J. Reedy tjre...@udel.edu added the comment: PEP-393 will take care of iterating by code points. Where would you have other iterators go? The string module? Something else I have not thought of? Or something new? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com