[issue12768] docstrings for the threading module
Graeme Cross gjcr...@gmail.com added the comment: I will check that the patch works with 3.2; if not, I'll redo the patch for 3.2. I will also incorporate the review changes from Ezio and Eric. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12768 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12833] raw_input misbehaves when readline is imported
Nadeem Vawda nadeem.va...@gmail.com added the comment: Reproduced on 3.3 head. Looking at the documentation of the C readline library, it needs to know the length of the prompt in order to display properly, so this seems to be an acknowledged limitation of the underlying library rather than a bug on our side. Still, this behavior is surprising and undesirable. I would suggest adding a note to the docs for the readline module, directing users to write: input(foo ) instead of: sys.stdout.write(foo ) input() -- nosy: +nadeem.vawda ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12833 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12833] raw_input misbehaves when readline is imported
Idan Kamara idank...@gmail.com added the comment: You're right, as this little C program verifies: #include stdio.h #include stdlib.h #include readline/readline.h int main() { printf(foo ); char* buf = readline(); free(buf); return 0; } Passing ' ' seems to be a suitable workaround for those who can't pass the text directly to raw_input though (such is the case where you have special classes who handle output). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12833 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Amaury Forgeot d'Arc amaur...@gmail.com added the comment: Unfortunately, it won't work. _dosmaperr() is not exported by msvcrt.dll, it is only available when you link against the static version of the C runtime. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Sat, 27 Aug 2011 03:26:21 -: To me, making (default) iteration deviate from indexing is anathema. So long is there's a way to interate through a string some other way that by code unit, that's fine. However, the Java way of 16-bit code units is so annoying because there often aren't code point APIs, and so you get a lot of niggling errors creeping in. This is part of why I strongly prefer wide builds, so that code point and code unit are the same thing again. However, there is nothing wrong with providing a library function that takes a string and returns an iterator that iterates over code points, joining surrogate pairs as needed. You could even have one that iterates over characters (I think Tom calls them graphemes), if that is well-defined and useful. Character can sometimes be a confusing term when it means something different to us programmers as it does to users. Code point to mean the integer is a lot clearer to us but to no one else. At work I often just give in and go along with the crowd and say character for the number that sits in a char or wchar_t or Character variable, even though of course that's a code point. I only rebel when they start calling code units characters, which (inexperienced) Java people tend to do, because that leads to surrogate splitting and related errors. By grapheme I mean something the user perceives as a single character. In full Unicodese, this is an extended grapheme cluster. These are code point sequences that start with a grapheme base and have zero or more grapheme extenders following it. For our purposes, that's *mostly* like saying you have a non-Mark followed by any number of Mark code points, the main excepting being that a CR followed by a LF also counts as a single grapheme in Unicode. If you are in an editor and wanted to swap two characters, the one under the user's cursor and the one next to it, you have to deal with graphemes not individual code points, or else you'd get the wrong answer. Imagine swapping the last two characters of the first string below, or the first two characters of second one: contrôléecontro\x{302}le\x{301}e élèvee\x{301}le\x{300}ve While you can sometimes fake a correct answer by considering things in NFC not NFD, that's doesn't work in the general case, as there are only a few compatibility glyphs for round-tripping for legacy encodings (like ISO 8859-1) compared with infinitely many combinations of combining marks. Particularly in mathematics and in phonetics, you often end up using marks on characters for which no pre-combined variant glyph exists. Here's the IPA for a couple of Spanish words with their tight (phonetic, not phonemic) transcriptions: anécdota[a̠ˈne̞ɣ̞ð̞o̞t̪a̠] rincón [rĩŋˈkõ̞n] NFD: ane\x{301}cdota [a\x{320}\x{2C8}ne\x{31E}\x{263}\x{31E}\x{F0}\x{31E}o\x{31E}t\x{32A}a\x{320}] rinco\x{301}n [ri\x{303}\x{14B}\x{2C8}ko\x{31E}\x{303}n] NFD: an\x{E9}cdota [a\x{320}\x{2C8}ne\x{31E}\x{263}\x{31E}\x{F0}\x{31E}o\x{31E}t\x{32A}a\x{320}] rinc\x{F3}n [r\x{129}\x{14B}\x{2C8}k\x{F5}\x{31E}n] So combining marks don't just go away in NFC, and you really do have to deal with them. Notice that to get the tabs right (your favorite subject :), you have to deal with print widths, which is another place that you get into trouble if you only count code points. BTW, did you know that the stress mark used in the phonetics above is actually a (modifier) letter in Unicode, not punctuation? # uniprops -a 2c8 U+02C8 ‹ˈ› \N{MODIFIER LETTER VERTICAL LINE} \w \pL \p{L_} \p{Lm} All Any Alnum Alpha Alphabetic Assigned InSpacingModifierLetters Case_Ignorable CI Common Zyyy Dia Diacritic L Lm Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC ID_Start IDS Letter L_ Modifier_Letter Print Spacing_Modifier_Letters Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word Age=1.1 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON Block=Spacing_Modifier_Letters Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Script=Common Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=BB Line_Break=Break_Before LB=BB Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Zyyy
[issue12847] crash with negative PUT in pickle
New submission from Antoine Pitrou pit...@free.fr: This doesn't happen on 2.x cPickle, where PUT keys are simply treated as strings. import pickle, pickletools s = b'Va\np-1\n.' pickletools.dis(s) 0: VUNICODE'a' 3: pPUT-1 7: .STOP highest protocol among opcodes = 0 pickle.loads(s) Erreur de segmentation -- messages: 143062 nosy: pitrou priority: normal severity: normal status: open title: crash with negative PUT in pickle type: crash versions: Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12847] crash with negative PUT in pickle
Antoine Pitrou pit...@free.fr added the comment: Same with LONG_BINPUT on a 32-bit build: s = b'\x80\x03X\x01\x00\x00\x00ar\xff\xff\xff\xff.' pickletools.dis(s) 0: \x80 PROTO 3 2: XBINUNICODE 'a' 8: rLONG_BINPUT -1 13: .STOP highest protocol among opcodes = 2 pickle.loads(s) Erreur de segmentation -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12847 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11564] pickle not 64-bit ready
Antoine Pitrou pit...@free.fr added the comment: Here is a new patch against 3.2. I can't say it works for sure, but it should be much better. It also adds a couple more tests. There seems to be a separate issue where pure-Python pickle.py considers 32-bit lengths signed where the C impl considers them unsigned... -- Added file: http://bugs.python.org/file23052/pickle64-4.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11564 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
New submission from Antoine Pitrou pit...@free.fr: In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's i typecode which means signed... and therefore fails reading the data. Apparently, pickle.py uses marshal for speed reasons, but marshal doesn't support unsigned types. (seen from http://bugs.python.org/issue11564) -- components: Library (Lib) messages: 143065 nosy: alexandre.vassalotti, pitrou priority: normal severity: normal status: open title: pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned type: behavior versions: Python 3.2, Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12848 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset b06f011a3529 by Nick Coghlan in branch 'default': Fix #12835: prevent use of the unencrypted sendmsg/recvmsg APIs on SSL wrapped sockets (Patch by David Watson) http://hg.python.org/cpython/rev/b06f011a3529 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12835 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake
Changes by Nick Coghlan ncogh...@gmail.com: -- resolution: - fixed stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12835 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9923] mailcap module may not work on non-POSIX platforms if MAILCAPS env variable is set
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 7b83d2c1aad9 by Nick Coghlan in branch 'default': Fix #9923: mailcap now uses the OS path separator for the MAILCAP envvar. Not backported, since it could break cases where people worked around the old POSIX-specific behaviour on non-POSIX platforms. http://hg.python.org/cpython/rev/7b83d2c1aad9 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12174] Multiprocessing logging levels unclear
Vinay Sajip vinay_sa...@yahoo.co.uk added the comment: Although the reference docs don't list the numeric values of logging levels, this happened during reorganising of the docs. The table has moved to the HOWTO: http://docs.python.org/howto/logging.html#logging-levels That said, I don't understand the need for special logging levels in the multiprocessing package. From the section following the one linked to above: Defining your own levels is possible, but should not be necessary, as the existing levels have been chosen on the basis of practical experience. However, if you are convinced that you need custom levels, great care should be exercised when doing this, and it is possibly *a very bad idea to define custom levels if you are developing a library*. That’s because if multiple library authors all define their own custom levels, there is a chance that the logging output from such multiple libraries used together will be difficult for the using developer to control and/or interpret, because a given numeric value might mean different things for different libraries. -- nosy: +vinay.sajip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12174 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9923] mailcap module may not work on non-POSIX platforms if MAILCAPS env variable is set
Nick Coghlan ncogh...@gmail.com added the comment: As noted in the commit message, I didn't backport this, since it didn't seem worth risking breaking even the unlikely case that someone actually *was* using the MAILCAP environment variable on Windows. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: -Python 2.7, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9923 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Vlad Riscutia riscutiav...@gmail.com added the comment: Oh, got it. Interesting. Then should I just add a comment somewhere or should we resolve this as Won't Fix? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Antoine Pitrou pit...@free.fr added the comment: We could add a special case to generrmap.c (but how can I compile and execute this file? it doesn't seem to be part of the project files). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Fri, 26 Aug 2011 21:11:24 -: Would this also affect .islower() and friends? SHORT VERSION: (7 lines) I don't believe so, but the relationship between lower() and islower() is not as clear to me as I would have thought, and more importantly, the code and the documentation for Python's islower() etc currently seem to disagree. For future releases, I recommend fixing the code, but if compatibility is an issue, then perhaps for previous releases still in maintenance mode fixing only the documentation would possibly be good enough--your call. === MEDIUM VERSION: (87 lines) I was initially confused with Python's islower() family because of the way they are defined to operate on full strings. They don't check that everything is lowercase even though they say they do. http://docs.python.org/py3k/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range str.lower() Return a copy of the string with all the cased characters [4] converted to lowercase. str.islower() Return true if all cased characters [4] in the string are lowercase and there is at least one cased character, false otherwise. [4] (1, 2, 3, 4) Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase). This is strange in several ways. Of lesser importance is that strings can be considered lowercase even if they don't match ^\p{lowercase}+$ Another is that the result of calling str.lower() may not be .islower(). I'm not sure what these are particularly for, since I myself would just use a regex to get finer-grained control. (I suppose that's because re doesn't give access to the Unicode properties needed that this approach never gained any traction in the Python community.) However, the worst of this is that the documentation defines both cased characters and lowercase characters *differently* from how Unicode does defines those very same terms. This was quite confusing. Unicode distinguishes Cased code points from Cased_*Letter* code points. Python is using the Cased_Letter property but calling it Cased. Cased in a proper superset of Cased_Letter. From the DerivedCoreProperties file in the Unicode Character Database: # Derived Property: Cased (Cased) # As defined by Unicode Standard Definition D120 # C has the Lowercase or Uppercase property or has a General_Category value of Titlecase_Letter. In the same way, the Lowercase and Uppercase properties are not the same as the Lowercase_*Letter* and Uppercase_*Letter* properties. Rather, the former are respectively proper supersets of the latter. # Derived Property: Lowercase # Generated from: Ll + Other_Lowercase [...] # Derived Property: Uppercase # Generated from: Lu + Other_Uppercase In all these, you almost always want the superset versions not the restricted subset versions you are using. If it were in the regex engine, the user could select either. Java used to miss all these, too. But in 1.7, they updated their character methods to use the properties that they'd all along said they were using: http://download.oracle.com/javase/7/docs/api/java/lang/Character.html#isLowerCase(char) public static boolean isLowerCase(char ch) Determines if the specified character is a lowercase character. A character is lowercase if its general category type, provided by Character.getType(ch), is LOWERCASE_LETTER, or it has contributory - property Other_Lowercase as defined by the Unicode Standard. Note: This method cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the isLowerCase(int) method. (And yes, that's where Java uses character to mean code unit not code point, alas. No wonder people get confused) I'm pretty sure that Python needs to either update its documentation to match its code, update its code to match its documentation, or both. Java chose to update the code to match the documentation, and this is the course I would recommend if at all possible. If you say you are checking for cased code points, then you should use the Unicode definition of cased code points not your own, and if you say you are checking for lowercase code points, then you should use the Unicode definition not your own. Both of these require access to contributory properties from the UCD and not just general categories alone. --tom === LONG VERSION: (222 lines) Essential tools I use for inspecting Unicode code points and their properties include
[issue10015] Creating a multiprocess.pool.ThreadPool from a child thread blows up.
Changes by Vinay Sajip vinay_sa...@yahoo.co.uk: -- title: Creating a multiproccess.pool.ThreadPool from a child thread blows up. - Creating a multiprocess.pool.ThreadPool from a child thread blows up. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10015 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Antoine Pitrou pit...@free.fr added the comment: Ok, apparently I can use errmap.mak, except that I get the following error: Z:\default\PCnmake errmap.mak Microsoft (R) Program Maintenance Utility Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. cl generrmap.c Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64 Copyright (C) Microsoft Corporation. All rights reserved. generrmap.c generrmap.c(1) : fatal error C1034: stdio.h: no include path set NMAKE : fatal error U1077: 'C:\Program Files (x86)\Microsoft Visual Studio 9.0\ VC\bin\amd64\cl.EXE' : return code '0x2' Stop. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Antoine Pitrou pit...@free.fr added the comment: Ok, running vcvarsamd64.bat seems to do the trick. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8426] multiprocessing.Queue fails to get() very large objects
Vinay Sajip vinay_sa...@yahoo.co.uk added the comment: I think it's just a documentation issue. The problem with documenting limits is that they are system-specific and, even if the current limits that Charles-François has mentioned are documented, these could become outdated. Perhaps a suggestion could be added to the documentation: Avoid sending very large amounts of data via queues, as you could come up against system-dependent limits according to the operating system and whether pipes or sockets are used. You could consider an alternative strategy, such as writing large data blocks to temporary files and sending just the temporary file names via queues, relying on the consumer to delete the temporary files after processing. -- nosy: +vinay.sajip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8426 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11990] redirected output - stdout writes newline as \n in windows
Vinay Sajip vinay_sa...@yahoo.co.uk added the comment: So is this now just a documentation issue, about the changed behaviour of pipes in 3.2? -- nosy: +vinay.sajip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11990 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8296] multiprocessing.Pool hangs when issuing KeyboardInterrupt
Vinay Sajip vinay_sa...@yahoo.co.uk added the comment: Closing, as Andrey Vlasovskikh has agreed that this is a duplicate of #9205. -- nosy: +vinay.sajip resolution: - duplicate status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8296 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Antoine Pitrou pit...@free.fr added the comment: Here is a new patch. -- Added file: http://bugs.python.org/file23053/winenotdir.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Vlad Riscutia riscutiav...@gmail.com added the comment: Attached updated patch which extends generrmap.c to allow for easy addition of other error mappings. Also regenerated errmap.h and unittest. -- Added file: http://bugs.python.org/file23054/issue12802_2.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8296] multiprocessing.Pool hangs when issuing KeyboardInterrupt
Antoine Pitrou pit...@free.fr added the comment: Note that #9205 fixed concurrent.futures, but not multiprocessing.Pool which is a different kettle of fish. -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8296 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8426] multiprocessing.Queue fails to get() very large objects
Charles-François Natali neolo...@free.fr added the comment: Avoid sending very large amounts of data via queues, as you could come up against system-dependent limits according to the operating system and whether pipes or sockets are used. You could consider an alternative strategy, such as writing large data blocks to temporary files and sending just the temporary file names via queues, relying on the consumer to delete the temporary files after processing. There's a misunderstanding here: there is absolutely no limit on the size of objects that can be put through a queue (apart from the host's memory and the 32-bit limit): the problem is really that you can't just put an arbitrary buch of data to a queue, and then join it before making sure other processes will *eventually* pop all the data from the queue. I.e., you can't do: q = Queue() for i in range(100): q.put(big obj) q.join() for i in range(1000): q.get() That's because join() will wait until the feeder thread has managed to write all the items to the underlying pipe/Unix socket, and this might hang if the underlying pipe/socket is full (which will happen after one has put around 128K without having popped any item). That's what's explained here: It's documented in http://docs.python.org/library/multiprocessing.html#multiprocessing-programming : Joining processes that use queues Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the Queue.cancel_join_thread() method of the queue to avoid this behaviour.) This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be automatically be joined. If find this wording really clear, but if someone comes up with a better - i.e. less technical - wording, go ahead. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8426 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6560] socket sendmsg(), recvmsg() methods
Nick Coghlan ncogh...@gmail.com added the comment: Putting this back to open until we decide what to do about the OS X test failures. It sounds like it could really do with some more poking and prodding to figure out whether or not it poses a potential security risk or is just a relatively cosmetic problem with the API, so I'm reluctant to just skip the failing tests at this point. -- status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6560 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Guido van Rossum gu...@python.org added the comment: Thanks you very much. We should fix the behavior in 3.3 for sure. I'm thinking that we may be able to backport the behavior fix to 2.7 and 3.2 as well, since it just makes the behavior generally better (and for most folks it won't matter anyway). I'm not sure where the somewhat odd rules for .islower() come from, I think in part from the desire to have .islower() be False but a b.islower() to be True. Intuitively, this means that .islower() means both there is at least one lower case character and there are no upper case characters, but not all characters are lowercase. I forget what we do w.r.t. titlecase, but the intuitive meaning should not change. Although personally I don't have much of an intuition for what titlecase means (and why it's important), perhaps because I'm not familiar with any language where there is a third case for some letters. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Tom Christiansen tchr...@perl.com added the comment: Guido van Rossum rep...@bugs.python.org wrote on Sat, 27 Aug 2011 16:15:33 -: Although personally I don't have much of an intuition for what titlecase means (and why it's important), perhaps because I'm not familiar with any language where there is a third case for some letters. Neither am I. Even in old-style English with ae and oe, one wrote ÆGYPT and ÆSIR all caps but Ægypt and Æsir in titlecase, not *Aegypt or *Aesir. Similarly with ŒNOLOGY / Œnology / œnology, never *Oenology. (BTW, in French you really shouldn't split up the œ into oe, nor in Old English, Old Norse, or Icelandic the æ in ae; although in contemporary English, it's usually ok to do so.) I believe that almost but not quite all the sticky situations with Unicode casing involve compatibility characters for clean round-trips with legacy encodings. Exceptions include the German sharp s (both of them now) and the two Greek lowercase sigmas. Thank goodness we don't use the long s in English anymore. What is it with s's, anyway? :) Most of the titlecase letters are in Greek, with a few in Armenian. I know no Armenian (their letters all look the same to me :), and the folks I talked to about the Greek are skeptical. The German sharp s is a red herring, because you can never have it as the first letter (although it needn't be the last, as in Rußland). That's no more possible than having the old legacy ff ligature appear at the beginning of an English world. In any event, there are only 129 total code points that are problematic in terms of their case, where by problematic I mean one or more of: --- titlecase differs from uppercase --- foldcase differs from lowercase --- any of fold/lower/title/uppercase yields more than one code point Of all these, it's the (now two!) sharp s's and the Turkic i that are the most annoying. It's really quite a lot of trouble to go through for so few code points of so little (perceived) use. But I suppose you never know what new ones they'll uncover, either. Here are those 129 case-problematicals arranged in UCA order. Some of these normilizations forms that decompose into graphemes with four code points (not shown). There are a few other oddities, like the Kelvin sign and other singletons, but these are most of the trouble. They're all in the BMP; I guess we learned our lesson. :) --tom 1: U+0345 ○ͅ COMBINING GREEK YPOGEGRAMMENI fc=ι U+3B9 lc=○ͅ U+345 tc=Ι U+399 uc=Ι U+399 2: U+1E9A ẚ LATIN SMALL LETTER A WITH RIGHT HALF RING fc=aʾ U+61.2BE lc=ẚ U+1E9A tc=Aʾ U+41.2BE uc=Aʾ U+41.2BE 3: U+01F3 dz LATIN SMALL LETTER DZ fc=dz U+1F3 lc=dz U+1F3 tc=Dz U+1F2 uc=DZ U+1F1 4: U+01F2 Dz LATIN CAPITAL LETTER D WITH SMALL LETTER Z fc=dz U+1F3 lc=dz U+1F3 tc=Dz U+1F2 uc=DZ U+1F1 5: U+01F1 DZ LATIN CAPITAL LETTER DZ fc=dz U+1F3 lc=dz U+1F3 tc=Dz U+1F2 uc=DZ U+1F1 6: U+01C6 dž LATIN SMALL LETTER DZ WITH CARON fc=dž U+1C6 lc=dž U+1C6 tc=Dž U+1C5 uc=DŽ U+1C4 7: U+01C5 Dž LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON fc=dž U+1C6 lc=dž U+1C6 tc=Dž U+1C5 uc=DŽ U+1C4 8: U+01C4 DŽ LATIN CAPITAL LETTER DZ WITH CARON fc=dž U+1C6 lc=dž U+1C6 tc=Dž U+1C5 uc=DŽ U+1C4 9: U+FB00 ff LATIN SMALL LIGATURE FF fc=ff U+66.66 lc=ff U+FB00 tc=Ff U+46.66 uc=FF U+46.46 10: U+FB03 ffi LATIN SMALL LIGATURE FFI fc=ffi U+66.66.69 lc=ffi U+FB03 tc=Ffi U+46.66.69 uc=FFI U+46.46.49 11: U+FB04 ffl LATIN SMALL LIGATURE FFL fc=ffl U+66.66.6C lc=ffl U+FB04 tc=Ffl U+46.66.6C uc=FFL U+46.46.4C 12: U+FB01 fi LATIN SMALL LIGATURE FI fc=fi U+66.69 lc=fi U+FB01 tc=Fi U+46.69 uc=FI U+46.49 13: U+FB02 fl LATIN SMALL LIGATURE FL fc=fl U+66.6C lc=fl U+FB02 tc=Fl U+46.6C uc=FL U+46.4C 14: U+1E96 ẖ LATIN SMALL LETTER H WITH LINE BELOW fc=ẖ U+68.331 lc=ẖ U+1E96 tc=H̱ U+48.331 uc=H̱ U+48.331 15: U+0130 İ LATIN CAPITAL LETTER I WITH DOT ABOVE fc=i̇ U+69.307 lc=i̇ U+69.307 tc=İ U+130 uc=İ U+130 16: U+01F0 ǰ LATIN SMALL LETTER J WITH CARON fc=ǰ U+6A.30C lc=ǰ U+1F0 tc=J̌ U+4A.30C uc=J̌ U+4A.30C 17: U+01C9 lj LATIN SMALL LETTER LJ fc=lj U+1C9 lc=lj U+1C9 tc=Lj U+1C8 uc=LJ U+1C7 18: U+01C8 Lj LATIN CAPITAL LETTER L WITH SMALL LETTER J fc=lj U+1C9 lc=lj U+1C9 tc=Lj U+1C8 uc=LJ U+1C7 19: U+01C7 LJ LATIN CAPITAL LETTER LJ fc=lj U+1C9 lc=lj U+1C9 tc=Lj U+1C8 uc=LJ U+1C7 20: U+01CC nj LATIN SMALL LETTER NJ fc=nj U+1CC lc=nj U+1CC tc=Nj U+1CB uc=NJ U+1CA 21: U+01CB Nj LATIN CAPITAL LETTER N WITH SMALL LETTER J fc=nj U+1CC lc=nj U+1CC tc=Nj U+1CB uc=NJ U+1CA 22: U+01CA NJ LATIN CAPITAL LETTER NJ fc=nj U+1CC lc=nj U+1CC tc=Nj U+1CB uc=NJ
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Matthew Barnett pyt...@mrabarnett.plus.com added the comment: There are some oddities in Unicode case-folding. Under full case-folding, both \N{LATIN CAPITAL LETTER SHARP S} and \N{LATIN SMALL LETTER SHARP S} fold to ss, which means that those codepoints match each other. However, under simple case-folding, they fold to themselves, which means that those codepoints _don't_ match each other. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Antoine Pitrou pit...@free.fr added the comment: Neither am I. Even in old-style English with ae and oe, one wrote ÆGYPT and ÆSIR all caps but Ægypt and Æsir in titlecase, not *Aegypt or *Aesir. Similarly with ŒNOLOGY / Œnology / œnology, never *Oenology. Trying to disprove you a bit: http://ecx.images-amazon.com/images/I/51G6CH9XFFL._SL500_AA300_.jpg http://ecx.images-amazon.com/images/I/51k7TmosPdL._SL500_AA300_.jpg http://ecx.images-amazon.com/images/I/518UzMeLFCL._SL500_AA300_.jpg but classical typographies seem to write either the uppercase Œ or the lowercase œ. That said, I wonder why Unicode even includes ligatures like ff. Sounds like mission creep to me (and horrible annoyances for people like us). -- nosy: +pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL
Vlad Riscutia riscutiav...@gmail.com added the comment: Ah, I see Antoine already attached a patch. I was 3 minutes late :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12802 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug
Terry J. Reedy tjre...@udel.edu added the comment: Python makes it easy to transform a sequence with a generator as long as no look-ahead is needed. utf16.UTF16.__iter__ is a typical example. Whenever a surrogate is found, grab the matching one. However, grapheme clustering does require look-ahead, which is a bit trickier. Assume s is a sanitized sequence of code points with unicode database entries. Ignoring line endings the following should work (I tested it with a toy definition of mark()): def graphemes(s): sit = iter(s) try: graph = [next(sit)] except StopIteration: graph = [] for cp in sit: if mark(cp): graph.append(cp) else: yield combine(graph) graph = [cp] yield combine(graph) I tested this with several input with def mark(cp): return cp == '.' def combine(l) return ''.join(l) Python's object orientation makes formatting easy for the user. Assume someone does the hard work of writing (once ;-) a GCString class with a .__format__ method that interprets the format mini-language for graphemes, using a generalized version of your 'simply horrible' code. The might be done by adapting str.__format__ to use the grapheme iterator above. Then users should be able to write '{:6.6}'.format(GCString(a̠ˈne̞ɣ̞ð̞o̞t̪a̠)) a̠ˈne̞ɣ̞ð̞ (Note: Thunderbird properly displays characters with the marks beneath even though FireFox does not do so above or in its display of your message.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12729 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Ezio Melotti ezio.melo...@gmail.com added the comment: FTR, with the latest Python 3.2/3.3 (narrow) I get: Total failures: 58 / 500 ( 12%) Total successes: 442 / 500 ( 88%) and with the latest Python 3.2/3.3 (wide) I get: Total failures: 52 / 500 ( 10%) Total successes: 448 / 500 ( 90%) -- Added file: http://bugs.python.org/file23055/casing-results.txt ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue12736 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com