[issue12768] docstrings for the threading module

2011-08-27 Thread Graeme Cross

Graeme Cross gjcr...@gmail.com added the comment:

I will check that the patch works with 3.2; if not, I'll redo the patch for 3.2.
I will also incorporate the review changes from Ezio and Eric.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12768
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12833] raw_input misbehaves when readline is imported

2011-08-27 Thread Nadeem Vawda

Nadeem Vawda nadeem.va...@gmail.com added the comment:

Reproduced on 3.3 head. Looking at the documentation of the C readline
library, it needs to know the length of the prompt in order to display
properly, so this seems to be an acknowledged limitation of the underlying
library rather than a bug on our side.

Still, this behavior is surprising and undesirable. I would suggest adding
a note to the docs for the readline module, directing users to write:

input(foo )

instead of:

sys.stdout.write(foo )
input()

--
nosy: +nadeem.vawda

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12833
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12833] raw_input misbehaves when readline is imported

2011-08-27 Thread Idan Kamara

Idan Kamara idank...@gmail.com added the comment:

You're right, as this little C program verifies:

#include stdio.h
#include stdlib.h
#include readline/readline.h

int main() {
   printf(foo );
   char* buf = readline();
   free(buf);

   return 0;
}

Passing ' ' seems to be a suitable workaround for those who can't pass the text 
directly to raw_input though (such is the case where you have special classes 
who handle output).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12833
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

Unfortunately, it won't work. _dosmaperr() is not exported by msvcrt.dll, it is 
only available when you link against the static version of the C runtime.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-27 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Guido van Rossum rep...@bugs.python.org wrote
   on Sat, 27 Aug 2011 03:26:21 -: 

 To me, making (default) iteration deviate from indexing is anathema.

So long is there's a way to interate through a string some other way
that by code unit, that's fine.  However, the Java way of 16-bit code
units is so annoying because there often aren't code point APIs, and 
so you get a lot of niggling errors creeping in.  This is part of why
I strongly prefer wide builds, so that code point and code unit are the
same thing again.

 However, there is nothing wrong with providing a library function that
 takes a string and returns an iterator that iterates over code points,
 joining surrogate pairs as needed. You could even have one that
 iterates over characters (I think Tom calls them graphemes), if that
 is well-defined and useful.

Character can sometimes be a confusing term when it means something
different to us programmers as it does to users.  Code point to mean the
integer is a lot clearer to us but to no one else.  At work I often just
give in and go along with the crowd and say character for the number that
sits in a char or wchar_t or Character variable, even though of course
that's a code point.  I only rebel when they start calling code units 
characters, which (inexperienced) Java people tend to do, because that
leads to surrogate splitting and related errors.

By grapheme I mean something the user perceives as a single character.  In
full Unicodese, this is an extended grapheme cluster.  These are code point
sequences that start with a grapheme base and have zero or more grapheme
extenders following it.  For our purposes, that's *mostly* like saying you
have a non-Mark followed by any number of Mark code points, the main
excepting being that a CR followed by a LF also counts as a single grapheme
in Unicode.

If you are in an editor and wanted to swap two characters, the one 
under the user's cursor and the one next to it, you have to deal with
graphemes not individual code points, or else you'd get the wrong answer.
Imagine swapping the last two characters of the first string below,
or the first two characters of second one:

contrôléecontro\x{302}le\x{301}e
élèvee\x{301}le\x{300}ve

While you can sometimes fake a correct answer by considering things
in NFC not NFD, that's doesn't work in the general case, as there
are only a few compatibility glyphs for round-tripping for legacy
encodings (like ISO 8859-1) compared with infinitely many combinations
of combining marks.  Particularly in mathematics and in phonetics, 
you often end up using marks on characters for which no pre-combined
variant glyph exists.  Here's the IPA for a couple of Spanish words
with their tight (phonetic, not phonemic) transcriptions:

anécdota[a̠ˈne̞ɣ̞ð̞o̞t̪a̠]
rincón  [rĩŋˈkõ̞n]

NFD:
ane\x{301}cdota
[a\x{320}\x{2C8}ne\x{31E}\x{263}\x{31E}\x{F0}\x{31E}o\x{31E}t\x{32A}a\x{320}]
rinco\x{301}n  [ri\x{303}\x{14B}\x{2C8}ko\x{31E}\x{303}n]

NFD:
an\x{E9}cdota
[a\x{320}\x{2C8}ne\x{31E}\x{263}\x{31E}\x{F0}\x{31E}o\x{31E}t\x{32A}a\x{320}]
rinc\x{F3}n  [r\x{129}\x{14B}\x{2C8}k\x{F5}\x{31E}n]

So combining marks don't just go away in NFC, and you really do have to
deal with them.  Notice that to get the tabs right (your favorite subject :),
you have to deal with print widths, which is another place that you get
into trouble if you only count code points.

BTW, did you know that the stress mark used in the phonetics above
is actually a (modifier) letter in Unicode, not punctuation?

# uniprops -a 2c8
U+02C8 ‹ˈ› \N{MODIFIER LETTER VERTICAL LINE}
\w \pL \p{L_} \p{Lm}
All Any Alnum Alpha Alphabetic Assigned InSpacingModifierLetters 
Case_Ignorable CI Common Zyyy Dia Diacritic L Lm Gr_Base Grapheme_Base Graph 
GrBase ID_Continue IDC ID_Start IDS Letter L_ Modifier_Letter Print 
Spacing_Modifier_Letters Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum 
X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word
Age=1.1 Bidi_Class=ON Bidi_Class=Other_Neutral BC=ON 
Block=Spacing_Modifier_Letters Canonical_Combining_Class=0 
Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR 
Script=Common Decomposition_Type=None DT=None East_Asian_Width=Neutral 
Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX 
Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA 
Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U 
Joining_Type=U Line_Break=BB Line_Break=Break_Before LB=BB Numeric_Type=None 
NT=None Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 
Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1 
Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 
Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 
Present_In=6.0 IN=6.0 SC=Zyyy 

[issue12847] crash with negative PUT in pickle

2011-08-27 Thread Antoine Pitrou

New submission from Antoine Pitrou pit...@free.fr:

This doesn't happen on 2.x cPickle, where PUT keys are simply treated as 
strings.

 import pickle, pickletools
 s = b'Va\np-1\n.'
 pickletools.dis(s)
0: VUNICODE'a'
3: pPUT-1
7: .STOP
highest protocol among opcodes = 0
 pickle.loads(s)   
Erreur de segmentation

--
messages: 143062
nosy: pitrou
priority: normal
severity: normal
status: open
title: crash with negative PUT in pickle
type: crash
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12847
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12847] crash with negative PUT in pickle

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Same with LONG_BINPUT on a 32-bit build:

 s = b'\x80\x03X\x01\x00\x00\x00ar\xff\xff\xff\xff.'
 pickletools.dis(s)
0: \x80 PROTO  3
2: XBINUNICODE 'a'
8: rLONG_BINPUT -1
   13: .STOP
highest protocol among opcodes = 2
 pickle.loads(s)
Erreur de segmentation

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12847
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11564] pickle not 64-bit ready

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Here is a new patch against 3.2. I can't say it works for sure, but it should 
be much better. It also adds a couple more tests.
There seems to be a separate issue where pure-Python pickle.py considers 32-bit 
lengths signed where the C impl considers them unsigned...

--
Added file: http://bugs.python.org/file23052/pickle64-4.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11564
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12848] pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned

2011-08-27 Thread Antoine Pitrou

New submission from Antoine Pitrou pit...@free.fr:

In several opcodes (BINBYTES, BINUNICODE... what else?), _pickle.c happily 
accepts 32-bit lengths of more than 2**31, while pickle.py uses marshal's i 
typecode which means signed... and therefore fails reading the data.
Apparently, pickle.py uses marshal for speed reasons, but marshal doesn't 
support unsigned types.

(seen from http://bugs.python.org/issue11564)

--
components: Library (Lib)
messages: 143065
nosy: alexandre.vassalotti, pitrou
priority: normal
severity: normal
status: open
title: pickle.py treats 32bit lengths as signed, but _pickle.c as unsigned
type: behavior
versions: Python 3.2, Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12848
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake

2011-08-27 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset b06f011a3529 by Nick Coghlan in branch 'default':
Fix #12835: prevent use of the unencrypted sendmsg/recvmsg APIs on SSL wrapped 
sockets (Patch by David Watson)
http://hg.python.org/cpython/rev/b06f011a3529

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12835] Missing SSLSocket.sendmsg() wrapper allows programs to send unencrypted data by mistake

2011-08-27 Thread Nick Coghlan

Changes by Nick Coghlan ncogh...@gmail.com:


--
resolution:  - fixed
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12835
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9923] mailcap module may not work on non-POSIX platforms if MAILCAPS env variable is set

2011-08-27 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 7b83d2c1aad9 by Nick Coghlan in branch 'default':
Fix #9923: mailcap now uses the OS path separator for the MAILCAP envvar. Not 
backported, since it could break cases where people worked around the old 
POSIX-specific behaviour on non-POSIX platforms.
http://hg.python.org/cpython/rev/7b83d2c1aad9

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12174] Multiprocessing logging levels unclear

2011-08-27 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

Although the reference docs don't list the numeric values of logging levels, 
this happened during reorganising of the docs. The table has moved to the HOWTO:

http://docs.python.org/howto/logging.html#logging-levels

That said, I don't understand the need for special logging levels in the 
multiprocessing package. From the section following the one linked to above:

Defining your own levels is possible, but should not be necessary, as the 
existing levels have been chosen on the basis of practical experience. However, 
if you are convinced that you need custom levels, great care should be 
exercised when doing this, and it is possibly *a very bad idea to define custom 
levels if you are developing a library*. That’s because if multiple library 
authors all define their own custom levels, there is a chance that the logging 
output from such multiple libraries used together will be difficult for the 
using developer to control and/or interpret, because a given numeric value 
might mean different things for different libraries.

--
nosy: +vinay.sajip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12174
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9923] mailcap module may not work on non-POSIX platforms if MAILCAPS env variable is set

2011-08-27 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

As noted in the commit message, I didn't backport this, since it didn't seem 
worth risking breaking even the unlikely case that someone actually *was* using 
the MAILCAP environment variable on Windows.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed
versions:  -Python 2.7, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9923
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Vlad Riscutia

Vlad Riscutia riscutiav...@gmail.com added the comment:

Oh, got it. Interesting. Then should I just add a comment somewhere or should 
we resolve this as Won't Fix?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

We could add a special case to generrmap.c (but how can I compile and execute 
this file? it doesn't seem to be part of the project files).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Guido van Rossum rep...@bugs.python.org wrote
   on Fri, 26 Aug 2011 21:11:24 -: 

 Would this also affect .islower() and friends?

SHORT VERSION:  (7 lines)

I don't believe so, but the relationship between lower() and islower()
is not as clear to me as I would have thought, and more importantly,
the code and the documentation for Python's islower() etc currently seem
to disagree.  For future releases, I recommend fixing the code, but if
compatibility is an issue, then perhaps for previous releases still in
maintenance mode fixing only the documentation would possibly be good
enough--your call.

===

MEDIUM VERSION: (87 lines)

I was initially confused with Python's islower() family because of the way
they are defined to operate on full strings.  They don't check that
everything is lowercase even though they say they do.

   
http://docs.python.org/py3k/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range

str.lower()

Return a copy of the string with all the cased characters [4]
converted to lowercase.

str.islower()

Return true if all cased characters [4] in the string are lowercase 
and there is at least one cased character, false otherwise.

[4] (1, 2, 3, 4) Cased characters are those with general category
property being one of “Lu” (Letter, uppercase), “Ll” (Letter,
lowercase), or “Lt” (Letter, titlecase).

This is strange in several ways.  Of lesser importance is that
strings can be considered lowercase even if they don't match

^\p{lowercase}+$

Another is that the result of calling str.lower() may not be .islower().
I'm not sure what these are particularly for, since I myself would just use
a regex to get finer-grained control.  (I suppose that's because re doesn't
give access to the Unicode properties needed that this approach never
gained any traction in the Python community.)

However, the worst of this is that the documentation defines both cased
characters and lowercase characters *differently* from how Unicode does
defines those very same terms.  This was quite confusing.

Unicode distinguishes Cased code points from Cased_*Letter* code points.
Python is using the Cased_Letter property but calling it Cased.  Cased in 
a proper superset of Cased_Letter.  From the DerivedCoreProperties file in
the Unicode Character Database:

# Derived Property:   Cased (Cased)
#  As defined by Unicode Standard Definition D120
#  C has the Lowercase or Uppercase property or has a General_Category 
value of Titlecase_Letter.

In the same way, the Lowercase and Uppercase properties are not the same as
the Lowercase_*Letter* and Uppercase_*Letter* properties.  Rather, the former
are respectively proper supersets of the latter.  

# Derived Property: Lowercase
#  Generated from: Ll + Other_Lowercase

[...]

# Derived Property: Uppercase
#  Generated from: Lu + Other_Uppercase

In all these, you almost always want the superset versions not the
restricted subset versions you are using.  If it were in the regex engine,
the user could select either.

Java used to miss all these, too.  But in 1.7, they updated their character
methods to use the properties that they'd all along said they were using:

   
http://download.oracle.com/javase/7/docs/api/java/lang/Character.html#isLowerCase(char)

public static boolean isLowerCase(char ch)
Determines if the specified character is a lowercase character. 

 A character is lowercase if its general category type, provided by
 Character.getType(ch), is LOWERCASE_LETTER, or it has contributory
-   property Other_Lowercase as defined by the Unicode Standard.

Note: This method cannot handle supplementary characters.  To
  support all Unicode characters, including supplementary
  characters, use the isLowerCase(int) method.

(And yes, that's where Java uses character to mean code unit 
 not code point, alas.  No wonder people get confused)

I'm pretty sure that Python needs to either update its documentation to
match its code, update its code to match its documentation, or both.  Java
chose to update the code to match the documentation, and this is the course
I would recommend if at all possible.  If you say you are checking for
cased code points, then you should use the Unicode definition of cased code
points not your own, and if you say you are checking for lowercase code
points, then you should use the Unicode definition not your own.  Both of
these require access to contributory properties from the UCD and not 
just general categories alone.

--tom

===

LONG VERSION: (222 lines)

Essential tools I use for inspecting Unicode code points and their 
properties include


[issue10015] Creating a multiprocess.pool.ThreadPool from a child thread blows up.

2011-08-27 Thread Vinay Sajip

Changes by Vinay Sajip vinay_sa...@yahoo.co.uk:


--
title: Creating a multiproccess.pool.ThreadPool from a child thread blows up. 
- Creating a multiprocess.pool.ThreadPool from a child thread blows up.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10015
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Ok, apparently I can use errmap.mak, except that I get the following error:

Z:\default\PCnmake errmap.mak

Microsoft (R) Program Maintenance Utility Version 9.00.21022.08
Copyright (C) Microsoft Corporation.  All rights reserved.

cl  generrmap.c
Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

generrmap.c
generrmap.c(1) : fatal error C1034: stdio.h: no include path set
NMAKE : fatal error U1077: 'C:\Program Files (x86)\Microsoft Visual Studio 9.0\
VC\bin\amd64\cl.EXE' : return code '0x2'
Stop.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Ok, running vcvarsamd64.bat seems to do the trick.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8426] multiprocessing.Queue fails to get() very large objects

2011-08-27 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

I think it's just a documentation issue. The problem with documenting limits is 
that they are system-specific and, even if the current limits that 
Charles-François has mentioned are documented, these could become outdated. 
Perhaps a suggestion could be added to the documentation:

Avoid sending very large amounts of data via queues, as you could come up 
against system-dependent limits according to the operating system and whether 
pipes or sockets are used. You could consider an alternative strategy, such as 
writing large data blocks to temporary files and sending just the temporary 
file names via queues, relying on the consumer to delete the temporary files 
after processing.

--
nosy: +vinay.sajip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8426
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11990] redirected output - stdout writes newline as \n in windows

2011-08-27 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

So is this now just a documentation issue, about the changed behaviour of pipes 
in 3.2?

--
nosy: +vinay.sajip

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11990
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8296] multiprocessing.Pool hangs when issuing KeyboardInterrupt

2011-08-27 Thread Vinay Sajip

Vinay Sajip vinay_sa...@yahoo.co.uk added the comment:

Closing, as Andrey Vlasovskikh has agreed that this is a duplicate of #9205.

--
nosy: +vinay.sajip
resolution:  - duplicate
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8296
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Here is a new patch.

--
Added file: http://bugs.python.org/file23053/winenotdir.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Vlad Riscutia

Vlad Riscutia riscutiav...@gmail.com added the comment:

Attached updated patch which extends generrmap.c to allow for easy addition of 
other error mappings.

Also regenerated errmap.h and unittest.

--
Added file: http://bugs.python.org/file23054/issue12802_2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8296] multiprocessing.Pool hangs when issuing KeyboardInterrupt

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Note that #9205 fixed concurrent.futures, but not multiprocessing.Pool which is 
a different kettle of fish.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8296
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8426] multiprocessing.Queue fails to get() very large objects

2011-08-27 Thread Charles-François Natali

Charles-François Natali neolo...@free.fr added the comment:

 Avoid sending very large amounts of data via queues, as you could come up 
 against system-dependent limits according to the operating system and whether 
 pipes or sockets are used. You could consider an alternative strategy, such 
 as writing large data blocks to temporary files and sending just the 
 temporary file names via queues, relying on the consumer to delete the 
 temporary files after processing.

There's a misunderstanding here: there is absolutely no limit on the
size of objects that can be put through a queue (apart from the host's
memory and the 32-bit limit): the problem is really that you can't
just put an arbitrary buch of data to a queue, and then join it before
making sure other processes will *eventually* pop all the data from
the queue.
I.e., you can't do:

q = Queue()
for i in range(100):
q.put(big obj)
q.join()

for i in range(1000):
q.get()

That's because join() will wait until the feeder thread has managed to
write all the items to the underlying pipe/Unix socket, and this might
hang if the underlying pipe/socket is full (which will happen after
one has put around 128K without having popped any item).

That's what's explained here:

It's documented in
http://docs.python.org/library/multiprocessing.html#multiprocessing-programming
:

Joining processes that use queues

Bear in mind that a process that has put items in a queue will wait
before terminating until all the buffered items are fed by the
“feeder” thread to the underlying pipe. (The child process can call
the Queue.cancel_join_thread() method of the queue to avoid this
behaviour.)

This means that whenever you use a queue you need to make sure that
all items which have been put on the queue will eventually be removed
before the process is joined. Otherwise you cannot be sure that
processes which have put items on the queue will terminate. Remember
also that non-daemonic processes will be automatically be joined.


If find this wording really clear, but if someone comes up with a
better - i.e. less technical - wording, go ahead.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8426
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6560] socket sendmsg(), recvmsg() methods

2011-08-27 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

Putting this back to open until we decide what to do about the OS X test 
failures. It sounds like it could really do with some more poking and prodding 
to figure out whether or not it poses a potential security risk or is just a 
relatively cosmetic problem with the API, so I'm reluctant to just skip the 
failing tests at this point.

--
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6560
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Guido van Rossum

Guido van Rossum gu...@python.org added the comment:

Thanks you very much. We should fix the behavior in 3.3 for sure. I'm
thinking that we may be able to backport the behavior fix to 2.7 and
3.2 as well, since it just makes the behavior generally better (and
for most folks it won't matter anyway).

I'm not sure where the somewhat odd rules for .islower() come from, I
think in part from the desire to have .islower() be False but a
b.islower() to be True. Intuitively, this means that .islower() means
both there is at least one lower case character and there are no
upper case characters, but not all characters are lowercase. I
forget what we do w.r.t. titlecase, but the intuitive meaning should
not change. Although personally I don't have much of an intuition for
what titlecase means (and why it's important), perhaps because I'm not
familiar with any language where there is a third case for some
letters.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Guido van Rossum rep...@bugs.python.org wrote
   on Sat, 27 Aug 2011 16:15:33 -: 

 Although personally I don't have much of an intuition for what
 titlecase means (and why it's important), perhaps because I'm not
 familiar with any language where there is a third case for some
 letters.

Neither am I.  Even in old-style English with ae and oe, one wrote
ÆGYPT and ÆSIR all caps but Ægypt and Æsir in titlecase, not *Aegypt or
*Aesir.  Similarly with ŒNOLOGY / Œnology / œnology, never *Oenology.

(BTW, in French you really shouldn't split up the œ into oe, 
  nor in Old English, Old Norse, or Icelandic the æ in ae;
  although in contemporary English, it's usually ok to do so.)

I believe that almost but not quite all the sticky situations with
Unicode casing involve compatibility characters for clean round-trips
with legacy encodings.  Exceptions include the German sharp s (both of 
them now) and the two Greek lowercase sigmas.  Thank goodness we don't
use the long s in English anymore.  What is it with s's, anyway? :)

Most of the titlecase letters are in Greek, with a few in Armenian.
I know no Armenian (their letters all look the same to me :), and the
folks I talked to about the Greek are skeptical.  The German sharp s is
a red herring, because you can never have it as the first letter
(although it needn't be the last, as in Rußland).  That's no more
possible than having the old legacy ff ligature appear at the beginning
of an English world.

In any event, there are only 129 total code points that are
problematic in terms of their case, where by problematic 
I mean one or more of:

   --- titlecase differs from uppercase
   --- foldcase  differs from lowercase
   --- any of fold/lower/title/uppercase yields more than one code point

Of all these, it's the (now two!) sharp s's and the Turkic i that are the most 
annoying.
It's really quite a lot of trouble to go through for so few code points of so 
little
(perceived) use.  But I suppose you never know what new ones they'll uncover, 
either.
Here are those 129 case-problematicals arranged in UCA order.  Some of these
normilizations forms that decompose into graphemes with four code points (not 
shown).
There are a few other oddities, like the Kelvin sign and other singletons, 
but these
are most of the trouble. They're all in the BMP; I guess we learned our lesson. 
:)

--tom

  1: U+0345 ○ͅ  COMBINING  GREEK YPOGEGRAMMENI
   fc=ι  U+3B9 lc=○ͅ  U+345 tc=Ι  U+399 uc=Ι  U+399 
  2: U+1E9A ẚ  LATIN SMALL LETTER A WITH RIGHT HALF RING
   fc=aʾ  U+61.2BE lc=ẚ  U+1E9A tc=Aʾ  U+41.2BE uc=Aʾ  U+41.2BE 
  3: U+01F3 dz  LATIN SMALL LETTER DZ
   fc=dz  U+1F3 lc=dz  U+1F3 tc=Dz  U+1F2 uc=DZ  U+1F1 
  4: U+01F2 Dz  LATIN CAPITAL LETTER D WITH SMALL LETTER Z
   fc=dz  U+1F3 lc=dz  U+1F3 tc=Dz  U+1F2 uc=DZ  U+1F1 
  5: U+01F1 DZ  LATIN CAPITAL LETTER DZ
   fc=dz  U+1F3 lc=dz  U+1F3 tc=Dz  U+1F2 uc=DZ  U+1F1 
  6: U+01C6 dž  LATIN SMALL LETTER DZ WITH CARON
   fc=dž  U+1C6 lc=dž  U+1C6 tc=Dž  U+1C5 uc=DŽ  U+1C4 
  7: U+01C5 Dž  LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
   fc=dž  U+1C6 lc=dž  U+1C6 tc=Dž  U+1C5 uc=DŽ  U+1C4 
  8: U+01C4 DŽ  LATIN CAPITAL LETTER DZ WITH CARON
   fc=dž  U+1C6 lc=dž  U+1C6 tc=Dž  U+1C5 uc=DŽ  U+1C4 
  9: U+FB00 ff  LATIN SMALL LIGATURE FF
   fc=ff  U+66.66 lc=ff  U+FB00 tc=Ff  U+46.66 uc=FF  U+46.46 
 10: U+FB03 ffi  LATIN SMALL LIGATURE FFI
   fc=ffi  U+66.66.69 lc=ffi  U+FB03 tc=Ffi  U+46.66.69 uc=FFI  
U+46.46.49 
 11: U+FB04 ffl  LATIN SMALL LIGATURE FFL
   fc=ffl  U+66.66.6C lc=ffl  U+FB04 tc=Ffl  U+46.66.6C uc=FFL  
U+46.46.4C 
 12: U+FB01 fi  LATIN SMALL LIGATURE FI
   fc=fi  U+66.69 lc=fi  U+FB01 tc=Fi  U+46.69 uc=FI  U+46.49 
 13: U+FB02 fl  LATIN SMALL LIGATURE FL
   fc=fl  U+66.6C lc=fl  U+FB02 tc=Fl  U+46.6C uc=FL  U+46.4C 
 14: U+1E96 ẖ  LATIN SMALL LETTER H WITH LINE BELOW
   fc=ẖ  U+68.331 lc=ẖ  U+1E96 tc=H̱  U+48.331 uc=H̱  U+48.331 
 15: U+0130 İ  LATIN CAPITAL LETTER I WITH DOT ABOVE
   fc=i̇  U+69.307 lc=i̇  U+69.307 tc=İ  U+130 uc=İ  U+130 
 16: U+01F0 ǰ  LATIN SMALL LETTER J WITH CARON
   fc=ǰ  U+6A.30C lc=ǰ  U+1F0 tc=J̌  U+4A.30C uc=J̌  U+4A.30C 
 17: U+01C9 lj  LATIN SMALL LETTER LJ
   fc=lj  U+1C9 lc=lj  U+1C9 tc=Lj  U+1C8 uc=LJ  U+1C7 
 18: U+01C8 Lj  LATIN CAPITAL LETTER L WITH SMALL LETTER J
   fc=lj  U+1C9 lc=lj  U+1C9 tc=Lj  U+1C8 uc=LJ  U+1C7 
 19: U+01C7 LJ  LATIN CAPITAL LETTER LJ
   fc=lj  U+1C9 lc=lj  U+1C9 tc=Lj  U+1C8 uc=LJ  U+1C7 
 20: U+01CC nj  LATIN SMALL LETTER NJ
   fc=nj  U+1CC lc=nj  U+1CC tc=Nj  U+1CB uc=NJ  U+1CA 
 21: U+01CB Nj  LATIN CAPITAL LETTER N WITH SMALL LETTER J
   fc=nj  U+1CC lc=nj  U+1CC tc=Nj  U+1CB uc=NJ  U+1CA 
 22: U+01CA NJ  LATIN CAPITAL LETTER NJ
   fc=nj  U+1CC lc=nj  U+1CC tc=Nj  U+1CB uc=NJ  

[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Matthew Barnett

Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

There are some oddities in Unicode case-folding.

Under full case-folding, both \N{LATIN CAPITAL LETTER SHARP S} and \N{LATIN 
SMALL LETTER SHARP S} fold to ss, which means that those codepoints match 
each other.

However, under simple case-folding, they fold to themselves, which means that 
those codepoints _don't_ match each other.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Neither am I.  Even in old-style English with ae and oe, one wrote
 ÆGYPT and ÆSIR all caps but Ægypt and Æsir in titlecase, not *Aegypt or
 *Aesir.  Similarly with ŒNOLOGY / Œnology / œnology, never *Oenology.

Trying to disprove you a bit:
http://ecx.images-amazon.com/images/I/51G6CH9XFFL._SL500_AA300_.jpg
http://ecx.images-amazon.com/images/I/51k7TmosPdL._SL500_AA300_.jpg
http://ecx.images-amazon.com/images/I/518UzMeLFCL._SL500_AA300_.jpg

but classical typographies seem to write either the uppercase Πor the 
lowercase œ.

That said, I wonder why Unicode even includes ligatures like ff. Sounds like 
mission creep to me (and horrible annoyances for people like us).

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12802] Windows error code 267 should be mapped to ENOTDIR, not EINVAL

2011-08-27 Thread Vlad Riscutia

Vlad Riscutia riscutiav...@gmail.com added the comment:

Ah, I see Antoine already attached a patch. I was 3 minutes late :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12802
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12729] Python lib re cannot handle Unicode properly due to narrow/wide bug

2011-08-27 Thread Terry J. Reedy

Terry J. Reedy tjre...@udel.edu added the comment:

Python makes it easy to transform a sequence with a generator as long as no 
look-ahead is needed. utf16.UTF16.__iter__ is a typical example. Whenever a 
surrogate is found, grab the matching one.

However, grapheme clustering does require look-ahead, which is a bit trickier. 
Assume s is a sanitized sequence of code points with unicode database entries. 
Ignoring line endings the following should work (I tested it with a toy 
definition of mark()):

def graphemes(s):
  sit = iter(s)
  try: graph = [next(sit)]
  except StopIteration: graph = []

  for cp in sit:
if mark(cp):  
  graph.append(cp)
else:
  yield combine(graph)
  graph = [cp]

  yield combine(graph)

I tested this with several input with
def mark(cp): return cp == '.'
def combine(l) return ''.join(l)

Python's object orientation makes formatting easy for the user. Assume someone 
does the hard work of writing (once ;-) a GCString class with a .__format__ 
method that interprets the format mini-language for graphemes, using a 
generalized version of your 'simply horrible' code. The might be done by 
adapting str.__format__ to use the grapheme iterator above. Then users should 
be able to write

 '{:6.6}'.format(GCString(a̠ˈne̞ɣ̞ð̞o̞t̪a̠))
a̠ˈne̞ɣ̞ð̞
(Note: Thunderbird properly displays characters with the marks beneath even 
though FireFox does not do so above or in its display of your message.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12729
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2011-08-27 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

FTR, with the latest Python 3.2/3.3 (narrow) I get:
   Total failures:   58 / 500 ( 12%)
   Total successes: 442 / 500 ( 88%)
and with the latest Python 3.2/3.3 (wide) I get:
   Total failures:   52 / 500 ( 10%)
   Total successes: 448 / 500 ( 90%)

--
Added file: http://bugs.python.org/file23055/casing-results.txt

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12736
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com