Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-24 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 20:56 -0500, Glyph Lefkowitz a écrit :
> On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote:
> 
> > On Tue, 23 Nov 2010 00:07:09 -0500
> > Glyph Lefkowitz  wrote:
> >> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto <
> >> [email protected]> wrote:
> >> 
> >>> Hello. Does this affect python? Thank you.
> >>> 
> >>> http://www.openssl.org/news/secadv_20101116.txt
> >>> 
> >> 
> >> No.
> > 
> > Well, actually it does, but Python links against the system OpenSSL on
> > most platforms (except Windows), so it's up to the OS vendor to apply
> > the patch.
> 
> 
> It does?  If so, I must have misunderstood the vulnerability.  Can you
> explain how it affects Python?

If I believe the link above:
“Any OpenSSL based TLS server is vulnerable if it is multi-threaded and
uses OpenSSL's internal caching mechanism. Servers that are
multi-process and/or disable internal session caching are NOT affected.”

So, you just have to create a multithreaded TLS server which doesn't
disable server-side session caching (it is enabled by default according
to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html )

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Centos 5.5 freeze during test_concurrent_futures

2010-11-24 Thread Antoine Pitrou

Hi,

> py3k built from trunk on Centos 5.5 freezes during regrtest on 
> test_concurrent_futures with "Fatal Python error: Invalid thread state for 
> this thread". As in a typical concurrent problem, subsequent calls freeze in 
> different test cases, but the freeze itself is always reproducible and always 
> during this test.

Well, could you run this under gdb and report the stacks for the
various threads when the process crashes?
(when compiled --with-pydebug, if possible)

Thank you

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http.server - reference to bug #427345

2010-11-24 Thread Antoine Pitrou
On Tue, 23 Nov 2010 22:35:10 -0600
Brian Curtin  wrote:
> On Tue, Nov 23, 2010 at 22:28, Glenn Linderman
> 
> > wrote:
> 
> >  Where might I find the bug #427345 that is referred to in a comment inside
> > http.server ?  Here is a code excerpt:
> >
> > # throw away additional data [see bug #427345]
> > while select.select([self.rfile._sock], [], [], 0)[0]:
> > if not self.rfile._sock.recv(1):
> > break
> >
> 
> http://bugs.python.org/issue427345
> 
> http://bugs.python.org/ has a box on the left-hand side where you can enter
> issue numbers.

And of course you can also reverse-engineer the clever URL scheme used
by Roundup bug entries ;)

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Stephen J. Turnbull
James Y Knight writes:

 > a) You seem to be hung up implementation details of emacs.

Hung up?  No.  It's the program whose text model I know best, and even
if its design could theoretically be a lot better for this purpose, I
can't say I've seen a real program whose model is obviously better for
the purpose of a language for implementing text editors.[1]  So it's not
obvious to me that its model can be ruled out on a priori grounds.  If
not, it would be nice if your new language could implement it
efficiently without contorted programming.

 >But yes, positions should be stored as an byte offset into the
 >utf8 string. NOT as number of codepoints since the beginning of
 >the string. Probably you want it to be somewhat opaque, so that
 >you actually have to specify whether you wanted to go to +1
 >byte, codepoint, or grapheme.

Well, first of all, +1 byte should not be available to a text
iterator, at least not with the same iterator/position object that
implements character and/or grapheme movement.  (You seem to have
thought about this issue a lot, but mixing bytes with text units makes
wonder how much practical implementation you've done.)

Second, incrementing to grapheme boundaries is relatively easy to do
efficiently, just as incrementing to a UTF-8 character boundary is
easy to do.  We already do the latter, the former is pragmatically
harder, but not a conceptual stretch.  That's not the question.  The
question is how do we identify an arbitrary position in the text?
Sometimes it's nice to have a numerical measure of size or location.

It is not obvious that position by grapheme count is going to be the
obvious way to determine position in a text.  Eg, for languages with
variable metric characters, character counts as a way of lining up
table columns is going the way of Tyrannosaurus.  In the Han-using
languages, yes, column counts within lines are going to be important
forever, because the characters are literally square for most
practical purposes ... but they don't use composing characters (all
the Japanese kana are precomposed, for example), so position by
grapheme is going to be very close to position by character, and fine
positioning will be done either by mouse or by incrementing the last
few characters.  Nor do I think operations like "advance 1,000,000
characters" will have less meaning than "advance 1,000,000 graphemes."
Both of them are just a way of saying "go way far away", end up in
about the same place, and where there's a bias, it will be pretty
consistent in a statistical sense for any given natural language (and
therefore, for 99% of users).

 > But once you [the language implementor] are providing correct
 > abstractions for grapheme movement, it's just as easy to also
 > provide an abstraction for codepoint movement, and make your
 > low-level implementation of the iterator object be a byte-offset
 > into a UTF8 buffer.

Sure, that's fine for something that just iterates over the text.  But
if you actually need to remember positions, or regions, to jump to
later or to communicate to other code that manipulates them, doing
this stuff the straightforward way (just copying the whole iterator
object to hang on to its state) becomes expensive.  You end up
proliferating types that all do the same kind of thing.  Judicious use
of inheritance helps, but getting the fundamental abstraction right is
hard.  Or least, Emacs hasn't found it in 20 years of trying.

OTOH, all that stuff "just works" and just works efficiently, up to
the grapheme vs. character issue, with an array.

About that issue, to go back to tired old Emacs, *all* of the things I
can think of that I might want to do by grapheme (display, insert,
delete, move a few places) do fit the "increment until done" model.
These things already work quite well for the variable-width buffer
that "multilingual" Emacsen use, whether the old Mule encoding or
UTF-8.  So I can see how the UTF-8 model with appropriate iterators
for characters and graphemes can work well for lots of applications
and use cases.

But Emacs already has opaque "markers", yet nevertheless the use of
integer character positions in strings and buffers has survived.  That
*may* have to do with mutability, and the "all the world is a buffer"
design, as Glyph suggested, but I think it more likely that markers
are very expense to create and use compared to integers.  Perhaps an
editor of power similar to Emacs could be implemented with string
operations on lines, or the like, and these issues would go away.  But
it's not obvious to me.

Footnotes: 
[1]  Yes, I know that not all programs are text editors.  So shoot
me.  It's still the text manipulation program I know best, and it's
not obvious to me that it's the unique class that would need these
features.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/opti

Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Stephen J. Turnbull
James Y Knight writes:

 > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly
 > superior [...]a because it is an ASCII superset, and thus more
 > easily compatible with other software. That also makes it most
 > commonly used for internet communication.

Sure, UTF-8 is very nice as a protocol for communicating text.  So
what?  If your application involves shoveling octets real fast, don't
convert and shovel those octets.  If your application involves
significant text processing, well, conversion can almost always be
done as fast as you can do I/O so it doesn't cost wallclock time, and
generally doesn't require a huge percentage of CPU time compared to
the actual text processing.  It's just a specialization of
serialization, that we do all the time for more complex data
structures.

So wire protocols are not a killer argument for or against any
particular internal representation of text.

 > (So, there's a huge advantage for using it internally as well right
 > there: no transcoding necessary for writing your HTML output).

I don't know your use cases but for mine, transcoding (whether in Lisp
or Python or C) is invariably the least of my worries.  *Especially*
transcoding to UTF-8, which is the default codec for me, and I *never*
mix bytes and text, so having not bothered to set the codec, I don't
bother to transcode explicitly.

 > If you really want a fixed-width encoding, you have to go to
 > UTF-32

Not really.  I never bothered implementing the codec, because I
haven't yet seen a non-BMP Unicode character in the wild (I still see
a lot of non-Unicode characters, but hey, that's the price you pay for
living in the land that invented sushi, sake, and anime).  For most
use cases, those are going to be rare, where by "rare" I mean "you
aren't going to see 6400 *different* non-BMP characters."[1]  So
instead of having the codec produce UTF-16, you have it produce (Holy
CEF, Batman!) "pure" UCS-2 with the non-BMP characters registered on
demand and encoded in the BMP private area.  Python, of course, will
never know the difference, and your language won't need to care, either.

 > But that's all a side issue: even if you do choose UTF-16 as your
 > underlying encoding, you *still* need to provide iterators that
 > work by "byte" (only now bytes are 16-bits), by codepoint,

Nope, see above.  Codepoints can be bytes and vice versa.  The needed
codec is no harder to use than any other codec, and only slightly less
efficient than the normal UTF-8 codec unless you're basically
restricted to a rather uncommon script (and even then there are
optimizations).

 > and by grapheme.

Sure, but as I point out elsewhere, the use cases where grapheme
movement is distinguished from character movement I can come up with
are all iterative, and I don't need array behavior for both anyway.
So since I *can* have a character array in Unicode, and I *can't* have
a grapheme array (except maybe by a scheme like the above), I'll go
for the character array.

Unless maybe you convince me I don't need it, but I'm yet to be
convinced.

 > away with...just so long as you don't mind that you sometimes end
 > up splitting a string in the middle of a codepoint and causing a
 > unicode error!

I *do* mind, but I like Python anyway.


Footnotes: 
[1]  OK, in practice a lot of the private space will be taken by
existing system characters, such as the Apple logo (absolutely
essential for writing email on Mac, at least in Japan).  Whose
use-case is going to see 1000 different non-BMP characters in a
session?  I do know a couple of Buddhist dictionary editors, but
aside from them, I can't think of anybody.  Lara Croft, maybe.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Antoine Pitrou
On Wed, 24 Nov 2010 18:51:49 +0900
"Stephen J. Turnbull"  wrote:
> James Y Knight writes:
> 
>  > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly
>  > superior [...]a because it is an ASCII superset, and thus more
>  > easily compatible with other software. That also makes it most
>  > commonly used for internet communication.
> 
> Sure, UTF-8 is very nice as a protocol for communicating text.  So
> what?  If your application involves shoveling octets real fast, don't
> convert and shovel those octets.  If your application involves
> significant text processing, well, conversion can almost always be
> done as fast as you can do I/O so it doesn't cost wallclock time, and
> generally doesn't require a huge percentage of CPU time compared to
> the actual text processing.  It's just a specialization of
> serialization, that we do all the time for more complex data
> structures.
> 
> So wire protocols are not a killer argument for or against any
> particular internal representation of text.

Agreed. Decoding and encoding utf-8 is so fast that it should be
dwarfed by any actual processing done on the text.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r86726 - python/branches/release27-maint/Objects/setobject.c

2010-11-24 Thread Antoine Pitrou
On Wed, 24 Nov 2010 11:39:23 +0100 (CET)
armin.rigo  wrote:
> Author: armin.rigo
> Date: Wed Nov 24 11:39:23 2010
> New Revision: 86726
> 
> Log:
> A no-op change.  It looks like this call was not meant to be a recursive
> call, but just call the helper (which the recursive call ends up doing).

Since it's allegedly a no-op change, it doesn't come with a test, and
2.7.1 is in rc phase, is it really the right time to do it? What is the
motivation for it?

Thanks

Antoine.


> 
> 
> Modified:
>python/branches/release27-maint/Objects/setobject.c
> 
> Modified: python/branches/release27-maint/Objects/setobject.c
> ==
> --- python/branches/release27-maint/Objects/setobject.c   (original)
> +++ python/branches/release27-maint/Objects/setobject.c   Wed Nov 24 
> 11:39:23 2010
> @@ -1858,7 +1858,7 @@
>  tmpkey = make_new_set(&PyFrozenSet_Type, key);
>  if (tmpkey == NULL)
>  return -1;
> -rv = set_contains(so, tmpkey);
> +rv = set_contains_key(so, tmpkey);
>  Py_DECREF(tmpkey);
>  }
>  return rv;


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread Michael Foord

On 23/11/2010 14:16, Nick Coghlan wrote:

On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord
  wrote:

PEP 354 was rejected for two primary reasons - lack of interest and nowhere
obvious to put it. Would it be *so bad* if an enum type lived in its own
module? There is certainly more interest now, and if we are to use something
like this in the standard library it *has* to be in the standard library
(unless every module implements their own private _Constant class).

Time to revisit the PEP?

If you (or anyone else) wanted to revisit the PEP, then I would advise
trawling through the standard library looking for constants that could
be sensibly converted to enum values.


Based on a non-exhaustive search, Python standard library modules 
currently using integers for constants:


* re - has flags (OR'able constants) defined in sre_constants, each flag 
has two names (e.g. re.IGNORECASE and re.I)
* os has SEEK_SET, SEEK_CUR, SEEK_END - *plus* those implemented in 
posix / nt
* doctest has its own flag system, but is really just using integer 
flags / constants (quite a few of them)

* token has a tonne of constants (autogenerated)
* socket exports a bunch of constants defined in _socket
* gzip has flags: FTEXT, FHCRC, FEXTRA, FNAME, FCOMMENT

* errno (builtin module)

EALREADY, EINPROGRESS, EWOULDBLOCK, ECONNRESET, EINVAL,
ENOTCONN, ESHUTDOWN, EINTR, EISCONN, EBADF, ECONNABORTED

* opcode has HAVE_ARGUMENT, EXTENDED_ARG. In fact pretty much the whole 
of opcode is about defining and exposing named constants

* msilib uses flag constants
* multiprocessing.pool - RUN, CLOSE, TERMINATE
* multiprocessing.util - NOTSET, SUBDEBUG, DEBUG, INFO, SUBWARNING
* xml.dom and xml.dom.Node (in __init__.py) have a bunch of constants
* xml.dom.NodeFilter.NodeFilter holds a bunch of constants (some of them 
flags)

* xmlrpc.client has a bunch of error constants
* calendar uses constants to represent weekdays, plus one for the EPOCH 
that is best left alone
* http.client has a tonne of constants - recognisable as ports / error 
codes though
* dis has flags in COMPILER_FLAG_NAMES, which are then set as locals in 
inspect

* io defines SEEK_SET, SEEK_CUR, SEEK_END (same as os)

Where constants are implemented in C but exported via a Python module 
(the constants exported by os and socket for example) they could be 
wrapped. Where they are exported directly by a C extension or builtin 
module (e.g. errno) they are probably best left.


Raymond feels that having an enum / constant type would be Javaesque and 
unused. If we used it in the standard library the unused fear at least 
would be unwarranted. The change would be largely transparent to 
developers, except they get better debugging info. Twisted is also 
looking for an enum / constant type:


http://twistedmatrix.com/trac/ticket/4671

Because we would need to subclass from int for backwards compatibility 
we can't (unless the base class is set dynamically which I don't 
propose) it couldn't replace float / string constants. Hopefully it 
would still be sufficient to allow Twisted to use it. (Although they do 
so love reimplementing parts of the standard library - usually better 
than the standard library it has to be said.)


All the best,

Michael

There are a tonne of constants that are used as numbers (MAX_LINE_LENGTH 
appears in a few places) and aren't just arbitrary constants. There are 
also some other interesting ones:


* pty has STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, CHILD
* poplib has POP3_PORT, POP3_SSL_PORT - recognisable as port numbers, 
should be left as ints

* datetime.py has MINYEAR and MAXYEAR
* colorsys has float constants
* tty uses constants for termios list indexes (used as numbers I guess)
* curses.ascii has a whole bunch of integer constants referring to ascii 
characters
* Several modules - decimal, concurrent.futures, uuid (and now inspect) 
already use strings




A decision would also need to be made as to whether or not to subclass
int, or just provide __index__ (the former has the advantage of being
able to drop cleanly into OS level APIs that expect a numerical
constant).

Whether enums should provide arbitrary name-value mappings (ala C
enums) or were restricted to sequential indices starting from zero
would be another question best addressed by a code survey of at least
the stdlib.

And getgeneratorstate() doesn't count as a use case, since the
ordering isn't needed and using string literals instead of integers
will cover the debugging aspect :)

Cheers,
Nick.




--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuit

Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread Nick Coghlan
On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord
 wrote:
> Based on a non-exhaustive search, Python standard library modules currently
> using integers for constants:

Thanks for that review. I think following up on the "NamedConstant"
idea may make more sense than pursuing enums in their own right. That
way we could get the debugging benefits on the Python side regardless
of any type constraints on the value (e.g. needing to be an integer in
order to interface to C code), without needing to design an enum API
that suited all purposes.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-24 Thread exarkun

On 08:02 am, [email protected] wrote:

Le mardi 23 novembre 2010 � 20:56 -0500, Glyph Lefkowitz a �crit :

On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote:

> On Tue, 23 Nov 2010 00:07:09 -0500
> Glyph Lefkowitz  wrote:
>> On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto <
>> [email protected]> wrote:
>>
>>> Hello. Does this affect python? Thank you.
>>>
>>> http://www.openssl.org/news/secadv_20101116.txt
>>>
>>
>> No.
>
> Well, actually it does, but Python links against the system OpenSSL 
on
> most platforms (except Windows), so it's up to the OS vendor to 
apply

> the patch.


It does?  If so, I must have misunderstood the vulnerability.  Can you
explain how it affects Python?


If I believe the link above:
1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded and
uses OpenSSL's internal caching mechanism. Servers that are
multi-process and/or disable internal session caching are NOT 
affected. 1D


So, you just have to create a multithreaded TLS server which doesn't
disable server-side session caching (it is enabled by default according
to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html 
)


Hm.  The session cache is enabled by default, but nothing will ever use 
it unless the server specifies a session id using 
SSL_set_session_id_context or SSL_CTX_set_session_id_context.  Python 
doesn't expose these, so I don't think any Python SSL server can set 
them.


The vulnerability announcement isn't 100% clear on this, but I took a 
look at the patch which fixes the issue and it /appears/ as though if a 
client never tries to re-use a session then you will be safe from this 
bug.  However, perhaps this only means that only malicious clients 
(which send a session id even when they can't actually have one) will be 
able to trigger the bug.


Or I may misunderstand how SSL sessions work in OpenSSL entirely.  The 
documentation for them is on par with that for most of the rest of 
OpenSSL.


Jean-Paul
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-24 Thread Antoine Pitrou
On Wed, 24 Nov 2010 15:01:06 -
[email protected] wrote:
> >
> >If I believe the link above:
> > 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded and
> >uses OpenSSL's internal caching mechanism. Servers that are
> >multi-process and/or disable internal session caching are NOT 
> >affected. 1D
> >
> >So, you just have to create a multithreaded TLS server which doesn't
> >disable server-side session caching (it is enabled by default according
> >to http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html 
> >)
> 
> Hm.  The session cache is enabled by default, but nothing will ever use 
> it unless the server specifies a session id using 
> SSL_set_session_id_context or SSL_CTX_set_session_id_context.  Python 
> doesn't expose these, so I don't think any Python SSL server can set 
> them.

Well, Python calls SSL_CTX_set_session_id_context() implicitly, starting
from 3.2 (precisely so that the session cache gets used). The
"documentation" I've found about the "session id context" seems to
suggest that a process-wide constant is enough.

(and you can verify that caching occurs using the new
SSLContext.session_stats() method)

> Or I may misunderstand how SSL sessions work in OpenSSL entirely.  The 
> documentation for them is on par with that for most of the rest of 
> OpenSSL.

Agreed.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread Steven D'Aprano

Nick Coghlan wrote:

On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord
 wrote:

Based on a non-exhaustive search, Python standard library modules currently
using integers for constants:


Thanks for that review. I think following up on the "NamedConstant"
idea may make more sense than pursuing enums in their own right. 


Pardon me if I've missed something in this thread, but when you say 
"NamedConstant", do you mean actual constants that can only be bound 
once but not re-bound? If so, +1. If not, what do you mean?


I thought PEP 3115 could be used to implement such constants, but I 
can't get it to work...


class readonlydict(dict):
def __setitem__(self, key, value):
if key in self:
raise TypeError("can't rebind constant")
dict.__setitem__(self, key, value)
# Need to also handle updates, del, pop, etc.

class MetaConstant(type):
@classmethod
def __prepare__(metacls, name, bases):
return readonlydict()
def __new__(cls, name, bases, classdict):
assert type(classdict) is readonlydict
return type.__new__(cls, name, bases, classdict)

class Constant(metaclass=MetaConstant):
a = 1
b = 2
c = 3


What I expect is that Constant.a should return 1, and Constant.a=2 
should raise TypeError, but what I get is a normal class __dict__.


>>> Constant.a
1
>>> Constant.a = 2
>>> Constant.a
2


--
Steven

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Vulnerability (openssl-1.0.0a)

2010-11-24 Thread exarkun

On 03:11 pm, [email protected] wrote:

On Wed, 24 Nov 2010 15:01:06 -
[email protected] wrote:

>
>If I believe the link above:
> 1CAny OpenSSL based TLS server is vulnerable if it is multi-threaded 
and

>uses OpenSSL's internal caching mechanism. Servers that are
>multi-process and/or disable internal session caching are NOT
>affected. 1D
>
>So, you just have to create a multithreaded TLS server which doesn't
>disable server-side session caching (it is enabled by default 
according
>to 
http://www.openssl.org/docs/ssl/SSL_CTX_set_session_cache_mode.html

>)

Hm.  The session cache is enabled by default, but nothing will ever 
use

it unless the server specifies a session id using
SSL_set_session_id_context or SSL_CTX_set_session_id_context.  Python
doesn't expose these, so I don't think any Python SSL server can set
them.


Well, Python calls SSL_CTX_set_session_id_context() implicitly, 
starting

from 3.2 (precisely so that the session cache gets used). The
"documentation" I've found about the "session id context" seems to
suggest that a process-wide constant is enough.


Ah.  Okay, then Python 3.2 would be vulnerable.  Good thing it isn't 
released yet. ;)


Jean-Paul
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread Benjamin Peterson
2010/11/24 Steven D'Aprano :
> Nick Coghlan wrote:
>>
>> On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord
>>  wrote:
>>>
>>> Based on a non-exhaustive search, Python standard library modules
>>> currently
>>> using integers for constants:
>>
>> Thanks for that review. I think following up on the "NamedConstant"
>> idea may make more sense than pursuing enums in their own right.
>
> Pardon me if I've missed something in this thread, but when you say
> "NamedConstant", do you mean actual constants that can only be bound once
> but not re-bound? If so, +1. If not, what do you mean?
>
> I thought PEP 3115 could be used to implement such constants, but I can't
> get it to work...
>
> class readonlydict(dict):
>    def __setitem__(self, key, value):
>        if key in self:
>            raise TypeError("can't rebind constant")
>        dict.__setitem__(self, key, value)
>    # Need to also handle updates, del, pop, etc.
>
> class MetaConstant(type):
>   �...@classmethod
>    def __prepare__(metacls, name, bases):
>        return readonlydict()
>    def __new__(cls, name, bases, classdict):
>        assert type(classdict) is readonlydict
>        return type.__new__(cls, name, bases, classdict)
>
> class Constant(metaclass=MetaConstant):
>    a = 1
>    b = 2
>    c = 3
>
>
> What I expect is that Constant.a should return 1, and Constant.a=2 should
> raise TypeError, but what I get is a normal class __dict__.

The construction namespace can be customized, but class.__dict__ must
always be a real dict.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Fwd: constant/enum type in stdlib

2010-11-24 Thread Joao S. O. Bueno
Hi --

If I may add my 0.02 cents - this sample has a sample implementation
of the proposed features I found most interesting up to now:
1) inherit from int
2) display the constant's name on 'repr'
3) optionally populate a module with the constants
4) Optionally provide a starting value for the enum
5) Optionally provide a mapping with the values


http://pastebin.com/6f1u35qJ

(implementation is in python 2)


Todo here:
6) Make them "read only"
7) Make the base type optional, with "int" as default - but also being able
to create "constants" inheriting from other objects
8) more ideas?

I am willing to play along this sample code as discussion goes on if
there is any feedback.



 js
 -><-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Alexander Belopolsky
On Tue, Nov 23, 2010 at 2:18 PM, Amaury Forgeot d'Arc
 wrote:
..
>> Given the apparent difficulty of writing even basic text processing
>> algorithms in presence of surrogate pairs, I wonder how wise it is to
>> expose Python users to them.
>
> This was already discussed two years ago:
>
> http://mail.python.org/pipermail/python-dev/2008-July/080900.html
>

Thanks for the link.   Let me summarize that discussion as I read it.

The discussion starts with a reference to Guido's 2001 post which concluded with

"""
... if we had wanted to use a
variable-lenth internal representation, we should have picked UTF-8
way back, like Perl did.  Moving to a UTF-16-based internal
representation now will give us all the problems of the Perl choice
without any of the benefits.
""" [1]

and proposes to move to USC-4 completely for Python 3.0.  Note that
this is not the option that I would like to discuss here.   I don't
propose to discuss abandoning narrow builds.  Instead, I would like to
discuss the costs and benefits associated with using variable width
CES as an internal representation.  This is where the 2008 discussion
moved.  OP did not realize that narrow build supported UTF-16 and like
myself was surprised that application developers should be aware of
surrogates if they want to use narrow builds.  It was also suggested
that Python itself is likely to have many bugs that can be triggered
by non-BMP characters on narrow builds.  Guido's response was:

"""
I'd also prefer to receive bug reports about breakages actually
encountered in the wild than purely theoretical issues
"""

I don't think this is a good position to take.  Programs that expect
one code unit where Python may produce two are likely to have security
holes.  Even when programmers carefully sanitize their input, they are
likely to do it at the code point level based on Unicode category and
0x boundary does not mean anything special for their applications.
  I think anyone who wants to write a robust application has two
choices in practice:  (a) use wide Unicode build; (b) restrict all
text to BMP.  Supporting surrogates at the application level is likely
to be prohibitively expensive.

It was later suggested that the main benefit of "UTF-16" builds is
that they can easily interface with system libraries that are "UTF-16"
based.  However, how likely are these libraries be bug-free when it
comes to non-BMP characters?  The history teaches us that not very
likely.

Daniel Arbuckle presented arguments against imposing the burden of
dealing with surrogates on application writers. [2]

The recurrent theme on the thread was that non-BMP characters are rare
and those who need them can afford the extra development cost
associated with the surrogates.  This point was very eloquently
articulated by Guido:

"""
Who are the many here? Who are the few? I'd venture that (at least for
the foreseeable future, say, until China will finally have taken over
the role of the US as the de-facto dominant super power :-) the many
are people whose app will never see a Unicode character outside the
BMP, or who do such minimal string processing that their code doesn't
care whether it's handling UTF-16-encoded data.
""" [3]

This argument can also be used to support the position that narrow
builds should not support non-BMP characters.

Later the discussion started resembling this thread when it went into
a scholastic dispute over fine points in Unicode Standard terminology.
:-)

Then BDFL vetoed len(u"\U00012345") returning 1 on narrow builds. [4]
I would be against that as well.  I don't see len("\U00012345") == 2
as a big problem because application developers can simply avoid using
\U literals if they don't want to support non-BMP characters.  On the
other hand, an option to warn users about non-BMP literals on a narrow
build may be useful but it is easy to implement in lint-like tools.

There were multiple suggestions for standard library additions to help
application writers to deal with surrogate pairs, but as far as I can
tell, nothing has been done in this area in the following two years.
I don't think there is a recipe on how to fix legacy
character-by-character processing loop such as

   for c in string:
  ...

to make it iterate over code points consistently in wide and narrow
builds.  (Note that I am not asking for a grapheme iterator here.
This is clearly an application level feature.)


> So yes, wrap() and center() should be fixed.

I opened an issue 10521 for that. [5]  I am fully prepared to see it
dismissed as "theoretical" and be closed with "won't fix" or linger
indefinitely.   Fixing it would most likely involve writing the second
version of pad() utility function specifically for the narrow build.

All examples I've seen in Python C code of dealing with surrogates
came with hand-coded #ifndef Py_UNICODE_WIDE fragments and no
user-friendly macros or APIs that would abstract it away.

A quick grep for maxunicode in the standard library revealed only o

Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread Michael Foord

On 24/11/2010 14:08, Nick Coghlan wrote:

On Wed, Nov 24, 2010 at 10:30 PM, Michael Foord
  wrote:

Based on a non-exhaustive search, Python standard library modules currently
using integers for constants:

Thanks for that review. I think following up on the "NamedConstant"
idea may make more sense than pursuing enums in their own right. That
way we could get the debugging benefits on the Python side regardless
of any type constraints on the value (e.g. needing to be an integer in
order to interface to C code), without needing to design an enum API
that suited all purposes.


Can you explain what you see as the difference?

I'm not particularly interested in type validation but I like the fact 
that typical enum APIs allow you to group constants: the generated 
constant class acts as a namespace for all the defined constants.


Are you just suggesting something along the lines of:

class NamedConstant(int):
def __new__(cls, name, val):
return int.__new__(cls, val)

def __init__(self, name, val):
self._name = name

def __repr__(self):
return '' % self._name

FOO = NamedConstant('FOO', 3)

In general the less features the better, but I'd like a few more 
features than that. :-)


All the best,

Michael



Cheers,
Nick.




--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread M.-A. Lemburg
Alexander Belopolsky wrote:
> To conclude, I feel that rather than trying to fully support non-BMP
> characters as surrogate pairs in narrow builds, we should make it
> easier for application developers to avoid them. 

I don't understand what you're after here. Programmers can easily
avoid them by not using them :-)

> If abandoning
> internal use of UTF-16 is not an option, I think we should at least
> add an option for decoders that currently produce surrogate pairs to
> treat non-BMP characters as errors and handle them according to user's
> choice.

But what do you gain by doing this ? You'd lose the round-trip
safety of those codecs and that's not a good thing.

Note that most text processing APIs in Python work based on code
units, which in most cases represent single code points, but in
some cases can also represent surrogates (both on UCS-2 and on
UCS-4 builds).

E.g. str.center(n) centers the string in a padded string that
is composed of n code units. Whether that operation will result
in a text that's centered visually on output is a completely
different story. The original string could contain surrogates,
it could also contain combing code points, so the visual
presentation of the result may very well not be centered at
all; it may not even appear as having the length n to the user.

Since we're not going change the semantics of those APIs,
it is OK to not support padding with non-BMP code points on
UCS-2 builds.

Supporting such cases would only cause problems:

* if the methods would pad with surrogates, the resulting
  string would no longer have length n; breaking the
  assumption that len(str.center(n)) == n

* if the methods would pad with half the number of surroagtes
  to make sure that len(str.center(n)) == n, the resulting
  output to e.g. a terminal would be further off, than what
  you already have with surrogates and combining code points
  in the original string.

More on codecs supporting surrogates:

  http://mail.python.org/pipermail/python-dev/2008-July/080915.html

Perhaps it's time to reconsider a project I once started
but that never got off the ground:

  http://mail.python.org/pipermail/python-dev/2008-July/080911.html

Here's the pre-PEP:

  http://mail.python.org/pipermail/python-dev/2001-July/015938.html

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 24 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-24 Thread Brett Cannon
On Tue, Nov 23, 2010 at 15:07, Terry Reedy  wrote:
>
>
> On 11/23/2010 5:43 PM, Éric Araujo wrote:
>>>
>>> Modified: python/branches/py3k/Misc/ACKS
>>>
>>> ==
>>> --- python/branches/py3k/Misc/ACKS      (original)
>>> +++ python/branches/py3k/Misc/ACKS      Tue Nov 23 21:32:47 2010
>>> @@ -1,4 +1,4 @@
>>> -Acknowledgements
>>> +Acknowledgements
>>
>> This change introduced a so-called UTF-8 BOM in the file.  Is
>> TortoiseSvn the culprit or a text editor?
>
> I used Notepad to edit the file, TortoiseSvn to commit, the same as I did
> for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday.
> If the latter is OK, perhaps *.py gets filtered better than misc. text
> files. I believe I have the config as specified in dev/faq.

Adding the BOM will be an editor thing, not a svn thing. Doing a
Google search for [ms notepad bom] shows that Notepad did the
"helpful", invisible edit.

-Brett


>
> [miscellany]
> enable-auto-props = yes
>
> [auto-props]
> * = svn:eol-style=native
> *.c = svn:keywords=Id
> *.h = svn:keywords=Id
> *.py = svn:keywords=Id
> *.txt = svn:keywords=Author Date Id Revision
>
> Terry
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-24 Thread Terry Reedy

On 11/24/2010 2:04 PM, Brett Cannon wrote:

On Tue, Nov 23, 2010 at 15:07, Terry Reedy  wrote:



I used Notepad to edit the file, TortoiseSvn to commit, the same as I did
for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday.
If the latter is OK, perhaps *.py gets filtered better than misc. text
files. I believe I have the config as specified in dev/faq.


Adding the BOM will be an editor thing, not a svn thing. Doing a
Google search for [ms notepad bom] shows that Notepad did the
"helpful", invisible edit.


So I presume it did the same with IOBinding.py. Does *.py get filtered 
is a way that could be extended to no-extention files? Do *.txt files 
get BOM filtered off? Should all text files in repository have some 
extension (default .txt)?


More to the point, can better filtering be added to the new hg 
repository? Or can a local Windows hg setup have such filtering on local 
commits before pushing?


I know now that I could always edit with IDLE's editor, but it is a lot 
easier to right click and select edit than it is to run thru the 
directory tree in an open dialog. And of course, since the pseudo-BOM 
addition is undocumented within notepad itself, and probably other 
editors, it is easy to not know.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-24 Thread Georg Brandl
Am 24.11.2010 20:25, schrieb Terry Reedy:
> On 11/24/2010 2:04 PM, Brett Cannon wrote:
>> On Tue, Nov 23, 2010 at 15:07, Terry Reedy  wrote:
> 
>>> I used Notepad to edit the file, TortoiseSvn to commit, the same as I did
>>> for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday.
>>> If the latter is OK, perhaps *.py gets filtered better than misc. text
>>> files. I believe I have the config as specified in dev/faq.
>>
>> Adding the BOM will be an editor thing, not a svn thing. Doing a
>> Google search for [ms notepad bom] shows that Notepad did the
>> "helpful", invisible edit.
> 
> So I presume it did the same with IOBinding.py. Does *.py get filtered 
> is a way that could be extended to no-extention files? Do *.txt files 
> get BOM filtered off? Should all text files in repository have some 
> extension (default .txt)?
> 
> More to the point, can better filtering be added to the new hg 
> repository? Or can a local Windows hg setup have such filtering on local 
> commits before pushing?

Of course it can; it's just a matter of writing the respective hooks.
What we *can* do in any case is to check for UTF-8 "BOMs" server-side
in the whitespace checking hook.

> I know now that I could always edit with IDLE's editor, but it is a lot 
> easier to right click and select edit than it is to run thru the 
> directory tree in an open dialog. And of course, since the pseudo-BOM 
> addition is undocumented within notepad itself, and probably other 
> editors, it is easy to not know.

It should show up as an invisible change in the first line of a file when you
look at a "svn diff".  (It is a very good practice to look at a diff before
committing anyway.)

Georg


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Alexander Belopolsky
On Wed, Nov 24, 2010 at 1:50 PM, M.-A. Lemburg  wrote:
..
>> add an option for decoders that currently produce surrogate pairs to
>> treat non-BMP characters as errors and handle them according to user's
>> choice.
>
> But what do you gain by doing this ? You'd lose the round-trip
> safety of those codecs and that's not a good thing.
>

Any non-trivial text processing is likely to be broken in presence of
surrogates.  Producing them on input is just trading known issue for
an unknown one.  Processing surrogate pairs in python code is hard.
Software that has to support non-BMP characters will most likely be
written for a wide build and contain subtle bugs when run under a
narrow build.  Note that my latest proposal does not abolish
surrogates outright.  Users who want them can still use something like
"surrogateescape"  error handler for non-BMP characters.

> Since we're not going change the semantics of those APIs,
> it is OK to not support padding with non-BMP code points on
> UCS-2 builds.
>

Well, I think more users are willing to accept slightly misaligned
text in their web-app logs than those willing to cope with

Traceback (most recent call last):
  ...
TypeError: The fill character must be exactly one character long

there.

Yes, allowing non-trusted users to specify fill character is unlikely,
but it is quite likely that naive slicing or iteration over string
units would result in

Traceback (most recent call last):
  ...
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in
position 0: surrogates not allowed

> Supporting such cases would only cause problems:
>
> * if the methods would pad with surrogates, the resulting
>  string would no longer have length n; breaking the
>  assumption that len(str.center(n)) == n
>

I agree, but how is this different from breaking the assumption that
len(chr(i)) == 1?

> * if the methods would pad with half the number of surroagtes
>  to make sure that len(str.center(n)) == n, the resulting
>  output to e.g. a terminal would be further off, than what
>  you already have with surrogates and combining code points
>  in the original string.
>

I agree again.  What I suggested on the tracker, supporting non-BMP
characters in narrow builds should mean that library functions given
input with the same UCS-4 encoding should produce output with the same
UCS-4 encoding.


> Perhaps it's time to reconsider a project I once started
> but that never got off the ground:
>
>  http://mail.python.org/pipermail/python-dev/2008-July/080911.html
>
> Here's the pre-PEP:
>
>  http://mail.python.org/pipermail/python-dev/2001-July/015938.html

I agree again, but I feel that exposing code units rather than code
points at the Python string level takes us back to 2.x days of mixing
bytes and strings.

Let me quote Guido circa 2001 again:

"""
... if we had wanted to use a
variable-lenth internal representation, we should have picked UTF-8
way back, like Perl did.  Moving to a UTF-16-based internal
representation now will give us all the problems of the Perl choice
without any of the benefits.
"""

I don't understand what changed since 2001 that made this argument
invalid.   I note that an opinion has been raised on this thread that
if we want compressed internal representation for strings, we should
use UTF-8.  I tend to agree, but UTF-8 has been repeatedly rejected as
too hard to implement.  What makes UTF-16 easier than UTF-8?  Only the
fact that you can ignore bugs longer, in my view.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [Preview] Comments and change proposals on documentation

2010-11-24 Thread Georg Brandl
Hi,

at , you can look at a version of the 3.2
docs that has the upcoming commenting feature.  JavaScript is mandatory.
I've switched on anonymous comments for testing, but usually at least
comments from anonymous users can be moderated.  Be sure to test the
"propose a change" feature too.  Login currently allows OpenID exclusively.

Credits go to Jacob Mason, whose GSOC project is responsible for almost all
of what you see there.  [1]

Please test on a smaller page, such as ,
there is currently a speed issue with larger pages.  (Helpful tips from
JS experts are welcome.)

Other things I have to do before this can go live:

* reuse existing logins from either wiki or tracker?
* (re)Captcha integration for anonymous comments
* easier moderation (currently emails are sent on new comments)
* facility for (semi)automatic applying of proposals (once Hg is live, this
  should be easy to do due to the separation between commit and merge)
* allow commenting on code blocks (figure out where to place the "bubble")

Any feedback is appreciated (I'd suggest mailing it to doc-SIG only, to avoid
cluttering up python-dev).

Have fun,
Georg

[1] The source for the webapp is at
, but most of the
functionality is implemented in Sphinx trunk.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] collect2: library libpython2.6 not found while building extensions (--enable-shared)

2010-11-24 Thread Anurag Chourasia
All,

When I configure python to enable shared libraries, none of the extensions
are getting built during the make step due to this error.

building 'cStringIO' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I.
-IInclude -I./Include -I/opt/freeware/include
-I/opt/freeware/include/readline -I/opt/freeware/include/ncurses
-I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include
-I/u01/home/apli/wm/GDD/Python-2.6.6 -c
/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o
./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o
-L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so
*collect2: library libpython2.6 not found*

building 'cPickle' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I.
-IInclude -I./Include -I/opt/freeware/include
-I/opt/freeware/include/readline -I/opt/freeware/include/ncurses
-I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include
-I/u01/home/apli/wm/GDD/Python-2.6.6 -c
/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o
./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o
-L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so
*collect2: library libpython2.6 not found*

This is on AIX 5.3, GCC 4.2, Python 2.6.6

I can confirm that there is a libpython2.6.a file in the top level directory
from where I am doing the configure/make etc

Here are the options supplied to the configure command

./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I
/opt/freeware/include -I /opt/freeware/include/readline -I
/opt/freeware/include/ncurses"

Please guide me in getting past this error.

Thanks for your help on this.

Regards,
Anurag
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-24 Thread Martin v. Löwis
> So I presume it did the same with IOBinding.py.

No. This file contains only ASCII characters, so notepad has decided
to not add the BOM.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread average
Is immutability a general need that should have general solution?  By
generalizing the idea to lists/tuples, set/frozenset, dicts, and strings
(for example), it seems one could simplify the container classes, eliminate
code complexity, and perhaps improve resource utilization.

mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r86731 - in python/branches/py3k: Lib/distutils/command/install.py Lib/distutils/sysconfig.py Lib/sysconfig.py Makefile.pre.in Misc/python.pc.in configure configure.in

2010-11-24 Thread Antoine Pitrou
On Wed, 24 Nov 2010 20:43:47 +0100 (CET)
barry.warsaw  wrote:
> Author: barry.warsaw
> Date: Wed Nov 24 20:43:47 2010
> New Revision: 86731
> 
> Log:
> Final patch for issue 9807.

This seems to have broken compilation under Windows:

Build started: Project: ssl, Configuration: Debug|Win32
Performing Makefile project actions
Traceback (most recent call last):
  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
line 519, in 
main()
  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
line 507, in main
known_paths = addusersitepackages(known_paths)
  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
line 253, in addusersitepackages
user_site = getusersitepackages()
  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
line 228, in getusersitepackages
user_base = getuserbase() # this will also set USER_BASE
  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
line 218, in getuserbase
USER_BASE = get_config_var('userbase')
  File 
"d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 
586, in get_config_var
return get_config_vars().get(name)
  File 
"d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", line 
478, in get_config_vars
_CONFIG_VARS['abiflags'] = sys.abiflags
AttributeError: 'module' object has no attribute 'abiflags'

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r86731 - in python/branches/py3k: Lib/distutils/command/install.py Lib/distutils/sysconfig.py Lib/sysconfig.py Makefile.pre.in Misc/python.pc.in configure configure.in

2010-11-24 Thread Barry Warsaw
On Nov 25, 2010, at 12:41 AM, Antoine Pitrou wrote:

>On Wed, 24 Nov 2010 20:43:47 +0100 (CET)
>barry.warsaw  wrote:
>> Author: barry.warsaw
>> Date: Wed Nov 24 20:43:47 2010
>> New Revision: 86731
>> 
>> Log:
>> Final patch for issue 9807.
>
>This seems to have broken compilation under Windows:
>
>Build started: Project: ssl, Configuration: Debug|Win32
>Performing Makefile project actions
>Traceback (most recent call last):
>  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
> line 519, in 
>main()
>  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
> line 507, in main
>known_paths = addusersitepackages(known_paths)
>  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
> line 253, in addusersitepackages
>user_site = getusersitepackages()
>  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
> line 228, in getusersitepackages
>user_base = getuserbase() # this will also set USER_BASE
>  File "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\site.py", 
> line 218, in getuserbase
>USER_BASE = get_config_var('userbase')
>  File 
> "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", 
> line 586, in get_config_var
>return get_config_vars().get(name)
>  File 
> "d:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\sysconfig.py", 
> line 478, in get_config_vars
>_CONFIG_VARS['abiflags'] = sys.abiflags
>AttributeError: 'module' object has no attribute 'abiflags'

As discussed on IRC, _CONFIG_VARS['abiflags'] = '' if sys.abiflags is not
defined.  Amaury is going to test that.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Greg Ewing

On 24/11/10 13:22, James Y Knight wrote:


Instead, provide bidirectional iterators which can traverse the string by byte,
codepoint, or by grapheme


Maybe it would be a good idea to add some iterators like this
to Python. (Or has the time machine beaten me there?)

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Stephen J. Turnbull
Alexander Belopolsky writes:

 > Any non-trivial text processing is likely to be broken in presence of
 > surrogates.

If you're worried about this, write a UCS-2-producing codec that
rejects surrogates or stuffs them into the private zone of the BMP.
Maybe such a codec should be default, but so far nobody seems to want
one enough; they want UTF-16 even though they know it's wrong.

One of the things that makes the 16-bit code unit attractive to me is
that the options for working around the variable-width nature of
UTF-16 (without actually implementing conformance to UTF-16 in
internal operations!) are many.  If you use octets as code units, you
don't have such options: you have to do it right.

 > Processing surrogate pairs in python code is hard.

Sure, but as James Knight and MAL point out, so is processing compose
characters, and those errors will go undetected in your proposals,
even with a strict UCS-2 definition.  What can you do?  Banning
composing characters isn't going to fly!

 > Yes, allowing non-trusted users to specify fill character is unlikely,
 > but it is quite likely that naive slicing or iteration over string
 > units would result in
 > 
 > Traceback (most recent call last):

Naive slicing yes, but naive iteration (ie, iteration that consumes
the whole string, or up to a known character, rather than up to a
specified position) is highly unlikely to result in such a traceback.
It is precisely that property (non-BMP characters get passed through
unchanged, or ignored) that makes extension to non-BMP code points
attractive.

 > I agree again, but I feel that exposing code units rather than code
 > points at the Python string level takes us back to 2.x days of mixing
 > bytes and strings.

It does, but there's a difference.  With bytes as UTF-8, only ASCII
values have defined semantics in Unicode.  The rest have semantics
that is context-dependent, and they are frequent in any non-English
processing and many English use cases (math symbols, correctly-
oriented punctuation).  With 16-bit code units, all values have well-
defined semantics in Unicode, and non-characters are going to be
extremely rare in the vast majority of use cases.  IOW, you can think
of Python as a UCS-2 device processing characters, and let surrounding
UTF-16 processors deal with the errors.

 > Let me quote Guido circa 2001 again:
 > 
 > """
 > ... if we had wanted to use a
 > variable-lenth internal representation, we should have picked UTF-8
 > way back, like Perl did.  Moving to a UTF-16-based internal
 > representation now will give us all the problems of the Perl choice
 > without any of the benefits.
 > """
 > 
 > I don't understand what changed since 2001 that made this argument
 > invalid.

Nothing.  The internal representation of Python is UCS-2, not UTF-16.
People who want to think otherwise are kidding themselves.  The
presence of surrogates is not sufficient to call something UTF-16.
Preserving the Unicode code points through any builtin operations is a
necessary condition, and Python doesn't do that.  *However*, in my
opinion, it's not a big deal to allow surrogates in UCS-2 a la ISO
10646-1:1996.  That lets people who want a quick and dirty way to
handle BMP text that *might* (but usually won't) contain some non-BMP
characters go a long way fast.  "Although practicality beats purity."

 > I note that an opinion has been raised on this thread that
 > if we want compressed internal representation for strings, we should
 > use UTF-8.  I tend to agree, but UTF-8 has been repeatedly rejected as
 > too hard to implement.  What makes UTF-16 easier than UTF-8?  Only the
 > fact that you can ignore bugs longer, in my view.

That's mostly true.  My guess is that we can probably ignore those
bugs for as long as it takes someone to write the higher-level
libraries that James suggests and MAL has actually proposed and
started a PEP for.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Greg Ewing

On 24/11/10 22:03, Stephen J. Turnbull wrote:

But
if you actually need to remember positions, or regions, to jump to
later or to communicate to other code that manipulates them, doing
this stuff the straightforward way (just copying the whole iterator
object to hang on to its state) becomes expensive.


If the internal representation of a text pointer (I won't call it
an iterator because that means something else in Python) is a byte
offset or something similar, it shouldn't take up any more space
than a Python int, which is what you'd be using anyway if you
represented text positions by grapheme indexes or whatever.

If you want the text pointer to also remember which string it
points into, it'll be a bit bigger, but again, no bigger than
you would need to get the same functionality using a grapheme
index plus a reference to the original string. Probably smaller,
because it would all be encapsulated in one object.

So I don't really see what you're arguing for here. How do
*you* think positions in unicode strings should be represented?

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Greg Ewing

On 25/11/10 06:37, Alexander Belopolsky wrote:


I don't think there is a recipe on how to fix legacy
character-by-character processing loop such as

for c in string:
   ...

to make it iterate over code points consistently in wide and narrow
builds.


A couple of possibilities:

1) Make things so that 'for c in string' does actually
iterate over characters rather than code units. This could
break existing code, though.

2) Provide some things like

   for c in string.chars():
 ...

   for c in string.graphemes():
 ...

where chars() and graphemes() return appropriate iterators.
(Or possibly iterable views, but that would raise the
expectation that the views could also be randomly indexed
by char or grapheme, which we probably wouldn't want to
support.)

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-24 Thread Greg Ewing

On 25/11/10 12:38, average wrote:

Is immutability a general need that should have general solution?


I don't think it really generalizes. Tuples are not just frozen
lists, for example -- they have a different internal structure
that's more efficient to create and access.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Stephen J. Turnbull
Greg Ewing writes:
 > On 24/11/10 22:03, Stephen J. Turnbull wrote:
 > > But
 > > if you actually need to remember positions, or regions, to jump to
 > > later or to communicate to other code that manipulates them, doing
 > > this stuff the straightforward way (just copying the whole iterator
 > > object to hang on to its state) becomes expensive.
 > 
 > If the internal representation of a text pointer (I won't call it
 > an iterator because that means something else in Python) is a byte
 > offset or something similar, it shouldn't take up any more space
 > than a Python int, which is what you'd be using anyway if you
 > represented text positions by grapheme indexes or whatever.

That's not necessarily true.  Eg, in Emacs ("there you go again"),
Lisp integers are not only immediate (saving one pointer), but the
type is encoded in the lower bits, so that there is no need for a type
pointer -- the representation is smaller than the opaque marker type.
Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of
24 bytes on a 64-bit platform.

In Python it's true that markers can use the same data structure as
integers and simply provide different methods, and it's arguable that
Python's design is better.  But if you use bytes internally, then you
have problems.  Do you expose that byte value to the user?  Can users
(programmers using the language and end users) specify positions in
terms of byte values?  If so, what do you do if the user specifies a
byte value that points into a multibyte character?  What if the user
wants to specify position by number of characters?  Can you translate
efficiently?

As I say elsewhere, it's possible that there really never is a need to
efficiently specify an absolute position in a large text as a
character (grapheme, whatever) count.  But I think it would be hard to
implement an efficient text-processing *language*, eg, a Python module
for *full conformance* in handling Unicode, on top of UTF-8.  Any time
you have an algorithm that requires efficient access to arbitrary text
positions, you'll spend all your skull sweat fighting the
representation.  At least, that's been my experience with Emacsen.

 > So I don't really see what you're arguing for here. How do
 > *you* think positions in unicode strings should be represented?

I think what users should see is character positions, and they should
be able to specify them numerically as well as via an opaque marker
object.  I don't care whether that position is represented as bytes or
characters internally, except that the experience of Emacsen is that
representation as byte positions is both inefficient and fragile.  The
representation as character positions is more robust but slightly more
inefficient.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Alexander Belopolsky
On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull  wrote:
..
>  > I note that an opinion has been raised on this thread that
>  > if we want compressed internal representation for strings, we should
>  > use UTF-8.  I tend to agree, but UTF-8 has been repeatedly rejected as
>  > too hard to implement.  What makes UTF-16 easier than UTF-8?  Only the
>  > fact that you can ignore bugs longer, in my view.
>
> That's mostly true.  My guess is that we can probably ignore those
> bugs for as long as it takes someone to write the higher-level
> libraries that James suggests and MAL has actually proposed and
> started a PEP for.
>

As far as I can tell, that PEP generated grand total of one comment in
nine years.  This may or may not be indicative of how far away we are
from seeing it implemented.  :-)

As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even
more far fetched.  Once upon a time, Python Unicode strings supported
buffer protocol and would lazily fill an internal buffer with bytes in
the default encoding.  In 3.x the default encoding has been fixed as
UTF-8, buffer protocol support was removed from strings, but the
internal buffer caching (now UTF-8) encoded representation remained.
Maybe we can now implement defenc logic in reverse.  Recall that
strings are stored as UCS-2/4 sequences, but once buffer is requested
in 2.x Python code or char* is obtained via
_PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer
is filled with UTF-8 bytes and  defenc is set to point to that buffer.
  So the idea is for strings to store their data as UTF-8 buffer
pointed by defenc upon construction.  If an application uses string
indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer.
Proper, Unicode-aware algorithms such as grapheme, word or line
iteration or simple operations such as concatenation, search or
substitution would operate directly on defenc buffers.  Presumably
over time fewer and fewer applications would use code unit indexing
that require UCS-2/4 buffer and eventually Python strings can stop
supporting indexing altogether just like they stopped supporting the
buffer protocol in 3.x.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-24 Thread Terry Reedy

On 11/24/2010 3:04 PM, Georg Brandl wrote:


Adding the BOM will be an editor thing, not a svn thing. Doing a



It should show up as an invisible change in the first line of a file when you
look at a "svn diff".  (It is a very good practice to look at a diff before
committing anyway.)


It does show up, and yes I agree. That should be in dev/faq if not already

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-24 Thread Terry Reedy

On 11/24/2010 5:13 PM, "Martin v. Löwis" wrote:

So I presume it did the same with IOBinding.py.


No. This file contains only ASCII characters, so notepad has decided
to not add the BOM.


Or it somehow got removed from the .py file. I tried with another .py 
file (and reverted!) and the diff showed the invisible change to the 
first line that Georg predicted.


--
Terry Jan Reedy


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-24 Thread Terry Reedy

On 11/24/2010 3:06 PM, Alexander Belopolsky wrote:


Any non-trivial text processing is likely to be broken in presence of
surrogates.  Producing them on input is just trading known issue for
an unknown one.  Processing surrogate pairs in python code is hard.
Software that has to support non-BMP characters will most likely be
written for a wide build and contain subtle bugs when run under a
narrow build.  Note that my latest proposal does not abolish
surrogates outright.  Users who want them can still use something like
"surrogateescape"  error handler for non-BMP characters.


It seems to me that what you are asking for is an alternate, optional, 
utf-8-bmp codec that would raise an error, in either direction, for 
non-bmp chars. Then, as you suggest, if one is not prepared for 
surrogates, they are not allowed.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com