Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-28 Thread Vinay Sajip
Victor Stinner victor.stinner at haypocalc.com writes:

 It's difficult for an user to choose between between open() and 
 codecs.open().

Is it? How about the following decision process?

If writing code for Python 3.x only, use open().

If writing code which has to work under both Python 2.x and 3.x, use
codecs.open().

BTW I have written code using StreamReader and StreamWriter in the past,
though it may not have been published on the Internet. Python is used a
lot by companies for internal systems. Such code is seldom published on the
Internet, so it seems that there's no real way of knowing how much
StreamReader/StreamWriter are used.

When looking at porting projects to Python 3.x, I've always adopted a single
code-base approach for 2.x and 3.x, as I feel it's the path of least ongoing
maintenance and hence (in my experience) the path of least resistance to
providing 3.x support. Though of course I've no objection to implementing their
functionality in the most efficient way possible (which may well be
TextIOWrapper), IMO deprecating StreamReader/StreamWriter will make 2.x/3.x
portability harder to achieve, and so seems a step too far.

Regards,

Vinay Sajip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread M.-A. Lemburg
Victor Stinner wrote:
 Le mercredi 25 mai 2011 à 15:43 +0200, M.-A. Lemburg a écrit :
 For UTF-16 it would e.g. make sense to always read data in blocks
 with even sizes, removing the trial-and-error decoding and extra
 buffering currently done by the base classes. For UTF-32, the
 blocks should have size % 4 == 0.

 For UTF-8 (and other variable length encodings) it would make
 sense looking at the end of the (bytes) data read from the
 stream to see whether a complete code point was read or not,
 rather than simply running the decoder on the complete data
 set, only to find that a few bytes at the end are missing.
 
 I think that the readahead algorithm is much more faster than trying to
 avoid partial input, and it's not a problem to have partial input if you
 use an incremental decoder.

Depends on where you're coming from. For non-seekable streams
such as sockets or pipes, readahead is not going to work.

For seekable streams, I agree that readahead is better strategy.

And of course, it also makes sense to use incremental decoders
for these encodings.

 For single character encodings, it would make sense to prefetch
 data in big chunks and skip all the trial and error decoding
 implemented by the base classes to address the above problem
 with variable length encodings.
 
 TextIOWrapper implements this optimization using its readahead
 algorithm.

It does yes, but the above was an optimization specific
to single character encodings, not all encodings and
TextIOWrapper doesn't know anything about specific characteristics
of the underlying encodings (except perhaps a few special
cases).

 That's somewhat unfair: TextIOWrapper is implemented in C,
 whereas the StreamReader/Writer subclasses used by the
 codecs are written in Python.

 A fair comparison would use the Python implementation of
 TextIOWrapper.
 
 Do you mean that you would like to reimplement codecs in C? 

As use of Unicode codecs increases in Python applications,
this would certainly be an approach to consider, yes.

Looking at the current situation, it is better to use
TextIOWrapper as it provides better performance, but since
TextIOWrapper cannot (per desing) provide per-codec optimizations,
this is likely to change with a codec rewrite in C of codecs
that benefit a lot from such specific optimizations.

 It is not
 revelant to compare codecs and _pyio, because codecs reuses
 BufferedReader (of the io module, not of the _pyio module), and io is
 the main I/O module of Python 3.

They both use whatever stream you pass in as parameter,
so your TextIOWrapper benchmark will also use the BufferedReader
of the io module.

The point here is to compare Python to Python, not Python
to C.

 But well, as you want, here is a benchmark comparing:
_pyio.TextIOWrapper(io.open(filename, 'rb'), encoding)
 and 
 codecs.open(filename, encoding)
 
 The only change with my previous bench.py script is the test_io()
 function :
 
 def test_io(test_func, chunk_size):
 with open(FILENAME, 'rb') as buffered:
 f = _pyio.TextIOWrapper(buffered, ENCODING)
 test_file(f, test_func, chunk_size)
 f.close()

Thanks for running those tests.

 (1) Decode Objects/unicodeobject.c (317336 characters) from utf-8
 
 test_io.readline(): 1193.4 ms
 test_codecs.readline(): 1267.9 ms
 - codecs 6% slower than io
 
 test_io.read(1): 21696.4 ms
 test_codecs.read(1): 36027.2 ms
 - codecs 66% slower than io
 
 test_io.read(100): 3080.7 ms
 test_codecs.read(100): 3901.7 ms
 - codecs 27% slower than io

This shows that StreamReader/Writer could benefit quite
a bit from using incremental encoders/decoders.

 test_io.read(): 3991.0 ms
 test_codecs.read(): 1736.9 ms
 - codecs 130% FASTER than io

No surprise here. It's also a very common use case
to read the whole file in one go and the bigger
the file, the more impact this has.

 (2) Decode README (6613 characters) from ascii
 
 test_io.readline(): 678.1 ms
 test_codecs.readline(): 760.5 ms
 - codecs 12% slower than io
 
 test_io.read(1): 13533.2 ms
 test_codecs.read(1): 21900.0 ms
 - codecs 62% slower than io
 
 test_io.read(100): 2663.1 ms
 test_codecs.read(100): 3270.1 ms
 - codecs 23% slower than io
 
 test_io.read(): 6769.1 ms
 test_codecs.read(): 3919.6 ms
 - codecs 73% FASTER than io

See above.

 (3) Decode Lib/test/cjkencodings/gb18030.txt (501 characters) from
 gb18030
 
 test_io.readline(): 38.9 ms
 test_codecs.readline(): 15.1 ms
 - codecs 157% FASTER than io
 
 test_io.read(1): 369.8 ms
 test_codecs.read(1): 302.2 ms
 - codecs 22% FASTER than io
 
 test_io.read(100): 258.2 ms
 test_codecs.read(100): 155.1 ms
 - codecs 67% FASTER than io
 
 test_io.read(): 1803.2 ms
 test_codecs.read(): 1002.9 ms
 - codecs 80% FASTER than io

These results are interesting since gb18030 is a shift
encoding which keeps state in the encoded data stream, so
the strategy chosen by TextIOWrapper doesn't work out that
well.

It hints to what I mentioned above: per codec optimizations
are going to be relevant once 

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Victor Stinner
Le vendredi 27 mai 2011 10:17:29, M.-A. Lemburg a écrit :
  I think that the readahead algorithm is much more faster than trying to
  avoid partial input, and it's not a problem to have partial input if you
  use an incremental decoder.
 
 Depends on where you're coming from. For non-seekable streams
 such as sockets or pipes, readahead is not going to work.

I don't see how StreamReader/StreamWriter can do a better job than 
TextIOWrapper for non-seekable streams.

  TextIOWrapper implements this optimization using its readahead
  algorithm.
 
 It does yes, but the above was an optimization specific
 to single character encodings, not all encodings and
 TextIOWrapper doesn't know anything about specific characteristics
 of the underlying encodings (except perhaps a few special
 cases).

Please give me numbers: how fast are your suggested optimizations? Are they 
faster than readahead? All of your argumentation is based on theorical facts.

  Do you mean that you would like to reimplement codecs in C?
 
 As use of Unicode codecs increases in Python applications,
 this would certainly be an approach to consider, yes.

I am not sure that StreamReader is/can be faster than TextIOWrapper if it is 
reimplemented in C (see the updated benchmark below, codecs vs _pyio).

  test_io.read(): 3991.0 ms
  test_codecs.read(): 1736.9 ms
  - codecs 130% FASTER than io
 
 No surprise here. It's also a very common use case
 to read the whole file in one go and the bigger
 the file, the more impact this has.

Oh, I understood why codecs is always faster than _pyio (or even io): it's 
because of IncrementalNewlineDecoder. To be fair, the read(-1) should be 
tested without IncrementalNewlineDecoder: e.g. with newline='\n'.

newline='' cannot be used for the read(-1) test, because even if newline='' 
indicates that we don't want to translate newlines, read(-1) uses the 
IncrementalNewlineDecoder (which is slower than not calling it at all). We may 
optimize this specific case in TextIOWrapper.

  (3) Decode Lib/test/cjkencodings/gb18030.txt (501 characters) from
  gb18030
  
  test_io.readline(): 38.9 ms
  test_codecs.readline(): 15.1 ms
  - codecs 157% FASTER than io
  
  test_io.read(1): 369.8 ms
  test_codecs.read(1): 302.2 ms
  - codecs 22% FASTER than io
  
  test_io.read(100): 258.2 ms
  test_codecs.read(100): 155.1 ms
  - codecs 67% FASTER than io
  
  test_io.read(): 1803.2 ms
  test_codecs.read(): 1002.9 ms
  - codecs 80% FASTER than io
 
 These results are interesting since gb18030 is a shift
 encoding which keeps state in the encoded data stream, so
 the strategy chosen by TextIOWrapper doesn't work out that
 well.

In the 4 tests, TextIOWrapper only calls the decoder *once*, on the whole 
content of the file. The file size if 864 bytes, which is smaller than the 
TextIOWrapper chunk size (2048 bytes).

StreamReader of the gb18030 codec is implemented in C, not in Python (using 
multibytecodec.c). So to be fair, the test on this encoding should be done 
using io, not _pyio for this encoding.

Moreover, the multibytecodec module doesn't support universal newline! It does 
only support '\n' newlines. So to be more fair, the test should use '\n' 
newline.

It's one more reason to TextIOWrapper instead of StreamReader: it has the same 
behaviour (universal newlines) for all encodings. Or is it yet another bug in 
StreamReader?

 I am still -1 on deprecating the StreamReader/Writer parts of
 the codec APIs. I've given numerous reasons on why these are
 useful, what their intention is, why they were added to Python 1.6.

codecs.open() now uses TextIOWrapper, so there is no good reason to keep 
StreamReader or StreamWriter. You did not give me any use case where 
StreamReader or StreamWriter should be used instead of TextIOWrapper. You only 
listed theorical optimizations.

You have until the release of Python 3.3 to prove that StreamReader and/or 
StreamWriter can be faster than TextIOWrapper. If you can prove it using a 
patch and a benchmark, I will be ok to revert my commit.

 Since such a deprecation would change an important documented API,
 please write a PEP outlining your reasoning, including my comments,
 use cases and possibilities for optimizations.

Ok, I will write on a PEP explaining why StreamReader and StreamWriter are 
deprecated.

---

I wrote a new benchmarking script which tries to compare more closely codecs 
to io/_pyio (change the newline value and use io for gb18030). It should be a 
little bit more reliable because each test now runs 5 times (taking the 
smallest time), but it's not really reliable... The script is attached to this 
mail.



(1) Decode Objects/unicodeobject.c (317334 characters) from utf-8

_pyio.readline(): 1078.4 ms (8 loops, newline: '')
codecs.readline(): 983.0 ms (8 loops, newline: '')
- codecs 10% FASTER than _pyio

_pyio.read(1): 3503.5 ms (2 loops, newline: '')
codecs.read(1): 6626.7 ms (2 loops, newline: '')
- codecs 89% slower than _pyio

_pyio.read(100): 2076.2 

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Benjamin Peterson
2011/5/27 Victor Stinner victor.stin...@haypocalc.com:
 You have until the release of Python 3.3 to prove that StreamReader and/or
 StreamWriter can be faster than TextIOWrapper. If you can prove it using a
 patch and a benchmark, I will be ok to revert my commit.

Please don't hold commits over someone's head.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread M.-A. Lemburg
Victor Stinner wrote:
 Le vendredi 27 mai 2011 10:17:29, M.-A. Lemburg a écrit :
 I am still -1 on deprecating the StreamReader/Writer parts of
 the codec APIs. I've given numerous reasons on why these are
 useful, what their intention is, why they were added to Python 1.6.
 
 codecs.open() now uses TextIOWrapper, so there is no good reason to keep 
 StreamReader or StreamWriter. You did not give me any use case where 
 StreamReader or StreamWriter should be used instead of TextIOWrapper. You 
 only 
 listed theorical optimizations.
 
 You have until the release of Python 3.3 to prove that StreamReader and/or 
 StreamWriter can be faster than TextIOWrapper. If you can prove it using a 
 patch and a benchmark, I will be ok to revert my commit.

Victor, please revert the change. It has *not* been approved !

If we'd go by your reasoning for deprecating and eventually
removing parts of the stdlib or Python's subsystems, we'll end
up with a barebone version of Python. That's not what we want
and it's not what our users want.

I have tried to explain the design decisions and reasons for
those codec APIs at great length. You've pretty much used up
my patience. If you are not going to revert the patch, I will.

 Since such a deprecation would change an important documented API,
 please write a PEP outlining your reasoning, including my comments,
 use cases and possibilities for optimizations.
 
 Ok, I will write on a PEP explaining why StreamReader and StreamWriter are 
 deprecated.

Wrong order: first write a PEP, then discuss, then get approval,
then patch.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 27 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-05-23: Released eGenix mx Base 3.2.0  http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1  http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy   24 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Nick Coghlan
On Fri, May 27, 2011 at 11:42 PM, M.-A. Lemburg m...@egenix.com wrote:

 Wrong order: first write a PEP, then discuss, then get approval,
 then patch.

Indeed.

If another committer says please revert and better justify this
change then we revert it. We don't get into commit wars.

Something does need to be done to resolve the duplication of
functionality between the io and codecs modules, but it is *far* from
clear that deprecating chunks of the longer standing API is the right
way to go about it. This is especially true given Guido's explicit
direction following the issues with the PyCObject removal in 3.2 that
we be *very* conservative about introducing additional
incompatibilities between Python 2 and Python 3.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Victor Stinner
Le vendredi 27 mai 2011 15:33:07, Benjamin Peterson a écrit :
 2011/5/27 Victor Stinner victor.stin...@haypocalc.com:
  You have until the release of Python 3.3 to prove that StreamReader
  and/or StreamWriter can be faster than TextIOWrapper. If you can prove
  it using a patch and a benchmark, I will be ok to revert my commit.
 
 Please don't hold commits over someone's head.

Tell me if I am wrong, but only Marc-Andre is against deprecating StreamReader 
and StreamWriter. Walter and Antoine are in favor of using TextIOWrapper 
instead of StreamReader/StreamWriter.

Different people would like to be able to call codecs.open() in Python 2 and 3, 
so I kept the function with its API unchanged, and I documented that open() 
should be preferred (but I did not deprecated codecs.open).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Benjamin Peterson
2011/5/27 Victor Stinner victor.stin...@haypocalc.com:
 Le vendredi 27 mai 2011 15:33:07, Benjamin Peterson a écrit :
 2011/5/27 Victor Stinner victor.stin...@haypocalc.com:
  You have until the release of Python 3.3 to prove that StreamReader
  and/or StreamWriter can be faster than TextIOWrapper. If you can prove
  it using a patch and a benchmark, I will be ok to revert my commit.

 Please don't hold commits over someone's head.

 Tell me if I am wrong, but only Marc-Andre is against deprecating StreamReader
 and StreamWriter. Walter and Antoine are in favor of using TextIOWrapper
 instead of StreamReader/StreamWriter.

I'm am too. There does, however, seem to be significant disagreement,
and it shouldn't be a race to see who can commit first.


 Different people would like to be able to call codecs.open() in Python 2 and 
 3,
 so I kept the function with its API unchanged, and I documented that open()
 should be preferred (but I did not deprecated codecs.open).




-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Victor Stinner
Le vendredi 27 mai 2011 16:01:14, Nick Coghlan a écrit :
 On Fri, May 27, 2011 at 11:42 PM, M.-A. Lemburg m...@egenix.com wrote:
  Wrong order: first write a PEP, then discuss, then get approval,
  then patch.
 
 Indeed.
 
 If another committer says please revert and better justify this
 change then we revert it. We don't get into commit wars.

I reverted my controversal commit.

 Something does need to be done to resolve the duplication of
 functionality between the io and codecs modules, but it is *far* from
 clear that deprecating chunks of the longer standing API is the right
 way to go about it.

Yes, StreamReader  friends are present in Python since Python 2.0.

 This is especially true given Guido's explicit
 direction following the issues with the PyCObject removal in 3.2 that
 we be *very* conservative about introducing additional
 incompatibilities between Python 2 and Python 3.

I did search for usage of these classes on the Internet, and except projects 
implementing their own codecs (and so implement their 
StreamReader/StreamWriter classes, even if they don't use it), I only found 
one project using directly StreamReader: pygment (*). I searched quickly, so 
don't trust these results :-) StreamReader  friends are used indirectly 
through codecs.open(). My patch changes codecs.open() to make it reuse open 
(io.TextIOWrapper), so the deprecation of StreamReader would not be noticed by 
most users.

I think that there are much more users of PyCObject than users using directly 
the StreamReader API (not through codecs.open()).

(*) I also found Sphinx, but I was wrong: it doesn't use StreamReader, it just 
has a full copy of the UTF-8-SIG codec which has a StreamReader class. I don't 
think that the class is used.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Victor Stinner
Le vendredi 27 mai 2011 15:42:10, M.-A. Lemburg a écrit :
 If we'd go by your reasoning for deprecating and eventually
 removing parts of the stdlib or Python's subsystems, we'll end
 up with a barebone version of Python. That's not what we want
 and it's not what our users want.

I don't want to deprecate the whole stdlib, just duplicate old API, to follow 
import this mantra:

There should be one-- and preferably only one --obvious way to do it.

It's difficult for an user to choose between between open() and codecs.open().

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread M.-A. Lemburg
Victor Stinner wrote:
 Le vendredi 27 mai 2011 15:42:10, M.-A. Lemburg a écrit :
 If we'd go by your reasoning for deprecating and eventually
 removing parts of the stdlib or Python's subsystems, we'll end
 up with a barebone version of Python. That's not what we want
 and it's not what our users want.
 
 I don't want to deprecate the whole stdlib, just duplicate old API, to follow 
 import this mantra:
 
 There should be one-- and preferably only one --obvious way to do it.

What people tend to miss in this mantra is the last part: obvious.
It doesn't say: there should only be one way to do it. There can
be many ways, but there should preferably be only one *obvious* way.

Using codec.open() is not obvious in Python3, since the standard
open() already provides a way to access an encoded stream. Using
a builtin is the obvious way to go.

It is obvious in Python2 where the standard open() doesn't provide a
way to define an encoding, so the user has to explicitly look for this
kind of API and then find it in the obvious (to some extent)
codecs module, since that's where encodings happen in Python2.

Having multiple ways to do things, is the most natural thing
on earth and it's good that way.

Python does not and should not force people into doing things
in one dictated right way. It should, however, provide
natural choices and obvious hints to find a good solution.
And that's what the Zen mantra is all about.

 It's difficult for an user to choose between between open() and codecs.open().

As I mentioned on the ticket and in my replies: I'm not against
changing codecs.open() to use a variant that is based on TextIOWrapper,
provided there are no user noticeable compatibility issues.

Thanks for reverting the patch.

Have a nice weekend,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 27 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-05-23: Released eGenix mx Base 3.2.0  http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1  http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy   24 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Terry Reedy

On 5/27/2011 11:08 AM, Victor Stinner wrote:


Tell me if I am wrong, but only Marc-Andre is against deprecating StreamReader


While I am, in general, in favor of removing some duplication, I was and 
am against doing this change precipitously. So I was for the reversion 
(noted), at least temporarily. Given the disagreement, I think there 
should be a PEP with pro and con arguments.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-27 Thread Nick Coghlan
On Sat, May 28, 2011 at 6:30 AM, Terry Reedy tjre...@udel.edu wrote:
 On 5/27/2011 11:08 AM, Victor Stinner wrote:

 Tell me if I am wrong, but only Marc-Andre is against deprecating
 StreamReader

 While I am, in general, in favor of removing some duplication, I was and am
 against doing this change precipitously. So I was for the reversion (noted),
 at least temporarily. Given the disagreement, I think there should be a PEP
 with pro and con arguments.

Indeed.

I'm also against any deprecation in this area, since that just means
needless work for anyone that *do* use these APIs (even if those
people are few and far between). If we can refactor to remove the
duplication of functionality, that's a *much* better solution.

If we can carry optparse style argument parsing and 2.x style string
formatting, we can carry a couple of legacy codec interface
definitions.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-25 Thread M.-A. Lemburg
Walter Dörwald wrote:
 On 24.05.11 12:58, Victor Stinner wrote:
 Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit :
 Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:

 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.

 This could be be partially fixed by implementing generic
 StreamReader/StreamWriter classes that reuse the incremental codecs, but
 I don't think thats worth it.

 Why not?

 We have already an implementation of this idea, it is called
 io.TextIOWrapper.
 
 Exactly.
 
 From another post by Victor:
 
 As I wrote, codecs.open() is useful in Python 2. But I don't know any
 program or library using directly StreamReader or StreamWriter.
 
 So: implementing this is a lot of work, duplicates existing
 functionality and is mostly unused.

You are missing the point: we have StreamReader and StreamWriter APIs
on codecs to allow each codecs to implement more efficient ways of
encoding and decoding streams.

Examples of such optimizations are reading the stream in
chunks that can be decoded in one piece, or writing to the stream
in a way that doesn't generate encoding state problems on the
receiving end by ending transmission half-way through a
shift block.

Of course, you won't find many direct uses of these APIs, since
most of the time, applications will simply use codecs.open() to
automatically benefit from these optimizations.

OTOH, TextIOWrapper doesn't know anything about specific encodings
and thus does not allow for such optimizations to be implemented
by codecs.

We don't have many such specialized implementations in the stdlib,
but this doesn't mean that there's no use for them. It
just means that developers and users are simply unaware of the
possibilities opened by these stateful stream APIs.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 25 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-05-23: Released eGenix mx Base 3.2.0  http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1  http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy   26 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-25 Thread Victor Stinner
Le mercredi 25 mai 2011 à 11:38 +0200, M.-A. Lemburg a écrit :
 You are missing the point: we have StreamReader and StreamWriter APIs
 on codecs to allow each codecs to implement more efficient ways of
 encoding and decoding streams.
 
 Examples of such optimizations are reading the stream in
 chunks that can be decoded in one piece, or writing to the stream
 in a way that doesn't generate encoding state problems on the
 receiving end by ending transmission half-way through a
 shift block.
 
 ...
 
 We don't have many such specialized implementations in the stdlib,
 but this doesn't mean that there's no use for them. It
 just means that developers and users are simply unaware of the
 possibilities opened by these stateful stream APIs.

Does at least one codec implement such implementation in its
StreamReader or StreamWriter class? And can't we implement such
optimization in incremental encoders and decoders (or in TextIOWrapper)?

I checked all multibyte codecs (UTF and CJK codecs) and I don't see any
of such optimization. UTF codecs handle the BOM, but don't have anything
looking like an optimization. CJK codecs use multibytecodec,
MultibyteStreamReader and MultibyteStreamWriter, which don't look to be
optimized. But I missed maybe something?

TextIOWrapper has an advanced buffer algorithm to prefetch (readahead)
some bytes at each read to speed up small read. It is difficult to
implement such algorithm, but it's done and it works.

--

Ok, let's stop to speak about theorical optimizations, and let's do a
benchmark to compare codecs and the io modules on reading files!

I tested Python 3.3 (70370:178d367c9733) compiled in release mode (gcc
-O3) on a Pentium4 @ 3 GHz with 2 GB of memory. I tunned manually the
number of loops to ensure that the faster test takes at least one
second. I only ran my benchmark once. See the attached bench.py file.


(1) Decode Objects/unicodeobject.c (317336 characters) from utf-8

test_io.readline(): 89.6 ms
test_codecs.readline(): 1272.8 ms
- codecs 1320% slower than io

test_io.read(1): 1728.9 ms
test_codecs.read(1): 36395.0 ms
- codecs 2005% slower than io

test_io.read(100): 460.7 ms
test_codecs.read(100): 3897.0 ms
- codecs 746% slower than io

test_io.read(-1): 1911.7 ms
test_codecs.read(-1): 1740.7 ms
- codecs 10% FASTER than io


(2) Decode README (6613 characters) from ascii

test_io.readline(): 109.9 ms
test_codecs.readline(): 1023.8 ms
- codecs 832% slower than io

test_io.read(1): 1560.4 ms
test_codecs.read(1): 29402.6 ms
- codecs 1784% slower than io

test_io.read(100): 866.9 ms
test_codecs.read(100): 3699.5 ms
- codecs 327% slower than io

test_io.read(-1): 5140.2 ms
test_codecs.read(-1): 4817.9 ms
- codecs 7% FASTER than io


(3) Decode Lib/test/cjkencodings/gb18030.txt (501 characters) from
gb18030

test_io.readline(): 1193.7 ms
test_codecs.readline(): 1474.3 ms
- codecs 24% slower than io

test_io.read(1): 3847.7 ms
test_codecs.read(1): 27103.9 ms
- codecs 604% slower than io

test_io.read(100): 12839.5 ms
test_codecs.read(100): 13444.2 ms
- codecs 5% slower than io

test_io.read(-1): 2183.3 ms
test_codecs.read(-1): 1906.1 ms
- codecs 15% FASTER than io


The readahead code does really help read(1): io is between 6 and 20
times faster than the codecs. But it does really use a more common
usecase, readline: io is between 1.2 and 13 times faster than the
codecs.

codecs is always faster (between 1.07 and 1.15 times faster than io) to
read the whole content of file using read(-1). Something should maybe be
optimized in TextIOWrapper.read() ;-) But the gain is minor if you
compare it to the gain on read(1) and readline()!

Please check my bench.py script and redo the benchmark on your own
computer!

Victor
import codecs
import sys
import time

FILENAME = Objects/unicodeobject.c; FILESIZE = 317336; ENCODING = 'utf-8'; LOOPS=10
FILENAME = Lib/test/cjkencodings/gb18030.txt; FILESIZE = 501; ENCODING = 'gb18030'; LOOPS=200

FILENAME = README; FILESIZE = 6613; ENCODING = 'ascii'; LOOPS=400

def bench(loops, func, *args):
t0=time.time()
for loop in range(loops):
func(*args)
dt = time.time() - t0
text = %s.%s % (func.__name__, test_func)
if chunk_size is not None:
text += (%s) % chunk_size
else:
text += ()
print(%s: %.1f ms % (text, dt * 1000))
return dt

def test_file(f, test_func, chunk_size):
size = 0
func = getattr(f, test_func)
while True:
if chunk_size is not None:
c = func(chunk_size)
else:
c = func()
if not c:
break
size += len(c)
assert size == FILESIZE, %s != %s % (size, FILESIZE)

def test_io(test_func, chunk_size):
with open(FILENAME, encoding=ENCODING) as f:
test_file(f, test_func, chunk_size)

def test_codecs(test_func, chunk_size):
with codecs.open(FILENAME, 'r', encoding=ENCODING) as f:
test_file(f, test_func, chunk_size)

print(Python %s % sys.version)
print(Decode %s (%s characters) from %s % 

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-25 Thread M.-A. Lemburg
Victor Stinner wrote:
 Le mercredi 25 mai 2011 à 11:38 +0200, M.-A. Lemburg a écrit :
 You are missing the point: we have StreamReader and StreamWriter APIs
 on codecs to allow each codecs to implement more efficient ways of
 encoding and decoding streams.

 Examples of such optimizations are reading the stream in
 chunks that can be decoded in one piece, or writing to the stream
 in a way that doesn't generate encoding state problems on the
 receiving end by ending transmission half-way through a
 shift block.

 ...

 We don't have many such specialized implementations in the stdlib,
 but this doesn't mean that there's no use for them. It
 just means that developers and users are simply unaware of the
 possibilities opened by these stateful stream APIs.
 
 Does at least one codec implement such implementation in its
 StreamReader or StreamWriter class? And can't we implement such
 optimization in incremental encoders and decoders (or in TextIOWrapper)?

I don't see how, since you need control over the file API methods
in order to implement such optimizations. OTOH, adding lots of
special cases to TextIOWrapper isn't a good either, since these
optimizations would then only trigger for a small number of
codecs and completely leave out 3rd party codecs.

 I checked all multibyte codecs (UTF and CJK codecs) and I don't see any
 of such optimization. UTF codecs handle the BOM, but don't have anything
 looking like an optimization. CJK codecs use multibytecodec,
 MultibyteStreamReader and MultibyteStreamWriter, which don't look to be
 optimized. But I missed maybe something?

No, you haven't missed such per-codec optimizations. The base classes
implement general purpose support for reading from streams in
chunks, but the support isn't optimized per codec.

For UTF-16 it would e.g. make sense to always read data in blocks
with even sizes, removing the trial-and-error decoding and extra
buffering currently done by the base classes. For UTF-32, the
blocks should have size % 4 == 0.

For UTF-8 (and other variable length encodings) it would make
sense looking at the end of the (bytes) data read from the
stream to see whether a complete code point was read or not,
rather than simply running the decoder on the complete data
set, only to find that a few bytes at the end are missing.

For single character encodings, it would make sense to prefetch
data in big chunks and skip all the trial and error decoding
implemented by the base classes to address the above problem
with variable length encodings.

Finally, all this could be implemented in C, reducing the
Python call overhead dramatically.

 TextIOWrapper has an advanced buffer algorithm to prefetch (readahead)
 some bytes at each read to speed up small read. It is difficult to
 implement such algorithm, but it's done and it works.
 
 --
 
 Ok, let's stop to speak about theorical optimizations, and let's do a
 benchmark to compare codecs and the io modules on reading files!

That's somewhat unfair: TextIOWrapper is implemented in C,
whereas the StreamReader/Writer subclasses used by the
codecs are written in Python.

A fair comparison would use the Python implementation of
TextIOWrapper.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 25 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-05-23: Released eGenix mx Base 3.2.0  http://python.egenix.com/
2011-05-25: Released mxODBC 3.1.1  http://python.egenix.com/
2011-06-20: EuroPython 2011, Florence, Italy   26 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-25 Thread Victor Stinner
Le mercredi 25 mai 2011 à 15:43 +0200, M.-A. Lemburg a écrit :
 For UTF-16 it would e.g. make sense to always read data in blocks
 with even sizes, removing the trial-and-error decoding and extra
 buffering currently done by the base classes. For UTF-32, the
 blocks should have size % 4 == 0.

 For UTF-8 (and other variable length encodings) it would make
 sense looking at the end of the (bytes) data read from the
 stream to see whether a complete code point was read or not,
 rather than simply running the decoder on the complete data
 set, only to find that a few bytes at the end are missing.

I think that the readahead algorithm is much more faster than trying to
avoid partial input, and it's not a problem to have partial input if you
use an incremental decoder.

 For single character encodings, it would make sense to prefetch
 data in big chunks and skip all the trial and error decoding
 implemented by the base classes to address the above problem
 with variable length encodings.

TextIOWrapper implements this optimization using its readahead
algorithm.

 That's somewhat unfair: TextIOWrapper is implemented in C,
 whereas the StreamReader/Writer subclasses used by the
 codecs are written in Python.
 
 A fair comparison would use the Python implementation of
 TextIOWrapper.

Do you mean that you would like to reimplement codecs in C? It is not
revelant to compare codecs and _pyio, because codecs reuses
BufferedReader (of the io module, not of the _pyio module), and io is
the main I/O module of Python 3.

But well, as you want, here is a benchmark comparing:
   _pyio.TextIOWrapper(io.open(filename, 'rb'), encoding)
and 
codecs.open(filename, encoding)

The only change with my previous bench.py script is the test_io()
function :

def test_io(test_func, chunk_size):
with open(FILENAME, 'rb') as buffered:
f = _pyio.TextIOWrapper(buffered, ENCODING)
test_file(f, test_func, chunk_size)
f.close()


(1) Decode Objects/unicodeobject.c (317336 characters) from utf-8

test_io.readline(): 1193.4 ms
test_codecs.readline(): 1267.9 ms
- codecs 6% slower than io

test_io.read(1): 21696.4 ms
test_codecs.read(1): 36027.2 ms
- codecs 66% slower than io

test_io.read(100): 3080.7 ms
test_codecs.read(100): 3901.7 ms
- codecs 27% slower than io

test_io.read(): 3991.0 ms
test_codecs.read(): 1736.9 ms
- codecs 130% FASTER than io


(2) Decode README (6613 characters) from ascii

test_io.readline(): 678.1 ms
test_codecs.readline(): 760.5 ms
- codecs 12% slower than io

test_io.read(1): 13533.2 ms
test_codecs.read(1): 21900.0 ms
- codecs 62% slower than io

test_io.read(100): 2663.1 ms
test_codecs.read(100): 3270.1 ms
- codecs 23% slower than io

test_io.read(): 6769.1 ms
test_codecs.read(): 3919.6 ms
- codecs 73% FASTER than io


(3) Decode Lib/test/cjkencodings/gb18030.txt (501 characters) from
gb18030

test_io.readline(): 38.9 ms
test_codecs.readline(): 15.1 ms
- codecs 157% FASTER than io

test_io.read(1): 369.8 ms
test_codecs.read(1): 302.2 ms
- codecs 22% FASTER than io

test_io.read(100): 258.2 ms
test_codecs.read(100): 155.1 ms
- codecs 67% FASTER than io

test_io.read(): 1803.2 ms
test_codecs.read(): 1002.9 ms
- codecs 80% FASTER than io


_pyio.TextIOWrapper is faster than codecs.StreamReader for readline(),
read(1) and read(100), with ASCII and UTF-8. It is slower for gb18030.

As in the io vs codecs benchmark, codecs.StreamReader is always faster
than _pyio.TextIOWrapper for read().

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-25 Thread Victor Stinner
Le mercredi 25 mai 2011 à 13:10 +0200, Victor Stinner a écrit :
 codecs is always faster (between 1.07 and 1.15 times faster than io) to
 read the whole content of file using read(-1). Something should maybe be
 optimized in TextIOWrapper.read() ;-)

Oh, I understood: it's maybe the universal newline mode of TextIOWrapper
was enabled. If you disable is using open(..., newline='\n'), io and
codecs run at the same speed to read the whole content of the file
(f.read()).

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 15:24 +1000, Nick Coghlan a écrit :
 On Tue, May 24, 2011 at 10:08 AM, Victor Stinner
 victor.stin...@haypocalc.com wrote:
  It's trivial to replace a call to codecs.open() by a call to open(),
  because the two API are very close. The main different is that
  codecs.open() doesn't support universal newline, so you have to use
  open(..., newline='') to keep the same behaviour (keep newlines
  unchanged). This task can be done by 2to3. But I suppose that most
  people will be happy with the universal newline mode.
 
 Is there any reason that codecs.open() can't become a thin wrapper
 around builtin open in 3.3?

Yes, it's trivial to implement codecs.open using:

def open(filename, mode='rb', encoding=None, errors='strict',
buffering=1):
return builtins.open(filename, mode, buffering, 
 encoding, errors, newline='')

But do you we really need two ways to open a file? Extract of import
this:
There should be one-- and preferably only one --obvious way to do it.

Another example: Python 3.2 has subprocess.Popen, os.popen and
platform.popen to open a subprocess. platform.popen is now deprecated in
Python 3.3. Well, it's already better than Python 2.5 which has
os.popen(), os.popen2(), os.popen3(), os.popen4(), os.spawnl(),
os.spawnle(), os.spawnlp(), os.spawnlpe(), os.spawnv(), os.spawnve(),
os.spawnvp(), os.spawnvpe(), subprocess.Popen, platform.popen and maybe
others :-)

 How API compatible is TextIOWrapper with StreamReader/StreamWriter?

It's fully compatible.

 How hard would it to be change them to be adapters over the main IO
 machinery rather than independent classes?

I don't understand your proposition. We don't need StreamReader and
StreamWriter to open a stream as a file text, only incremental decoders
and encoders. Why do you want to keep them?

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread M.-A. Lemburg
Victor Stinner wrote:
 Hi,
 
 In Python 2, codecs.open() is the best way to read and/or write files
 using Unicode. But in Python 3, open() is preferred with its fast io
 module. I would like to deprecate codecs.open() because it can be
 replaced by open() and io.TextIOWrapper. I would like your opinion and
 that's why I'm writing this email.

I think you should have moved this part of your email
further up, since it explains the reason why this idea was
rejected for now:

 I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
 want to deprecate codecs.open()  friends because they want to be able
 to write code working on Python 2 and on Python 3 without any change. I
 don't think it's realistic: nontrivial programs require at least the six
 module, and most likely the 2to3 program. The six module can have its
 codecs.open function if codecs.open is removed from Python 3.4.

And now for something completely different:

 codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
 classes of the codecs module don't support universal newlines, still
 have some issues with stateful codecs (like UTF-16/32 BOMs), and each
 codec has to implement a StreamReader and a StreamWriter class.
 
 StreamReader and StreamWriter are stateless codecs (no reset() or
 setstate() method), and so it's not possible to write a generic fix for
 all child classes in the codecs module. Each stateful codec has to
 handle special cases like seek() problems. For example, UTF-16 codec
 duplicates some IncrementalEncoder/IncrementalDecoder code into its
 StreamWriter/StreamReader class.

Please read PEP 100 regarding StreamReader and StreamWriter.
Those codecs parts were explicitly designed to be stateful,
unlike the stateless encoder/decoder methods.

Please read my reply on the ticket:


StreamReader and StreamWriter classes provide the base codec
implementations for stateful interaction with streams. They
define the interface and provide a working implementation for
those codecs that choose not to implement their own variants.

Each codec can, however, implement variants which are optimized
for the specific encoding or intercept certain stream methods
to add functionality or improve the encoding/decoding
performance.

Both are essential parts of the codec interface.

TextIOWrapper and StreamReaderWriter are merely wrappers
around streams that make use of the codecs. They don't
provide any codec logic themselves. That's the conceptual
difference.


 The io module is well tested, supports non-seekable streams, handles
 correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
 newlines including an universal newline mode. TextIOWrapper reuses
 incremental encoders and decoders, so BOM issues were fixed only once,
 in TextIOWrapper.
 
 It's trivial to replace a call to codecs.open() by a call to open(),
 because the two API are very close. The main different is that
 codecs.open() doesn't support universal newline, so you have to use
 open(..., newline='') to keep the same behaviour (keep newlines
 unchanged). This task can be done by 2to3. But I suppose that most
 people will be happy with the universal newline mode.
 
 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.

This is a misunderstanding of the concepts behind the two.

StreamReader and StreamWriters are implemented by the codecs,
they are part of the API that each codec has to provide in order
to register in the Python codecs system. Their purpose is
to provide a stateful interface and work efficiently and
directly on streams rather than buffers.

Here's my reply from the ticket regarding using incremental
encoders/decoders for the StreamReader/Writer parts of the
codec set of APIs:


The point about having them use incremental codecs for encoding and decoding is 
a good one and would
need to be investigated. If possible, we could use incremental 
encoders/decoders for the standard
StreamReader/Writer base classes or add new IncrementalStreamReader/Writer 
classes which then use
the IncrementalEncode/Decoder per default.

Please open a new ticket for this.


 StreamReader, StreamWriter, StreamReaderEncoder and EncodedFile are not
 used in the Python 3 standard library. I tried removed them: except
 tests of test_codecs which test them directly, the full test suite pass.

 Read the issue for more information: http://bugs.python.org/issue8796

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 24 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-06-20: EuroPython 2011, Florence, Italy   27 days to go

::: Try our new mxODBC.Connect Python Database Interface 

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Vinay Sajip
Victor Stinner victor.stinner at haypocalc.com writes:

 I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
 want to deprecate codecs.open()  friends because they want to be able
 to write code working on Python 2 and on Python 3 without any change. I
 don't think it's realistic: nontrivial programs require at least the six
 module, and most likely the 2to3 program. The six module can have its
 codecs.open function if codecs.open is removed from Python 3.4.

What's non-trivial? Both pip and virtualenv (widely used programs) were ported
to Python 3 using a single codebase for 2.x and 3.x, because it seemed to
involve the least ongoing maintenance burden. Though these particular programs
don't use codecs.open, I don't see much value in making it harder to write
programs which can run under both 2.x and 3.x; that's not going to speed
adoption of 3.x.

I find 2to3 very useful indeed for showing where changes may need to be made for
2.x/3.x portability, but do not use it as an automatic conversion tool. The six
module is very useful, too, but some projects won't necessarily want to add it
as an additional dependency, and reimplement just the parts they need from that
bag of tricks.

So I would also want to keep codecs.open() and friends, at least for now -
though it makes seems to make sense to implement them as wrappers (as Nick
suggested).

Regards,

Vinay Sajip


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 08:16 +, Vinay Sajip a écrit :
 So I would also want to keep codecs.open() and friends, at least for now

Well, I would agree to keep codecs.open() (if we patch it to reuse
TextIOWrapper and add a note to say that it is kept for backward
compatibiltiy and open() should be preferred in Python 3), but deprecate
StreamReader, StreamWriter and EncodedFile.

As I wrote, codecs.open() is useful in Python 2. But I don't know any
program or library using directly StreamReader or StreamWriter.

I found some projects (ex: twisted-mail, feeds2imap, pyflag, pygsm, ...)
implementing their own Python codec (cool!) and their codec has their
StreamReader and StreamWriter class, but I don't think that these
classes are used.

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit :
 Please read PEP 100 regarding StreamReader and StreamWriter.
 Those codecs parts were explicitly designed to be stateful,
 unlike the stateless encoder/decoder methods.

Yes, it is possible to implement stateful StreamReader and StreamWriter
classes and we have such codecs (I gave the example of UTF-16), but the
state is not exposed (getstate / setstate), and so it's not possible to
write generic code to handle the codec state in the base StreamReader
and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0)
for example.

 Each codec can, however, implement variants which are optimized
 for the specific encoding or intercept certain stream methods
 to add functionality or improve the encoding/decoding
 performance.

Can you give me some examples?

 TextIOWrapper and StreamReaderWriter are merely wrappers
 around streams that make use of the codecs. They don't
 provide any codec logic themselves. That's the conceptual
 difference.
 ...
 StreamReader and StreamWriters ... work efficiently and
 directly on streams rather than buffers.

StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
The implementation is maybe different, but the API is just the same, and
so the usecases are just the same.

I don't see in which case I should use StreamReader or StreamWriter
instead TextIOWrapper. I thought that TextIOWrapper is specific to files
on disk, but TextIOWrapper is already used for other usages like
sockets.

 Here's my reply from the ticket regarding using incremental
 encoders/decoders for the StreamReader/Writer parts of the
 codec set of APIs:
 
 
 The point about having them use incremental codecs for encoding and
 decoding is a good one and would
 need to be investigated. If possible, we could use incremental
 encoders/decoders for the standard
 StreamReader/Writer base classes or add new
 IncrementalStreamReader/Writer classes which then use
 the IncrementalEncode/Decoder per default.

Why do you want to write a duplicate feature? TextIOWrapper is already
here, it's working and widely used.

I am working on codec issues (like CJK encodings, see #12100, #12057,
#12016) and I would like to remove StreamReader and StreamWriter to have
*less* code to maintain.

If you want to add more code, will be available to maintain it? It looks
like you are busy, some people (not me ;-)) are still
waiting .transform()/.untransform()!

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread M.-A. Lemburg
Victor Stinner wrote:
 Le mardi 24 mai 2011 à 10:03 +0200, M.-A. Lemburg a écrit :
 Please read PEP 100 regarding StreamReader and StreamWriter.
 Those codecs parts were explicitly designed to be stateful,
 unlike the stateless encoder/decoder methods.
 
 Yes, it is possible to implement stateful StreamReader and StreamWriter
 classes and we have such codecs (I gave the example of UTF-16), but the
 state is not exposed (getstate / setstate), and so it's not possible to
 write generic code to handle the codec state in the base StreamReader
 and StreamWriter classes. io.TextIOWrapper requires encoder.setstate(0)
 for example.

So instead of always suggesting to deprecate everything,
how about you come up with a proposal to add meaningful
new methods to those base classes ?

 Each codec can, however, implement variants which are optimized
 for the specific encoding or intercept certain stream methods
 to add functionality or improve the encoding/decoding
 performance.
 
 Can you give me some examples?

See the UTF-16 codec in the stdlib for example. This uses
some of the available possibilities to interpret the BOM mark
and then switches the encoder/decoder methods accordingly.

A lot more could be done for other variable length encoding
codecs, e.g. UTF-8, since these often have problems near
the end of a read due to missing bytes.

The base class implementation provides a general purpose
implementation to cover the case, but it's not efficient,
since it doesn't know anything about the encoding
characteristics.

Such an implementation would have to be done per codec
and that's why we have per codec StreamReader/Writer
APIs.

 TextIOWrapper and StreamReaderWriter are merely wrappers
 around streams that make use of the codecs. They don't
 provide any codec logic themselves. That's the conceptual
 difference.
 ...
 StreamReader and StreamWriters ... work efficiently and
 directly on streams rather than buffers.
 
 StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
 have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
 The implementation is maybe different, but the API is just the same, and
 so the usecases are just the same.
 
 I don't see in which case I should use StreamReader or StreamWriter
 instead TextIOWrapper. I thought that TextIOWrapper is specific to files
 on disk, but TextIOWrapper is already used for other usages like
 sockets.

I have no idea why TextIOWrapper was added to the stdlib
instead of making StreamReaderWriter more capable,
since StreamReaderWriter had already been available in Python
since Python 1.6 (and this is being used by codecs.open()).

Perhaps we should deprecate TextIOWrapper instead and
replace it with codecs.StreamReaderWriter ? ;-)

Seriously, I don't see use of TextIOWrapper as an argument
for removing StreamReader/Writer parts of the codecs API.

 Here's my reply from the ticket regarding using incremental
 encoders/decoders for the StreamReader/Writer parts of the
 codec set of APIs:

 
 The point about having them use incremental codecs for encoding and
 decoding is a good one and would
 need to be investigated. If possible, we could use incremental
 encoders/decoders for the standard
 StreamReader/Writer base classes or add new
 IncrementalStreamReader/Writer classes which then use
 the IncrementalEncode/Decoder per default.
 
 Why do you want to write a duplicate feature? TextIOWrapper is already
 here, it's working and widely used.

See above and please also try to understand why we have per-codec
implementations for streams. I'm tired of repeating myself.

I would much prefer to see the codec-specific functionality
in TextIOWrapper added back to the codecs where it
belongs.

 I am working on codec issues (like CJK encodings, see #12100, #12057,
 #12016) and I would like to remove StreamReader and StreamWriter to have
 *less* code to maintain.

 If you want to add more code, will be available to maintain it? It looks
 like you are busy, some people (not me ;-)) are still
 waiting .transform()/.untransform()!

I dropped the ball on the idea after the strong wave of
comments against those methods. People will simply have
to use codecs.encode() and codecs.decode().

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 24 2011)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2011-06-20: EuroPython 2011, Florence, Italy   27 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   

Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Nick Coghlan
On Tue, May 24, 2011 at 6:58 PM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 StreamReader, StreamWriter, TextIOWrapper and StreamReaderWriter all
 have a file-like API: tell(), seek(), read(),  readline(), write(), etc.
 The implementation is maybe different, but the API is just the same, and
 so the usecases are just the same.

 I don't see in which case I should use StreamReader or StreamWriter
 instead TextIOWrapper. I thought that TextIOWrapper is specific to files
 on disk, but TextIOWrapper is already used for other usages like
 sockets.

Back up a step here. It's important to remember that the codecs module
*long* predates the existence of the Python 3 I/O model and the io
module in particular.

Just as PEP 302 defines how module importers should be written, PEP
100 defines how text codecs should be written (i.e. in terms of
StreamReader and StreamWriter).

PEP 3116 then defines how such codecs can be used as part of the
overall I/O stack as redesigned for Python 3.

Now, there may be an opportunity here to rationalise things a bit and
re-use the *new* io module interfaces as the basis for an updated
codec API PEP, but we shouldn't be hasty in deprecating an old API
that is about how to write codecs just because it is similar to a
shiny new one that is about how to process I/O data.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Walter Dörwald
On 24.05.11 02:08, Victor Stinner wrote:

 [...]
 codecs.open() and StreamReader, StreamWriter and StreamReaderWriter
 classes of the codecs module don't support universal newlines, still
 have some issues with stateful codecs (like UTF-16/32 BOMs), and each
 codec has to implement a StreamReader and a StreamWriter class.
 
 StreamReader and StreamWriter are stateless codecs (no reset() or
 setstate() method),

They *are* stateful, they just don't expose their state to the public.

 and so it's not possible to write a generic fix for
 all child classes in the codecs module. Each stateful codec has to
 handle special cases like seek() problems.

Yes, which in theory makes it possible to implement shortcuts for
certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the
character position by 4 to get the byte position). However AFAICR none
of the readers/writers does that.

 For example, UTF-16 codec
 duplicates some IncrementalEncoder/IncrementalDecoder code into its
 StreamWriter/StreamReader class.

Actually it's the other way round: When I implemented the incremental
codecs, I copied code from the StreamReader/StreamWriter classes.

 The io module is well tested, supports non-seekable streams, handles
 correctly corner-cases (like UTF-16/32 BOMs) and supports any kind of
 newlines including an universal newline mode. TextIOWrapper reuses
 incremental encoders and decoders, so BOM issues were fixed only once,
 in TextIOWrapper.
 
 It's trivial to replace a call to codecs.open() by a call to open(),
 because the two API are very close. The main different is that
 codecs.open() doesn't support universal newline, so you have to use
 open(..., newline='') to keep the same behaviour (keep newlines
 unchanged). This task can be done by 2to3. But I suppose that most
 people will be happy with the universal newline mode.
 
 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.

This could be be partially fixed by implementing generic
StreamReader/StreamWriter classes that reuse the incremental codecs, but
I don't think thats worth it.

 [...] 

Servus,
   Walter
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 20:25:11 +1000
Nick Coghlan ncogh...@gmail.com wrote:
 
 Just as PEP 302 defines how module importers should be written, PEP
 100 defines how text codecs should be written (i.e. in terms of
 StreamReader and StreamWriter).
 
 PEP 3116 then defines how such codecs can be used as part of the
 overall I/O stack as redesigned for Python 3.

The I/O stack doesn't use StreamReader and StreamWriter. That's the
whole point. Stream* have been made useless by the new I/O stack.

 Now, there may be an opportunity here to rationalise things a bit and
 re-use the *new* io module interfaces as the basis for an updated
 codec API PEP, but we shouldn't be hasty in deprecating an old API
 that is about how to write codecs just because it is similar to a
 shiny new one that is about how to process I/O data.

Ok, can you explain us the difference, concretely?

Thanks

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Łukasz Langa

Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:

 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.
 
 This could be be partially fixed by implementing generic
 StreamReader/StreamWriter classes that reuse the incremental codecs, but
 I don't think thats worth it.

Why not?

-- 
Best regards,
Łukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 08:16 +, Vinay Sajip a écrit :
  I opened an issue for this idea. Brett and Marc-Andree Lemburg don't
  want to deprecate codecs.open()  friends because they want to be able
  to write code working on Python 2 and on Python 3 without any change. I
  don't think it's realistic: nontrivial programs require at least the six
  module, and most likely the 2to3 program. The six module can have its
  codecs.open function if codecs.open is removed from Python 3.4.
 
 What's non-trivial? Both pip and virtualenv (widely used programs) were 
 ported
 to Python 3 using a single codebase for 2.x and 3.x, because it seemed to
 involve the least ongoing maintenance burden. Though these particular programs
 don't use codecs.open, I don't see much value in making it harder to write
 programs which can run under both 2.x and 3.x; that's not going to speed
 adoption of 3.x.

pip has a pip.backwardcompat module which is vey similar to six. If
codecs.open() is deprecated or removed, it will be trivial to add a
wrapper for codecs.open() or open() to six and pip.backwardcompat.
virtualenv.py starts also with a thin compatibility layer.

But yes, each program using a compatibily layer/module will have to be
updated.

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Antoine Pitrou
On Tue, 24 May 2011 12:16:49 +0200
Walter Dörwald wal...@livinglogic.de wrote:
 
  and so it's not possible to write a generic fix for
  all child classes in the codecs module. Each stateful codec has to
  handle special cases like seek() problems.
 
 Yes, which in theory makes it possible to implement shortcuts for
 certain codecs (e.g. the UTF-32-BE/LE codecs could simply multiply the
 character position by 4 to get the byte position). However AFAICR none
 of the readers/writers does that.

And in practice, TextIOWrapper.tell() does a similar optimization in
a generic way. I'm linking to the Python implementation for readability:
http://hg.python.org/cpython/file/5c716437a83a/Lib/_pyio.py#l1741

TextIOWrapper.seek() is straightforward due to the structure of the
integer cookie returned by TextIOWrapper.tell().

In practice, TextIOWrapper gets much more love than
Stream{Reader,Writer} because it's an essential part of the new I/O
stack. As Victor said, problems which Stream* have had for years are
solved neatly in TextIOWrapper.

Therefore, leaving Stream{Reader,Writer} in is not a matter of choice
and freedom given to users. It's giving people the misleading
possibility of using non-optimized, poorly debugged, less featureful
implementations of the same basic idea (an unicode stream abstraction).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Victor Stinner
Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit :
 Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:
 
  I don't see which usecase is not covered by TextIOWrapper. But I know
  some cases which are not supported by StreamReader/StreamWriter.
  
  This could be be partially fixed by implementing generic
  StreamReader/StreamWriter classes that reuse the incremental codecs, but
  I don't think thats worth it.
 
 Why not?

We have already an implementation of this idea, it is called
io.TextIOWrapper.

Victor

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Walter Dörwald
On 24.05.11 12:58, Victor Stinner wrote:
 Le mardi 24 mai 2011 à 12:42 +0200, Łukasz Langa a écrit :
 Wiadomość napisana przez Walter Dörwald w dniu 2011-05-24, o godz. 12:16:

 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.

 This could be be partially fixed by implementing generic
 StreamReader/StreamWriter classes that reuse the incremental codecs, but
 I don't think thats worth it.

 Why not?
 
 We have already an implementation of this idea, it is called
 io.TextIOWrapper.

Exactly.

From another post by Victor:

 As I wrote, codecs.open() is useful in Python 2. But I don't know any
 program or library using directly StreamReader or StreamWriter.

So: implementing this is a lot of work, duplicates existing
functionality and is mostly unused.

Servus,
   Walter




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Terry Reedy

On 5/24/2011 6:14 AM, M.-A. Lemburg wrote:


I have no idea why TextIOWrapper was added to the stdlib
instead of making StreamReaderWriter more capable,
since StreamReaderWriter had already been available in Python
since Python 1.6 (and this is being used by codecs.open()).


As I understand it, you (and others) wrote codecs long ago and recently 
other people wrote the new i/o stack, which sometimes uses codecs, and 
when they needed to add a few details, they 'naturally' added them to 
the module they were working on and understood (and planned to rewrite 
in C) rather than to the older module that they maybe did not completely 
understand and which is only in Python.


The Victor comes along to do maintenance on some of the Asian codecs and 
discovers that he needs to make changes in two (or more?) places rather 
than one, which he naturally finds unsatifactory.



Perhaps we should deprecate TextIOWrapper instead and
replace it with codecs.StreamReaderWriter ? ;-)


I think we should separate two issues: removing internal implementation 
duplication and removing external api duplication. I should think that 
the former should not be too controversial. The latter, I know, is more 
contentious. One problem is that stdlib changes that perhaps 'should' 
have been made in 3.0/1 could not be discovered until the moratorium and 
greater focus on the stdlib.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-24 Thread Martin (gzlist)
On 24/05/2011, Victor Stinner victor.stin...@haypocalc.com wrote:

 In Python 2, codecs.open() is the best way to read and/or write files
 using Unicode. But in Python 3, open() is preferred with its fast io
 module. I would like to deprecate codecs.open() because it can be
 replaced by open() and io.TextIOWrapper. I would like your opinion and
 that's why I'm writing this email.

There are some modules that try to stay compatible with Python 2 and 3
without a source translation step. Removing the codecs classes would
mean they'd have to add a few more compatibility hacks, but could be
done.

As an aside, I'm still not sure how the io module should be used.
Example, a simple task I've used StreamWriter classes for is to wrap
stdout. If the stdout.encoding can't represent a character, using
replace means you can write any unicode string without throwing a
UnicodeEncodeError.

With the io module, it seems you need to construct a new TextIOWrapper
object, passing the attributes of the old one as parameters, and as
soon as someone passes something that's not a TextIOWrapper (say, a
StringIO object) your code breaks. Is the intention that code dealing
with streams needs to be covered in isinstance checks in Python 3?

Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Deprecate codecs.open() and StreamWriter/StreamReader

2011-05-23 Thread Nick Coghlan
On Tue, May 24, 2011 at 10:08 AM, Victor Stinner
victor.stin...@haypocalc.com wrote:
 It's trivial to replace a call to codecs.open() by a call to open(),
 because the two API are very close. The main different is that
 codecs.open() doesn't support universal newline, so you have to use
 open(..., newline='') to keep the same behaviour (keep newlines
 unchanged). This task can be done by 2to3. But I suppose that most
 people will be happy with the universal newline mode.

Is there any reason that codecs.open() can't become a thin wrapper
around builtin open in 3.3?

 I don't see which usecase is not covered by TextIOWrapper. But I know
 some cases which are not supported by StreamReader/StreamWriter.

How API compatible is TextIOWrapper with StreamReader/StreamWriter?
How hard would it to be change them to be adapters over the main IO
machinery rather than independent classes?

Rather than deprecating them, that seems like a more profitable
direction to take them.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com