[issue4757] reject unicode in zlib

2010-01-09 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 The patch was committed to py3k and 3.1. Thank you!

r76836 and r76838

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread flox

flox la...@yahoo.fr added the comment:

Definitely, zlib.compress should raise a TypeError (like bz2 does).

 import bz2, zlib
 bz2.compress('abc')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: argument 1 must be bytes or buffer, not str
 zlib.compress('abc')
bx\x9cKLJ\x06\x00\x02M\x01'

Someone can review the patch and merge it?

--
nosy: +flox
versions: +Python 3.2 -Python 3.0

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

The patch lacks a test that TypeError is raised on unicode input,
otherwise it's fine.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread flox

flox la...@yahoo.fr added the comment:

Patch from haypo updated for r76830 .

Additional tests for TypeError, and to check that bytearray objects are
accepted.

--
Added file: http://bugs.python.org/file1/issue4757_zlib_bytes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

The patch produces a number of errors in test_tarfile, test_distutils,
test_gzip and test_xmlrpc.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread flox

flox la...@yahoo.fr added the comment:

Fixed.

And some bytearray tests improved in test_zlib.

--
Added file: http://bugs.python.org/file15556/issue4757_zlib_bytes_v2.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
assignee:  - pitrou
resolution:  - accepted

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

The patch was committed to py3k and 3.1. Thank you!

--
resolution: accepted - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-12-14 Thread flox

Changes by flox la...@yahoo.fr:


Removed file: http://bugs.python.org/file1/issue4757_zlib_bytes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-01-05 Thread Marc-Andre Lemburg

Marc-Andre Lemburg m...@egenix.com added the comment:

On 2009-01-04 23:51, STINNER Victor wrote:
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 The fact that Python 2.x also accepts Unicode ASCII strings 
 where strings are normally expected is intended to help with 
 the migration to Unicode
 
 I hate this behaviour. It doesn't help migration, it's the opposite! 
 Sometimes 
 it works (ASCII), and somtimes it fails (just one non-ASCII character). And 
 then we will read Unicode sucks! because people doesn't understand the 
 error.

Well, that's your opinion.

The feature was added to get people
work with Unicode at all, since otherwise we would have had to do
all the Unicode porting we're doing now for Python 3 at the time
Unicode was introduced - which was in version Python 1.6, eight years
ago.

At the time the Python community was a lot smaller and there wasn't
all that much interest in Unicode anyway - the Unicode support I wrote
for Python 1.6 was partially financed by HP which needed it for an
application they had written in Python.

See the introduction in PEP 100 for the motivation behind the design
decisions:

http://www.python.org/dev/peps/pep-0100/

 In Python 3.x, it's probably better to use bytes throughout the
 API.
 
 I propose to reject unicode in Python 3.x and display a warning for Python 
 2.x. A warning to prepare the migration... not to Unicode, but to Python3 ;-)

Fair enough.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-01-05 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

  I propose to reject unicode in Python 3.x and display a warning for
  Python 2.x. A warning to prepare the migration... not to Unicode, but to
  Python3 ;-)

 Fair enough.

The patch for Python 3.x is already attached to this issue. We might only 
apply this one and leave Python 2.x unchanged. Can someone review the patch?

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-01-04 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 The fact that Python 2.x also accepts Unicode ASCII strings 
 where strings are normally expected is intended to help with 
 the migration to Unicode

I hate this behaviour. It doesn't help migration, it's the opposite! Sometimes 
it works (ASCII), and somtimes it fails (just one non-ASCII character). And 
then we will read Unicode sucks! because people doesn't understand the 
error.

 In Python 3.x, it's probably better to use bytes throughout the
 API.

I propose to reject unicode in Python 3.x and display a warning for Python 
2.x. A warning to prepare the migration... not to Unicode, but to Python3 ;-)

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2009-01-04 Thread Lukas Lueg

Lukas Lueg knabberknusperh...@yahoo.de added the comment:

The current behaviour may help the majority by ignorance and cause weird
errors for others. We tell people that Python distincts between Text and
Data but actually treat it all the same by implicit encoding.

Modules that only operate on Bytes should reject Unicode-objects in
Python3; it's a matter of 3 lines to display a warning in Python 2.
Those modules that usually operate on Text but have single functions
that operate on Bytes should display a warning but not enforce explicit
encoding.

Also see #4821 and #4818 where unicode already got rejected by the
openssl-driven classes but silently accepted by the build-in ones.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2008-12-27 Thread STINNER Victor

New submission from STINNER Victor victor.stin...@haypocalc.com:

Python 2.x allows to encode any byte string (str) and ASCII unicode 
string (unicode):

$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
 import zlib
 zlib.compress('abc')
x\x9cKLJ\x06\x00\x02M\x01'
 zlib.compress(u'abc')
x\x9cKLJ\x06\x00\x02M\x01'
 zlib.compress(u'abc\xe9')
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' ...

I'm not sure that this behaviour was really wanted become the 
decompress operation is not symetric (the result type is always byte 
string):

$ python
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
 import zlib
 zlib.decompress(x\x9cKLJ\x06\x00\x02M\x01')
'abc'

---

Python 3.0 accepts any string: bytes or characters. But decompress 
always produce bytes string:

$ ./python
Python 3.1a0 (py3k:67926M, Dec 26 2008, 23:59:07)
 import zlib
 zlib.compress(b'abc')
bx\x9cKLJ\x06\x00\x02M\x01'
 zlib.compress('abc')
bx\x9cKLJ\x06\x00\x02M\x01'
 zlib.compress('abc\xe9')
b'x\x9cKLJ\xbc\x12\x00\x06\xca\x02\x93'
 zlib.compress('abc\xe9'.encode('utf-8'))
b'x\x9cKLJ\xbc\x12\x00\x06\xca\x02\x93'
 zlib.decompress(b'x\x9cKLJ\xbc\x12\x00\x06\xca\x02\x93')
b'abc\xc3\xa9'

The most strange operation is the decompression of an unicode string:

$ ./python
 zlib.decompress('x\x9cKLJ\xbc\x12\x00\x06\xca\x02\x93')
...
zlib.error: Error -3 while decompressing data: incorrect header check

---

I propose to change zlib API to reject unicode string and use explicit 
conversion to/from bytes. Functions/methods:
 - compress(bytes, ...)
 - decompress(bytes, ...)
 - compress object.compress(bytes, ...)
 - decompress object.decompress(bytes, ...)
 - crc32(bytes, value=0)
 - adler(bytes, value=1)

Note: binascii.crc32() already rejects unicode string.

The behaviour may kept in Python 3.0.x and only changed in Python 3.1.

--
components: Extension Modules
files: zlib_bytes.patch
keywords: patch
messages: 78356
nosy: haypo
severity: normal
status: open
title: reject unicode in zlib
type: behavior
versions: Python 3.0, Python 3.1
Added file: http://bugs.python.org/file12472/zlib_bytes.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2008-12-27 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

See also issue #4738 (better threads support in zlib).

--
nosy: +ebfe, pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4757] reject unicode in zlib

2008-12-27 Thread ebfe

ebfe knabberknusperh...@yahoo.de added the comment:

I don't think Python 2.x should be changed - but 3.0 or 3.1 should be:

 - Characters don't mean a thing in zlib-land, all operations are based
on bytes and their (implicit) default encoding. This behaviour is hidden
and somewhat violates the rule of least surprise.
 - type(zlib.decompress(zlib.compress('abc'))) == bytes anyway
 - Changing from s* to y* forces the programmer to use .encode() on his
strings (e.g. zlib.compress('abc'.encode()) which very clearly shows
what's happening. If you want to compress and decompress Python3
strings, you *must* share the same character encoding; think of
zlib.compress('hôńè') and str(zlib.decompress(x)) with different locales.
 - Other modules (hashlib comes to my mind...) already reject Unicode
objects for the same argument.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4757
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com