[Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Jason Orendorff
Instead of byte literals, how about a classmethod bytes.from_hex(), which works like this:

  # two equivalent things
  expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

  expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227, 131, 79, 229, 201, 46, 106])

It's just a nicety; the former fits my brain a little better.  This would work fine both in 2.5 and in 3.0.

I thought about unicode.encode('hex'), but obviously it will continue
to return a str in 2.x, not bytes.  Also the pseudo-encodings
('hex', 'rot13', 'zip', 'uu', etc.) generally scare me.  And now
that bytes and text are going to be two very different types, they're
even weirder than before.  Consider:

  text.encode('utf-8') ==> bytes
  text.encode('rot13') ==> text
  bytes.encode('zip') ==> bytes
  bytes.encode('uu') ==> text (?)

This state of affairs seems kind of crazy to me.

Actually users trying to figure out Unicode would probably be better served if bytes.encode() and text.decode() did not exist.

-j

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Guido van Rossum
On 2/15/06, Jason Orendorff <[EMAIL PROTECTED]> wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
>
># two equivalent things
>expected_md5_hash =
> bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
>
>  It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.

Yes, this looks nice.

>  I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me.  And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
>
>text.encode('utf-8') ==> bytes
>text.encode('rot13') ==> text
>bytes.encode('zip') ==> bytes
>bytes.encode('uu') ==> text (?)
>
>  This state of affairs seems kind of crazy to me.
>
>  Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

Yeah, the pseudogeneralizations seem to be a mistake -- they are
almost universally frowned upon. I'll happily send their to their
grave in Py3k.

It would be better if the signature of text.encode() always returned a
bytes object. But why deny the bytes object a decode() method if text
objects have an encode() method?

I'd say there are two "symmetric" API flavors possible (t and b are
text and bytes objects, respectively, where text is a string type,
either str or unicode; enc is an encoding name):

- b.decode(enc) -> t; t.encode(enc) -> b
- b = bytes(t, enc); t = text(b, enc)

I'm not sure why one flavor would be preferred over the other,
although having both would probably be a mistake.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Thomas Heller
Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
> 
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

I hope this will also be equivalent:
>   expected_md5_hash = bytes.from_hex('5c 53 50 24 ca c5 19 91 53 e3 83 4f e5 
> c9 2e 6a')

Thomas

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Martin v. Löwis
Jason Orendorff wrote:
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')

This looks good, although it duplicates

expected_md5_hash = binascii.unhexlify('5c535024cac5199153e3834fe5c92e6a')

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread M.-A. Lemburg
Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(), which
> works like this:
> 
>   # two equivalent things
>   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
> 131, 79, 229, 201, 46, 106])
> 
> It's just a nicety; the former fits my brain a little better.  This would
> work fine both in 2.5 and in 3.0.
> 
> I thought about unicode.encode('hex'), but obviously it will continue to
> return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
> 'zip', 'uu', etc.) generally scare me. 

Those are not pseudo-encodings, they are regular codecs.

It's a common misunderstanding that codecs are only seen as serving
the purpose of converting between Unicode and strings.

The codec system is deliberately designed to be general enough
to also work with many other types, e.g. it is easily possible to
write a codec that convert between the hex literal sequence you
have above to a list of ordinals:

""" Hex string codec

Converts between a list of ordinals and a two byte hex literal
string.

Usage:
>>> codecs.encode([1,2,3], 'hexstring')
'010203'
>>> codecs.decode(_, 'hexstring')
[1, 2, 3]

(c) 2006, Marc-Andre Lemburg.

"""
import codecs

class Codec(codecs.Codec):

def encode(self, input, errors='strict'):

""" Convert hex ordinal list to hex literal string.
"""
if not isinstance(input, list):
raise TypeError('expected list of integers')
return (
''.join(['%02x' % x for x in input]),
len(input))

def decode(self,input,errors='strict'):

""" Convert hex literal string to hex ordinal list.
"""
if not isinstance(input, str):
raise TypeError('expected string of hex literals')
size = len(input)
if not size % 2 == 0:
raise TypeError('input string has uneven length')
return (
[int(input[(i<<1):(i<<1)+2], 16)
 for i in range(size >> 1)],
size)

class StreamWriter(Codec,codecs.StreamWriter):
pass

class StreamReader(Codec,codecs.StreamReader):
pass

def getregentry():
return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

> And now that bytes and text are
> going to be two very different types, they're even weirder than before.
> Consider:
> 
>   text.encode('utf-8') ==> bytes
>   text.encode('rot13') ==> text
>   bytes.encode('zip') ==> bytes
>   bytes.encode('uu') ==> text (?)
> 
> This state of affairs seems kind of crazy to me.

Really ?

It all depends on what you use the codecs for. The above
usages through the .encode() and .decode() methods is
not the only way you can make use of them.

To get full access to the codecs, you'll have to use
the codecs module.

> Actually users trying to figure out Unicode would probably be better served
> if bytes.encode() and text.decode() did not exist.

You're missing the point: the .encode() and .decode() methods
are merely interfaces to the registered codecs. Whether they
make sense for a certain codec depends on the codec, not the
methods that interface to it, and again, codecs do not
only exist to convert between Unicode and strings.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Barry Warsaw
On Wed, 2006-02-15 at 14:01 -0500, Jason Orendorff wrote:
> Instead of byte literals, how about a classmethod bytes.from_hex(),
> which works like this:
> 
>   # two equivalent things
>   expected_md5_hash =
> bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
>   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83,
> 227, 131, 79, 229, 201, 46, 106])

Kind of like binascii.unhexlify() but returning a bytes object.

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Guido van Rossum
On 2/15/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Jason Orendorff wrote:
> > Also the pseudo-encodings ('hex', 'rot13',
> > 'zip', 'uu', etc.) generally scare me.
>
> Those are not pseudo-encodings, they are regular codecs.
>
> It's a common misunderstanding that codecs are only seen as serving
> the purpose of converting between Unicode and strings.
>
> The codec system is deliberately designed to be general enough
> to also work with many other types, e.g. it is easily possible to
> write a codec that convert between the hex literal sequence you
> have above to a list of ordinals:

It's fine that the codec system supports this. However it's
questionable that these encodings are invoked using the standard
encode() and decode() APIs; and it will be more questionable once
encode() returns a bytes object. Methods that return different types
depending on the value of an argument are generally a bad idea. (Hence
the movement to have separate opentext and openbinary or openbytes
functions.)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Greg Ewing
Jason Orendorff wrote:

> Also the pseudo-encodings ('hex', 
> 'rot13', 'zip', 'uu', etc.) generally scare me.

I think these will have to cease being implemented as
encodings in 3.0. They should really never have been
in the first place.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread Josiah Carlson

Greg Ewing <[EMAIL PROTECTED]> wrote:
> Jason Orendorff wrote:
> 
> > Also the pseudo-encodings ('hex', 
> > 'rot13', 'zip', 'uu', etc.) generally scare me.
> 
> I think these will have to cease being implemented as
> encodings in 3.0. They should really never have been
> in the first place.

I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
and likely a few others that the two of you may be arguing against
should stay as encodings, because strictly speaking, they are defined as
encodings of data.  They may not be encodings of _unicode_ data, but
that doesn't mean that they aren't useful encodings for other kinds of
data, some text, some binary, ...

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-16 Thread Greg Ewing
Josiah Carlson wrote:

> They may not be encodings of _unicode_ data,

But if they're not encodings of unicode data, what
business do they have being available through
someunicodestring.encode(...)?

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-16 Thread Josiah Carlson

Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> Josiah Carlson wrote:
> 
> > They may not be encodings of _unicode_ data,
> 
> But if they're not encodings of unicode data, what
> business do they have being available through
> someunicodestring.encode(...)?

I had always presumed that bytes objects are going to be able to be a
source for encode AND decode, like current non-unicode strings are able
to be today.  In that sense, if I have a bytes object which is an
encoding of rot13, hex, uu, etc., or I have a bytes object which I would
like to be in one of those encodings, I should be able to do b.encode(...)
or b.decode(...), given that 'b' is a bytes object.

Are 'encodings' going to become a mechanism to encode and decode
_unicode_ strings, rather than a mechanism to encode and decode _text
and data_ strings?  That would seem like a backwards step to me, as the
email package would need to package their own base-64 encode/decode API
and implementation, and similarly for any other package which uses any
one of the encodings already available.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Martin v. Löwis
Josiah Carlson wrote:
> I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
> and likely a few others that the two of you may be arguing against
> should stay as encodings, because strictly speaking, they are defined as
> encodings of data.  They may not be encodings of _unicode_ data, but
> that doesn't mean that they aren't useful encodings for other kinds of
> data, some text, some binary, ...

To support them, the bytes type would have to gain a .encode method,
and I'm -1 on supporting bytes.encode, or string.decode.

Why is

s.encode("uu")

any better than

binascii.b2a_uu(s)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Jason Orendorff
On 2/15/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>  Actually users trying to figure out Unicode would probably be better served> if bytes.encode() and text.decode() did not exist.[...]It would be better if the signature of text.encode() always returned a
bytes object. But why deny the bytes object a decode() method if textobjects have an encode() method?
I agree, text.encode() and bytes.decode() are both swell.  It's the
other two that bother me.
I'd say there are two "symmetric" API flavors possible (t and b are
text and bytes objects, respectively, where text is a string type,either str or unicode; enc is an encoding name):- b.decode(enc) -> t; t.encode(enc) -> b- b = bytes(t, enc); t = text(b, enc)
I'm not sure why one flavor would be preferred over the other,although having both would probably be a mistake.
I prefer constructor flavor; the word "bytes" feels more concrete than
"encode".  But I worry about constructors being too overloaded.

>>> text(b, enc)  # decode
>>> text(mydict)  # repr
>>> text(b)   # uh... decode with default encoding?

-j

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Bob Ippolito

On Feb 16, 2006, at 9:20 PM, Josiah Carlson wrote:

>
> Greg Ewing <[EMAIL PROTECTED]> wrote:
>>
>> Josiah Carlson wrote:
>>
>>> They may not be encodings of _unicode_ data,
>>
>> But if they're not encodings of unicode data, what
>> business do they have being available through
>> someunicodestring.encode(...)?
>
> I had always presumed that bytes objects are going to be able to be a
> source for encode AND decode, like current non-unicode strings are  
> able
> to be today.  In that sense, if I have a bytes object which is an
> encoding of rot13, hex, uu, etc., or I have a bytes object which I  
> would
> like to be in one of those encodings, I should be able to do  
> b.encode(...)
> or b.decode(...), given that 'b' is a bytes object.
>
> Are 'encodings' going to become a mechanism to encode and decode
> _unicode_ strings, rather than a mechanism to encode and decode _text
> and data_ strings?  That would seem like a backwards step to me, as  
> the
> email package would need to package their own base-64 encode/decode  
> API
> and implementation, and similarly for any other package which uses any
> one of the encodings already available.

It would be VERY useful to separate the two concepts.  bytes<->bytes  
transforms should be one function pair, and bytes<->text transforms  
should be another.  The current situation is totally insane:

str.decode(codec) -> str or unicode or UnicodeDecodeError or  
ZlibError or TypeError.. who knows what else
str.encode(codec) -> str or unicode or UnicodeDecodeError or  
TypeError... probably other exceptions

Granted, unicode.encode(codec) and unicode.decode(codec) are actually  
somewhat sane in that the return type is always a str and the  
exceptions are either UnicodeEncodeError or UnicodeDecodeError.

I think that rot13 is the only conceptually text<->text transform  
(though the current implementation is really bytes<->bytes),  
everything else is either bytes<->text or bytes<->bytes.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> Josiah Carlson wrote:
>> I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
>> and likely a few others that the two of you may be arguing against
>> should stay as encodings, because strictly speaking, they are defined as
>> encodings of data.  They may not be encodings of _unicode_ data, but
>> that doesn't mean that they aren't useful encodings for other kinds of
>> data, some text, some binary, ...
> 
> To support them, the bytes type would have to gain a .encode method,
> and I'm -1 on supporting bytes.encode, or string.decode.
> 
> Why is
> 
> s.encode("uu")
> 
> any better than
> 
> binascii.b2a_uu(s)

The .encode() and .decode() methods are merely convenience
interfaces to the registered codecs (with some extra logic to
make sure that only a pre-defined set of return types are allowed).
It's up to the user to use them for e.g. UU-encoding or not.

The reason we have codecs for UU, zip and the others is that
you can use their StreamWriters/Readers in stackable streams.

Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Bengt Richter
On Fri, 17 Feb 2006 00:33:49 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= 
<[EMAIL PROTECTED]> wrote:

>Josiah Carlson wrote:
>> I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
>> and likely a few others that the two of you may be arguing against
>> should stay as encodings, because strictly speaking, they are defined as
>> encodings of data.  They may not be encodings of _unicode_ data, but
>> that doesn't mean that they aren't useful encodings for other kinds of
>> data, some text, some binary, ...
>
>To support them, the bytes type would have to gain a .encode method,
>and I'm -1 on supporting bytes.encode, or string.decode.
>
>Why is
>
>s.encode("uu")
>
>any better than
>
>binascii.b2a_uu(s)
>
One aspect is that dotted notation method calling is serially composable,
whereas function calls nest, and you have to find and read from the innermost,
which gets hard quickly unless you use multiline formatting. But even then
you can't read top to bottom as processing order.

If we had a general serial composition syntax for function calls
something like unix piping (which is a big part of the power of unix shells IMO)
we could make the choice of appropriate composition semantics better.

Decorators already compose functions in a limited way, but processing
order would read like forth horizontally. Maybe '->' ? How about

foo(x, y) -> bar() -> baz(z)

as as sugar for

baz.__get__(bar.__get__(foo(x, y))())(z)

? (Hope I got that right ;-)

I.e., you'd have self-like args to receive results from upstream. E.g.,

 >>> def foo(x, y): return 'foo(%s, %s)'%(x,y)
 ...
 >>> def bar(stream): return 'bar(%s)'%stream
 ...
 >>> def baz(stream, z): return 'baz(%s, %s)'%(stream,z)
 ...
 >>> x = 'ex'; y='wye'; z='zed'
 >>> baz.__get__(bar.__get__(foo(x, y))())(z)
 'baz(bar(foo(ex, wye)), zed)'

Regards,
Bengt Richter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Martin v. Löwis
M.-A. Lemburg wrote:
> Just because some codecs don't fit into the string.decode()
> or bytes.encode() scenario doesn't mean that these codecs are
> useless or that the methods should be banned.

No. The reason to ban string.decode and bytes.encode is that
it confuses users.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Josiah Carlson

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> M.-A. Lemburg wrote:
> > Just because some codecs don't fit into the string.decode()
> > or bytes.encode() scenario doesn't mean that these codecs are
> > useless or that the methods should be banned.
> 
> No. The reason to ban string.decode and bytes.encode is that
> it confuses users.

How are users confused?  bytes.encode CAN only produce bytes.  Though
string.decode (or bytes.decode) MAY produce strings (or bytes) or
unicode, depending on the codec, I think it is quite reasonable to
expect that users will understand that string.decode('utf-8') is
different than string.decode('base-64'), and that they may produce
different output.  In a similar fashion, dict.get(1) may produce
different results than dict.get(2) for some dictionaries.  If some users
can't understand this (passing different arguments to a function may
produce different output), then I think that some users are broken
beyond repair.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Bengt Richter
On Fri, 17 Feb 2006 21:35:25 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= 
<[EMAIL PROTECTED]> wrote:

>M.-A. Lemburg wrote:
>> Just because some codecs don't fit into the string.decode()
>> or bytes.encode() scenario doesn't mean that these codecs are
>> useless or that the methods should be banned.
>
>No. The reason to ban string.decode and bytes.encode is that
>it confuses users.
Well, that's because of semantic overloading. Assuming you mean
string as characters and bytes as binary bytes.

The trouble is encoding and decoding have to have bytes to represent
the coded info, whichever direction. Characters per se aren't coded
info, so string.decode doesn't make sense without faking it with
string.encode().decode() and bytes.encode() likewise first has to
have a hidden .decode to become a string that makes sense to encode.
And the hidden stuff restricts to ascii, for further grief :-(

So yes, please ban string.decode and bytes.encode.

And maybe introduce bytes.recode for bytes->bytes transforms?
(strings don't have any codes to recode).

Regards,
Bengt Richter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Martin v. Löwis
Josiah Carlson wrote:
> How are users confused?

Users do

py> "Martin v. Löwis".encode("utf-8")
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
ordinal not in range(128)

because they want to convert the string "to Unicode", and they have
found a text telling them that .encode("utf-8") is a reasonable
method.

What it *should* tell them is

py> "Martin v. Löwis".encode("utf-8")
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: 'str' object has no attribute 'encode'

> bytes.encode CAN only produce bytes.

I don't understand MAL's design, but I believe in that design,
bytes.encode could produce anything (say, a list). A codec
can convert anything to anything else.

> If some users
> can't understand this (passing different arguments to a function may
> produce different output),

It's worse than that. The return *type* depends on the *value* of
the argument. I think there is little precedence for that: normally,
the return values depend on the argument values, and, in a polymorphic
function, the return type might depend on the argument types (e.g.
the arithmetic operations). Also, the return type may depend on the
number of arguments (e.g. by requesting a return type in a keyword
argument).

> then I think that some users are broken beyond repair.

Hmm. I'm speechless.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Ian Bicking
Martin v. Löwis wrote:
> Users do
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> 
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.
> 
> What it *should* tell them is
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I think it would be even better if they got "ValueError: utf8 can only 
encode unicode objects".  AttributeError is not much more clear than the 
UnicodeDecodeError.

That str.encode(unicode_encoding) implicitly decodes strings seems like 
a flaw in the unicode encodings, quite seperate from the existance of 
str.encode.  I for one really like s.encode('zlib').encode('base64') -- 
and if the zlib encoding raised an error when it was passed a unicode 
object (instead of implicitly encoding the string with the ascii 
encoding) that would be fine.

The pipe-like nature of .encode and .decode works very nicely for 
certain transformations, applicable to both unicode and byte objects. 
Let's not throw the baby out with the bath water.


-- 
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Josiah Carlson

"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> 
> Josiah Carlson wrote:
> > How are users confused?
> 
> Users do
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> 
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.

Removing functionality because some users read bad instructions
somewhere, is a bit like kicking your kitten because your puppy peed on
the floor.  You are punishing the wrong group, for something that
shouldn't result in punishment: it should result in education.

Users are always going to get bad instructions, and removing utility
because some users fail to think before they act, or complain when their
lack of thinking doesn't work, will give us a language where we are
removing features because *new* users have no idea what they are doing.


> What it *should* tell them is
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I disagree.  I think the original error was correct, and we should be
educating users to prefix their literals with a 'u' if they want unicode,
or they should get their data from a unicode source (wxPython with
unicode, StreamReader, etc.)


> > bytes.encode CAN only produce bytes.
> 
> I don't understand MAL's design, but I believe in that design,
> bytes.encode could produce anything (say, a list). A codec
> can convert anything to anything else.

That seems to me to be a little overkill...

In any case, I personally find that data.encode('base-64') and
edata.decode('base-64') to be more convenient than binascii.b2a_base64
(data) and binascii.a2b_base64(edata).  Ditto for hexlify/unhexlify, etc.


> > If some users
> > can't understand this (passing different arguments to a function may
> > produce different output),
> 
> It's worse than that. The return *type* depends on the *value* of
> the argument. I think there is little precedence for that: normally,
> the return values depend on the argument values, and, in a polymorphic
> function, the return type might depend on the argument types (e.g.
> the arithmetic operations). Also, the return type may depend on the
> number of arguments (e.g. by requesting a return type in a keyword
> argument).

You only need to look to dictionaries where different values passed into
a function call may very well return results of different types, yet
there have been no restrictions on mapping to and from single types per
dictionary.

Many dict-like interfaces for configuration files do this, things like
config.get('remote_host') and config.get('autoconnect') not being
uncommon.


 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Martin v. Löwis
Ian Bicking wrote:
> That str.encode(unicode_encoding) implicitly decodes strings seems like
> a flaw in the unicode encodings, quite seperate from the existance of
> str.encode.  I for one really like s.encode('zlib').encode('base64') --
> and if the zlib encoding raised an error when it was passed a unicode
> object (instead of implicitly encoding the string with the ascii
> encoding) that would be fine.
> 
> The pipe-like nature of .encode and .decode works very nicely for
> certain transformations, applicable to both unicode and byte objects.
> Let's not throw the baby out with the bath water.

The way you use it, it's a matter of notation only: why
is

zlib(base64(s))

any worse? I think it's better: it doesn't use string literals to
denote function names.

If there is a point to this overgeneralized codec idea, it is
the streaming aspect: that you don't need to process all data
at once, but can feed data sequentially. Of course, you are
not using this in your example.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Ian Bicking
Josiah Carlson wrote:
>>>If some users
>>>can't understand this (passing different arguments to a function may
>>>produce different output),
>>
>>It's worse than that. The return *type* depends on the *value* of
>>the argument. I think there is little precedence for that: normally,
>>the return values depend on the argument values, and, in a polymorphic
>>function, the return type might depend on the argument types (e.g.
>>the arithmetic operations). Also, the return type may depend on the
>>number of arguments (e.g. by requesting a return type in a keyword
>>argument).
> 
> 
> You only need to look to dictionaries where different values passed into
> a function call may very well return results of different types, yet
> there have been no restrictions on mapping to and from single types per
> dictionary.
> 
> Many dict-like interfaces for configuration files do this, things like
> config.get('remote_host') and config.get('autoconnect') not being
> uncommon.

I think there is *some* justification, if you don't understand up front 
that the codec you refer to (using a string) is just a way of avoiding 
an import (thankfully -- dynamically importing unicode codecs is 
obviously infeasible).  Now, if you understand the argument refers to 
some algorithm, it's not so bad.

The other aspect is that there should be something consistent about the 
return types -- the Python type is not what we generally rely on, 
though.  In this case they are all "data".  Unicode and bytes are both 
data, and you could probably argue lists of ints is data too (but an 
arbitrary list definitely isn't data).  On the outer end of data might 
be an ElementTree structure (but that's getting fishy).  An open file 
object is not data.  A tuple probably isn't data.

-- 
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Ian Bicking
Martin v. Löwis wrote:
> Ian Bicking wrote:
> 
>>That str.encode(unicode_encoding) implicitly decodes strings seems like
>>a flaw in the unicode encodings, quite seperate from the existance of
>>str.encode.  I for one really like s.encode('zlib').encode('base64') --
>>and if the zlib encoding raised an error when it was passed a unicode
>>object (instead of implicitly encoding the string with the ascii
>>encoding) that would be fine.
>>
>>The pipe-like nature of .encode and .decode works very nicely for
>>certain transformations, applicable to both unicode and byte objects.
>>Let's not throw the baby out with the bath water.
> 
> 
> The way you use it, it's a matter of notation only: why
> is
> 
> zlib(base64(s))
> 
> any worse? I think it's better: it doesn't use string literals to
> denote function names.

Maybe it isn't worse, but the real alternative is:

   import zlib
   import base64

   base64.b64encode(zlib.compress(s))

Encodings cover up eclectic interfaces, where those interfaces fit a 
basic pattern -- data in, data out.

-- 
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Martin v. Löwis
Ian Bicking wrote:
> Maybe it isn't worse, but the real alternative is:
> 
>   import zlib
>   import base64
> 
>   base64.b64encode(zlib.compress(s))
> 
> Encodings cover up eclectic interfaces, where those interfaces fit a
> basic pattern -- data in, data out.

So should I write

3.1415.encode("sin")

or would that be

3.1415.decode("sin")

What about

"http://www.python.org".decode("URL")

It's "data in, data out", after all. Who needs functions?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Ian Bicking
Martin v. Löwis wrote:
>>Maybe it isn't worse, but the real alternative is:
>>
>>  import zlib
>>  import base64
>>
>>  base64.b64encode(zlib.compress(s))
>>
>>Encodings cover up eclectic interfaces, where those interfaces fit a
>>basic pattern -- data in, data out.
> 
> 
> So should I write
> 
> 3.1415.encode("sin")
> 
> or would that be
> 
> 3.1415.decode("sin")

The ambiguity shows that "sin" is clearly not an encoding.  Doesn't read 
right anyway.

[0.3, 0.35, ...].encode('fourier') would be sensible though.  Except of 
course lists don't have an encode method; but that's just a convenience 
of strings and unicode because those objects are always data, where 
lists are only sometimes data.  If extended indefinitely, the namespace 
issue is notable.  But it's not going to be extended indefinitely, so 
that's just a theoretical problem.

> What about
> 
> "http://www.python.org".decode("URL")

you mean 'a%20b'.decode('url') == 'a b'?  That's not what you meant, but 
nevertheless that would be an excellent encoding ;)


-- 
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Bob Ippolito

On Feb 17, 2006, at 4:20 PM, Martin v. Löwis wrote:

> Ian Bicking wrote:
>> Maybe it isn't worse, but the real alternative is:
>>
>>   import zlib
>>   import base64
>>
>>   base64.b64encode(zlib.compress(s))
>>
>> Encodings cover up eclectic interfaces, where those interfaces fit a
>> basic pattern -- data in, data out.
>
> So should I write
>
> 3.1415.encode("sin")
>
> or would that be
>
> 3.1415.decode("sin")
>
> What about
>
> "http://www.python.org".decode("URL")
>
> It's "data in, data out", after all. Who needs functions?

Well, 3.1415.decode("sin") is of course NaN, because 3.1415.encode 
("sinh") is not defined for numbers outside of [-1, 1] :)

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread Aahz
On Fri, Feb 17, 2006, "Martin v. L?wis" wrote:
> Josiah Carlson wrote:
>>
>> How are users confused?
> 
> Users do
> 
> py> "Martin v. L?wis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> 
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.

The problem is that they don't understand that "Martin v. L?wis" is not
Unicode -- once all strings are Unicode, this is guaranteed to work.
While it's not absolutely true, my experience of watching Unicode
confusion is that the simplest approach for newbies is: encode FROM
Unicode, decode TO Unicode.  Most people when they start playing with
Unicode think of it as just another text encoding rather than suddenly
replacing "the universe" as the most base form of text.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Martin v. Löwis
Aahz wrote:
> The problem is that they don't understand that "Martin v. L?wis" is not
> Unicode -- once all strings are Unicode, this is guaranteed to work.

This specific call, yes. I don't think the problem will go away as long
as both encode and decode are available for both strings and byte
arrays.

> While it's not absolutely true, my experience of watching Unicode
> confusion is that the simplest approach for newbies is: encode FROM
> Unicode, decode TO Unicode.

I think this is what should be in-grained into the library, also. It
shouldn't try to give additional meaning to these terms.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin, v. Löwis wrote:
>> How are users confused?
> 
> Users do
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
> ordinal not in range(128)
> 
> because they want to convert the string "to Unicode", and they have
> found a text telling them that .encode("utf-8") is a reasonable
> method.
> 
> What it *should* tell them is
> 
> py> "Martin v. Löwis".encode("utf-8")
> Traceback (most recent call last):
>   File "", line 1, in ?
> AttributeError: 'str' object has no attribute 'encode'

I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the .encode() and .decode() method *do* check the return
types of the codecs and only allow strings or Unicode
on return (no lists, instances, tuples or anything else).

You seem to ignore this fact.

If we were to follow your idea, we should remove .encode()
and .decode() altogether and refer users to the codecs.encode()
and codecs.decode() function. However, I doubt that users
will like this idea.

>> bytes.encode CAN only produce bytes.
> 
> I don't understand MAL's design, but I believe in that design,
> bytes.encode could produce anything (say, a list). A codec
> can convert anything to anything else.

True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Thomas Wouters
On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:

> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the return
> types of the codecs and only allow strings or Unicode
> on return (no lists, instances, tuples or anything else).
> 
> You seem to ignore this fact.

Actually, I think the problem is that while we all agree the
bytestring/unicode methods are a useful way to convert from bytestring to
unicode and back again, we disagree on their *general* usefulness. Sure, the
codecs mechanism is powerful, and even more so because they can determine
their own returntype. But it still smells and feels like a Perl attitude,
for the reasons already explained numerous times, as well:

 - The return value for the non-unicode encodings depends on the value of
   the encoding argument.

 - The general case, by and large, especially in non-powerusers, is to
   encode unicode to bytestrings and to decode bytestrings to unicode. And
   that is a hard enough task for many of the non-powerusers. Being able to
   use the encode/decode methods for other tasks isn't helping them.

That is why I disagree with the hypergeneralization of the encode/decode
methods, regardless of the fact that it is a natural expansion of the
implementation of codecs. Sure, it looks 'right' and 'natural' when you look
at the implementation. It sure doesn't look natural, to me and to many
others, when you look at the task of encoding and decoding
bytestrings/unicode.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> Just because some codecs don't fit into the string.decode()
>> or bytes.encode() scenario doesn't mean that these codecs are
>> useless or that the methods should be banned.
> 
> No. The reason to ban string.decode and bytes.encode is that
> it confuses users.

Instead of starting to ban everything that can potentially
confuse a few users, we should educate those users and tell
them what these methods mean and how they should be used.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Thomas Wouters wrote:
> On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
> 
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .decode() method *do* check the return
>> types of the codecs and only allow strings or Unicode
>> on return (no lists, instances, tuples or anything else).
>>
>> You seem to ignore this fact.
> 
> Actually, I think the problem is that while we all agree the
> bytestring/unicode methods are a useful way to convert from bytestring to
> unicode and back again, we disagree on their *general* usefulness. Sure, the
> codecs mechanism is powerful, and even more so because they can determine
> their own returntype. But it still smells and feels like a Perl attitude,
> for the reasons already explained numerous times, as well:

It's by no means a Perl attitude.

The main reason is symmetry and the fact that strings and Unicode
should be as similar as possible in order to simplify the task of
moving from one to the other.

>  - The return value for the non-unicode encodings depends on the value of
>the encoding argument.

Not really: you'll always get a basestring instance.

>  - The general case, by and large, especially in non-powerusers, is to
>encode unicode to bytestrings and to decode bytestrings to unicode. And
>that is a hard enough task for many of the non-powerusers. Being able to
>use the encode/decode methods for other tasks isn't helping them.

Agreed.

Still, I believe that this is an educational problem. There are
a couple of gotchas users will have to be aware of (and this is
unrelated to the methods in question):

* "encoding" always refers to transforming original data into
  a derived form

* "decoding" always refers to transforming a derived form of
  data back into its original form

* for Unicode codecs the original form is Unicode, the derived
  form is, in most cases, a string

As a result, if you want to use a Unicode codec such as utf-8,
you encode Unicode into a utf-8 string and decode a utf-8 string
into Unicode.

Encoding a string is only possible if the string itself is
original data, e.g. some data that is supposed to be transformed
into a base64 encoded form.

Decoding Unicode is only possible if the Unicode string itself
represents a derived form, e.g. a sequence of hex literals.

> That is why I disagree with the hypergeneralization of the encode/decode
> methods, regardless of the fact that it is a natural expansion of the
> implementation of codecs. Sure, it looks 'right' and 'natural' when you look
> at the implementation. It sure doesn't look natural, to me and to many
> others, when you look at the task of encoding and decoding
> bytestrings/unicode.

That's because you only look at one specific task.

Codecs also unify the various interfaces to common encodings
such as base64, uu or zip which are not Unicode related.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Martin v. Löwis
M.-A. Lemburg wrote:
> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the return
> types of the codecs and only allow strings or Unicode
> on return (no lists, instances, tuples or anything else).
> 
> You seem to ignore this fact.

I'm not ignoring the fact that you have explained this
many times. I just fail to understand your explanations.

For example, you said at some point that codecs are not
restricted to Unicode. However, I don't recall any
explanation what the restriction *is*, if any restriction
exists. No such restriction seems to be documented.

> True. However, note that the .encode()/.decode() methods on
> strings and Unicode narrow down the possible return types.
> The corresponding .bytes methods should only allow bytes and
> Unicode.

I forgot that: what is the rationale for that restriction?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .decode() method *do* check the return
>> types of the codecs and only allow strings or Unicode
>> on return (no lists, instances, tuples or anything else).
>>
>> You seem to ignore this fact.
> 
> I'm not ignoring the fact that you have explained this
> many times. I just fail to understand your explanations.

Feel free to ask questions.

> For example, you said at some point that codecs are not
> restricted to Unicode. However, I don't recall any
> explanation what the restriction *is*, if any restriction
> exists. No such restriction seems to be documented.

The codecs are not restricted w/r to the data types
they work on. It's up to the codecs to define which
data types are valid and which they take on input and
return.

>> True. However, note that the .encode()/.decode() methods on
>> strings and Unicode narrow down the possible return types.
>> The corresponding .bytes methods should only allow bytes and
>> Unicode.
> 
> I forgot that: what is the rationale for that restriction?

To assure that only those types can be returned from those
methods, ie. instances of basestring, which in return permits
type inference for those methods.

The codecs functions encode() and decode() don't have these
restrictions, and thus provide a generic interface to the
codec's encode and decode functions. It's up to the caller
to restrict the allowed encodings and as result the
possible input/output types.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Martin v. Löwis
M.-A. Lemburg wrote:
>>>True. However, note that the .encode()/.decode() methods on
>>>strings and Unicode narrow down the possible return types.
>>>The corresponding .bytes methods should only allow bytes and
>>>Unicode.
>>
>>I forgot that: what is the rationale for that restriction?
> 
> 
> To assure that only those types can be returned from those
> methods, ie. instances of basestring, which in return permits
> type inference for those methods.

Hmm. So it for type inference
Where is that documented?

This looks pretty inconsistent. Either codecs can give arbitrary
return types, then .encode/.decode should also be allowed to
give arbitrary return types, or codecs should be restricted.
What's the point of first allowing a wide interface, and then
narrowing it?

Also, if type inference is the goal, what is the point in allowing
two result types?

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
 True. However, note that the .encode()/.decode() methods on
 strings and Unicode narrow down the possible return types.
 The corresponding .bytes methods should only allow bytes and
 Unicode.
>>> I forgot that: what is the rationale for that restriction?
>>
>> To assure that only those types can be returned from those
>> methods, ie. instances of basestring, which in return permits
>> type inference for those methods.
> 
> Hmm. So it for type inference
> Where is that documented?

Somewhere in the python-dev mailing list archives ;-)

Seriously, we should probably add this to the documentation.

> This looks pretty inconsistent. Either codecs can give arbitrary
> return types, then .encode/.decode should also be allowed to
> give arbitrary return types, or codecs should be restricted.

No.

As I've said before: the .encode() and .decode() methods
are convenience methods to interface to codecs which take
string/Unicode on input and create string/Unicode output.

> What's the point of first allowing a wide interface, and then
> narrowing it?

The codec interface is an abstract interface. It is a flexible
enough to allow codecs to define possible input and output
types while being strict about the method names and signatures.

Much like the file interface in Python, the copy protocol
or the pickle interface.

> Also, if type inference is the goal, what is the point in allowing
> two result types?

I'm not sure I understand the question: type inference is about
being able to infer the types of (among other things) function
return objects. This is what the restriction guarantees - much
like int() guarantees that you get either an integer or a long.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread Thomas Wouters
On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:

> It's by no means a Perl attitude.

In your eyes, perhaps. It certainly feels that way to me (or I wouldn't have
said it :). Perl happens to be full of general constructs that were added
because they were easy to add, or they were useful in edgecases. The
encode/decode methods remind me of that, even though I fully understand the
reasoning behind it, and the elegance of the implementation.

> The main reason is symmetry and the fact that strings and Unicode
> should be as similar as possible in order to simplify the task of
> moving from one to the other.

Yes, and this is a design choice I don't agree with. They're different
types. They do everything similarly, except when they are mixed together
(unicode takes precedence, in general, encoding the bytestring from the
default encoding.) Going from one to the other isn't symmetric, though. I
understand that you disagree; the disagreement is on the fundamental choice
of allowing 'encode' and 'decode' to do *more* than going from and to
unicode. I regret that decision, not the decision to make encode and decode
symmetric (which makes sense, after the decision to overgeneralize
encode/decode is made.)

> >  - The return value for the non-unicode encodings depends on the value of
> >the encoding argument.

> Not really: you'll always get a basestring instance.

Which is not a particularly useful distinction, since in any real world
application, you have to be careful not to mix unicode with (non-ascii)
bytestrings. The only way to reliably deal with unicode is to have it
well-contained (when migrating an application from using bytestrings to
using unicode) or to use unicode everywhere, decoding/encoding at
entrypoints. Containment is hard to achieve.

> Still, I believe that this is an educational problem. There are
> a couple of gotchas users will have to be aware of (and this is
> unrelated to the methods in question):
> 
> * "encoding" always refers to transforming original data into
>   a derived form
> 
> * "decoding" always refers to transforming a derived form of
>   data back into its original form
> 
> * for Unicode codecs the original form is Unicode, the derived
>   form is, in most cases, a string
> 
> As a result, if you want to use a Unicode codec such as utf-8,
> you encode Unicode into a utf-8 string and decode a utf-8 string
> into Unicode.
> 
> Encoding a string is only possible if the string itself is
> original data, e.g. some data that is supposed to be transformed
> into a base64 encoded form.
> 
> Decoding Unicode is only possible if the Unicode string itself
> represents a derived form, e.g. a sequence of hex literals.

Most of these gotchas would not have been gotchas had encode/decode only
been usable for unicode encodings.

> > That is why I disagree with the hypergeneralization of the encode/decode
> > methods
[..]
> That's because you only look at one specific task.

> Codecs also unify the various interfaces to common encodings
> such as base64, uu or zip which are not Unicode related.

No, I think you misunderstand. I object to the hypergeneralization of the
*encode/decode methods*, not the codec system. I would have been fine with
another set of methods for non-unicode transformations. Although I would
have been even more fine if they got their encoding not as a string, but as,
say, a module object, or something imported from a module.

Not that I think any of this matters; we have what we have and I'll have to
live with it ;)

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-20 Thread Bengt Richter
On Sat, 18 Feb 2006 23:33:15 +0100, Thomas Wouters <[EMAIL PROTECTED]> wrote:

>On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:
>
[...]
>> >  - The return value for the non-unicode encodings depends on the value of
>> >the encoding argument.
>
>> Not really: you'll always get a basestring instance.
>
But actually basestring is weird graft of semantic apples and empty bags IMO.
unicode is essentially an abstract character vector type,
and str is an abstract binary octet vector type having nothing to do with 
characters
except by inferential association with an encoding.

>Which is not a particularly useful distinction, since in any real world
>application, you have to be careful not to mix unicode with (non-ascii)
>bytestrings. The only way to reliably deal with unicode is to have it
>well-contained (when migrating an application from using bytestrings to
>using unicode) or to use unicode everywhere, decoding/encoding at
>entrypoints. Containment is hard to achieve.
>
>> Still, I believe that this is an educational problem. There are
>> a couple of gotchas users will have to be aware of (and this is
>> unrelated to the methods in question):
>> 
>> * "encoding" always refers to transforming original data into
>>   a derived form
ISTM encoding separates type information from the source and sets it aside
as the identity of the encoding, and renders the data in a composite of
more primitive types, octets being the most primitive short of bits.

>> 
>> * "decoding" always refers to transforming a derived form of
>>   data back into its original form
Decoding of a composite of primitives requires additional separate information
(namely identification of the encoding) to create a higher composite type.
>> 
>> * for Unicode codecs the original form is Unicode, the derived
>>   form is, in most cases, a string
You mean a str instance, right? Where the original type as character vector
is gone. That's not a string in the sense of character string.
>> 
>> As a result, if you want to use a Unicode codec such as utf-8,
>> you encode Unicode into a utf-8 string and decode a utf-8 string
>> into Unicode.
s/string/str instance/
>> 
>> Encoding a string is only possible if the string itself is
>> original data, e.g. some data that is supposed to be transformed
>> into a base64 encoded form.
note what base64 really is for. It's essence is to create a _character_ sequence
which can succeed in being encoded as ascii. The concept of base64 going 
str->str
is really a mental shortcut for s_str.decode('base64').encode('ascii'), where
3 octets are decoded as code for 4 characters modulo padding logic.

>> 
>> Decoding Unicode is only possible if the Unicode string itself
>> represents a derived form, e.g. a sequence of hex literals.
Again, it's an abbreviation, e.g. 
print u'4cf6776973'.encode('hex_chars_to_octets').decode('latin-1')
Should print Löwis

>
>Most of these gotchas would not have been gotchas had encode/decode only
>been usable for unicode encodings.
>
>> > That is why I disagree with the hypergeneralization of the encode/decode
>> > methods
>[..]
>> That's because you only look at one specific task.
>
>> Codecs also unify the various interfaces to common encodings
>> such as base64, uu or zip which are not Unicode related.
I think the trouble is that these view the transformations as octets->octets
whereas IMO decoding should always result in a container type that knows what 
it is
semantically without association with external use-this-codec information. IOW,

octets.decode('zip') -> archive
archive.encode('bzip') -> octets

You could even subclass octet to make archive that knows it's an octet vector
representing a decoded zip, so it can have an encode method that could
(specifying 'zip' again) encode itself back to the original zip, or an alternate
method to encode itself as something else, which you couldn't do from plain 
octets
without specifying both transformations at once. (hence the .recode idea, but I 
don't
think that is as pure. The constructor for the container type could also be 
used, like
Archive(octets, 'zip') analogous to unicode('abc', 'ascii')

IOW 
octets + decoding info -> container type instance
container type instance + encoding info -> octets
>
>No, I think you misunderstand. I object to the hypergeneralization of the
>*encode/decode methods*, not the codec system. I would have been fine with
>another set of methods for non-unicode transformations. Although I would
>have been even more fine if they got their encoding not as a string, but as,
>say, a module object, or something imported from a module.
>
>Not that I think any of this matters; we have what we have and I'll have to
>live with it ;)
Probably.
BTW, you may notice I'm saying octet instead of bytes. I have another post on 
that,
arguing that the basic binary information type should be octet, since binary 
files
are made of octets that have no instrinsic numerical or character significance.
See other post i

Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-20 Thread Ron Adam
Bengt Richter wrote:
> On Sat, 18 Feb 2006 23:33:15 +0100, Thomas Wouters <[EMAIL PROTECTED]> wrote:

> note what base64 really is for. It's essence is to create a _character_ 
> sequence
> which can succeed in being encoded as ascii. The concept of base64 going 
> str->str
> is really a mental shortcut for s_str.decode('base64').encode('ascii'), where
> 3 octets are decoded as code for 4 characters modulo padding logic.

Wouldn't it be...

obj.encode('base64').encode('ascii')

This would probably also work...

obj.encode('base64').decode('ascii')  ->  ascii alphabet in unicode

Where the underlying sequence might be ...

obj -> bytes -> bytes:base64 -> base64 ascii character set

The point is to have the data in a safe to transmit form that can 
survive being encoded and decoded into different forms along the 
transmission path and still be restored at the final destination.

base64 ascii character set -> bytes:base64 -> original bytes -> obj




* a related note, derived from this and your other post in this thread.

If the str type constructor had an encode argument like the unicode type 
does, along with a str.encoded_with attribute.  Then it might be 
possible to depreciate the .decode() and .encode() methods and remove 
them form P3k entirely or use them as data coders/decoders instead of 
char type encoders.

It could also create a clear separation between character encodings and 
data coding.  The following should give an exception.

   str(str, 'rot13'))

Rot13 isn't a character encoding, but a data coding method.

   data_str.encode('rot13')   # could be ok

But this wouldn't...

   new_str = data_str.encode('latin_1')# could cause an exception

We'd have to use...

   new_str = str(data_str, 'latin_1')  # New string sub type...


Cheers,
Ronald Adam

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com