Re: encode and decode builtins

2014-11-16 Thread dieter
Garrett Berg  writes:
> ...
> However, there are times that I do not care what data I am working with,
> and I find myself writing something like:
>
> if isinstance(data, bytes): data = data.decode()

Apparently, below this code, you do care that "data" contains
"str" (not "bytes") -- otherwise, you could simply drop this line.

Note, in addition, that the code above might only work if "data" actually
contains ASCII only bytes (at least in Python 2.x, it would fails
otherwise). To safely decode bytes into unicode, you need to precisely
know the encoding -- something which is not consistent with
"I do not care about what data I am working with".

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: encode and decode builtins

2014-11-16 Thread Ned Batchelder

On 11/16/14 2:39 AM, Garrett Berg wrote:

I made the switch to python 3 about two months ago, and I have to say I
love everything about it, /especially/ the change to using only bytes
and str (no more unicode! or... everything is unicode!) As someone who
works with embedded devices, it is great to know what data I am working
with.


I am glad that you are excited about Python 3.  But I'm a little 
surprised to hear your characterization of the changes it brought.  Both 
Python 2 and Python 3 are the same in that they have two types for 
representing strings: one for byte strings, and one for Unicode strings.


The difference is that Python 2 called them str and unicode, with "" 
being a byte string; Python 3 calls them bytes and str, with "" being a 
unicode string.  Also, Python 2 happily converted between them 
implicitly, while Python 3 does not.




However, there are times that I do not care what data I am working with,
and I find myself writing something like:

if isinstance(data, bytes): data = data.decode()



This goes against a fundamental tenet of both Python 2 and 3: you should 
know what data you have, and deal with it properly.



This is tedious and breaks the pythonic method of not caring about what
your input is. If I expect that my input can always be decoded into
valid data, then why do I have to write this?

Instead, I would like to propose to add *encode* and *decode* as
builtins. I have written simple code to demonstrate my desire:

https://gist.github.com/cloudformdesign/d8065a32cdd76d1b3230


If you find these functions useful, by all means use them in your code. 
 BTW: looks to me like you have infinite recursion on lines 9 and 20, 
so that must be a simple oversight.




There may be a few edge cases I am missing, which would all the more
prove my point -- we need a function like this!


You are free to have a function like that.  Getting them added to the 
standard library is extremely unlikely.




Basically, if I expect my data to be a string I can just write:

data = decode(data)

​Which would accomplish two goals: explicitly stating what I expect of
my data, and doing so concisely and cleanly.






--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list


Re: encode and decode builtins

2014-11-15 Thread Ben Finney
Garrett Berg  writes:

> I made the switch to python 3 about two months ago, and I have to say
> I love everything about it, *especially* the change to using only
> bytes and str (no more unicode! or... everything is unicode!) As
> someone who works with embedded devices, it is great to know what data
> I am working with.

THanks! It is great to hear from people directly benefiting from this
clear distinction.

> However, there are times that I do not care what data I am working
> with, and I find myself writing something like:
>
> if isinstance(data, bytes): data = data.decode()

Why are you in a position where ‘data’ is not known to be bytes? If you
want ‘unicode’ objects, isn't the API guaranteeing to provide them?

> This is tedious and breaks the pythonic method of not caring about
> what your input is.

I wouldn't call that Pythonic. Rather, in the face of ambiguity (“is
this text or bytes?”), Pythonic code refuses the temptation to guess:
you need to clarify what you have as early as possible in the process.

> If I expect that my input can always be decoded into valid data, then
> why do I have to write this?

I don't know. Why do you have to?

-- 
 \ “God was invented to explain mystery. God is always invented to |
  `\ explain those things that you do not understand.” —Richard P. |
_o__)Feynman, 1988 |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


encode and decode builtins

2014-11-15 Thread Garrett Berg
I made the switch to python 3 about two months ago, and I have to say I
love everything about it, *especially* the change to using only bytes and
str (no more unicode! or... everything is unicode!) As someone who works
with embedded devices, it is great to know what data I am working with.

However, there are times that I do not care what data I am working with,
and I find myself writing something like:

if isinstance(data, bytes): data = data.decode()


This is tedious and breaks the pythonic method of not caring about what
your input is. If I expect that my input can always be decoded into valid
data, then why do I have to write this?

Instead, I would like to propose to add *encode* and *decode* as builtins.
I have written simple code to demonstrate my desire:

https://gist.github.com/cloudformdesign/d8065a32cdd76d1b3230

There may be a few edge cases I am missing, which would all the more prove
my point -- we need a function like this!

Basically, if I expect my data to be a string I can just write:

data = decode(data)

​Which would accomplish two goals: explicitly stating what I expect of my
data, and doing so concisely and cleanly.
-- 
https://mail.python.org/mailman/listinfo/python-list