On 01/18/2014 05:48 AM, Nick Coghlan wrote:
On 18 Jan 2014 11:52, "Ethan Furman" wrote:

I'll admit to being somewhat on the fence about %a.

It seems there are two possibilities with %a:

  1) have it be ascii(repr(obj))

  2) have it be str(obj).encode('ascii', 'strict')

This gets very close to crossing the line into implicit encoding of text again. 
Binary interpolation is being added back
for the specific use case of working with ASCII compatible segments in binary 
formats, and it's at best arguable that
supporting %a will help with that use case.

Agreed.


However, without it, there may be a greater temptation to inappropriately 
define __bytes__ just to support binary
interpolation, rather than because a type truly has an appropriate translation 
directly to bytes.

True.


By allowing %a, we avoid that temptation. This is also potentially useful 
specifically in the case of binary logging
formats and as a quick way to request backslash escaping of non-ASCII 
characters in text.

Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think 
it will head off a fair bit of potential
misuse of __bytes__.

So, if %a is added it would act like:

---------
  "%a" % some_obj
---------
  tmp = str(some_obj)
  res = b''
  for ch in tmp:
      if ord(ch) < 256:
          res += bytes([ord(ch)]
      else:
          res += unicode_escape(ch)
---------

where 'unicode_escape' would yield something like "\u0440" ?

--
~Ethan~
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to