On Thu, Jan 9, 2014 at 5:00 PM, Chris Barker <chris.bar...@noaa.gov> wrote:

> On Thu, Jan 9, 2014 at 1:45 PM, Antoine Pitrou <solip...@pitrou.net>wrote:
>
>> > latin-1 guaranteed to work with any binary data, and round-trip
>> accurately?
>>
>> Yes, it is.
>>
>> > and will surrogateescape work for arbitrary binary data?
>>
>> Yes, it will.
>>
>
> Then maybe this is really a documentation issue, after all.
>
> I know I learned something.
>

I think the other issue is everyone is talking about keeping the data from
the file in a single object. If you slice it up into pieces and decode the
parts as necessary this also solves the issue. So if you had an HTTP header
you could do::

  raw_header, body = data.split(b'\r\n\r\n)
  header = raw_header.decode('ascii')  # Ort whatever HTTP headers are
encoded in.

Now that might not easily solve the issue of the ASCII text interspersed
(such as Kristján's "phone number in the middle of stuff" example), but it
will deal with the problem. And if the numbers were separated with clean
markers then this would probably still work.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to