Re: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Juraj Sukop Fri, 10 Jan 2014 15:41:57 -0800

On Fri, Jan 10, 2014 at 10:52 PM, Chris Barker <[email protected]>wrote:

> On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop <[email protected]>wrote:
>
>> As you may know, PDF operates over bytes and an integer or floating-point
>> number is written down as-is, for example "100" or "1.23".
>>
>
> Just to be clear here -- is PDF specifically bytes+ascii?
>
> Or could there be some-other-encoding unicode in there?
>

>From the specs: "At the most fundamental level, a PDF file is a sequence of
8-bit bytes." But it is also possible to represent a PDF using printable
ASCII + whitespace by using escapes and "filters". Then, there are also
"text strings" which might be encoded in UTF+16.

What this all means is that the PDF objects are expressed in ASCII,
"stream" objects like images and fonts may have a binary part and I never
saw those UTF+16 strings.

u"stream\n%s\nendstream\nendobj"%binary_data.decode('latin-1')
>

The argument for dropping "%f" et al. has been that if something is a text,
then it should be Unicode. Conversely, if it is not text, then it should
not be Unicode.

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Reply via email to