Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Glenn Linderman Mon, 13 Jan 2014 12:45:57 -0800

On 1/13/2014 6:43 AM, Stephen J. Turnbull wrote:

Glenn Linderman writes:


  > On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote:
  >> Glenn Linderman writes:
  >>> the proposals to embed binary in Unicode by abusing Latin-1
  >>> encoding.

  >> Those aren't "proposals", they are currently feasible
  >> techniques in Python 3 for *some* use cases. The question is why
  >> infecting Python 3 with the byte/character confoundance virus is
  >> preferable to such techniques, especially if their (serious!)
  >> deficiencies are removed by creating a new type such as
  >> asciistr.

  > "smuggled binary" (great term borrowed from a different
  > subthread) muddies the waters of what you are dealing with.

Not really.  The "mud" is one or more of the serious deficiencies.  It
can be removed, I believe (and Nick apparently does, too).  "asciistr"
is one way to try that.

Yes really. Use of smuggled binary means the str containing it can nolonger be treated completely as a str. That is "muddier" than having astr that is only a str.

  > When the mixture of text and binary is done as encoded text in
  > binary, then it is obvious that only limited text processing can be
  > performed,

Hardly.  After all, that's how all text processing was done for
decades.  Still is, in some programs, especially C programs.

I disagree, and so do you... text processing must be limited to the textsubsets of the text that includes smuggled binary... that is limited...you can't just apply text searches, scans, and transformations over thecomplete str, when it contains smuggled binary. You know that, but musthave not considered it a limitation, because you know you can do anytext processing on the text parts. But it is a limitation to have tokeep track of it, and apply the text processing only to the parts thatare text. Yes, it has been done that way, and the limitations of doingit that way led to the plethora of encodings each of which was intendedto be sufficient for some problem domain, but most of which were onlysufficient for a smaller problem domain than intended, especially ascommunications became more global in nature.

  > And there are no extra, confusing Latin-1 encode/decode operations
  > required.

The "extra" encode/decode operations are mostly (perhaps all) due to
examples that started from bytes and end with bytes.  Of course if you
assume that API and propose to do the operations using Unicode, you'll
get "extra" decode/encode operations.

No, the "extra" encode/decode are from the requirement that smuggledbinary use latin-1, and other binary flavors are not always latin-1.


  > From a higher-level perspective, I think it would be great to have
  > a module, perhaps called "boundary" (let's call it that for now),
  > that allow some definition syntax (augmented BNF? augmented ABNF?)
  > to explain the format of a binary blob.

We have struct, for one.  I'm not sure why you want more than that.  I
suppose you could go all the way to ASN.1.

struct is insufficient to capture a whole file format, with optionalparts, although it suffices for fragments.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Reply via email to