On 1/13/2014 6:43 AM, Stephen J. Turnbull wrote:
Glenn Linderman writes:

  > On 1/12/2014 4:08 PM, Stephen J. Turnbull wrote:
  >> Glenn Linderman writes:
  >>> the proposals to embed binary in Unicode by abusing Latin-1
  >>> encoding.

  >> Those aren't "proposals", they are currently feasible
  >> techniques in Python 3 for *some* use cases. The question is why
  >> infecting Python 3 with the byte/character confoundance virus is
  >> preferable to such techniques, especially if their (serious!)
  >> deficiencies are removed by creating a new type such as
  >> asciistr.

  > "smuggled binary" (great term borrowed from a different
  > subthread) muddies the waters of what you are dealing with.

Not really.  The "mud" is one or more of the serious deficiencies.  It
can be removed, I believe (and Nick apparently does, too).  "asciistr"
is one way to try that.

Yes really. Use of smuggled binary means the str containing it can no longer be treated completely as a str. That is "muddier" than having a str that is only a str.

  > When the mixture of text and binary is done as encoded text in
  > binary, then it is obvious that only limited text processing can be
  > performed,

Hardly.  After all, that's how all text processing was done for
decades.  Still is, in some programs, especially C programs.

I disagree, and so do you... text processing must be limited to the text subsets of the text that includes smuggled binary... that is limited... you can't just apply text searches, scans, and transformations over the complete str, when it contains smuggled binary. You know that, but must have not considered it a limitation, because you know you can do any text processing on the text parts. But it is a limitation to have to keep track of it, and apply the text processing only to the parts that are text. Yes, it has been done that way, and the limitations of doing it that way led to the plethora of encodings each of which was intended to be sufficient for some problem domain, but most of which were only sufficient for a smaller problem domain than intended, especially as communications became more global in nature.


  > And there are no extra, confusing Latin-1 encode/decode operations
  > required.

The "extra" encode/decode operations are mostly (perhaps all) due to
examples that started from bytes and end with bytes.  Of course if you
assume that API and propose to do the operations using Unicode, you'll
get "extra" decode/encode operations.

No, the "extra" encode/decode are from the requirement that smuggled binary use latin-1, and other binary flavors are not always latin-1.


  > From a higher-level perspective, I think it would be great to have
  > a module, perhaps called "boundary" (let's call it that for now),
  > that allow some definition syntax (augmented BNF? augmented ABNF?)
  > to explain the format of a binary blob.

We have struct, for one.  I'm not sure why you want more than that.  I
suppose you could go all the way to ASN.1.

struct is insufficient to capture a whole file format, with optional parts, although it suffices for fragments.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to