Nick Coghlan wrote:
so the latter would be less of an attractive nuisance when writing code that needs to handle arbitrary binary formats and can't assume ASCII compatibility.
Hang on a moment. What do you mean by code that "handles arbitrary binary formats"? As far as I can see, the proposed features are for code that handles *particular* binary formats. Ones with well-defined fields that are specified to contain ASCII-encoded text. It's the programmer's responsibility to make sure that the fields he's treating as ASCII really do contain ASCII, just as it's his responsibility to make sure he reads and writes a text file using the correct encoding. Now, it's possible that if you were working from an incomplete spec and some examples, you might be led to believe that a particular field was ASCII when in fact it was some ASCII superset such as latin1 or utf8. In that case, if you parsed it assuming ASCII, you would get into trouble of some sort with bytes greater than 127. However, the proposed formatting operations are concerned only with *generating* binary data, not parsing it. Under Guido's proposed semantics, all of the ASCII formatting operations are guaranteed to produce valid ASCII, regardless of what types or values are thrown at them. So as long as the field's true encoding is something ASCII-compatible, you will always generate valid data.
Because I *want to use* the PEP 460 binary interpolation API, but wouldn't be able to use Guido's more lenient proposal, as it is a bug magnet in the presence of arbitrary binary data.
Where exactly is this "arbitrary binary data" that you keep talking about? The only place that arbitrary bytes comes into the picture is through b"%s" % b"...", and that's defined to just pass the bytes straight through. I don't see how that could attract any bugs that weren't already present in the data being interpolated.
The LHS may or may not be tainted with assumptions about ASCII compatibility, which means it effectively *is* tainted with such assumptions, which means code that needs to handle arbitrary binary data can't use it and is left without a binary interpolation feature.
If I understand correctly, what concerns you here is that you can't tell by looking at b"%s" % x whether it encodes anything as ASCII without knowing the type of x. I'm not sure how serious a problem that would be. Most of the time I think it will be fairly obvious from the purpose of the code what the type of x is *intended* to be. If it's not actually that type, then clearly there's a bug somewhere. Of all such possible bugs, the one most likely to arise due to a confusion in the programmer's mind between text and bytes would be for x to be a string when it was meant to be bytes or vice versa. Due to the still-very-strong separation between text and bytes in Py3, this is unlikely to happen without something else blowing up first. Even if it does happen, it won't result in a data- dependent failure. If b"%s" % 'hello' were defined to interpolate 'hello'.encode('ascii'), then there *would* be cause for concern. But this is not what Guido proposes -- instead he proposes interpolating ascii('hello') == "'hello'". This is almost certainly *never* what the file spec calls for, so you'll find out about it very soon one way or another. Effectively this means that b"%s" % x where x is a string is useless, so I'd much prefer it to just raise an exception in that case to make the failure immediately obvious. But either way, you're not going to end up with a latent failure waiting for some non-ASCII data to come along before you notice it. To summarise, I think the idea of binary format strings being too "tainted" for a program that does not want to use ASCII formatting to rely on is mostly FUD. -- Greg _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com