On 2 September 2016 at 03:35, Steve Dower <steve.do...@python.org> wrote: > I'd need to test to be sure, but writing an incomplete code point should > just truncate to before that point. It may currently raise OSError if that > truncated to zero length, as I believe that's not currently distinguished > from an error. What behavior would you propose?
For "correct" behaviour, you should retain the unwritten bytes, and write them as part of the next call (essentially making the API stateful, in the same way that incremental codecs work). I'm pretty sure that this could cause actual problems, for example I think invoke (https://github.com/pyinvoke/invoke) gets byte streams from subprocesses and dumps them direct to stdout in blocks (so could easily end up splitting multibyte sequences). It''s arguable that it should be decoding the bytes from the subprocess and then re-encoding them, but that gets us into "guess the encoding used by the subprocess" territory. The problem is that we're not going to simply drop some bad data in the common case - it's not so much the dropping of the start of an incomplete code point that bothers me, as the encoding error you hit at the start of the *next* block of data you send. So people will get random, unexplained, encoding errors. I don't see an easy answer here other than a stateful API. > Reads of less than four bytes fail instantly, as in the worst case we need > four bytes to represent one Unicode character. This is an unfortunate > reality of trying to limit it to one system call - you'll never get a full > buffer from a single read, as there is no simple mapping between > length-as-utf8 and length-as-utf16 for an arbitrary string. And here - "read a single byte" is a not uncommon way of getting some data. Once again see invoke: https://github.com/pyinvoke/invoke/blob/master/invoke/platform.py#L147 used at https://github.com/pyinvoke/invoke/blob/master/invoke/runners.py#L548 I'm not saying that there's an easy answer here, but this *will* break code. And actually, it's in violation of the documentation: see https://docs.python.org/3/library/io.html#io.RawIOBase.read """ read(size=-1) Read up to size bytes from the object and return them. As a convenience, if size is unspecified or -1, readall() is called. Otherwise, only one system call is ever made. Fewer than size bytes may be returned if the operating system call returns fewer than size bytes. If 0 bytes are returned, and size was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, None is returned. """ You're not allowed to return 0 bytes if the requested size was not 0, and you're not at EOF. Having said all this, I'm strongly +1 on the idea of this PEP, it would be fantastic to resolve the above issues and get this in. Paul _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com