Paul Moore (p.f.moore at gmail.com) on Fri Sep 2 05:23:04 EDT 2016 wrote > > On 2 September 2016 at 03:35, Steve Dower <steve.dower at python.org > <https://mail.python.org/mailman/listinfo/python-dev>> wrote: > >* I'd need to test to be sure, but writing an incomplete code point should > *>* just truncate to before that point. It may currently raise OSError if that > *>* truncated to zero length, as I believe that's not currently distinguished > *>* from an error. What behavior would you propose? > * > For "correct" behaviour, you should retain the unwritten bytes, and > write them as part of the next call (essentially making the API > stateful, in the same way that incremental codecs work). I'm pretty > sure that this could cause actual problems, for example I think invoke > (https://github.com/pyinvoke/invoke) gets byte streams from > subprocesses and dumps them direct to stdout in blocks (so could > easily end up splitting multibyte sequences). It''s arguable that it > should be decoding the bytes from the subprocess and then re-encoding > them, but that gets us into "guess the encoding used by the > subprocess" territory. > > The problem is that we're not going to simply drop some bad data in > the common case - it's not so much the dropping of the start of an > incomplete code point that bothers me, as the encoding error you hit > at the start of the *next* block of data you send. So people will get > random, unexplained, encoding errors. > > I don't see an easy answer here other than a stateful API. > > Isn't the buffered IO wrapper for this?
> >* Reads of less than four bytes fail instantly, as in the worst case we need > *>* four bytes to represent one Unicode character. This is an unfortunate > *>* reality of trying to limit it to one system call - you'll never get a full > *>* buffer from a single read, as there is no simple mapping between > *>* length-as-utf8 and length-as-utf16 for an arbitrary string. > * > And here - "read a single byte" is a not uncommon way of getting some > data. Once again see invoke: > https://github.com/pyinvoke/invoke/blob/master/invoke/platform.py#L147 > > used at > https://github.com/pyinvoke/invoke/blob/master/invoke/runners.py#L548 > > I'm not saying that there's an easy answer here, but this *will* break > code. And actually, it's in violation of the documentation: > seehttps://docs.python.org/3/library/io.html#io.RawIOBase.read > > """ > read(size=-1) > > Read up to size bytes from the object and return them. As a > convenience, if size is unspecified or -1, readall() is called. > Otherwise, only one system call is ever made. Fewer than size bytes > may be returned if the operating system call returns fewer than size > bytes. > > If 0 bytes are returned, and size was not 0, this indicates end of > file. If the object is in non-blocking mode and no bytes are > available, None is returned. > """ > > You're not allowed to return 0 bytes if the requested size was not 0, > and you're not at EOF. > > That's why it should be rather signaled by an exception. Even when one doesn't transcode UTF-16 to UTF-8, reading just one byte is still impossible I would argue that also incorrect here. I raise ValueError in win_unicode_console. Adam Bartoš
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com