Re: [Python-Dev] TextIO seek and tell cookies
Yeah, that should work. The implementation is something like a byte offset to the start of a line plus a character count, plus some misc flags. I found this implementation in the 2.6 code (the last version where it was pure Python code): def _pack_cookie(self, position, dec_flags=0, bytes_to_feed=0, need_eof=0, chars_to_skip=0): # The meaning of a tell() cookie is: seek to position, set the # decoder flags to dec_flags, read bytes_to_feed bytes, feed them # into the decoder with need_eof as the EOF flag, then skip # chars_to_skip characters of the decoded result. For most simple # decoders, tell() will often just give a byte offset in the file. return (position | (dec_flags<<64) | (bytes_to_feed<<128) | (chars_to_skip<<192) | bool(need_eof)<<256) def _unpack_cookie(self, bigint): rest, position = divmod(bigint, 1<<64) rest, dec_flags = divmod(rest, 1<<64) rest, bytes_to_feed = divmod(rest, 1<<64) need_eof, chars_to_skip = divmod(rest, 1<<64) return position, dec_flags, bytes_to_feed, need_eof, chars_to_skip On Mon, Sep 26, 2016 at 3:43 PM, Greg Ewing wrote: > Ben Leslie wrote: >> >> But the idea of transmitting these offsets outside of a running >> process is not something that I had anticipated. It got me thinking: >> is there a guarantee that these opaque values returned from tell() is >> stable across different versions of Python? > > > Are they even guaranteed to work on a different file > object in the same process? I.e. if you read some stuff > from a file, do tell() on it, then close it, open it > again and seek() with that token, are you guaranteed to > end up at the same place in the file? > > -- > Greg > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
Ben Leslie wrote: But the idea of transmitting these offsets outside of a running process is not something that I had anticipated. It got me thinking: is there a guarantee that these opaque values returned from tell() is stable across different versions of Python? Are they even guaranteed to work on a different file object in the same process? I.e. if you read some stuff from a file, do tell() on it, then close it, open it again and seek() with that token, are you guaranteed to end up at the same place in the file? -- Greg ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On Mon, Sep 26, 2016, at 05:30, Ben Leslie wrote: > I think the case of JSON or SQL database is even more important though. > > tell/seek can return 129-bit integers (maybe even more? my maths might > be off here). > > The very large integers that can be returned by tell() will break > serialization to JSON, and storing in a SQL database (at least for > most database types). > > What is the value of comparing these to plain integers? Unless you > happen to know the magic encoding it isn't going to be very useful I > think? I assume the value is that in the circumstances in which all of the flags and other bits are zero, they can be used as offsets in precisely the way that you used them. It may also be possible that in some cases where they are not zero, doing arithmetic with them is still "safe" since the real offset is still in the low-order bits. I don't know if those circumstances are predictable enough for it to be worthwhile. Changing it would obviously break code that does this (unless, perhaps, it were changed to be a class with arithmetic operators), the question is whether such code "deserves" to be broken. In my own tests, even a UTF-8-sig file with DOS line endings "worked". Does anyone have information about what circumstances can reliably cause tell() to return values that are *not* simple integers? Maybe it has something to do with working with stateful encodings like iso-2022 or UTF-7? What was the situation that caused your problem? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
It was pointed out in private email that technically JSON can represent very large integers even if ECMAScript itself can't. But the idea of transmitting these offsets outside of a running process is not something that I had anticipated. It got me thinking: is there a guarantee that these opaque values returned from tell() is stable across different versions of Python? My reading of opaque is that it could be subject to change, but that possibly isn't the intent. It seems that since the sizeof(int) and sizeof(Py_off_t) could be different in different builds of Python even off the same version, then the opaque value returned is necessarily going to be different between builds of even the same version of Python. It seems like it would be prudent to discourage the sharing of these opaque cookies (such as via a database or interchange formats) as you'd have to be very sure that they would be interpreted correctly in any receiving instance. Cheers, Ben On 26 September 2016 at 02:30, Ben Leslie wrote: > I think the case of JSON or SQL database is even more important though. > > tell/seek can return 129-bit integers (maybe even more? my maths might > be off here). > > The very large integers that can be returned by tell() will break > serialization to JSON, and storing in a SQL database (at least for > most database types). > > What is the value of comparing these to plain integers? Unless you > happen to know the magic encoding it isn't going to be very useful I > think? > > Cheers, > > Ben > > On 25 September 2016 at 21:18, Guido van Rossum wrote: >> Be careful though, comparing these to plain integers should probably >> be allowed, and we also should make sure that things like >> serialization via JSON or storing in an SQL database don't break. I >> personally think it's one of those "learn not to touch the stove" >> cases and there's limited value in making this API idiot proof. >> >> On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan wrote: >>> On 26 September 2016 at 10:21, MRAB wrote: On 2016-09-26 00:21, Ben Leslie wrote: > Are there any downsides to this? I've made some progress developing a > patch to change this functionality. Is it worth polishing and > submitting? > An alternative might be a subclass of int. >>> >>> It could make sense to use a subclass of int that emitted deprecation >>> warnings for integer arithmetic, and then eventually disallowed it >>> entirely. >>> >>> Cheers, >>> Nick. >>> >>> -- >>> Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia >>> ___ >>> Python-Dev mailing list >>> Python-Dev@python.org >>> https://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> https://mail.python.org/mailman/options/python-dev/guido%40python.org >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/benno%40benno.id.au ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
I think the case of JSON or SQL database is even more important though. tell/seek can return 129-bit integers (maybe even more? my maths might be off here). The very large integers that can be returned by tell() will break serialization to JSON, and storing in a SQL database (at least for most database types). What is the value of comparing these to plain integers? Unless you happen to know the magic encoding it isn't going to be very useful I think? Cheers, Ben On 25 September 2016 at 21:18, Guido van Rossum wrote: > Be careful though, comparing these to plain integers should probably > be allowed, and we also should make sure that things like > serialization via JSON or storing in an SQL database don't break. I > personally think it's one of those "learn not to touch the stove" > cases and there's limited value in making this API idiot proof. > > On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan wrote: >> On 26 September 2016 at 10:21, MRAB wrote: >>> On 2016-09-26 00:21, Ben Leslie wrote: Are there any downsides to this? I've made some progress developing a patch to change this functionality. Is it worth polishing and submitting? >>> An alternative might be a subclass of int. >> >> It could make sense to use a subclass of int that emitted deprecation >> warnings for integer arithmetic, and then eventually disallowed it >> entirely. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benno%40benno.id.au ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On 25 September 2016 at 17:21, MRAB wrote: > On 2016-09-26 00:21, Ben Leslie wrote: >> >> Hi all, >> >> I recently shot myself in the foot by assuming that TextIO.tell >> returned integers rather than opaque cookies. Specifically I was >> adding an offset to the value returned by TextIO.tell. In retrospect >> this doesn't make sense/ >> >> Now, I don't want to drive change simply because I failed to read the >> documentation carefully, but I think the current API is very easy to >> misuse. Most of the time TextIO.tell returns a cookie that is actually >> an integer and adding an offset to it and seek-ing works fine. >> >> The only indication you get that you are mis-using the API is that >> sometimes tell returns a cookie that when you add an integer offset to >> it will cause seek() to fail with an OverflowError. >> >> Would it be possible to change the API to return something more >> opaque? E.g.: rather than converting the C cookie structure to a long, >> could it instead be converted to a bytes() object. >> >> (I.e.: Change textiowrapper_build_cookie to use >> PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and >> equivalent for textiowrapper_parse_cookie). >> >> This would ensure the return value is never mis-used and is probably >> also faster using bytes objects than converting to/from an integer. >> > why would it be faster? It's an integer internally. It isn't an integer internally though, it is a cookie: typedef struct { Py_off_t start_pos; int dec_flags; int bytes_to_feed; int chars_to_skip; char need_eof; } cookie_type; The memory view of this structure is then converted to a long. Surely converting to a PyLong is more work than converting to bytes? In any case, performance really isn't the motivation here. Cheers, Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On 25 September 2016 at 21:18, Guido van Rossum wrote: > Be careful though, comparing these to plain integers should probably > be allowed, There's a good reason why it's "opaque" ... why would you want to make it less opaque? And I'm curious why Python didn't adopt the fgetpos/fsetpos style that makes the data structure completely opaque (fpos_t). IIRC, this was added to C when the ANSI standard was first written, to allow cross-platform compatibility in cases where ftell/fseek was difficult (or impossible) to fully implement. Maybe those reasons don't matter any more (e.g., dealing with record-oriented or keyed file systems) ... > and we also should make sure that things like > serialization via JSON or storing in an SQL database don't break. I > personally think it's one of those "learn not to touch the stove" > cases and there's limited value in making this API idiot proof. > > On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan wrote: > > On 26 September 2016 at 10:21, MRAB wrote: > >> On 2016-09-26 00:21, Ben Leslie wrote: > >>> Are there any downsides to this? I've made some progress developing a > >>> patch to change this functionality. Is it worth polishing and > >>> submitting? > >>> > >> An alternative might be a subclass of int. > > > > It could make sense to use a subclass of int that emitted deprecation > > warnings for integer arithmetic, and then eventually disallowed it > > entirely. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > > ___ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > pludemann%40google.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
Be careful though, comparing these to plain integers should probably be allowed, and we also should make sure that things like serialization via JSON or storing in an SQL database don't break. I personally think it's one of those "learn not to touch the stove" cases and there's limited value in making this API idiot proof. On Sun, Sep 25, 2016 at 9:05 PM, Nick Coghlan wrote: > On 26 September 2016 at 10:21, MRAB wrote: >> On 2016-09-26 00:21, Ben Leslie wrote: >>> Are there any downsides to this? I've made some progress developing a >>> patch to change this functionality. Is it worth polishing and >>> submitting? >>> >> An alternative might be a subclass of int. > > It could make sense to use a subclass of int that emitted deprecation > warnings for integer arithmetic, and then eventually disallowed it > entirely. > > Cheers, > Nick. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On 26 September 2016 at 10:21, MRAB wrote: > On 2016-09-26 00:21, Ben Leslie wrote: >> Are there any downsides to this? I've made some progress developing a >> patch to change this functionality. Is it worth polishing and >> submitting? >> > An alternative might be a subclass of int. It could make sense to use a subclass of int that emitted deprecation warnings for integer arithmetic, and then eventually disallowed it entirely. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] TextIO seek and tell cookies
On 2016-09-26 00:21, Ben Leslie wrote: Hi all, I recently shot myself in the foot by assuming that TextIO.tell returned integers rather than opaque cookies. Specifically I was adding an offset to the value returned by TextIO.tell. In retrospect this doesn't make sense/ Now, I don't want to drive change simply because I failed to read the documentation carefully, but I think the current API is very easy to misuse. Most of the time TextIO.tell returns a cookie that is actually an integer and adding an offset to it and seek-ing works fine. The only indication you get that you are mis-using the API is that sometimes tell returns a cookie that when you add an integer offset to it will cause seek() to fail with an OverflowError. Would it be possible to change the API to return something more opaque? E.g.: rather than converting the C cookie structure to a long, could it instead be converted to a bytes() object. (I.e.: Change textiowrapper_build_cookie to use PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and equivalent for textiowrapper_parse_cookie). This would ensure the return value is never mis-used and is probably also faster using bytes objects than converting to/from an integer. why would it be faster? It's an integer internally. Are there any downsides to this? I've made some progress developing a patch to change this functionality. Is it worth polishing and submitting? An alternative might be a subclass of int. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] TextIO seek and tell cookies
Hi all, I recently shot myself in the foot by assuming that TextIO.tell returned integers rather than opaque cookies. Specifically I was adding an offset to the value returned by TextIO.tell. In retrospect this doesn't make sense/ Now, I don't want to drive change simply because I failed to read the documentation carefully, but I think the current API is very easy to misuse. Most of the time TextIO.tell returns a cookie that is actually an integer and adding an offset to it and seek-ing works fine. The only indication you get that you are mis-using the API is that sometimes tell returns a cookie that when you add an integer offset to it will cause seek() to fail with an OverflowError. Would it be possible to change the API to return something more opaque? E.g.: rather than converting the C cookie structure to a long, could it instead be converted to a bytes() object. (I.e.: Change textiowrapper_build_cookie to use PyBytes_FromStringAndSize rather than _PyLong_FromByteArray and equivalent for textiowrapper_parse_cookie). This would ensure the return value is never mis-used and is probably also faster using bytes objects than converting to/from an integer. Are there any downsides to this? I've made some progress developing a patch to change this functionality. Is it worth polishing and submitting? Cheers, Ben ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com