Re: [Python-ideas] Ideas for improving the struct module
On Fri, Jan 20, 2017 at 12:30 AM, Steven D'Aprano wrote: > Does it require a PEP just to add one more > format code? (Maybe it will, if the format code requires a complete > re-write of the entire module.) Yes, I think a PEP would be useful in this case. The proposed change *would* entail some fairly substantial changes to the design of the module (I encourage you to take a look at the source to appreciate what's involved), and if we're going to that level of effort it's probably worth stepping back and seeing whether those changes are compatible with other proposed directions for the struct module, and whether it makes sense to do more than add that one format code. That level of change probably isn't worth it "just to add one more format code", but might be worth it if it allows other possible expansions of the struct module functionality. There are also performance considerations to look at, behaviour of alignment to consider, and other details. -- Mark ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On 19Jan2017 16:04, Yury Selivanov wrote: This is a neat idea, but this will only work for parsing framed binary protocols. For example, if you protocol prefixes all packets with a length field, you can write an efficient read buffer and use your proposal to decode all of message's fields in one shot. Which is good. Not all protocols use framing though. For instance, your proposal won't help to write Thrift or Postgres protocols parsers. Sure, but a lot of things fit the proposal. Seems a win: both simple and useful. Overall, I'm not sure that this is worth the hassle. With proposal: data, = struct.unpack('!H$', buf) buf = buf[2+len(data):] with the current struct module: len, = struct.unpack('!H', buf) data = buf[2:2+len] buf = buf[2+len:] Another thing: struct.calcsize won't work with structs that use variable length fields. True, but it would be enough for it to raise an exception of some kind. It won't break any in play code, and it will prevent accidents for users of new variable sizes formats. We've all got things we wish struct might cover (I have a few, but strangely the top of the list is nonsemantic: I wish it let me put meaningless whitespace inside the format for readability). +1 on the proposal from me. Oh: subject to one proviso: reading a struct will need to return how many bytes of input data were scanned, not merely returning the decoded values. Cheers, Cameron Simpson ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On 19Jan2017 12:08, Elizabeth Myers wrote: I also didn't mention that when you are unpacking iteratively (e.g., you have multiple strings), the code becomes a bit more hairy: test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' offset = 0 while offset < len(test_bytes): ... length = struct.unpack_from('!H', test_bytes, offset)[0] ... offset += 2 ... string = struct.unpack_from('{}s'.format(length), test_bytes, offset)[0] ... offset += length It actually gets a lot worse when you have to unpack a set of strings in a context-sensitive manner. You have to be sure to update the offset constantly so you can always unpack strings appropriately. Yuck! Whenever I'm doing iterative stuff like this, either variable length binary or lexical stuff, I always end up with a bunch of functions which can be called like this: datalen, offset = get_bs(chunk, offset=offset) The notable thing here is just that they return the data and the new offset, which makes updating the offset impossible to forget, and also makes the calling code more succinct, like the internal call to get_bs() below: such as this decoder for a length encoded field: def get_bsdata(chunk, offset=0): ''' Fetch a length-prefixed data chunk. Decodes an unsigned value from a bytes at the specified `offset` (default 0), and collects that many following bytes. Return those following bytes and the new offset. ''' ##is_bytes(chunk) offset0 = offset datalen, offset = get_bs(chunk, offset=offset) data = chunk[offset:offset+datalen] ##is_bytes(data) if len(data) != datalen: raise ValueError("bsdata(chunk, offset=%d): insufficient data: expected %d bytes, got %d bytes" % (offset0, datalen, len(data))) offset += datalen return data, offset Cheers, Cameron Simpson ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
Nevertheless the C meaning *is* the etymology of the module name. :-) --Guido (mobile) On Jan 19, 2017 16:54, "Chris Angelico" wrote: > On Fri, Jan 20, 2017 at 11:38 AM, Steven D'Aprano > wrote: > > On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote: > > > >> To be fair, the name "struct" implies a C-style structure, which > >> _does_ have a fixed size, or at least fixed offsets for its members > > > > > > Ah, the old "everyone thinks in C terms" fallacy raises its ugly head > > agan :-) > > > > The name doesn't imply any such thing to me, or those who haven't been > > raised on C. It implies the word "structure", which has no implication > > of being fixed-width. > > Fair point. Objection retracted - and it was only minor anyway. This > would be a handy feature to add. +1. > > ChrisA > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On Fri, Jan 20, 2017 at 11:38 AM, Steven D'Aprano wrote: > On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote: > >> To be fair, the name "struct" implies a C-style structure, which >> _does_ have a fixed size, or at least fixed offsets for its members > > > Ah, the old "everyone thinks in C terms" fallacy raises its ugly head > agan :-) > > The name doesn't imply any such thing to me, or those who haven't been > raised on C. It implies the word "structure", which has no implication > of being fixed-width. Fair point. Objection retracted - and it was only minor anyway. This would be a handy feature to add. +1. ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote: > To be fair, the name "struct" implies a C-style structure, which > _does_ have a fixed size, or at least fixed offsets for its members Ah, the old "everyone thinks in C terms" fallacy raises its ugly head agan :-) The name doesn't imply any such thing to me, or those who haven't been raised on C. It implies the word "structure", which has no implication of being fixed-width. The docs for the struct module describes it as: struct — Interpret bytes as packed binary data which applies equally to the fixed- and variable-width case. The fact that we can sensibly talk about "fixed-width" and "variable-width" structs without confusion, shows that the concept is bigger than the C data-type. (Even if the most common use will probably remain C-style fixed-width structs.) Python is not C, and we shouldn't be limited by what C does. If we wanted C, we would use C. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On Thu, Jan 19, 2017 at 08:31:03AM +, Mark Dickinson wrote: > On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: > > [...] struct already supports > > variable-width formats. > > Unfortunately, that's not really true: the Pascal strings it supports > are in some sense variable length, but are stored in a fixed-width > field. The internals of the struct module rely on each field starting > at a fixed offset, computable directly from the format string. I don't > think variable-length fields would be a good fit for the current > design of the struct module. I know nothing and care even less (is caring a negative amount possible?) about the internal implementation of the struct module. Since Elizabeth is volunteering to do the work to make it work, will it be accepted? Subject to the usual code quality reviews, contributor agreement, etc. Are there objections to the *idea* of adding support for null terminated strings to the struct module? Does it require a PEP just to add one more format code? (Maybe it will, if the format code requires a complete re-write of the entire module.) It seems to me that if Elizabeth is willing to do the work, and somebody to review it, this would be a welcome addition to the module. It would require at least one API change: struct.calcsize won't work for formats containing null-terminated strings. But that's a minor matter. -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
There is now an issue for this: http://bugs.python.org/issue29328 -- ~Ethan~ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
This is a neat idea, but this will only work for parsing framed binary protocols. For example, if you protocol prefixes all packets with a length field, you can write an efficient read buffer and use your proposal to decode all of message's fields in one shot. Which is good. Not all protocols use framing though. For instance, your proposal won't help to write Thrift or Postgres protocols parsers. Overall, I'm not sure that this is worth the hassle. With proposal: data, = struct.unpack('!H$', buf) buf = buf[2+len(data):] with the current struct module: len, = struct.unpack('!H', buf) data = buf[2:2+len] buf = buf[2+len:] Another thing: struct.calcsize won't work with structs that use variable length fields. Yury On 2017-01-18 5:24 AM, Elizabeth Myers wrote: Hello, I've noticed a lot of binary protocols require variable length bytestrings (with or without a null terminator), but it is not easy to unpack these in Python without first reading the desired length, or reading bytes until a null terminator is reached. I've noticed the netstruct library (https://github.com/stendec/netstruct) has a format specifier, $, which assumes the previous type to pack/unpack is the string's length. This is an interesting idea in of itself, but doesn't handle the null-terminated string chase. I know $ is similar to pascal strings, but sometimes you need more than 255 characters :p. For null-terminated strings, it may be simpler to have a specifier for those. I propose 0, but this point can be bikeshedded over endlessly if desired ;) (I thought about using n/N but they're :P). It's worth noting that (maybe one of?) Perl's equivalent to the struct module, whose name escapes me atm, has a module which can handle this case. I can't remember if it handled variable length or zero-terminated though; maybe it did both. Perl is more or less my 10th language. :p This pain point is an annoyance imo and would greatly simplify a lot of code if implemented, or something like it. I'd be happy to take a look at implementing it if the idea is received sufficiently warmly. -- Elizabeth ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On 2017-01-19 12:47, Elizabeth Myers wrote: On 19/01/17 05:58, Rhodri James wrote: On 19/01/17 08:31, Mark Dickinson wrote: On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: [...] struct already supports variable-width formats. Unfortunately, that's not really true: the Pascal strings it supports are in some sense variable length, but are stored in a fixed-width field. The internals of the struct module rely on each field starting at a fixed offset, computable directly from the format string. I don't think variable-length fields would be a good fit for the current design of the struct module. For the OPs use-case, I'd suggest a library that sits on top of the struct module, rather than an expansion to the struct module itself. Unfortunately as the OP explained, this makes the struct module a poor fit for protocol decoding, even as a base layer for something. It's one of the things I use python for quite frequently, and I always end up rolling my own and discarding struct entirely. Yes, for variable-length fields the struct module is worse than useless: it actually reduces clarity a little. Consider: test_bytes = b'\x00\x00\x00\x0chello world!' With this, you can do: length = int.from_bytes(test_bytes[:4], 'big') string = test_bytes[4:length] Shouldn't that be: string = test_bytes[4:4+length] or you can do: length = struct.unpack_from('!I', test_bytes)[0] string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] Which looks more readable without consulting the docs? ;) Which is more likely to be correct? :-) Building anything on top of the struct library like this would lead to worse-looking code for minimal gains in efficiency. To quote Jamie Zawinksi, it is like building a bookshelf out of mashed potatoes as it stands. If we had an extension similar to netstruct: length, string = struct.unpack('!I$', test_bytes) MUCH improved readability, and also less verbose. :) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
Construct has radical API changes and should remain apart. It feels to me like a straw-man to introduce a large library to the discussion as justification for it being too-specialized. This proposal to me seems much more modest: add another format character (or two) to the existing set of a dozen or so that will be packed/unpacked just like the others. It also has demonstrable use in various formats/protocols. On Thu, Jan 19, 2017 at 12:50 PM, Nathaniel Smith wrote: > I haven't had a chance to use it myself yet, but I've heard good things > about > > https://construct.readthedocs.io/en/latest/ > > It's certainly far more comprehensive than struct for this and other > problems. > > As usual, there's some tension between adding stuff to the stdlib versus > using more specialized third-party packages. The existence of packages like > construct doesn't automatically mean that we should stop improving the > stdlib, but OTOH not every useful thing can or should be in the stdlib. > > Personally, I find myself parsing uleb128-prefixed strings more often than > u4-prefixed strings. > > On Jan 19, 2017 10:42 AM, "Nick Timkovich" > wrote: > >> ctypes.Structure is *literally* the interface to the C struct that as >> Chris mentions has fixed offsets for all members. I don't think that should >> (can?) be altered. >> >> In file formats (beyond net protocols) the string size + variable length >> string motif comes up often and I am frequently re-implementing the >> two-line read-an-int + read-{}.format-bytes. >> >> On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno > > wrote: >> >>> I am for upgrading struct to these, if possible. >>> >>> But besides my +1, I am writting in to remember folks thatthere is >>> another >>> "struct" model in the stdlib: >>> >>> ctypes.Structure - >>> >>> For reading a lot of records with the same structure it is much more >>> handy than >>> struct, since it gives one a suitable Python object on instantiation. >>> >>> However, it also can't handle variable lenght fields automatically. >>> >>> But maybe, the improvement could be made on that side, or another package >>> altogether taht works more like it than current "struct". >>> >>> >>> >>> On 19 January 2017 at 16:08, Elizabeth Myers >>> wrote: >>> > On 19/01/17 06:47, Elizabeth Myers wrote: >>> >> On 19/01/17 05:58, Rhodri James wrote: >>> >>> On 19/01/17 08:31, Mark Dickinson wrote: >>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano < >>> st...@pearwood.info> >>> wrote: >>> > [...] struct already supports >>> > variable-width formats. >>> >>> Unfortunately, that's not really true: the Pascal strings it >>> supports >>> are in some sense variable length, but are stored in a fixed-width >>> field. The internals of the struct module rely on each field >>> starting >>> at a fixed offset, computable directly from the format string. I >>> don't >>> think variable-length fields would be a good fit for the current >>> design of the struct module. >>> >>> For the OPs use-case, I'd suggest a library that sits on top of the >>> struct module, rather than an expansion to the struct module itself. >>> >>> >>> >>> Unfortunately as the OP explained, this makes the struct module a >>> poor >>> >>> fit for protocol decoding, even as a base layer for something. It's >>> one >>> >>> of the things I use python for quite frequently, and I always end up >>> >>> rolling my own and discarding struct entirely. >>> >>> >>> >> >>> >> Yes, for variable-length fields the struct module is worse than >>> useless: >>> >> it actually reduces clarity a little. Consider: >>> >> >>> > test_bytes = b'\x00\x00\x00\x0chello world!' >>> >> >>> >> With this, you can do: >>> >> >>> > length = int.from_bytes(test_bytes[:4], 'big') >>> > string = test_bytes[4:length] >>> >> >>> >> or you can do: >>> >> >>> > length = struct.unpack_from('!I', test_bytes)[0] >>> > string = struct.unpack_from('{}s'.format(length), test_bytes, >>> 4)[0] >>> >> >>> >> Which looks more readable without consulting the docs? ;) >>> >> >>> >> Building anything on top of the struct library like this would lead to >>> >> worse-looking code for minimal gains in efficiency. To quote Jamie >>> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >>> >> stands. >>> >> >>> >> If we had an extension similar to netstruct: >>> >> >>> > length, string = struct.unpack('!I$', test_bytes) >>> >> >>> >> MUCH improved readability, and also less verbose. :) >>> > >>> > I also didn't mention that when you are unpacking iteratively (e.g., >>> you >>> > have multiple strings), the code becomes a bit more hairy: >>> > >>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>> offset = 0 >>> while offset < len(test_bytes): >>> > ... length = struct.unpack_from('!H', test_bytes, offset)[0] >>> > ... offset += 2 >>> > ... string = struct.unpack_from('{}s'.format(length), test_
Re: [Python-ideas] Ideas for improving the struct module
I haven't had a chance to use it myself yet, but I've heard good things about https://construct.readthedocs.io/en/latest/ It's certainly far more comprehensive than struct for this and other problems. As usual, there's some tension between adding stuff to the stdlib versus using more specialized third-party packages. The existence of packages like construct doesn't automatically mean that we should stop improving the stdlib, but OTOH not every useful thing can or should be in the stdlib. Personally, I find myself parsing uleb128-prefixed strings more often than u4-prefixed strings. On Jan 19, 2017 10:42 AM, "Nick Timkovich" wrote: > ctypes.Structure is *literally* the interface to the C struct that as > Chris mentions has fixed offsets for all members. I don't think that should > (can?) be altered. > > In file formats (beyond net protocols) the string size + variable length > string motif comes up often and I am frequently re-implementing the > two-line read-an-int + read-{}.format-bytes. > > On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno > wrote: > >> I am for upgrading struct to these, if possible. >> >> But besides my +1, I am writting in to remember folks thatthere is >> another >> "struct" model in the stdlib: >> >> ctypes.Structure - >> >> For reading a lot of records with the same structure it is much more >> handy than >> struct, since it gives one a suitable Python object on instantiation. >> >> However, it also can't handle variable lenght fields automatically. >> >> But maybe, the improvement could be made on that side, or another package >> altogether taht works more like it than current "struct". >> >> >> >> On 19 January 2017 at 16:08, Elizabeth Myers >> wrote: >> > On 19/01/17 06:47, Elizabeth Myers wrote: >> >> On 19/01/17 05:58, Rhodri James wrote: >> >>> On 19/01/17 08:31, Mark Dickinson wrote: >> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano < >> st...@pearwood.info> >> wrote: >> > [...] struct already supports >> > variable-width formats. >> >> Unfortunately, that's not really true: the Pascal strings it supports >> are in some sense variable length, but are stored in a fixed-width >> field. The internals of the struct module rely on each field starting >> at a fixed offset, computable directly from the format string. I >> don't >> think variable-length fields would be a good fit for the current >> design of the struct module. >> >> For the OPs use-case, I'd suggest a library that sits on top of the >> struct module, rather than an expansion to the struct module itself. >> >>> >> >>> Unfortunately as the OP explained, this makes the struct module a poor >> >>> fit for protocol decoding, even as a base layer for something. It's >> one >> >>> of the things I use python for quite frequently, and I always end up >> >>> rolling my own and discarding struct entirely. >> >>> >> >> >> >> Yes, for variable-length fields the struct module is worse than >> useless: >> >> it actually reduces clarity a little. Consider: >> >> >> > test_bytes = b'\x00\x00\x00\x0chello world!' >> >> >> >> With this, you can do: >> >> >> > length = int.from_bytes(test_bytes[:4], 'big') >> > string = test_bytes[4:length] >> >> >> >> or you can do: >> >> >> > length = struct.unpack_from('!I', test_bytes)[0] >> > string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] >> >> >> >> Which looks more readable without consulting the docs? ;) >> >> >> >> Building anything on top of the struct library like this would lead to >> >> worse-looking code for minimal gains in efficiency. To quote Jamie >> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >> >> stands. >> >> >> >> If we had an extension similar to netstruct: >> >> >> > length, string = struct.unpack('!I$', test_bytes) >> >> >> >> MUCH improved readability, and also less verbose. :) >> > >> > I also didn't mention that when you are unpacking iteratively (e.g., you >> > have multiple strings), the code becomes a bit more hairy: >> > >> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >> offset = 0 >> while offset < len(test_bytes): >> > ... length = struct.unpack_from('!H', test_bytes, offset)[0] >> > ... offset += 2 >> > ... string = struct.unpack_from('{}s'.format(length), test_bytes, >> > offset)[0] >> > ... offset += length >> > >> > It actually gets a lot worse when you have to unpack a set of strings in >> > a context-sensitive manner. You have to be sure to update the offset >> > constantly so you can always unpack strings appropriately. Yuck! >> > >> > It's worth mentioning that a few years ago, a coworker and I found >> > ourselves needing variable length strings in the context of a binary >> > protocol (DHCP), and wound up abandoning the struct module entirely >> > because it was unsuitable. My co-worker said the same thing I did: "it's >> > like building a bookshelf out of mash
Re: [Python-ideas] Ideas for improving the struct module
ctypes.Structure is *literally* the interface to the C struct that as Chris mentions has fixed offsets for all members. I don't think that should (can?) be altered. In file formats (beyond net protocols) the string size + variable length string motif comes up often and I am frequently re-implementing the two-line read-an-int + read-{}.format-bytes. On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno wrote: > I am for upgrading struct to these, if possible. > > But besides my +1, I am writting in to remember folks thatthere is another > "struct" model in the stdlib: > > ctypes.Structure - > > For reading a lot of records with the same structure it is much more handy > than > struct, since it gives one a suitable Python object on instantiation. > > However, it also can't handle variable lenght fields automatically. > > But maybe, the improvement could be made on that side, or another package > altogether taht works more like it than current "struct". > > > > On 19 January 2017 at 16:08, Elizabeth Myers > wrote: > > On 19/01/17 06:47, Elizabeth Myers wrote: > >> On 19/01/17 05:58, Rhodri James wrote: > >>> On 19/01/17 08:31, Mark Dickinson wrote: > On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano > > wrote: > > [...] struct already supports > > variable-width formats. > > Unfortunately, that's not really true: the Pascal strings it supports > are in some sense variable length, but are stored in a fixed-width > field. The internals of the struct module rely on each field starting > at a fixed offset, computable directly from the format string. I don't > think variable-length fields would be a good fit for the current > design of the struct module. > > For the OPs use-case, I'd suggest a library that sits on top of the > struct module, rather than an expansion to the struct module itself. > >>> > >>> Unfortunately as the OP explained, this makes the struct module a poor > >>> fit for protocol decoding, even as a base layer for something. It's > one > >>> of the things I use python for quite frequently, and I always end up > >>> rolling my own and discarding struct entirely. > >>> > >> > >> Yes, for variable-length fields the struct module is worse than useless: > >> it actually reduces clarity a little. Consider: > >> > > test_bytes = b'\x00\x00\x00\x0chello world!' > >> > >> With this, you can do: > >> > > length = int.from_bytes(test_bytes[:4], 'big') > > string = test_bytes[4:length] > >> > >> or you can do: > >> > > length = struct.unpack_from('!I', test_bytes)[0] > > string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] > >> > >> Which looks more readable without consulting the docs? ;) > >> > >> Building anything on top of the struct library like this would lead to > >> worse-looking code for minimal gains in efficiency. To quote Jamie > >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it > >> stands. > >> > >> If we had an extension similar to netstruct: > >> > > length, string = struct.unpack('!I$', test_bytes) > >> > >> MUCH improved readability, and also less verbose. :) > > > > I also didn't mention that when you are unpacking iteratively (e.g., you > > have multiple strings), the code becomes a bit more hairy: > > > test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' > offset = 0 > while offset < len(test_bytes): > > ... length = struct.unpack_from('!H', test_bytes, offset)[0] > > ... offset += 2 > > ... string = struct.unpack_from('{}s'.format(length), test_bytes, > > offset)[0] > > ... offset += length > > > > It actually gets a lot worse when you have to unpack a set of strings in > > a context-sensitive manner. You have to be sure to update the offset > > constantly so you can always unpack strings appropriately. Yuck! > > > > It's worth mentioning that a few years ago, a coworker and I found > > ourselves needing variable length strings in the context of a binary > > protocol (DHCP), and wound up abandoning the struct module entirely > > because it was unsuitable. My co-worker said the same thing I did: "it's > > like building a bookshelf out of mashed potatoes." > > > > I do understand it might require a possible major rewrite or major > > changes the struct module, but in the long run, I think it's worth it > > (especially because the struct module is not all that big in scope). As > > it stands, the struct module simply is not suited for protocols where > > you have variable-length strings, and in my experience, that is the vast > > majority of modern binary protocols on the Internet. > > > > -- > > Elizabeth > > ___ > > Python-ideas mailing list > > Python-ideas@python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ > Python-ideas mailing list > Python-ideas@
Re: [Python-ideas] Ideas for improving the struct module
I am for upgrading struct to these, if possible. But besides my +1, I am writting in to remember folks thatthere is another "struct" model in the stdlib: ctypes.Structure - For reading a lot of records with the same structure it is much more handy than struct, since it gives one a suitable Python object on instantiation. However, it also can't handle variable lenght fields automatically. But maybe, the improvement could be made on that side, or another package altogether taht works more like it than current "struct". On 19 January 2017 at 16:08, Elizabeth Myers wrote: > On 19/01/17 06:47, Elizabeth Myers wrote: >> On 19/01/17 05:58, Rhodri James wrote: >>> On 19/01/17 08:31, Mark Dickinson wrote: On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: > [...] struct already supports > variable-width formats. Unfortunately, that's not really true: the Pascal strings it supports are in some sense variable length, but are stored in a fixed-width field. The internals of the struct module rely on each field starting at a fixed offset, computable directly from the format string. I don't think variable-length fields would be a good fit for the current design of the struct module. For the OPs use-case, I'd suggest a library that sits on top of the struct module, rather than an expansion to the struct module itself. >>> >>> Unfortunately as the OP explained, this makes the struct module a poor >>> fit for protocol decoding, even as a base layer for something. It's one >>> of the things I use python for quite frequently, and I always end up >>> rolling my own and discarding struct entirely. >>> >> >> Yes, for variable-length fields the struct module is worse than useless: >> it actually reduces clarity a little. Consider: >> > test_bytes = b'\x00\x00\x00\x0chello world!' >> >> With this, you can do: >> > length = int.from_bytes(test_bytes[:4], 'big') > string = test_bytes[4:length] >> >> or you can do: >> > length = struct.unpack_from('!I', test_bytes)[0] > string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] >> >> Which looks more readable without consulting the docs? ;) >> >> Building anything on top of the struct library like this would lead to >> worse-looking code for minimal gains in efficiency. To quote Jamie >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >> stands. >> >> If we had an extension similar to netstruct: >> > length, string = struct.unpack('!I$', test_bytes) >> >> MUCH improved readability, and also less verbose. :) > > I also didn't mention that when you are unpacking iteratively (e.g., you > have multiple strings), the code becomes a bit more hairy: > test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' offset = 0 while offset < len(test_bytes): > ... length = struct.unpack_from('!H', test_bytes, offset)[0] > ... offset += 2 > ... string = struct.unpack_from('{}s'.format(length), test_bytes, > offset)[0] > ... offset += length > > It actually gets a lot worse when you have to unpack a set of strings in > a context-sensitive manner. You have to be sure to update the offset > constantly so you can always unpack strings appropriately. Yuck! > > It's worth mentioning that a few years ago, a coworker and I found > ourselves needing variable length strings in the context of a binary > protocol (DHCP), and wound up abandoning the struct module entirely > because it was unsuitable. My co-worker said the same thing I did: "it's > like building a bookshelf out of mashed potatoes." > > I do understand it might require a possible major rewrite or major > changes the struct module, but in the long run, I think it's worth it > (especially because the struct module is not all that big in scope). As > it stands, the struct module simply is not suited for protocols where > you have variable-length strings, and in my experience, that is the vast > majority of modern binary protocols on the Internet. > > -- > Elizabeth > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On Fri, Jan 20, 2017 at 5:08 AM, Elizabeth Myers wrote: > I do understand it might require a possible major rewrite or major > changes the struct module, but in the long run, I think it's worth it > (especially because the struct module is not all that big in scope). As > it stands, the struct module simply is not suited for protocols where > you have variable-length strings, and in my experience, that is the vast > majority of modern binary protocols on the Internet. > To be fair, the name "struct" implies a C-style structure, which _does_ have a fixed size, or at least fixed offsets for its members (the last member can be variable-sized). A quick search of PyPI shows up a struct-variant specifically designed for network protocols: https://pypi.python.org/pypi/netstruct/1.1.2 It even uses the dollar sign as you describe. So perhaps what you're looking for is this module coming into the stdlib? ChrisA ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On 19/01/17 06:47, Elizabeth Myers wrote: > On 19/01/17 05:58, Rhodri James wrote: >> On 19/01/17 08:31, Mark Dickinson wrote: >>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano >>> wrote: [...] struct already supports variable-width formats. >>> >>> Unfortunately, that's not really true: the Pascal strings it supports >>> are in some sense variable length, but are stored in a fixed-width >>> field. The internals of the struct module rely on each field starting >>> at a fixed offset, computable directly from the format string. I don't >>> think variable-length fields would be a good fit for the current >>> design of the struct module. >>> >>> For the OPs use-case, I'd suggest a library that sits on top of the >>> struct module, rather than an expansion to the struct module itself. >> >> Unfortunately as the OP explained, this makes the struct module a poor >> fit for protocol decoding, even as a base layer for something. It's one >> of the things I use python for quite frequently, and I always end up >> rolling my own and discarding struct entirely. >> > > Yes, for variable-length fields the struct module is worse than useless: > it actually reduces clarity a little. Consider: > test_bytes = b'\x00\x00\x00\x0chello world!' > > With this, you can do: > length = int.from_bytes(test_bytes[:4], 'big') string = test_bytes[4:length] > > or you can do: > length = struct.unpack_from('!I', test_bytes)[0] string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] > > Which looks more readable without consulting the docs? ;) > > Building anything on top of the struct library like this would lead to > worse-looking code for minimal gains in efficiency. To quote Jamie > Zawinksi, it is like building a bookshelf out of mashed potatoes as it > stands. > > If we had an extension similar to netstruct: > length, string = struct.unpack('!I$', test_bytes) > > MUCH improved readability, and also less verbose. :) I also didn't mention that when you are unpacking iteratively (e.g., you have multiple strings), the code becomes a bit more hairy: >>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>> offset = 0 >>> while offset < len(test_bytes): ... length = struct.unpack_from('!H', test_bytes, offset)[0] ... offset += 2 ... string = struct.unpack_from('{}s'.format(length), test_bytes, offset)[0] ... offset += length It actually gets a lot worse when you have to unpack a set of strings in a context-sensitive manner. You have to be sure to update the offset constantly so you can always unpack strings appropriately. Yuck! It's worth mentioning that a few years ago, a coworker and I found ourselves needing variable length strings in the context of a binary protocol (DHCP), and wound up abandoning the struct module entirely because it was unsuitable. My co-worker said the same thing I did: "it's like building a bookshelf out of mashed potatoes." I do understand it might require a possible major rewrite or major changes the struct module, but in the long run, I think it's worth it (especially because the struct module is not all that big in scope). As it stands, the struct module simply is not suited for protocols where you have variable-length strings, and in my experience, that is the vast majority of modern binary protocols on the Internet. -- Elizabeth ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On 19/01/17 05:58, Rhodri James wrote: > On 19/01/17 08:31, Mark Dickinson wrote: >> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano >> wrote: >>> [...] struct already supports >>> variable-width formats. >> >> Unfortunately, that's not really true: the Pascal strings it supports >> are in some sense variable length, but are stored in a fixed-width >> field. The internals of the struct module rely on each field starting >> at a fixed offset, computable directly from the format string. I don't >> think variable-length fields would be a good fit for the current >> design of the struct module. >> >> For the OPs use-case, I'd suggest a library that sits on top of the >> struct module, rather than an expansion to the struct module itself. > > Unfortunately as the OP explained, this makes the struct module a poor > fit for protocol decoding, even as a base layer for something. It's one > of the things I use python for quite frequently, and I always end up > rolling my own and discarding struct entirely. > Yes, for variable-length fields the struct module is worse than useless: it actually reduces clarity a little. Consider: >>> test_bytes = b'\x00\x00\x00\x0chello world!' With this, you can do: >>> length = int.from_bytes(test_bytes[:4], 'big') >>> string = test_bytes[4:length] or you can do: >>> length = struct.unpack_from('!I', test_bytes)[0] >>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0] Which looks more readable without consulting the docs? ;) Building anything on top of the struct library like this would lead to worse-looking code for minimal gains in efficiency. To quote Jamie Zawinksi, it is like building a bookshelf out of mashed potatoes as it stands. If we had an extension similar to netstruct: >>> length, string = struct.unpack('!I$', test_bytes) MUCH improved readability, and also less verbose. :) ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On 19/01/17 08:31, Mark Dickinson wrote: On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: [...] struct already supports variable-width formats. Unfortunately, that's not really true: the Pascal strings it supports are in some sense variable length, but are stored in a fixed-width field. The internals of the struct module rely on each field starting at a fixed offset, computable directly from the format string. I don't think variable-length fields would be a good fit for the current design of the struct module. For the OPs use-case, I'd suggest a library that sits on top of the struct module, rather than an expansion to the struct module itself. Unfortunately as the OP explained, this makes the struct module a poor fit for protocol decoding, even as a base layer for something. It's one of the things I use python for quite frequently, and I always end up rolling my own and discarding struct entirely. -- Rhodri James *-* Kynesim Ltd ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Ideas for improving the struct module
On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano wrote: > [...] struct already supports > variable-width formats. Unfortunately, that's not really true: the Pascal strings it supports are in some sense variable length, but are stored in a fixed-width field. The internals of the struct module rely on each field starting at a fixed offset, computable directly from the format string. I don't think variable-length fields would be a good fit for the current design of the struct module. For the OPs use-case, I'd suggest a library that sits on top of the struct module, rather than an expansion to the struct module itself. -- Mark ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/