Construct has radical API changes and should remain apart. It feels to me like a straw-man to introduce a large library to the discussion as justification for it being too-specialized.
This proposal to me seems much more modest: add another format character (or two) to the existing set of a dozen or so that will be packed/unpacked just like the others. It also has demonstrable use in various formats/protocols. On Thu, Jan 19, 2017 at 12:50 PM, Nathaniel Smith <n...@pobox.com> wrote: > I haven't had a chance to use it myself yet, but I've heard good things > about > > https://construct.readthedocs.io/en/latest/ > > It's certainly far more comprehensive than struct for this and other > problems. > > As usual, there's some tension between adding stuff to the stdlib versus > using more specialized third-party packages. The existence of packages like > construct doesn't automatically mean that we should stop improving the > stdlib, but OTOH not every useful thing can or should be in the stdlib. > > Personally, I find myself parsing uleb128-prefixed strings more often than > u4-prefixed strings. > > On Jan 19, 2017 10:42 AM, "Nick Timkovich" <prometheus...@gmail.com> > wrote: > >> ctypes.Structure is *literally* the interface to the C struct that as >> Chris mentions has fixed offsets for all members. I don't think that should >> (can?) be altered. >> >> In file formats (beyond net protocols) the string size + variable length >> string motif comes up often and I am frequently re-implementing the >> two-line read-an-int + read-{}.format-bytes. >> >> On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno <jsbu...@python.org.br >> > wrote: >> >>> I am for upgrading struct to these, if possible. >>> >>> But besides my +1, I am writting in to remember folks thatthere is >>> another >>> "struct" model in the stdlib: >>> >>> ctypes.Structure - >>> >>> For reading a lot of records with the same structure it is much more >>> handy than >>> struct, since it gives one a suitable Python object on instantiation. >>> >>> However, it also can't handle variable lenght fields automatically. >>> >>> But maybe, the improvement could be made on that side, or another package >>> altogether taht works more like it than current "struct". >>> >>> >>> >>> On 19 January 2017 at 16:08, Elizabeth Myers <elizab...@interlinked.me> >>> wrote: >>> > On 19/01/17 06:47, Elizabeth Myers wrote: >>> >> On 19/01/17 05:58, Rhodri James wrote: >>> >>> On 19/01/17 08:31, Mark Dickinson wrote: >>> >>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano < >>> st...@pearwood.info> >>> >>>> wrote: >>> >>>>> [...] struct already supports >>> >>>>> variable-width formats. >>> >>>> >>> >>>> Unfortunately, that's not really true: the Pascal strings it >>> supports >>> >>>> are in some sense variable length, but are stored in a fixed-width >>> >>>> field. The internals of the struct module rely on each field >>> starting >>> >>>> at a fixed offset, computable directly from the format string. I >>> don't >>> >>>> think variable-length fields would be a good fit for the current >>> >>>> design of the struct module. >>> >>>> >>> >>>> For the OPs use-case, I'd suggest a library that sits on top of the >>> >>>> struct module, rather than an expansion to the struct module itself. >>> >>> >>> >>> Unfortunately as the OP explained, this makes the struct module a >>> poor >>> >>> fit for protocol decoding, even as a base layer for something. It's >>> one >>> >>> of the things I use python for quite frequently, and I always end up >>> >>> rolling my own and discarding struct entirely. >>> >>> >>> >> >>> >> Yes, for variable-length fields the struct module is worse than >>> useless: >>> >> it actually reduces clarity a little. Consider: >>> >> >>> >>>>> test_bytes = b'\x00\x00\x00\x0chello world!' >>> >> >>> >> With this, you can do: >>> >> >>> >>>>> length = int.from_bytes(test_bytes[:4], 'big') >>> >>>>> string = test_bytes[4:length] >>> >> >>> >> or you can do: >>> >> >>> >>>>> length = struct.unpack_from('!I', test_bytes)[0] >>> >>>>> string = struct.unpack_from('{}s'.format(length), test_bytes, >>> 4)[0] >>> >> >>> >> Which looks more readable without consulting the docs? ;) >>> >> >>> >> Building anything on top of the struct library like this would lead to >>> >> worse-looking code for minimal gains in efficiency. To quote Jamie >>> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it >>> >> stands. >>> >> >>> >> If we had an extension similar to netstruct: >>> >> >>> >>>>> length, string = struct.unpack('!I$', test_bytes) >>> >> >>> >> MUCH improved readability, and also less verbose. :) >>> > >>> > I also didn't mention that when you are unpacking iteratively (e.g., >>> you >>> > have multiple strings), the code becomes a bit more hairy: >>> > >>> >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test' >>> >>>> offset = 0 >>> >>>> while offset < len(test_bytes): >>> > ... length = struct.unpack_from('!H', test_bytes, offset)[0] >>> > ... offset += 2 >>> > ... string = struct.unpack_from('{}s'.format(length), test_bytes, >>> > offset)[0] >>> > ... offset += length >>> > >>> > It actually gets a lot worse when you have to unpack a set of strings >>> in >>> > a context-sensitive manner. You have to be sure to update the offset >>> > constantly so you can always unpack strings appropriately. Yuck! >>> > >>> > It's worth mentioning that a few years ago, a coworker and I found >>> > ourselves needing variable length strings in the context of a binary >>> > protocol (DHCP), and wound up abandoning the struct module entirely >>> > because it was unsuitable. My co-worker said the same thing I did: >>> "it's >>> > like building a bookshelf out of mashed potatoes." >>> > >>> > I do understand it might require a possible major rewrite or major >>> > changes the struct module, but in the long run, I think it's worth it >>> > (especially because the struct module is not all that big in scope). As >>> > it stands, the struct module simply is not suited for protocols where >>> > you have variable-length strings, and in my experience, that is the >>> vast >>> > majority of modern binary protocols on the Internet. >>> > >>> > -- >>> > Elizabeth >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas@python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas@python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/