Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Mark Dickinson
On Fri, Jan 20, 2017 at 12:30 AM, Steven D'Aprano  wrote:
> Does it require a PEP just to add one more
> format code? (Maybe it will, if the format code requires a complete
> re-write of the entire module.)

Yes, I think a PEP would be useful in this case. The proposed change
*would* entail some fairly substantial changes to the design of the
module (I encourage you to take a look at the source to appreciate
what's involved), and if we're going to that level of effort it's
probably worth stepping back and seeing whether those changes are
compatible with other proposed directions for the struct module, and
whether it makes sense to do more than add that one format code. That
level of change probably isn't worth it "just to add one more format
code", but might be worth it if it allows other possible expansions of
the struct module functionality. There are also performance
considerations to look at, behaviour of alignment to consider, and
other details.

-- 
Mark
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Cameron Simpson

On 19Jan2017 16:04, Yury Selivanov  wrote:

This is a neat idea, but this will only work for parsing framed
binary protocols.  For example, if you protocol prefixes all packets
with a length field, you can write an efficient read buffer and
use your proposal to decode all of message's fields in one shot.
Which is good.

Not all protocols use framing though.  For instance, your proposal
won't help to write Thrift or Postgres protocols parsers.


Sure, but a lot of things fit the proposal. Seems a win: both simple and 
useful.



Overall, I'm not sure that this is worth the hassle.  With proposal:

  data, = struct.unpack('!H$', buf)
  buf = buf[2+len(data):]

with the current struct module:

  len, = struct.unpack('!H', buf)
  data = buf[2:2+len]
  buf = buf[2+len:]

Another thing: struct.calcsize won't work with structs that use
variable length fields.


True, but it would be enough for it to raise an exception of some kind. It 
won't break any in play code, and it will prevent accidents for users of new 
variable sizes formats.


We've all got things we wish struct might cover (I have a few, but strangely 
the top of the list is nonsemantic: I wish it let me put meaningless whitespace 
inside the format for readability).


+1 on the proposal from me.

Oh: subject to one proviso: reading a struct will need to return how many bytes 
of input data were scanned, not merely returning the decoded values.


Cheers,
Cameron Simpson 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Cameron Simpson

On 19Jan2017 12:08, Elizabeth Myers  wrote:

I also didn't mention that when you are unpacking iteratively (e.g., you
have multiple strings), the code becomes a bit more hairy:


test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
offset = 0
while offset < len(test_bytes):

... length = struct.unpack_from('!H', test_bytes, offset)[0]
... offset += 2
... string = struct.unpack_from('{}s'.format(length), test_bytes,
offset)[0]
... offset += length

It actually gets a lot worse when you have to unpack a set of strings in
a context-sensitive manner. You have to be sure to update the offset
constantly so you can always unpack strings appropriately. Yuck!


Whenever I'm doing iterative stuff like this, either variable length binary or 
lexical stuff, I always end up with a bunch of functions which can be called 
like this:


 datalen, offset = get_bs(chunk, offset=offset)

The notable thing here is just that they return the data and the new offset, 
which makes updating the offset impossible to forget, and also makes the 
calling code more succinct, like the internal call to get_bs() below:


such as this decoder for a length encoded field:

 def get_bsdata(chunk, offset=0):
   ''' Fetch a length-prefixed data chunk.
   Decodes an unsigned value from a bytes at the specified `offset`
   (default 0), and collects that many following bytes.
   Return those following bytes and the new offset.
   '''
   ##is_bytes(chunk)
   offset0 = offset
   datalen, offset = get_bs(chunk, offset=offset)
   data = chunk[offset:offset+datalen]
   ##is_bytes(data)
   if len(data) != datalen:
 raise ValueError("bsdata(chunk, offset=%d): insufficient data: expected %d 
 bytes, got %d bytes"

  % (offset0, datalen, len(data)))
   offset += datalen
   return data, offset

Cheers,
Cameron Simpson 
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Guido van Rossum
Nevertheless the C meaning *is* the etymology of the module name. :-)

--Guido (mobile)

On Jan 19, 2017 16:54, "Chris Angelico"  wrote:

> On Fri, Jan 20, 2017 at 11:38 AM, Steven D'Aprano 
> wrote:
> > On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote:
> >
> >> To be fair, the name "struct" implies a C-style structure, which
> >> _does_ have a fixed size, or at least fixed offsets for its members
> >
> >
> > Ah, the old "everyone thinks in C terms" fallacy raises its ugly head
> > agan :-)
> >
> > The name doesn't imply any such thing to me, or those who haven't been
> > raised on C. It implies the word "structure", which has no implication
> > of being fixed-width.
>
> Fair point. Objection retracted - and it was only minor anyway. This
> would be a handy feature to add. +1.
>
> ChrisA
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Chris Angelico
On Fri, Jan 20, 2017 at 11:38 AM, Steven D'Aprano  wrote:
> On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote:
>
>> To be fair, the name "struct" implies a C-style structure, which
>> _does_ have a fixed size, or at least fixed offsets for its members
>
>
> Ah, the old "everyone thinks in C terms" fallacy raises its ugly head
> agan :-)
>
> The name doesn't imply any such thing to me, or those who haven't been
> raised on C. It implies the word "structure", which has no implication
> of being fixed-width.

Fair point. Objection retracted - and it was only minor anyway. This
would be a handy feature to add. +1.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Steven D'Aprano
On Fri, Jan 20, 2017 at 05:16:28AM +1100, Chris Angelico wrote:

> To be fair, the name "struct" implies a C-style structure, which
> _does_ have a fixed size, or at least fixed offsets for its members


Ah, the old "everyone thinks in C terms" fallacy raises its ugly head 
agan :-)

The name doesn't imply any such thing to me, or those who haven't been 
raised on C. It implies the word "structure", which has no implication 
of being fixed-width.

The docs for the struct module describes it as:

struct — Interpret bytes as packed binary data

which applies equally to the fixed- and variable-width case. The fact 
that we can sensibly talk about "fixed-width" and "variable-width" 
structs without confusion, shows that the concept is bigger than the C 
data-type. (Even if the most common use will probably remain C-style 
fixed-width structs.)

Python is not C, and we shouldn't be limited by what C does. If we 
wanted C, we would use C.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Steven D'Aprano
On Thu, Jan 19, 2017 at 08:31:03AM +, Mark Dickinson wrote:
> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano  wrote:
> > [...] struct already supports
> > variable-width formats.
> 
> Unfortunately, that's not really true: the Pascal strings it supports
> are in some sense variable length, but are stored in a fixed-width
> field. The internals of the struct module rely on each field starting
> at a fixed offset, computable directly from the format string. I don't
> think variable-length fields would be a good fit for the current
> design of the struct module.

I know nothing and care even less (is caring a negative amount 
possible?) about the internal implementation of the struct module. Since 
Elizabeth is volunteering to do the work to make it work, will it be 
accepted?

Subject to the usual code quality reviews, contributor agreement, etc.

Are there objections to the *idea* of adding support for null terminated 
strings to the struct module? Does it require a PEP just to add one more 
format code? (Maybe it will, if the format code requires a complete 
re-write of the entire module.)

It seems to me that if Elizabeth is willing to do the work, and somebody 
to review it, this would be a welcome addition to the module. It would 
require at least one API change: struct.calcsize won't work for formats 
containing null-terminated strings. But that's a minor matter.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Ethan Furman

There is now an issue for this:

  http://bugs.python.org/issue29328

--
~Ethan~
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Yury Selivanov

This is a neat idea, but this will only work for parsing framed
binary protocols.  For example, if you protocol prefixes all packets
with a length field, you can write an efficient read buffer and
use your proposal to decode all of message's fields in one shot.
Which is good.

Not all protocols use framing though.  For instance, your proposal
won't help to write Thrift or Postgres protocols parsers.

Overall, I'm not sure that this is worth the hassle.  With proposal:

   data, = struct.unpack('!H$', buf)
   buf = buf[2+len(data):]

with the current struct module:

   len, = struct.unpack('!H', buf)
   data = buf[2:2+len]
   buf = buf[2+len:]

Another thing: struct.calcsize won't work with structs that use
variable length fields.

Yury


On 2017-01-18 5:24 AM, Elizabeth Myers wrote:

Hello,

I've noticed a lot of binary protocols require variable length
bytestrings (with or without a null terminator), but it is not easy to
unpack these in Python without first reading the desired length, or
reading bytes until a null terminator is reached.

I've noticed the netstruct library
(https://github.com/stendec/netstruct) has a format specifier, $, which
assumes the previous type to pack/unpack is the string's length. This is
an interesting idea in of itself, but doesn't handle the null-terminated
string chase. I know $ is similar to pascal strings, but sometimes you
need more than 255 characters :p.

For null-terminated strings, it may be simpler to have a specifier for
those. I propose 0, but this point can be bikeshedded over endlessly if
desired ;) (I thought about using n/N but they're :P).

It's worth noting that (maybe one of?) Perl's equivalent to the struct
module, whose name escapes me atm, has a module which can handle this
case. I can't remember if it handled variable length or zero-terminated
though; maybe it did both. Perl is more or less my 10th language. :p

This pain point is an annoyance imo and would greatly simplify a lot of
code if implemented, or something like it. I'd be happy to take a look
at implementing it if the idea is received sufficiently warmly.

--
Elizabeth
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread MRAB

On 2017-01-19 12:47, Elizabeth Myers wrote:

On 19/01/17 05:58, Rhodri James wrote:

On 19/01/17 08:31, Mark Dickinson wrote:

On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano 
wrote:

[...] struct already supports
variable-width formats.


Unfortunately, that's not really true: the Pascal strings it supports
are in some sense variable length, but are stored in a fixed-width
field. The internals of the struct module rely on each field starting
at a fixed offset, computable directly from the format string. I don't
think variable-length fields would be a good fit for the current
design of the struct module.

For the OPs use-case, I'd suggest a library that sits on top of the
struct module, rather than an expansion to the struct module itself.


Unfortunately as the OP explained, this makes the struct module a poor
fit for protocol decoding, even as a base layer for something.  It's one
of the things I use python for quite frequently, and I always end up
rolling my own and discarding struct entirely.



Yes, for variable-length fields the struct module is worse than useless:
it actually reduces clarity a little. Consider:


test_bytes = b'\x00\x00\x00\x0chello world!'


With this, you can do:


length = int.from_bytes(test_bytes[:4], 'big')
string = test_bytes[4:length]



Shouldn't that be:

string = test_bytes[4:4+length]


or you can do:


length = struct.unpack_from('!I', test_bytes)[0]
string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]


Which looks more readable without consulting the docs? ;)


Which is more likely to be correct? :-)


Building anything on top of the struct library like this would lead to
worse-looking code for minimal gains in efficiency. To quote Jamie
Zawinksi, it is like building a bookshelf out of mashed potatoes as it
stands.

If we had an extension similar to netstruct:


length, string = struct.unpack('!I$', test_bytes)


MUCH improved readability, and also less verbose. :)



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Nick Timkovich
Construct has radical API changes and should remain apart. It feels to me
like a straw-man to introduce a large library to the discussion as
justification for it being too-specialized.

This proposal to me seems much more modest: add another format character
(or two) to the existing set of a dozen or so that will be packed/unpacked
just like the others. It also has demonstrable use in various
formats/protocols.

On Thu, Jan 19, 2017 at 12:50 PM, Nathaniel Smith  wrote:

> I haven't had a chance to use it myself yet, but I've heard good things
> about
>
> https://construct.readthedocs.io/en/latest/
>
> It's certainly far more comprehensive than struct for this and other
> problems.
>
> As usual, there's some tension between adding stuff to the stdlib versus
> using more specialized third-party packages. The existence of packages like
> construct doesn't automatically mean that we should stop improving the
> stdlib, but OTOH not every useful thing can or should be in the stdlib.
>
> Personally, I find myself parsing uleb128-prefixed strings more often than
> u4-prefixed strings.
>
> On Jan 19, 2017 10:42 AM, "Nick Timkovich" 
> wrote:
>
>> ctypes.Structure is *literally* the interface to the C struct that as
>> Chris mentions has fixed offsets for all members. I don't think that should
>> (can?) be altered.
>>
>> In file formats (beyond net protocols) the string size + variable length
>> string motif comes up often and I am frequently re-implementing the
>> two-line read-an-int + read-{}.format-bytes.
>>
>> On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno > > wrote:
>>
>>> I am for upgrading struct to these, if possible.
>>>
>>> But besides my +1,  I am writting in to remember folks thatthere is
>>> another
>>> "struct" model in the stdlib:
>>>
>>> ctypes.Structure  -
>>>
>>> For reading a lot of records with the same structure it is much more
>>> handy than
>>> struct, since it gives one a suitable Python object on instantiation.
>>>
>>> However, it also can't handle variable lenght fields automatically.
>>>
>>> But maybe, the improvement could be made on that side, or another package
>>> altogether taht works more like it than current "struct".
>>>
>>>
>>>
>>> On 19 January 2017 at 16:08, Elizabeth Myers 
>>> wrote:
>>> > On 19/01/17 06:47, Elizabeth Myers wrote:
>>> >> On 19/01/17 05:58, Rhodri James wrote:
>>> >>> On 19/01/17 08:31, Mark Dickinson wrote:
>>>  On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano <
>>> st...@pearwood.info>
>>>  wrote:
>>> > [...] struct already supports
>>> > variable-width formats.
>>> 
>>>  Unfortunately, that's not really true: the Pascal strings it
>>> supports
>>>  are in some sense variable length, but are stored in a fixed-width
>>>  field. The internals of the struct module rely on each field
>>> starting
>>>  at a fixed offset, computable directly from the format string. I
>>> don't
>>>  think variable-length fields would be a good fit for the current
>>>  design of the struct module.
>>> 
>>>  For the OPs use-case, I'd suggest a library that sits on top of the
>>>  struct module, rather than an expansion to the struct module itself.
>>> >>>
>>> >>> Unfortunately as the OP explained, this makes the struct module a
>>> poor
>>> >>> fit for protocol decoding, even as a base layer for something.  It's
>>> one
>>> >>> of the things I use python for quite frequently, and I always end up
>>> >>> rolling my own and discarding struct entirely.
>>> >>>
>>> >>
>>> >> Yes, for variable-length fields the struct module is worse than
>>> useless:
>>> >> it actually reduces clarity a little. Consider:
>>> >>
>>> > test_bytes = b'\x00\x00\x00\x0chello world!'
>>> >>
>>> >> With this, you can do:
>>> >>
>>> > length = int.from_bytes(test_bytes[:4], 'big')
>>> > string = test_bytes[4:length]
>>> >>
>>> >> or you can do:
>>> >>
>>> > length = struct.unpack_from('!I', test_bytes)[0]
>>> > string = struct.unpack_from('{}s'.format(length), test_bytes,
>>> 4)[0]
>>> >>
>>> >> Which looks more readable without consulting the docs? ;)
>>> >>
>>> >> Building anything on top of the struct library like this would lead to
>>> >> worse-looking code for minimal gains in efficiency. To quote Jamie
>>> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it
>>> >> stands.
>>> >>
>>> >> If we had an extension similar to netstruct:
>>> >>
>>> > length, string = struct.unpack('!I$', test_bytes)
>>> >>
>>> >> MUCH improved readability, and also less verbose. :)
>>> >
>>> > I also didn't mention that when you are unpacking iteratively (e.g.,
>>> you
>>> > have multiple strings), the code becomes a bit more hairy:
>>> >
>>>  test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
>>>  offset = 0
>>>  while offset < len(test_bytes):
>>> > ... length = struct.unpack_from('!H', test_bytes, offset)[0]
>>> > ... offset += 2
>>> > ... string = struct.unpack_from('{}s'.format(length), test_

Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Nathaniel Smith
I haven't had a chance to use it myself yet, but I've heard good things
about

https://construct.readthedocs.io/en/latest/

It's certainly far more comprehensive than struct for this and other
problems.

As usual, there's some tension between adding stuff to the stdlib versus
using more specialized third-party packages. The existence of packages like
construct doesn't automatically mean that we should stop improving the
stdlib, but OTOH not every useful thing can or should be in the stdlib.

Personally, I find myself parsing uleb128-prefixed strings more often than
u4-prefixed strings.

On Jan 19, 2017 10:42 AM, "Nick Timkovich"  wrote:

> ctypes.Structure is *literally* the interface to the C struct that as
> Chris mentions has fixed offsets for all members. I don't think that should
> (can?) be altered.
>
> In file formats (beyond net protocols) the string size + variable length
> string motif comes up often and I am frequently re-implementing the
> two-line read-an-int + read-{}.format-bytes.
>
> On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno 
> wrote:
>
>> I am for upgrading struct to these, if possible.
>>
>> But besides my +1,  I am writting in to remember folks thatthere is
>> another
>> "struct" model in the stdlib:
>>
>> ctypes.Structure  -
>>
>> For reading a lot of records with the same structure it is much more
>> handy than
>> struct, since it gives one a suitable Python object on instantiation.
>>
>> However, it also can't handle variable lenght fields automatically.
>>
>> But maybe, the improvement could be made on that side, or another package
>> altogether taht works more like it than current "struct".
>>
>>
>>
>> On 19 January 2017 at 16:08, Elizabeth Myers 
>> wrote:
>> > On 19/01/17 06:47, Elizabeth Myers wrote:
>> >> On 19/01/17 05:58, Rhodri James wrote:
>> >>> On 19/01/17 08:31, Mark Dickinson wrote:
>>  On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano <
>> st...@pearwood.info>
>>  wrote:
>> > [...] struct already supports
>> > variable-width formats.
>> 
>>  Unfortunately, that's not really true: the Pascal strings it supports
>>  are in some sense variable length, but are stored in a fixed-width
>>  field. The internals of the struct module rely on each field starting
>>  at a fixed offset, computable directly from the format string. I
>> don't
>>  think variable-length fields would be a good fit for the current
>>  design of the struct module.
>> 
>>  For the OPs use-case, I'd suggest a library that sits on top of the
>>  struct module, rather than an expansion to the struct module itself.
>> >>>
>> >>> Unfortunately as the OP explained, this makes the struct module a poor
>> >>> fit for protocol decoding, even as a base layer for something.  It's
>> one
>> >>> of the things I use python for quite frequently, and I always end up
>> >>> rolling my own and discarding struct entirely.
>> >>>
>> >>
>> >> Yes, for variable-length fields the struct module is worse than
>> useless:
>> >> it actually reduces clarity a little. Consider:
>> >>
>> > test_bytes = b'\x00\x00\x00\x0chello world!'
>> >>
>> >> With this, you can do:
>> >>
>> > length = int.from_bytes(test_bytes[:4], 'big')
>> > string = test_bytes[4:length]
>> >>
>> >> or you can do:
>> >>
>> > length = struct.unpack_from('!I', test_bytes)[0]
>> > string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]
>> >>
>> >> Which looks more readable without consulting the docs? ;)
>> >>
>> >> Building anything on top of the struct library like this would lead to
>> >> worse-looking code for minimal gains in efficiency. To quote Jamie
>> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it
>> >> stands.
>> >>
>> >> If we had an extension similar to netstruct:
>> >>
>> > length, string = struct.unpack('!I$', test_bytes)
>> >>
>> >> MUCH improved readability, and also less verbose. :)
>> >
>> > I also didn't mention that when you are unpacking iteratively (e.g., you
>> > have multiple strings), the code becomes a bit more hairy:
>> >
>>  test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
>>  offset = 0
>>  while offset < len(test_bytes):
>> > ... length = struct.unpack_from('!H', test_bytes, offset)[0]
>> > ... offset += 2
>> > ... string = struct.unpack_from('{}s'.format(length), test_bytes,
>> > offset)[0]
>> > ... offset += length
>> >
>> > It actually gets a lot worse when you have to unpack a set of strings in
>> > a context-sensitive manner. You have to be sure to update the offset
>> > constantly so you can always unpack strings appropriately. Yuck!
>> >
>> > It's worth mentioning that a few years ago, a coworker and I found
>> > ourselves needing variable length strings in the context of a binary
>> > protocol (DHCP), and wound up abandoning the struct module entirely
>> > because it was unsuitable. My co-worker said the same thing I did: "it's
>> > like building a bookshelf out of mash

Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Nick Timkovich
ctypes.Structure is *literally* the interface to the C struct that as Chris
mentions has fixed offsets for all members. I don't think that should
(can?) be altered.

In file formats (beyond net protocols) the string size + variable length
string motif comes up often and I am frequently re-implementing the
two-line read-an-int + read-{}.format-bytes.

On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno 
wrote:

> I am for upgrading struct to these, if possible.
>
> But besides my +1,  I am writting in to remember folks thatthere is another
> "struct" model in the stdlib:
>
> ctypes.Structure  -
>
> For reading a lot of records with the same structure it is much more handy
> than
> struct, since it gives one a suitable Python object on instantiation.
>
> However, it also can't handle variable lenght fields automatically.
>
> But maybe, the improvement could be made on that side, or another package
> altogether taht works more like it than current "struct".
>
>
>
> On 19 January 2017 at 16:08, Elizabeth Myers 
> wrote:
> > On 19/01/17 06:47, Elizabeth Myers wrote:
> >> On 19/01/17 05:58, Rhodri James wrote:
> >>> On 19/01/17 08:31, Mark Dickinson wrote:
>  On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano  >
>  wrote:
> > [...] struct already supports
> > variable-width formats.
> 
>  Unfortunately, that's not really true: the Pascal strings it supports
>  are in some sense variable length, but are stored in a fixed-width
>  field. The internals of the struct module rely on each field starting
>  at a fixed offset, computable directly from the format string. I don't
>  think variable-length fields would be a good fit for the current
>  design of the struct module.
> 
>  For the OPs use-case, I'd suggest a library that sits on top of the
>  struct module, rather than an expansion to the struct module itself.
> >>>
> >>> Unfortunately as the OP explained, this makes the struct module a poor
> >>> fit for protocol decoding, even as a base layer for something.  It's
> one
> >>> of the things I use python for quite frequently, and I always end up
> >>> rolling my own and discarding struct entirely.
> >>>
> >>
> >> Yes, for variable-length fields the struct module is worse than useless:
> >> it actually reduces clarity a little. Consider:
> >>
> > test_bytes = b'\x00\x00\x00\x0chello world!'
> >>
> >> With this, you can do:
> >>
> > length = int.from_bytes(test_bytes[:4], 'big')
> > string = test_bytes[4:length]
> >>
> >> or you can do:
> >>
> > length = struct.unpack_from('!I', test_bytes)[0]
> > string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]
> >>
> >> Which looks more readable without consulting the docs? ;)
> >>
> >> Building anything on top of the struct library like this would lead to
> >> worse-looking code for minimal gains in efficiency. To quote Jamie
> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it
> >> stands.
> >>
> >> If we had an extension similar to netstruct:
> >>
> > length, string = struct.unpack('!I$', test_bytes)
> >>
> >> MUCH improved readability, and also less verbose. :)
> >
> > I also didn't mention that when you are unpacking iteratively (e.g., you
> > have multiple strings), the code becomes a bit more hairy:
> >
>  test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
>  offset = 0
>  while offset < len(test_bytes):
> > ... length = struct.unpack_from('!H', test_bytes, offset)[0]
> > ... offset += 2
> > ... string = struct.unpack_from('{}s'.format(length), test_bytes,
> > offset)[0]
> > ... offset += length
> >
> > It actually gets a lot worse when you have to unpack a set of strings in
> > a context-sensitive manner. You have to be sure to update the offset
> > constantly so you can always unpack strings appropriately. Yuck!
> >
> > It's worth mentioning that a few years ago, a coworker and I found
> > ourselves needing variable length strings in the context of a binary
> > protocol (DHCP), and wound up abandoning the struct module entirely
> > because it was unsuitable. My co-worker said the same thing I did: "it's
> > like building a bookshelf out of mashed potatoes."
> >
> > I do understand it might require a possible major rewrite or major
> > changes the struct module, but in the long run, I think it's worth it
> > (especially because the struct module is not all that big in scope). As
> > it stands, the struct module simply is not suited for protocols where
> > you have variable-length strings, and in my experience, that is the vast
> > majority of modern binary protocols on the Internet.
> >
> > --
> > Elizabeth
> > ___
> > Python-ideas mailing list
> > Python-ideas@python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> ___
> Python-ideas mailing list
> Python-ideas@

Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Joao S. O. Bueno
I am for upgrading struct to these, if possible.

But besides my +1,  I am writting in to remember folks thatthere is another
"struct" model in the stdlib:

ctypes.Structure  -

For reading a lot of records with the same structure it is much more handy than
struct, since it gives one a suitable Python object on instantiation.

However, it also can't handle variable lenght fields automatically.

But maybe, the improvement could be made on that side, or another package
altogether taht works more like it than current "struct".



On 19 January 2017 at 16:08, Elizabeth Myers  wrote:
> On 19/01/17 06:47, Elizabeth Myers wrote:
>> On 19/01/17 05:58, Rhodri James wrote:
>>> On 19/01/17 08:31, Mark Dickinson wrote:
 On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano 
 wrote:
> [...] struct already supports
> variable-width formats.

 Unfortunately, that's not really true: the Pascal strings it supports
 are in some sense variable length, but are stored in a fixed-width
 field. The internals of the struct module rely on each field starting
 at a fixed offset, computable directly from the format string. I don't
 think variable-length fields would be a good fit for the current
 design of the struct module.

 For the OPs use-case, I'd suggest a library that sits on top of the
 struct module, rather than an expansion to the struct module itself.
>>>
>>> Unfortunately as the OP explained, this makes the struct module a poor
>>> fit for protocol decoding, even as a base layer for something.  It's one
>>> of the things I use python for quite frequently, and I always end up
>>> rolling my own and discarding struct entirely.
>>>
>>
>> Yes, for variable-length fields the struct module is worse than useless:
>> it actually reduces clarity a little. Consider:
>>
> test_bytes = b'\x00\x00\x00\x0chello world!'
>>
>> With this, you can do:
>>
> length = int.from_bytes(test_bytes[:4], 'big')
> string = test_bytes[4:length]
>>
>> or you can do:
>>
> length = struct.unpack_from('!I', test_bytes)[0]
> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]
>>
>> Which looks more readable without consulting the docs? ;)
>>
>> Building anything on top of the struct library like this would lead to
>> worse-looking code for minimal gains in efficiency. To quote Jamie
>> Zawinksi, it is like building a bookshelf out of mashed potatoes as it
>> stands.
>>
>> If we had an extension similar to netstruct:
>>
> length, string = struct.unpack('!I$', test_bytes)
>>
>> MUCH improved readability, and also less verbose. :)
>
> I also didn't mention that when you are unpacking iteratively (e.g., you
> have multiple strings), the code becomes a bit more hairy:
>
 test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
 offset = 0
 while offset < len(test_bytes):
> ... length = struct.unpack_from('!H', test_bytes, offset)[0]
> ... offset += 2
> ... string = struct.unpack_from('{}s'.format(length), test_bytes,
> offset)[0]
> ... offset += length
>
> It actually gets a lot worse when you have to unpack a set of strings in
> a context-sensitive manner. You have to be sure to update the offset
> constantly so you can always unpack strings appropriately. Yuck!
>
> It's worth mentioning that a few years ago, a coworker and I found
> ourselves needing variable length strings in the context of a binary
> protocol (DHCP), and wound up abandoning the struct module entirely
> because it was unsuitable. My co-worker said the same thing I did: "it's
> like building a bookshelf out of mashed potatoes."
>
> I do understand it might require a possible major rewrite or major
> changes the struct module, but in the long run, I think it's worth it
> (especially because the struct module is not all that big in scope). As
> it stands, the struct module simply is not suited for protocols where
> you have variable-length strings, and in my experience, that is the vast
> majority of modern binary protocols on the Internet.
>
> --
> Elizabeth
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Chris Angelico
On Fri, Jan 20, 2017 at 5:08 AM, Elizabeth Myers
 wrote:
> I do understand it might require a possible major rewrite or major
> changes the struct module, but in the long run, I think it's worth it
> (especially because the struct module is not all that big in scope). As
> it stands, the struct module simply is not suited for protocols where
> you have variable-length strings, and in my experience, that is the vast
> majority of modern binary protocols on the Internet.
>

To be fair, the name "struct" implies a C-style structure, which
_does_ have a fixed size, or at least fixed offsets for its members
(the last member can be variable-sized). A quick search of PyPI shows
up a struct-variant specifically designed for network protocols:

https://pypi.python.org/pypi/netstruct/1.1.2

It even uses the dollar sign as you describe. So perhaps what you're
looking for is this module coming into the stdlib?

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Elizabeth Myers
On 19/01/17 06:47, Elizabeth Myers wrote:
> On 19/01/17 05:58, Rhodri James wrote:
>> On 19/01/17 08:31, Mark Dickinson wrote:
>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano 
>>> wrote:
 [...] struct already supports
 variable-width formats.
>>>
>>> Unfortunately, that's not really true: the Pascal strings it supports
>>> are in some sense variable length, but are stored in a fixed-width
>>> field. The internals of the struct module rely on each field starting
>>> at a fixed offset, computable directly from the format string. I don't
>>> think variable-length fields would be a good fit for the current
>>> design of the struct module.
>>>
>>> For the OPs use-case, I'd suggest a library that sits on top of the
>>> struct module, rather than an expansion to the struct module itself.
>>
>> Unfortunately as the OP explained, this makes the struct module a poor
>> fit for protocol decoding, even as a base layer for something.  It's one
>> of the things I use python for quite frequently, and I always end up
>> rolling my own and discarding struct entirely.
>>
> 
> Yes, for variable-length fields the struct module is worse than useless:
> it actually reduces clarity a little. Consider:
> 
 test_bytes = b'\x00\x00\x00\x0chello world!'
> 
> With this, you can do:
> 
 length = int.from_bytes(test_bytes[:4], 'big')
 string = test_bytes[4:length]
> 
> or you can do:
> 
 length = struct.unpack_from('!I', test_bytes)[0]
 string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]
> 
> Which looks more readable without consulting the docs? ;)
> 
> Building anything on top of the struct library like this would lead to
> worse-looking code for minimal gains in efficiency. To quote Jamie
> Zawinksi, it is like building a bookshelf out of mashed potatoes as it
> stands.
> 
> If we had an extension similar to netstruct:
> 
 length, string = struct.unpack('!I$', test_bytes)
> 
> MUCH improved readability, and also less verbose. :)

I also didn't mention that when you are unpacking iteratively (e.g., you
have multiple strings), the code becomes a bit more hairy:

>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
>>> offset = 0
>>> while offset < len(test_bytes):
... length = struct.unpack_from('!H', test_bytes, offset)[0]
... offset += 2
... string = struct.unpack_from('{}s'.format(length), test_bytes,
offset)[0]
... offset += length

It actually gets a lot worse when you have to unpack a set of strings in
a context-sensitive manner. You have to be sure to update the offset
constantly so you can always unpack strings appropriately. Yuck!

It's worth mentioning that a few years ago, a coworker and I found
ourselves needing variable length strings in the context of a binary
protocol (DHCP), and wound up abandoning the struct module entirely
because it was unsuitable. My co-worker said the same thing I did: "it's
like building a bookshelf out of mashed potatoes."

I do understand it might require a possible major rewrite or major
changes the struct module, but in the long run, I think it's worth it
(especially because the struct module is not all that big in scope). As
it stands, the struct module simply is not suited for protocols where
you have variable-length strings, and in my experience, that is the vast
majority of modern binary protocols on the Internet.

--
Elizabeth
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Elizabeth Myers
On 19/01/17 05:58, Rhodri James wrote:
> On 19/01/17 08:31, Mark Dickinson wrote:
>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano 
>> wrote:
>>> [...] struct already supports
>>> variable-width formats.
>>
>> Unfortunately, that's not really true: the Pascal strings it supports
>> are in some sense variable length, but are stored in a fixed-width
>> field. The internals of the struct module rely on each field starting
>> at a fixed offset, computable directly from the format string. I don't
>> think variable-length fields would be a good fit for the current
>> design of the struct module.
>>
>> For the OPs use-case, I'd suggest a library that sits on top of the
>> struct module, rather than an expansion to the struct module itself.
> 
> Unfortunately as the OP explained, this makes the struct module a poor
> fit for protocol decoding, even as a base layer for something.  It's one
> of the things I use python for quite frequently, and I always end up
> rolling my own and discarding struct entirely.
> 

Yes, for variable-length fields the struct module is worse than useless:
it actually reduces clarity a little. Consider:

>>> test_bytes = b'\x00\x00\x00\x0chello world!'

With this, you can do:

>>> length = int.from_bytes(test_bytes[:4], 'big')
>>> string = test_bytes[4:length]

or you can do:

>>> length = struct.unpack_from('!I', test_bytes)[0]
>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]

Which looks more readable without consulting the docs? ;)

Building anything on top of the struct library like this would lead to
worse-looking code for minimal gains in efficiency. To quote Jamie
Zawinksi, it is like building a bookshelf out of mashed potatoes as it
stands.

If we had an extension similar to netstruct:

>>> length, string = struct.unpack('!I$', test_bytes)

MUCH improved readability, and also less verbose. :)
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Rhodri James

On 19/01/17 08:31, Mark Dickinson wrote:

On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano  wrote:

[...] struct already supports
variable-width formats.


Unfortunately, that's not really true: the Pascal strings it supports
are in some sense variable length, but are stored in a fixed-width
field. The internals of the struct module rely on each field starting
at a fixed offset, computable directly from the format string. I don't
think variable-length fields would be a good fit for the current
design of the struct module.

For the OPs use-case, I'd suggest a library that sits on top of the
struct module, rather than an expansion to the struct module itself.


Unfortunately as the OP explained, this makes the struct module a poor 
fit for protocol decoding, even as a base layer for something.  It's one 
of the things I use python for quite frequently, and I always end up 
rolling my own and discarding struct entirely.


--
Rhodri James *-* Kynesim Ltd
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] Ideas for improving the struct module

2017-01-19 Thread Mark Dickinson
On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano  wrote:
> [...] struct already supports
> variable-width formats.

Unfortunately, that's not really true: the Pascal strings it supports
are in some sense variable length, but are stored in a fixed-width
field. The internals of the struct module rely on each field starting
at a fixed offset, computable directly from the format string. I don't
think variable-length fields would be a good fit for the current
design of the struct module.

For the OPs use-case, I'd suggest a library that sits on top of the
struct module, rather than an expansion to the struct module itself.

-- 
Mark
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/