[netsniff-ng] Re: [PATCH 05/15] trafgen: proto: Increment proto field at runtime

2016-08-09 Thread Vadim Kochan
On Tue, Aug 9, 2016 at 9:57 AM, Tobias Klauser  wrote:
> On 2016-08-08 at 21:04:20 +0200, Vadim Kochan  wrote:
>> On Sat, Aug 06, 2016 at 01:36:26AM +0300, Vadim Kochan wrote:
>> > On Wed, Aug 03, 2016 at 04:27:14PM +0200, Tobias Klauser wrote:
>> > > On 2016-07-26 at 21:35:10 +0200, Vadim Kochan  wrote:
>> > > > Extended 'struct packet_dyn' with proto fields which has
>> > > > dynamically changing values at runtime.
>> > > >
>> > > > Implement incrementing of proto field at runtime with min & max
>> > > > parameters, by default if the 'min' parameter is not specified
>> > > > then original value is used. For fields which len is greater
>> > > > than 4 - last 4 bytes are incremented as unsigned int value.
>> > > >
>> > > > Added 'field_changed' callback for proto header which
>> > > > may be used for check if csum updating is needed. This callback
>> > > > is called after field was changed at runtime.
>> > > >
>> > > > Added 'packet_update' callback to let proto header know
>> > > > when to apply final proto header changes at runtime (csum update).
>> > >
>> > > The documentation of these callbacks would also make sense where they're
>> > > defined.
>> > >
>> > > > Signed-off-by: Vadim Kochan 
>> > > > ---
>> > > >  trafgen.c   |  9 ++
>> > > >  trafgen_conf.h  |  7 
>> > > >  trafgen_proto.c | 99 
>> > > > +
>> > > >  trafgen_proto.h | 26 +++
>> > > >  4 files changed, 141 insertions(+)
>> > > >
>> > > > diff --git a/trafgen.c b/trafgen.c
>> > > > index b76b5d7..553dfa5 100644
>> > > > --- a/trafgen.c
>> > > > +++ b/trafgen.c
>> > > >  void proto_packet_finish(void)
>> > > >  {
>> > > > struct proto_hdr **headers = _packet()->headers[0];
>> > > > @@ -433,3 +446,89 @@ void proto_packet_finish(void)
>> > > > p->packet_finish(p);
>> > > > }
>> > > >  }
>> > > > +
>> > > > +static inline unsigned int field_inc(struct proto_field *field)
>> > > > +{
>> > > > +   uint32_t val;
>> > > > +
>> > > > +   val = field->func.val - field->func.min;
>> > > > +   val = (val + field->func.inc) % field->func.max;
>> > >
>> > > Shouldn't this be
>> > >
>> > >   val = (val + field->func.inc) % (field->func.max - field->func.min + 1)
>> > >
>> > > to be consistent with apply_counter()?
>>
>> I simplified it and now it really works well:
>>
>>   #define max_int32(a, b)
>>  \
>>   ({ 
>>  \
>>   int32_t _a = (int32_t) (a);
>>  \
>>   int32_t _b = (int32_t) (b);
>>  \
>>   _a - ((_a - _b) & ((_a - _b) >> (sizeof(int32_t) * 8 - 
>> 1)));\
>
> The line above definitely needs an explanatory comment. How and why does
> this work? Why is it implemented like this instead of the "obvious"
> max() solution?

I was searching for fast max() alg, but I don't have understanding of
it, I just tested it,
I will get rid of it and use max(x).

>
>>   })
>>
>>   static inline unsigned int field_inc(struct proto_field *field)
>>   {
>>  uint32_t min = field->func.min;
>>  uint32_t max = field->func.max;
>>  uint32_t val = field->func.val;
>>  uint32_t inc = field->func.inc;
>>  uint32_t next;
>>
>>  next = (val + inc) % (max + 1);
>>  field->func.val = max_int32(next, min);
>>
>>  return val;
>>   }
>>
>> so max_int32(a,b) should be fast enough w/o branching.
>
> Have you profiled this against other possible implementations,
> especially the max_int32() implementation above? If your profiling doesn't
> show significant improvements of this max_int32() variant I'd rather
> stay with the "obvious" one, i.e. a > b ? a : b

No, I did not, I just thought that branching will slow execution anyway ...

>
>> > Sure, I tried this approach while implementing 1st version but when I
>> > used the following case:
>> >
>> > trafgen/trafgen -o lo -n 10 --cpu 1 '{ eth(type=0x800), fill(0xff, 
>> > 10), dinc(5, 20, 5) }'
>> >
>> > then interval between 5 & 20 changes very differently. But in my version
>> > it repeats from 5 till 20 (yes here is a little difference that initial
>> > value is incremented immideately). Also semantic of proto dinc is
>> > dinc(step, min, max), and I will change it to looks like low-level one -
>> > dinc(min, max, step).
>> >
>> > >
>> > > Also, I think you should probably get rid of as many pointer
>> > > dereferences as possible in these runtime functions, i.e. store max and
>> > > min in temporary variables.
>> >
>> > OK, makes sense.
>> >
>> > >
>> > > > +   field->func.val = val + field->func.min;
>> > > > +
>> > > > +   

[netsniff-ng] Re: [PATCH 05/15] trafgen: proto: Increment proto field at runtime

2016-08-09 Thread Tobias Klauser
On 2016-08-08 at 21:04:20 +0200, Vadim Kochan  wrote:
> On Sat, Aug 06, 2016 at 01:36:26AM +0300, Vadim Kochan wrote:
> > On Wed, Aug 03, 2016 at 04:27:14PM +0200, Tobias Klauser wrote:
> > > On 2016-07-26 at 21:35:10 +0200, Vadim Kochan  wrote:
> > > > Extended 'struct packet_dyn' with proto fields which has
> > > > dynamically changing values at runtime.
> > > > 
> > > > Implement incrementing of proto field at runtime with min & max
> > > > parameters, by default if the 'min' parameter is not specified
> > > > then original value is used. For fields which len is greater
> > > > than 4 - last 4 bytes are incremented as unsigned int value.
> > > > 
> > > > Added 'field_changed' callback for proto header which
> > > > may be used for check if csum updating is needed. This callback
> > > > is called after field was changed at runtime.
> > > > 
> > > > Added 'packet_update' callback to let proto header know
> > > > when to apply final proto header changes at runtime (csum update).
> > > 
> > > The documentation of these callbacks would also make sense where they're
> > > defined.
> > > 
> > > > Signed-off-by: Vadim Kochan 
> > > > ---
> > > >  trafgen.c   |  9 ++
> > > >  trafgen_conf.h  |  7 
> > > >  trafgen_proto.c | 99 
> > > > +
> > > >  trafgen_proto.h | 26 +++
> > > >  4 files changed, 141 insertions(+)
> > > > 
> > > > diff --git a/trafgen.c b/trafgen.c
> > > > index b76b5d7..553dfa5 100644
> > > > --- a/trafgen.c
> > > > +++ b/trafgen.c
> > > >  void proto_packet_finish(void)
> > > >  {
> > > > struct proto_hdr **headers = _packet()->headers[0];
> > > > @@ -433,3 +446,89 @@ void proto_packet_finish(void)
> > > > p->packet_finish(p);
> > > > }
> > > >  }
> > > > +
> > > > +static inline unsigned int field_inc(struct proto_field *field)
> > > > +{
> > > > +   uint32_t val;
> > > > +
> > > > +   val = field->func.val - field->func.min;
> > > > +   val = (val + field->func.inc) % field->func.max;
> > > 
> > > Shouldn't this be
> > > 
> > >   val = (val + field->func.inc) % (field->func.max - field->func.min + 1)
> > > 
> > > to be consistent with apply_counter()?
> 
> I simplified it and now it really works well:
> 
>   #define max_int32(a, b) 
> \
>   ({  
> \
>   int32_t _a = (int32_t) (a); 
> \
>   int32_t _b = (int32_t) (b); 
> \
>   _a - ((_a - _b) & ((_a - _b) >> (sizeof(int32_t) * 8 - 
> 1)));\

The line above definitely needs an explanatory comment. How and why does
this work? Why is it implemented like this instead of the "obvious"
max() solution?

>   })
> 
>   static inline unsigned int field_inc(struct proto_field *field)
>   {
>  uint32_t min = field->func.min;
>  uint32_t max = field->func.max;
>  uint32_t val = field->func.val;
>  uint32_t inc = field->func.inc;
>  uint32_t next;
> 
>  next = (val + inc) % (max + 1);
>  field->func.val = max_int32(next, min);
> 
>  return val;
>   }
> 
> so max_int32(a,b) should be fast enough w/o branching.

Have you profiled this against other possible implementations,
especially the max_int32() implementation above? If your profiling doesn't
show significant improvements of this max_int32() variant I'd rather
stay with the "obvious" one, i.e. a > b ? a : b

> > Sure, I tried this approach while implementing 1st version but when I
> > used the following case:
> > 
> > trafgen/trafgen -o lo -n 10 --cpu 1 '{ eth(type=0x800), fill(0xff, 10), 
> > dinc(5, 20, 5) }'
> > 
> > then interval between 5 & 20 changes very differently. But in my version
> > it repeats from 5 till 20 (yes here is a little difference that initial
> > value is incremented immideately). Also semantic of proto dinc is
> > dinc(step, min, max), and I will change it to looks like low-level one -
> > dinc(min, max, step).
> > 
> > > 
> > > Also, I think you should probably get rid of as many pointer
> > > dereferences as possible in these runtime functions, i.e. store max and
> > > min in temporary variables.
> > 
> > OK, makes sense.
> > 
> > > 
> > > > +   field->func.val = val + field->func.min;
> > > > +
> > > > +   return field->func.val;
> > > > +}
> > > > +
> > > > +static void field_inc_func(struct proto_field *field)
> > > > +{
> > > > +   if (field->len == 1) {
> > > > +   uint8_t val;
> > > > +
> > > > +   val = field_inc(field);
> > > > +   proto_field_set_u8(field->hdr, field->id, val);
> > > 
> > > Assignment on declaration please. Or even better:
> 

[netsniff-ng] Re: [PATCH 05/15] trafgen: proto: Increment proto field at runtime

2016-08-08 Thread Vadim Kochan
On Sat, Aug 06, 2016 at 01:36:26AM +0300, Vadim Kochan wrote:
> On Wed, Aug 03, 2016 at 04:27:14PM +0200, Tobias Klauser wrote:
> > On 2016-07-26 at 21:35:10 +0200, Vadim Kochan  wrote:
> > > Extended 'struct packet_dyn' with proto fields which has
> > > dynamically changing values at runtime.
> > > 
> > > Implement incrementing of proto field at runtime with min & max
> > > parameters, by default if the 'min' parameter is not specified
> > > then original value is used. For fields which len is greater
> > > than 4 - last 4 bytes are incremented as unsigned int value.
> > > 
> > > Added 'field_changed' callback for proto header which
> > > may be used for check if csum updating is needed. This callback
> > > is called after field was changed at runtime.
> > > 
> > > Added 'packet_update' callback to let proto header know
> > > when to apply final proto header changes at runtime (csum update).
> > 
> > The documentation of these callbacks would also make sense where they're
> > defined.
> > 
> > > Signed-off-by: Vadim Kochan 
> > > ---
> > >  trafgen.c   |  9 ++
> > >  trafgen_conf.h  |  7 
> > >  trafgen_proto.c | 99 
> > > +
> > >  trafgen_proto.h | 26 +++
> > >  4 files changed, 141 insertions(+)
> > > 
> > > diff --git a/trafgen.c b/trafgen.c
> > > index b76b5d7..553dfa5 100644
> > > --- a/trafgen.c
> > > +++ b/trafgen.c
> > >  void proto_packet_finish(void)
> > >  {
> > >   struct proto_hdr **headers = _packet()->headers[0];
> > > @@ -433,3 +446,89 @@ void proto_packet_finish(void)
> > >   p->packet_finish(p);
> > >   }
> > >  }
> > > +
> > > +static inline unsigned int field_inc(struct proto_field *field)
> > > +{
> > > + uint32_t val;
> > > +
> > > + val = field->func.val - field->func.min;
> > > + val = (val + field->func.inc) % field->func.max;
> > 
> > Shouldn't this be
> > 
> > val = (val + field->func.inc) % (field->func.max - field->func.min + 1)
> > 
> > to be consistent with apply_counter()?

I simplified it and now it really works well:

#define max_int32(a, b) 
\
({  
\
int32_t _a = (int32_t) (a); 
\
int32_t _b = (int32_t) (b); 
\
_a - ((_a - _b) & ((_a - _b) >> (sizeof(int32_t) * 8 - 
1)));\
})

static inline unsigned int field_inc(struct proto_field *field)
{
   uint32_t min = field->func.min;
   uint32_t max = field->func.max;
   uint32_t val = field->func.val;
   uint32_t inc = field->func.inc;
   uint32_t next;

   next = (val + inc) % (max + 1);
   field->func.val = max_int32(next, min);

   return val;
}

so max_int32(a,b) should be fast enough w/o branching.

> 
> Sure, I tried this approach while implementing 1st version but when I
> used the following case:
> 
>   trafgen/trafgen -o lo -n 10 --cpu 1 '{ eth(type=0x800), fill(0xff, 10), 
> dinc(5, 20, 5) }'
> 
> then interval between 5 & 20 changes very differently. But in my version
> it repeats from 5 till 20 (yes here is a little difference that initial
> value is incremented immideately). Also semantic of proto dinc is
> dinc(step, min, max), and I will change it to looks like low-level one -
> dinc(min, max, step).
> 
> > 
> > Also, I think you should probably get rid of as many pointer
> > dereferences as possible in these runtime functions, i.e. store max and
> > min in temporary variables.
> 
> OK, makes sense.
> 
> > 
> > > + field->func.val = val + field->func.min;
> > > +
> > > + return field->func.val;
> > > +}
> > > +
> > > +static void field_inc_func(struct proto_field *field)
> > > +{
> > > + if (field->len == 1) {
> > > + uint8_t val;
> > > +
> > > + val = field_inc(field);
> > > + proto_field_set_u8(field->hdr, field->id, val);
> > 
> > Assignment on declaration please. Or even better:
> > 
> > proto_field_set_u8(field->hdr, field->id, field_inc(field));
> 
> OK
> 
> > 
> > > + } else if (field->len == 2) {
> > > + uint16_t val;
> > > +
> > > + val = field_inc(field);
> > > + proto_field_set_be16(field->hdr, field->id, val);
> > 
> > Same.
> OK
> 
> > 
> > > + } else if (field->len == 4) {
> > > + uint32_t val;
> > > +
> > > + val = field_inc(field);
> > > + proto_field_set_be32(field->hdr, field->id, val);
> > 
> > Same.
> OK
> 
> > 
> > > + } else if (field->len > 4) {
> > > + uint8_t *bytes = __proto_field_get_bytes(field);
> > > + uint32_t val;
> > > +
> > > + bytes += field->len - 4;
> > > + val = field_inc(field);
> > > +
> > > +

[netsniff-ng] Re: [PATCH 05/15] trafgen: proto: Increment proto field at runtime

2016-08-03 Thread Tobias Klauser
On 2016-07-26 at 21:35:10 +0200, Vadim Kochan  wrote:
> Extended 'struct packet_dyn' with proto fields which has
> dynamically changing values at runtime.
> 
> Implement incrementing of proto field at runtime with min & max
> parameters, by default if the 'min' parameter is not specified
> then original value is used. For fields which len is greater
> than 4 - last 4 bytes are incremented as unsigned int value.
> 
> Added 'field_changed' callback for proto header which
> may be used for check if csum updating is needed. This callback
> is called after field was changed at runtime.
> 
> Added 'packet_update' callback to let proto header know
> when to apply final proto header changes at runtime (csum update).

The documentation of these callbacks would also make sense where they're
defined.

> Signed-off-by: Vadim Kochan 
> ---
>  trafgen.c   |  9 ++
>  trafgen_conf.h  |  7 
>  trafgen_proto.c | 99 
> +
>  trafgen_proto.h | 26 +++
>  4 files changed, 141 insertions(+)
> 
> diff --git a/trafgen.c b/trafgen.c
> index b76b5d7..553dfa5 100644
> --- a/trafgen.c
> +++ b/trafgen.c
> @@ -619,6 +619,15 @@ static inline void packet_apply_dyn_elements(int idx)
>   apply_randomizer(idx);
>   apply_csum16(idx);
>   }
> +
> + if (packet_dyn_has_fields(_dyn[idx])) {
> + uint32_t i;
> +
> + for (i = 0; i < packet_dyn[idx].flen; i++)
> + proto_field_dyn_apply(packet_dyn[idx].fields[i]);
> +
> + proto_packet_update(idx);
> + }
>  }
>  
>  static void xmit_slowpath_or_die(struct ctx *ctx, unsigned int cpu, unsigned 
> long orig_num)
> diff --git a/trafgen_conf.h b/trafgen_conf.h
> index 934f8fe..7f3616c 100644
> --- a/trafgen_conf.h
> +++ b/trafgen_conf.h
> @@ -49,6 +49,8 @@ struct packet_dyn {
>   size_t rlen;
>   struct csum16 *csum;
>   size_t slen;
> + struct proto_field **fields;
> + uint32_t flen;
>  };
>  
>  static inline bool packet_dyn_has_elems(struct packet_dyn *p)
> @@ -61,6 +63,11 @@ static inline bool packet_dyn_has_only_csums(struct 
> packet_dyn *p)
>   return (p->clen == 0 && p->rlen == 0 && p->slen);
>  }
>  
> +static inline bool packet_dyn_has_fields(struct packet_dyn *p)
> +{
> + return !!p->flen;
> +}
> +
>  extern void compile_packets_str(char *str, bool verbose, unsigned int cpu);
>  extern void compile_packets(char *file, bool verbose, unsigned int cpu,
>   bool invoke_cpp, char *const cpp_argv[]);
> diff --git a/trafgen_proto.c b/trafgen_proto.c
> index ce389ce..069aa00 100644
> --- a/trafgen_proto.c
> +++ b/trafgen_proto.c
> @@ -419,6 +419,19 @@ void protos_init(const char *dev)
>   p->ctx = 
>  }
>  
> +void proto_packet_update(uint32_t idx)
> +{
> + struct packet *pkt = packet_get(idx);
> + ssize_t i;
> +
> + for (i = pkt->headers_count - 1; i >= 0; i--) {
> + struct proto_hdr *hdr = pkt->headers[i];
> +
> + if (hdr->packet_update)
> + hdr->packet_update(hdr);
> + }
> +}
> +
>  void proto_packet_finish(void)
>  {
>   struct proto_hdr **headers = _packet()->headers[0];
> @@ -433,3 +446,89 @@ void proto_packet_finish(void)
>   p->packet_finish(p);
>   }
>  }
> +
> +static inline unsigned int field_inc(struct proto_field *field)
> +{
> + uint32_t val;
> +
> + val = field->func.val - field->func.min;
> + val = (val + field->func.inc) % field->func.max;

Shouldn't this be

val = (val + field->func.inc) % (field->func.max - field->func.min + 1)

to be consistent with apply_counter()?

Also, I think you should probably get rid of as many pointer
dereferences as possible in these runtime functions, i.e. store max and
min in temporary variables.

> + field->func.val = val + field->func.min;
> +
> + return field->func.val;
> +}
> +
> +static void field_inc_func(struct proto_field *field)
> +{
> + if (field->len == 1) {
> + uint8_t val;
> +
> + val = field_inc(field);
> + proto_field_set_u8(field->hdr, field->id, val);

Assignment on declaration please. Or even better:

proto_field_set_u8(field->hdr, field->id, field_inc(field));

> + } else if (field->len == 2) {
> + uint16_t val;
> +
> + val = field_inc(field);
> + proto_field_set_be16(field->hdr, field->id, val);

Same.

> + } else if (field->len == 4) {
> + uint32_t val;
> +
> + val = field_inc(field);
> + proto_field_set_be32(field->hdr, field->id, val);

Same.

> + } else if (field->len > 4) {
> + uint8_t *bytes = __proto_field_get_bytes(field);
> + uint32_t val;
> +
> + bytes += field->len - 4;
> + val = field_inc(field);
> +
> + *(uint32_t *)bytes = bswap_32(val);

This