RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-05 Thread David Laight
From: Alexey Dobriyan
> Sent: 02 December 2016 01:22
> net_generic() function is both a) inline and b) used ~600 times.
> 
> It has the following code inside
> 
>   ...
>   ptr = ng->ptr[id - 1];
>   ...
> 
> "id" is never compile time constant so compiler is forced to subtract 1.
> And those decrements or LEA [r32 - 1] instructions add up.
> 
> We also start id'ing from 1 to catch bugs where pernet sybsystem id
> is not initialized and 0. This is quite pointless idea (nothing will
> work or immediate interference with first registered subsystem) in
> general but it hints what needs to be done for code size reduction.
> 
> Namely, overlaying allocation of pointer array and fixed part of
> structure in the beginning and using usual base-0 addressing.
> 
> Ids are just cookies, their exact values do not matter, so lets start
> with 3 on x86_64.
...
>  struct net_generic {
> - struct {
> - unsigned int len;
> - struct rcu_head rcu;
> - } s;
> -
> - void *ptr[0];
> + union {
> + struct {
> + unsigned int len;
> + struct rcu_head rcu;
> + } s;
> +
> + void *ptr[0];
> + };
>  };

That union is an accident waiting to happen.
What might work is to offset the Ids by
(offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
The subtract from the offset will then counter the structure offset
- which is what you are trying to achieve.

David.


Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-05 Thread Alexey Dobriyan
On Mon, Dec 5, 2016 at 3:49 PM, David Laight  wrote:
> From: Alexey Dobriyan
>> Sent: 02 December 2016 01:22
>> net_generic() function is both a) inline and b) used ~600 times.
>>
>> It has the following code inside
>>
>>   ...
>>   ptr = ng->ptr[id - 1];
>>   ...
>>
>> "id" is never compile time constant so compiler is forced to subtract 1.
>> And those decrements or LEA [r32 - 1] instructions add up.
>>
>> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> is not initialized and 0. This is quite pointless idea (nothing will
>> work or immediate interference with first registered subsystem) in
>> general but it hints what needs to be done for code size reduction.
>>
>> Namely, overlaying allocation of pointer array and fixed part of
>> structure in the beginning and using usual base-0 addressing.
>>
>> Ids are just cookies, their exact values do not matter, so lets start
>> with 3 on x86_64.
> ...
>>  struct net_generic {
>> - struct {
>> - unsigned int len;
>> - struct rcu_head rcu;
>> - } s;
>> -
>> - void *ptr[0];
>> + union {
>> + struct {
>> + unsigned int len;
>> + struct rcu_head rcu;
>> + } s;
>> +
>> + void *ptr[0];
>> + };
>>  };
>
> That union is an accident waiting to happen.

I kind of disagree. Module authors should not be given matches,
but it is hard to screw up if net_generic() is all you're given.

> What might work is to offset the Ids by
> (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
> The subtract from the offset will then counter the structure offset
> - which is what you are trying to achieve.

If you suggest this layout

struct net_generic {
struct {
} s;
void *ptr[0];
};

then is it not optimal because offset of "ptr" needs to be somewhere in code
either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.

Here is test program

struct ng1 {
union {
struct {
unsigned int len;
} s;
void *ptr[0];
};
};
struct ng2 {
struct {
unsigned int len;
} s;
void *ptr[0];
};
struct net {
int x;
struct ng1 *gen1;
struct ng2 *gen2;
};
void *ng1(const struct net *net, unsigned int id)
{
return net->gen1->ptr[id];
}
void *ng2(const struct net *net, unsigned int id)
{
return net->gen2->ptr[id];
}


 :
   0:   48 8b 47 08 movrax,QWORD PTR [rdi+0x8]
   4:   89 f6   movesi,esi
   6:   48 8b 04 f0 movrax,QWORD PTR [rax+rsi*8]
   a:   c3  ret


0010 :
  10:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
  14:   89 f6   movesi,esi
  16:   48 8b 44 f0 [[[08]]]  movrax,QWORD PTR [rax+rsi*8+0x8]
  1b:   c3  ret

I can send more type checking if you can tolerate "_" identifier

struct {
unsigned int _;
} pernet_ops_id_t;


RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-07 Thread David Laight
From: Alexey Dobriyan
> Sent: 05 December 2016 14:48
> On Mon, Dec 5, 2016 at 3:49 PM, David Laight  wrote:
> > From: Alexey Dobriyan
> >> Sent: 02 December 2016 01:22
> >> net_generic() function is both a) inline and b) used ~600 times.
> >>
> >> It has the following code inside
> >>
> >>   ...
> >>   ptr = ng->ptr[id - 1];
> >>   ...
> >>
> >> "id" is never compile time constant so compiler is forced to subtract 1.
> >> And those decrements or LEA [r32 - 1] instructions add up.
> >>
> >> We also start id'ing from 1 to catch bugs where pernet sybsystem id
> >> is not initialized and 0. This is quite pointless idea (nothing will
> >> work or immediate interference with first registered subsystem) in
> >> general but it hints what needs to be done for code size reduction.
> >>
> >> Namely, overlaying allocation of pointer array and fixed part of
> >> structure in the beginning and using usual base-0 addressing.
> >>
> >> Ids are just cookies, their exact values do not matter, so lets start
> >> with 3 on x86_64.
> > ...
> >>  struct net_generic {
> >> - struct {
> >> - unsigned int len;
> >> - struct rcu_head rcu;
> >> - } s;
> >> -
> >> - void *ptr[0];
> >> + union {
> >> + struct {
> >> + unsigned int len;
> >> + struct rcu_head rcu;
> >> + } s;
> >> +
> >> + void *ptr[0];
> >> + };
> >>  };
> >
> > That union is an accident waiting to happen.
> 
> I kind of disagree. Module authors should not be given matches,
> but it is hard to screw up if net_generic() is all you're given.
> 
> > What might work is to offset the Ids by
> > (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
> > The subtract from the offset will then counter the structure offset
> > - which is what you are trying to achieve.
> 
> If you suggest this layout
> 
> struct net_generic {
> struct {
> } s;
> void *ptr[0];
> };
> 
> then is it not optimal because offset of "ptr" needs to be somewhere in code
> either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.
> 
> Here is test program
> 
> struct ng1 {
> union {
> struct {
> unsigned int len;
> } s;
> void *ptr[0];
> };
> };
> struct ng2 {
> struct {
> unsigned int len;
> } s;
> void *ptr[0];
> };
> struct net {
> int x;
> struct ng1 *gen1;
> struct ng2 *gen2;
> };
> void *ng1(const struct net *net, unsigned int id)
> {
> return net->gen1->ptr[id];
> }
> void *ng2(const struct net *net, unsigned int id)
> {
> return net->gen2->ptr[id];
> }
> 
> 
>  :
>0:   48 8b 47 08 movrax,QWORD PTR [rdi+0x8]
>4:   89 f6   movesi,esi
>6:   48 8b 04 f0 movrax,QWORD PTR [rax+rsi*8]
>a:   c3  ret
> 
> 
> 0010 :
>   10:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
>   14:   89 f6   movesi,esi
>   16:   48 8b 44 f0 [[[08]]]  movrax,QWORD PTR [rax+rsi*8+0x8]
>   1b:   c3  ret

On x86 that will make ~0 difference since the offset (in that sequence)
doesn't require an extra instruction.

However if you offset the 'id' values so that only
values 2 up are valid the code becomes:
return net->gen2->ptr[id - 2];
which will be exactly the same code as:
return net->gen1->ptr[id];
but it is much more obvious that 'id' values must be >= 2.

The '2' should be generated from the structure offset, but with my method
is doesn't actually matter if it is wrong.

David



Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-13 Thread Alexey Dobriyan
On Wed, Dec 7, 2016 at 1:49 PM, David Laight  wrote:
> From: Alexey Dobriyan
>> Sent: 05 December 2016 14:48
>> On Mon, Dec 5, 2016 at 3:49 PM, David Laight  wrote:
>> > From: Alexey Dobriyan
>> >> Sent: 02 December 2016 01:22
>> >> net_generic() function is both a) inline and b) used ~600 times.
>> >>
>> >> It has the following code inside
>> >>
>> >>   ...
>> >>   ptr = ng->ptr[id - 1];
>> >>   ...
>> >>
>> >> "id" is never compile time constant so compiler is forced to subtract 1.
>> >> And those decrements or LEA [r32 - 1] instructions add up.
>> >>
>> >> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> >> is not initialized and 0. This is quite pointless idea (nothing will
>> >> work or immediate interference with first registered subsystem) in
>> >> general but it hints what needs to be done for code size reduction.
>> >>
>> >> Namely, overlaying allocation of pointer array and fixed part of
>> >> structure in the beginning and using usual base-0 addressing.
>> >>
>> >> Ids are just cookies, their exact values do not matter, so lets start
>> >> with 3 on x86_64.
>> > ...
>> >>  struct net_generic {
>> >> - struct {
>> >> - unsigned int len;
>> >> - struct rcu_head rcu;
>> >> - } s;
>> >> -
>> >> - void *ptr[0];
>> >> + union {
>> >> + struct {
>> >> + unsigned int len;
>> >> + struct rcu_head rcu;
>> >> + } s;
>> >> +
>> >> + void *ptr[0];
>> >> + };
>> >>  };
>> >
>> > That union is an accident waiting to happen.
>>
>> I kind of disagree. Module authors should not be given matches,
>> but it is hard to screw up if net_generic() is all you're given.
>>
>> > What might work is to offset the Ids by
>> > (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
>> > The subtract from the offset will then counter the structure offset
>> > - which is what you are trying to achieve.
>>
>> If you suggest this layout
>>
>> struct net_generic {
>> struct {
>> } s;
>> void *ptr[0];
>> };
>>
>> then is it not optimal because offset of "ptr" needs to be somewhere in code
>> either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.
>>
>> Here is test program
>>
>> struct ng1 {
>> union {
>> struct {
>> unsigned int len;
>> } s;
>> void *ptr[0];
>> };
>> };
>> struct ng2 {
>> struct {
>> unsigned int len;
>> } s;
>> void *ptr[0];
>> };
>> struct net {
>> int x;
>> struct ng1 *gen1;
>> struct ng2 *gen2;
>> };
>> void *ng1(const struct net *net, unsigned int id)
>> {
>> return net->gen1->ptr[id];
>> }
>> void *ng2(const struct net *net, unsigned int id)
>> {
>> return net->gen2->ptr[id];
>> }
>>
>>
>>  :
>>0:   48 8b 47 08 movrax,QWORD PTR [rdi+0x8]
>>4:   89 f6   movesi,esi
>>6:   48 8b 04 f0 movrax,QWORD PTR [rax+rsi*8]
>>a:   c3  ret
>>
>>
>> 0010 :
>>   10:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
>>   14:   89 f6   movesi,esi
>>   16:   48 8b 44 f0 [[[08]]]  movrax,QWORD PTR [rax+rsi*8+0x8]
>>   1b:   c3  ret
>
> On x86 that will make ~0 difference since the offset (in that sequence)
> doesn't require an extra instruction.

Well, the point of the patch is to save .text, so might as well save
as much as possible. Any form other than "ptr[id]" is going
to be either bigger or bigger and slower and "ptr" should be the first field.

> However if you offset the 'id' values so that only
> values 2 up are valid the code becomes:
> return net->gen2->ptr[id - 2];
> which will be exactly the same code as:
> return net->gen1->ptr[id];
> but it is much more obvious that 'id' values must be >= 2.
>
> The '2' should be generated from the structure offset, but with my method
> is doesn't actually matter if it is wrong.


RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-13 Thread David Laight
From: Alexey Dobriyan
> Sent: 13 December 2016 14:23
...
> Well, the point of the patch is to save .text, so might as well save
> as much as possible. Any form other than "ptr[id]" is going
> to be either bigger or bigger and slower and "ptr" should be the first field.

You've not read and understood the next bit:

> > However if you offset the 'id' values so that only
> > values 2 up are valid the code becomes:
> > return net->gen2->ptr[id - 2];
> > which will be exactly the same code as:
> > return net->gen1->ptr[id];
> > but it is much more obvious that 'id' values must be >= 2.
> >
> > The '2' should be generated from the structure offset, but with my method
> > is doesn't actually matter if it is wrong.

If you have foo->bar[id - const] then the compiler has to add the
offset of 'bar' and subtract for 'const'.
If the numbers match no add or subtract is needed.

It is much cleaner to do this by explicitly removing the offset on the
accesses than using a union.

David



Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-14 Thread Alexey Dobriyan
On Tue, Dec 13, 2016 at 5:42 PM, David Laight  wrote:
> From: Alexey Dobriyan
>> Sent: 13 December 2016 14:23
> ...
>> Well, the point of the patch is to save .text, so might as well save
>> as much as possible. Any form other than "ptr[id]" is going
>> to be either bigger or bigger and slower and "ptr" should be the first field.
>
> You've not read and understood the next bit:
>
>> > However if you offset the 'id' values so that only
>> > values 2 up are valid the code becomes:
>> > return net->gen2->ptr[id - 2];
>> > which will be exactly the same code as:
>> > return net->gen1->ptr[id];
>> > but it is much more obvious that 'id' values must be >= 2.
>> >
>> > The '2' should be generated from the structure offset, but with my method
>> > is doesn't actually matter if it is wrong.
>
> If you have foo->bar[id - const] then the compiler has to add the
> offset of 'bar' and subtract for 'const'.
> If the numbers match no add or subtract is needed.
>
> It is much cleaner to do this by explicitly removing the offset on the
> accesses than using a union.

Surprisingly, the trick only works if array index is cast to "unsigned long"
before subtracting.

Code becomes

...
ptr = ng->ptr[(unsigned long)id - 3];
...

I'll post a patch when net-next reopens.


RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-14 Thread David Laight
From: Alexey Dobriyan 
> Sent: 14 December 2016 13:20
...
> > If you have foo->bar[id - const] then the compiler has to add the
> > offset of 'bar' and subtract for 'const'.
> > If the numbers match no add or subtract is needed.
> >
> > It is much cleaner to do this by explicitly removing the offset on the
> > accesses than using a union.
> 
> Surprisingly, the trick only works if array index is cast to "unsigned long"
> before subtracting.
> 
> Code becomes
> 
> ...
> ptr = ng->ptr[(unsigned long)id - 3];
> ...

The compiler may also be able to optimise it away if 'id' is 'int'
rather than 'unsigned int'.

Oh, if you need casts like that use an accessor function.

David