RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-14 Thread David Laight
From: Alexey Dobriyan 
> Sent: 14 December 2016 13:20
...
> > If you have foo->bar[id - const] then the compiler has to add the
> > offset of 'bar' and subtract for 'const'.
> > If the numbers match no add or subtract is needed.
> >
> > It is much cleaner to do this by explicitly removing the offset on the
> > accesses than using a union.
> 
> Surprisingly, the trick only works if array index is cast to "unsigned long"
> before subtracting.
> 
> Code becomes
> 
> ...
> ptr = ng->ptr[(unsigned long)id - 3];
> ...

The compiler may also be able to optimise it away if 'id' is 'int'
rather than 'unsigned int'.

Oh, if you need casts like that use an accessor function.

David




Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-14 Thread Alexey Dobriyan
On Tue, Dec 13, 2016 at 5:42 PM, David Laight  wrote:
> From: Alexey Dobriyan
>> Sent: 13 December 2016 14:23
> ...
>> Well, the point of the patch is to save .text, so might as well save
>> as much as possible. Any form other than "ptr[id]" is going
>> to be either bigger or bigger and slower and "ptr" should be the first field.
>
> You've not read and understood the next bit:
>
>> > However if you offset the 'id' values so that only
>> > values 2 up are valid the code becomes:
>> > return net->gen2->ptr[id - 2];
>> > which will be exactly the same code as:
>> > return net->gen1->ptr[id];
>> > but it is much more obvious that 'id' values must be >= 2.
>> >
>> > The '2' should be generated from the structure offset, but with my method
>> > is doesn't actually matter if it is wrong.
>
> If you have foo->bar[id - const] then the compiler has to add the
> offset of 'bar' and subtract for 'const'.
> If the numbers match no add or subtract is needed.
>
> It is much cleaner to do this by explicitly removing the offset on the
> accesses than using a union.

Surprisingly, the trick only works if array index is cast to "unsigned long"
before subtracting.

Code becomes

...
ptr = ng->ptr[(unsigned long)id - 3];
...

I'll post a patch when net-next reopens.


RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-13 Thread David Laight
From: Alexey Dobriyan
> Sent: 13 December 2016 14:23
...
> Well, the point of the patch is to save .text, so might as well save
> as much as possible. Any form other than "ptr[id]" is going
> to be either bigger or bigger and slower and "ptr" should be the first field.

You've not read and understood the next bit:

> > However if you offset the 'id' values so that only
> > values 2 up are valid the code becomes:
> > return net->gen2->ptr[id - 2];
> > which will be exactly the same code as:
> > return net->gen1->ptr[id];
> > but it is much more obvious that 'id' values must be >= 2.
> >
> > The '2' should be generated from the structure offset, but with my method
> > is doesn't actually matter if it is wrong.

If you have foo->bar[id - const] then the compiler has to add the
offset of 'bar' and subtract for 'const'.
If the numbers match no add or subtract is needed.

It is much cleaner to do this by explicitly removing the offset on the
accesses than using a union.

David



Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-13 Thread Alexey Dobriyan
On Wed, Dec 7, 2016 at 1:49 PM, David Laight  wrote:
> From: Alexey Dobriyan
>> Sent: 05 December 2016 14:48
>> On Mon, Dec 5, 2016 at 3:49 PM, David Laight  wrote:
>> > From: Alexey Dobriyan
>> >> Sent: 02 December 2016 01:22
>> >> net_generic() function is both a) inline and b) used ~600 times.
>> >>
>> >> It has the following code inside
>> >>
>> >>   ...
>> >>   ptr = ng->ptr[id - 1];
>> >>   ...
>> >>
>> >> "id" is never compile time constant so compiler is forced to subtract 1.
>> >> And those decrements or LEA [r32 - 1] instructions add up.
>> >>
>> >> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> >> is not initialized and 0. This is quite pointless idea (nothing will
>> >> work or immediate interference with first registered subsystem) in
>> >> general but it hints what needs to be done for code size reduction.
>> >>
>> >> Namely, overlaying allocation of pointer array and fixed part of
>> >> structure in the beginning and using usual base-0 addressing.
>> >>
>> >> Ids are just cookies, their exact values do not matter, so lets start
>> >> with 3 on x86_64.
>> > ...
>> >>  struct net_generic {
>> >> - struct {
>> >> - unsigned int len;
>> >> - struct rcu_head rcu;
>> >> - } s;
>> >> -
>> >> - void *ptr[0];
>> >> + union {
>> >> + struct {
>> >> + unsigned int len;
>> >> + struct rcu_head rcu;
>> >> + } s;
>> >> +
>> >> + void *ptr[0];
>> >> + };
>> >>  };
>> >
>> > That union is an accident waiting to happen.
>>
>> I kind of disagree. Module authors should not be given matches,
>> but it is hard to screw up if net_generic() is all you're given.
>>
>> > What might work is to offset the Ids by
>> > (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
>> > The subtract from the offset will then counter the structure offset
>> > - which is what you are trying to achieve.
>>
>> If you suggest this layout
>>
>> struct net_generic {
>> struct {
>> } s;
>> void *ptr[0];
>> };
>>
>> then is it not optimal because offset of "ptr" needs to be somewhere in code
>> either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.
>>
>> Here is test program
>>
>> struct ng1 {
>> union {
>> struct {
>> unsigned int len;
>> } s;
>> void *ptr[0];
>> };
>> };
>> struct ng2 {
>> struct {
>> unsigned int len;
>> } s;
>> void *ptr[0];
>> };
>> struct net {
>> int x;
>> struct ng1 *gen1;
>> struct ng2 *gen2;
>> };
>> void *ng1(const struct net *net, unsigned int id)
>> {
>> return net->gen1->ptr[id];
>> }
>> void *ng2(const struct net *net, unsigned int id)
>> {
>> return net->gen2->ptr[id];
>> }
>>
>>
>>  :
>>0:   48 8b 47 08 movrax,QWORD PTR [rdi+0x8]
>>4:   89 f6   movesi,esi
>>6:   48 8b 04 f0 movrax,QWORD PTR [rax+rsi*8]
>>a:   c3  ret
>>
>>
>> 0010 :
>>   10:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
>>   14:   89 f6   movesi,esi
>>   16:   48 8b 44 f0 [[[08]]]  movrax,QWORD PTR [rax+rsi*8+0x8]
>>   1b:   c3  ret
>
> On x86 that will make ~0 difference since the offset (in that sequence)
> doesn't require an extra instruction.

Well, the point of the patch is to save .text, so might as well save
as much as possible. Any form other than "ptr[id]" is going
to be either bigger or bigger and slower and "ptr" should be the first field.

> However if you offset the 'id' values so that only
> values 2 up are valid the code becomes:
> return net->gen2->ptr[id - 2];
> which will be exactly the same code as:
> return net->gen1->ptr[id];
> but it is much more obvious that 'id' values must be >= 2.
>
> The '2' should be generated from the structure offset, but with my method
> is doesn't actually matter if it is wrong.


RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-07 Thread David Laight
From: Alexey Dobriyan
> Sent: 05 December 2016 14:48
> On Mon, Dec 5, 2016 at 3:49 PM, David Laight  wrote:
> > From: Alexey Dobriyan
> >> Sent: 02 December 2016 01:22
> >> net_generic() function is both a) inline and b) used ~600 times.
> >>
> >> It has the following code inside
> >>
> >>   ...
> >>   ptr = ng->ptr[id - 1];
> >>   ...
> >>
> >> "id" is never compile time constant so compiler is forced to subtract 1.
> >> And those decrements or LEA [r32 - 1] instructions add up.
> >>
> >> We also start id'ing from 1 to catch bugs where pernet sybsystem id
> >> is not initialized and 0. This is quite pointless idea (nothing will
> >> work or immediate interference with first registered subsystem) in
> >> general but it hints what needs to be done for code size reduction.
> >>
> >> Namely, overlaying allocation of pointer array and fixed part of
> >> structure in the beginning and using usual base-0 addressing.
> >>
> >> Ids are just cookies, their exact values do not matter, so lets start
> >> with 3 on x86_64.
> > ...
> >>  struct net_generic {
> >> - struct {
> >> - unsigned int len;
> >> - struct rcu_head rcu;
> >> - } s;
> >> -
> >> - void *ptr[0];
> >> + union {
> >> + struct {
> >> + unsigned int len;
> >> + struct rcu_head rcu;
> >> + } s;
> >> +
> >> + void *ptr[0];
> >> + };
> >>  };
> >
> > That union is an accident waiting to happen.
> 
> I kind of disagree. Module authors should not be given matches,
> but it is hard to screw up if net_generic() is all you're given.
> 
> > What might work is to offset the Ids by
> > (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
> > The subtract from the offset will then counter the structure offset
> > - which is what you are trying to achieve.
> 
> If you suggest this layout
> 
> struct net_generic {
> struct {
> } s;
> void *ptr[0];
> };
> 
> then is it not optimal because offset of "ptr" needs to be somewhere in code
> either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.
> 
> Here is test program
> 
> struct ng1 {
> union {
> struct {
> unsigned int len;
> } s;
> void *ptr[0];
> };
> };
> struct ng2 {
> struct {
> unsigned int len;
> } s;
> void *ptr[0];
> };
> struct net {
> int x;
> struct ng1 *gen1;
> struct ng2 *gen2;
> };
> void *ng1(const struct net *net, unsigned int id)
> {
> return net->gen1->ptr[id];
> }
> void *ng2(const struct net *net, unsigned int id)
> {
> return net->gen2->ptr[id];
> }
> 
> 
>  :
>0:   48 8b 47 08 movrax,QWORD PTR [rdi+0x8]
>4:   89 f6   movesi,esi
>6:   48 8b 04 f0 movrax,QWORD PTR [rax+rsi*8]
>a:   c3  ret
> 
> 
> 0010 :
>   10:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
>   14:   89 f6   movesi,esi
>   16:   48 8b 44 f0 [[[08]]]  movrax,QWORD PTR [rax+rsi*8+0x8]
>   1b:   c3  ret

On x86 that will make ~0 difference since the offset (in that sequence)
doesn't require an extra instruction.

However if you offset the 'id' values so that only
values 2 up are valid the code becomes:
return net->gen2->ptr[id - 2];
which will be exactly the same code as:
return net->gen1->ptr[id];
but it is much more obvious that 'id' values must be >= 2.

The '2' should be generated from the structure offset, but with my method
is doesn't actually matter if it is wrong.

David



Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-05 Thread Alexey Dobriyan
On Mon, Dec 5, 2016 at 3:49 PM, David Laight  wrote:
> From: Alexey Dobriyan
>> Sent: 02 December 2016 01:22
>> net_generic() function is both a) inline and b) used ~600 times.
>>
>> It has the following code inside
>>
>>   ...
>>   ptr = ng->ptr[id - 1];
>>   ...
>>
>> "id" is never compile time constant so compiler is forced to subtract 1.
>> And those decrements or LEA [r32 - 1] instructions add up.
>>
>> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> is not initialized and 0. This is quite pointless idea (nothing will
>> work or immediate interference with first registered subsystem) in
>> general but it hints what needs to be done for code size reduction.
>>
>> Namely, overlaying allocation of pointer array and fixed part of
>> structure in the beginning and using usual base-0 addressing.
>>
>> Ids are just cookies, their exact values do not matter, so lets start
>> with 3 on x86_64.
> ...
>>  struct net_generic {
>> - struct {
>> - unsigned int len;
>> - struct rcu_head rcu;
>> - } s;
>> -
>> - void *ptr[0];
>> + union {
>> + struct {
>> + unsigned int len;
>> + struct rcu_head rcu;
>> + } s;
>> +
>> + void *ptr[0];
>> + };
>>  };
>
> That union is an accident waiting to happen.

I kind of disagree. Module authors should not be given matches,
but it is hard to screw up if net_generic() is all you're given.

> What might work is to offset the Ids by
> (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
> The subtract from the offset will then counter the structure offset
> - which is what you are trying to achieve.

If you suggest this layout

struct net_generic {
struct {
} s;
void *ptr[0];
};

then is it not optimal because offset of "ptr" needs to be somewhere in code
either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.

Here is test program

struct ng1 {
union {
struct {
unsigned int len;
} s;
void *ptr[0];
};
};
struct ng2 {
struct {
unsigned int len;
} s;
void *ptr[0];
};
struct net {
int x;
struct ng1 *gen1;
struct ng2 *gen2;
};
void *ng1(const struct net *net, unsigned int id)
{
return net->gen1->ptr[id];
}
void *ng2(const struct net *net, unsigned int id)
{
return net->gen2->ptr[id];
}


 :
   0:   48 8b 47 08 movrax,QWORD PTR [rdi+0x8]
   4:   89 f6   movesi,esi
   6:   48 8b 04 f0 movrax,QWORD PTR [rax+rsi*8]
   a:   c3  ret


0010 :
  10:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
  14:   89 f6   movesi,esi
  16:   48 8b 44 f0 [[[08]]]  movrax,QWORD PTR [rax+rsi*8+0x8]
  1b:   c3  ret

I can send more type checking if you can tolerate "_" identifier

struct {
unsigned int _;
} pernet_ops_id_t;


RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-05 Thread David Laight
From: Alexey Dobriyan
> Sent: 02 December 2016 01:22
> net_generic() function is both a) inline and b) used ~600 times.
> 
> It has the following code inside
> 
>   ...
>   ptr = ng->ptr[id - 1];
>   ...
> 
> "id" is never compile time constant so compiler is forced to subtract 1.
> And those decrements or LEA [r32 - 1] instructions add up.
> 
> We also start id'ing from 1 to catch bugs where pernet sybsystem id
> is not initialized and 0. This is quite pointless idea (nothing will
> work or immediate interference with first registered subsystem) in
> general but it hints what needs to be done for code size reduction.
> 
> Namely, overlaying allocation of pointer array and fixed part of
> structure in the beginning and using usual base-0 addressing.
> 
> Ids are just cookies, their exact values do not matter, so lets start
> with 3 on x86_64.
...
>  struct net_generic {
> - struct {
> - unsigned int len;
> - struct rcu_head rcu;
> - } s;
> -
> - void *ptr[0];
> + union {
> + struct {
> + unsigned int len;
> + struct rcu_head rcu;
> + } s;
> +
> + void *ptr[0];
> + };
>  };

That union is an accident waiting to happen.
What might work is to offset the Ids by
(offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
The subtract from the offset will then counter the structure offset
- which is what you are trying to achieve.

David.


[PATCH 3/3] netns: fix net_generic() "id - 1" bloat

2016-12-01 Thread Alexey Dobriyan
net_generic() function is both a) inline and b) used ~600 times.

It has the following code inside

...
ptr = ng->ptr[id - 1];
...

"id" is never compile time constant so compiler is forced to subtract 1.
And those decrements or LEA [r32 - 1] instructions add up.

We also start id'ing from 1 to catch bugs where pernet sybsystem id
is not initialized and 0. This is quite pointless idea (nothing will
work or immediate interference with first registered subsystem) in
general but it hints what needs to be done for code size reduction.

Namely, overlaying allocation of pointer array and fixed part of
structure in the beginning and using usual base-0 addressing.

Ids are just cookies, their exact values do not matter, so lets start
with 3 on x86_64.

Code size savings (oh boy): -4.2 KB

As usual, ignore the initial compiler stupidity part of the table.

add/remove: 0/0 grow/shrink: 12/670 up/down: 89/-4297 (-4208)
function old new   delta
tipc_nametbl_insert_publ12501270 +20
nlmclnt_lookup_host  686 703 +17
nfsd4_encode_fattr  59305941 +11
nfs_get_client  10501061 +11
register_pernet_operations   333 342  +9
tcf_mirred_init  843 849  +6
tcf_bpf_init11431149  +6
gss_setup_upcall 990 994  +4
idmap_name_to_id 432 434  +2
ops_init 274 275  +1
nfsd_inject_forget_client259 260  +1
nfs4_alloc_client612 613  +1
tunnel_key_walker164 163  -1

...

tipc_bcbase_select_primary   392 360 -32
mac80211_hwsim_new_radio28082767 -41
ipip6_tunnel_ioctl  22282186 -42
tipc_bcast_rcv   715 672 -43
tipc_link_build_proto_msg   11401089 -51
nfsd4_lock  38513796 -55
tipc_mon_rcv1012 956 -56
Total: Before=156643951, After=156639743, chg -0.00%


Signed-off-by: Alexey Dobriyan 
---

 include/net/netns/generic.h |   16 +---
 net/core/net_namespace.c|   20 
 2 files changed, 21 insertions(+), 15 deletions(-)

--- a/include/net/netns/generic.h
+++ b/include/net/netns/generic.h
@@ -25,12 +25,14 @@
  */
 
 struct net_generic {
-   struct {
-   unsigned int len;
-   struct rcu_head rcu;
-   } s;
-
-   void *ptr[0];
+   union {
+   struct {
+   unsigned int len;
+   struct rcu_head rcu;
+   } s;
+
+   void *ptr[0];
+   };
 };
 
 static inline void *net_generic(const struct net *net, unsigned int id)
@@ -40,7 +42,7 @@ static inline void *net_generic(const struct net *net, 
unsigned int id)
 
rcu_read_lock();
ng = rcu_dereference(net->gen);
-   ptr = ng->ptr[id - 1];
+   ptr = ng->ptr[id];
rcu_read_unlock();
 
return ptr;
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -39,6 +39,9 @@ EXPORT_SYMBOL(init_net);
 
 static bool init_net_initialized;
 
+#define MIN_PERNET_OPS_ID  \
+   ((sizeof(struct net_generic) + sizeof(void *) - 1) / sizeof(void *))
+
 #define INITIAL_NET_GEN_PTRS   13 /* +1 for len +2 for rcu_head */
 
 static unsigned int max_gen_ptrs = INITIAL_NET_GEN_PTRS;
@@ -46,7 +49,7 @@ static unsigned int max_gen_ptrs = INITIAL_NET_GEN_PTRS;
 static struct net_generic *net_alloc_generic(void)
 {
struct net_generic *ng;
-   size_t generic_size = offsetof(struct net_generic, ptr[max_gen_ptrs]);
+   unsigned int generic_size = offsetof(struct net_generic, 
ptr[max_gen_ptrs]);
 
ng = kzalloc(generic_size, GFP_KERNEL);
if (ng)
@@ -60,12 +63,12 @@ static int net_assign_generic(struct net *net, unsigned int 
id, void *data)
struct net_generic *ng, *old_ng;
 
BUG_ON(!mutex_is_locked(&net_mutex));
-   BUG_ON(id == 0);
+   BUG_ON(id < MIN_PERNET_OPS_ID);
 
old_ng = rcu_dereference_protected(net->gen,
   lockdep_is_held(&net_mutex));
-   if (old_ng->s.len >= id) {
-   old_ng->ptr[id - 1] = data;
+   if (old_ng->s.len > id) {
+   old_ng->ptr[id] = data;
return 0;
}
 
@@ -84,8 +87,9 @@ static int net_assign_generic(struct net *net, unsigned int 
id, vo