Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Ilpo Järvinen
On Wed, 20 Feb 2008, Jan Engelhardt wrote:

> 
> On Feb 20 2008 17:27, Patrick McHardy wrote:
> >> Striking. How can this even happen? A callsite which calls
> >> 
> >>  dev_alloc_skb(n)
> >> 
> >> is just equivalent to
> >> 
> >>  __dev_alloc_skb(n, GFP_ATOMIC);
> >> 
> >> which means there's like 4 (or 8 if it's long) bytes more on the
> >> stack. For a worst case, count in another 8 bytes for push and pop or mov 
> >> on
> >> the stack. But that still does not add up to 23 kb.

I think you misunderstood the results, if I uninlined dev_alloc_skb(), it 
_alone_ was uninlined which basically means that __dev_alloc_skb() that is 
inline as well is included inside that uninlined function.

When both were inlined, they add up to everywhere, and uninlining 
dev_alloc_skb alone mitigates that for both(!) of them in every place 
where dev_alloc_skb is being called. Because __dev_alloc_skb call sites 
are few, most benefits show up already with dev_alloc_skb uninlining 
alone. On the other hand, if __dev_alloc_skb is uninlined, the size 
reasoning you used above applies to dev_alloc_skb callsites, and that
is definately less than 23kB.

> > __dev_alloc_skb() is also an inline function which performs
> > some extra work. Which raises the question - if dev_alloc_skb()
> > is uninlined, shouldn't __dev_alloc_skb() be uninline as well?

Of course that could be done as well, however, I wouldn't be too keen to 
deepen callchain by both of them, ie., uninlined dev_alloc_skb would just 
contain few bytes which perform the call to __dev_alloc_skb which has the 
bit larger content due to that "extra work". IMHO the best solution would 
duplicate the "extra work" to both of them on binary level (obviously 
not on the source level), e.g., by adding static inline ___dev_alloc_skb() 
to .h which is inlined to both of the variants. I'm not too sure if inline 
to __dev_alloc_skb() alone is enough in .c file to result in inlining of 
__dev_alloc_skb to dev_alloc_skb (with all gcc versions and relevant 
optimization settings).

> I'd like to see the results when {__dev_alloc_skb is externed
> and dev_alloc_skb remains inlined}.

The results are right under your nose already... ;-)
See from the list of the series introduction:

  http://marc.info/?l=linux-netdev=120351526210711=2

IMHO more interesting number (which I currently don't have) is the 
_remaining_ benefits of uninlining __dev_alloc_skb after
dev_alloc_skb was first uninlined.


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Jan Engelhardt

On Feb 20 2008 17:27, Patrick McHardy wrote:
>> Striking. How can this even happen? A callsite which calls
>> 
>>  dev_alloc_skb(n)
>> 
>> is just equivalent to
>> 
>>  __dev_alloc_skb(n, GFP_ATOMIC);
>> 
>> which means there's like 4 (or 8 if it's long) bytes more on the
>> stack. For a worst case, count in another 8 bytes for push and pop or mov on
>> the stack. But that still does not add up to 23 kb.
>
> __dev_alloc_skb() is also an inline function which performs
> some extra work. Which raises the question - if dev_alloc_skb()
> is uninlined, shouldn't __dev_alloc_skb() be uninline as well?
>
I'd like to see the results when {__dev_alloc_skb is externed
and dev_alloc_skb remains inlined}.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Patrick McHardy

Jan Engelhardt wrote:

On Feb 20 2008 15:47, Ilpo Järvinen wrote:

-23668  392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb

-static inline struct sk_buff *dev_alloc_skb(unsigned int length)
-{
-   return __dev_alloc_skb(length, GFP_ATOMIC);
-}
+extern struct sk_buff *dev_alloc_skb(unsigned int length);


Striking. How can this even happen? A callsite which calls

dev_alloc_skb(n)

is just equivalent to

__dev_alloc_skb(n, GFP_ATOMIC);

which means there's like 4 (or 8 if it's long) bytes more on the
stack. For a worst case, count in another 8 bytes for push and pop or mov on
the stack. But that still does not add up to 23 kb.



__dev_alloc_skb() is also an inline function which performs
some extra work. Which raises the question - if dev_alloc_skb()
is uninlined, shouldn't __dev_alloc_skb() be uninline as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Jan Engelhardt

On Feb 20 2008 15:47, Ilpo Järvinen wrote:
>
>-23668  392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb
>
>-static inline struct sk_buff *dev_alloc_skb(unsigned int length)
>-{
>-  return __dev_alloc_skb(length, GFP_ATOMIC);
>-}
>+extern struct sk_buff *dev_alloc_skb(unsigned int length);

Striking. How can this even happen? A callsite which calls

dev_alloc_skb(n)

is just equivalent to

__dev_alloc_skb(n, GFP_ATOMIC);

which means there's like 4 (or 8 if it's long) bytes more on the
stack. For a worst case, count in another 8 bytes for push and pop or mov on
the stack. But that still does not add up to 23 kb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Jan Engelhardt

On Feb 20 2008 17:27, Patrick McHardy wrote:
 Striking. How can this even happen? A callsite which calls
 
  dev_alloc_skb(n)
 
 is just equivalent to
 
  __dev_alloc_skb(n, GFP_ATOMIC);
 
 which means there's like 4 (or 8 if it's long) bytes more on the
 stack. For a worst case, count in another 8 bytes for push and pop or mov on
 the stack. But that still does not add up to 23 kb.

 __dev_alloc_skb() is also an inline function which performs
 some extra work. Which raises the question - if dev_alloc_skb()
 is uninlined, shouldn't __dev_alloc_skb() be uninline as well?

I'd like to see the results when {__dev_alloc_skb is externed
and dev_alloc_skb remains inlined}.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Patrick McHardy

Jan Engelhardt wrote:

On Feb 20 2008 15:47, Ilpo Järvinen wrote:

-23668  392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb

-static inline struct sk_buff *dev_alloc_skb(unsigned int length)
-{
-   return __dev_alloc_skb(length, GFP_ATOMIC);
-}
+extern struct sk_buff *dev_alloc_skb(unsigned int length);


Striking. How can this even happen? A callsite which calls

dev_alloc_skb(n)

is just equivalent to

__dev_alloc_skb(n, GFP_ATOMIC);

which means there's like 4 (or 8 if it's long) bytes more on the
stack. For a worst case, count in another 8 bytes for push and pop or mov on
the stack. But that still does not add up to 23 kb.



__dev_alloc_skb() is also an inline function which performs
some extra work. Which raises the question - if dev_alloc_skb()
is uninlined, shouldn't __dev_alloc_skb() be uninline as well?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Jan Engelhardt

On Feb 20 2008 15:47, Ilpo Järvinen wrote:

-23668  392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb

-static inline struct sk_buff *dev_alloc_skb(unsigned int length)
-{
-  return __dev_alloc_skb(length, GFP_ATOMIC);
-}
+extern struct sk_buff *dev_alloc_skb(unsigned int length);

Striking. How can this even happen? A callsite which calls

dev_alloc_skb(n)

is just equivalent to

__dev_alloc_skb(n, GFP_ATOMIC);

which means there's like 4 (or 8 if it's long) bytes more on the
stack. For a worst case, count in another 8 bytes for push and pop or mov on
the stack. But that still does not add up to 23 kb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot

2008-02-20 Thread Ilpo Järvinen
On Wed, 20 Feb 2008, Jan Engelhardt wrote:

 
 On Feb 20 2008 17:27, Patrick McHardy wrote:
  Striking. How can this even happen? A callsite which calls
  
   dev_alloc_skb(n)
  
  is just equivalent to
  
   __dev_alloc_skb(n, GFP_ATOMIC);
  
  which means there's like 4 (or 8 if it's long) bytes more on the
  stack. For a worst case, count in another 8 bytes for push and pop or mov 
  on
  the stack. But that still does not add up to 23 kb.

I think you misunderstood the results, if I uninlined dev_alloc_skb(), it 
_alone_ was uninlined which basically means that __dev_alloc_skb() that is 
inline as well is included inside that uninlined function.

When both were inlined, they add up to everywhere, and uninlining 
dev_alloc_skb alone mitigates that for both(!) of them in every place 
where dev_alloc_skb is being called. Because __dev_alloc_skb call sites 
are few, most benefits show up already with dev_alloc_skb uninlining 
alone. On the other hand, if __dev_alloc_skb is uninlined, the size 
reasoning you used above applies to dev_alloc_skb callsites, and that
is definately less than 23kB.

  __dev_alloc_skb() is also an inline function which performs
  some extra work. Which raises the question - if dev_alloc_skb()
  is uninlined, shouldn't __dev_alloc_skb() be uninline as well?

Of course that could be done as well, however, I wouldn't be too keen to 
deepen callchain by both of them, ie., uninlined dev_alloc_skb would just 
contain few bytes which perform the call to __dev_alloc_skb which has the 
bit larger content due to that extra work. IMHO the best solution would 
duplicate the extra work to both of them on binary level (obviously 
not on the source level), e.g., by adding static inline ___dev_alloc_skb() 
to .h which is inlined to both of the variants. I'm not too sure if inline 
to __dev_alloc_skb() alone is enough in .c file to result in inlining of 
__dev_alloc_skb to dev_alloc_skb (with all gcc versions and relevant 
optimization settings).

 I'd like to see the results when {__dev_alloc_skb is externed
 and dev_alloc_skb remains inlined}.

The results are right under your nose already... ;-)
See from the list of the series introduction:

  http://marc.info/?l=linux-netdevm=120351526210711w=2

IMHO more interesting number (which I currently don't have) is the 
_remaining_ benefits of uninlining __dev_alloc_skb after
dev_alloc_skb was first uninlined.


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/