Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
On Wed, 20 Feb 2008, Jan Engelhardt wrote: > > On Feb 20 2008 17:27, Patrick McHardy wrote: > >> Striking. How can this even happen? A callsite which calls > >> > >> dev_alloc_skb(n) > >> > >> is just equivalent to > >> > >> __dev_alloc_skb(n, GFP_ATOMIC); > >> > >> which means there's like 4 (or 8 if it's long) bytes more on the > >> stack. For a worst case, count in another 8 bytes for push and pop or mov > >> on > >> the stack. But that still does not add up to 23 kb. I think you misunderstood the results, if I uninlined dev_alloc_skb(), it _alone_ was uninlined which basically means that __dev_alloc_skb() that is inline as well is included inside that uninlined function. When both were inlined, they add up to everywhere, and uninlining dev_alloc_skb alone mitigates that for both(!) of them in every place where dev_alloc_skb is being called. Because __dev_alloc_skb call sites are few, most benefits show up already with dev_alloc_skb uninlining alone. On the other hand, if __dev_alloc_skb is uninlined, the size reasoning you used above applies to dev_alloc_skb callsites, and that is definately less than 23kB. > > __dev_alloc_skb() is also an inline function which performs > > some extra work. Which raises the question - if dev_alloc_skb() > > is uninlined, shouldn't __dev_alloc_skb() be uninline as well? Of course that could be done as well, however, I wouldn't be too keen to deepen callchain by both of them, ie., uninlined dev_alloc_skb would just contain few bytes which perform the call to __dev_alloc_skb which has the bit larger content due to that "extra work". IMHO the best solution would duplicate the "extra work" to both of them on binary level (obviously not on the source level), e.g., by adding static inline ___dev_alloc_skb() to .h which is inlined to both of the variants. I'm not too sure if inline to __dev_alloc_skb() alone is enough in .c file to result in inlining of __dev_alloc_skb to dev_alloc_skb (with all gcc versions and relevant optimization settings). > I'd like to see the results when {__dev_alloc_skb is externed > and dev_alloc_skb remains inlined}. The results are right under your nose already... ;-) See from the list of the series introduction: http://marc.info/?l=linux-netdev=120351526210711=2 IMHO more interesting number (which I currently don't have) is the _remaining_ benefits of uninlining __dev_alloc_skb after dev_alloc_skb was first uninlined. -- i. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
On Feb 20 2008 17:27, Patrick McHardy wrote: >> Striking. How can this even happen? A callsite which calls >> >> dev_alloc_skb(n) >> >> is just equivalent to >> >> __dev_alloc_skb(n, GFP_ATOMIC); >> >> which means there's like 4 (or 8 if it's long) bytes more on the >> stack. For a worst case, count in another 8 bytes for push and pop or mov on >> the stack. But that still does not add up to 23 kb. > > __dev_alloc_skb() is also an inline function which performs > some extra work. Which raises the question - if dev_alloc_skb() > is uninlined, shouldn't __dev_alloc_skb() be uninline as well? > I'd like to see the results when {__dev_alloc_skb is externed and dev_alloc_skb remains inlined}. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
Jan Engelhardt wrote: On Feb 20 2008 15:47, Ilpo Järvinen wrote: -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb -static inline struct sk_buff *dev_alloc_skb(unsigned int length) -{ - return __dev_alloc_skb(length, GFP_ATOMIC); -} +extern struct sk_buff *dev_alloc_skb(unsigned int length); Striking. How can this even happen? A callsite which calls dev_alloc_skb(n) is just equivalent to __dev_alloc_skb(n, GFP_ATOMIC); which means there's like 4 (or 8 if it's long) bytes more on the stack. For a worst case, count in another 8 bytes for push and pop or mov on the stack. But that still does not add up to 23 kb. __dev_alloc_skb() is also an inline function which performs some extra work. Which raises the question - if dev_alloc_skb() is uninlined, shouldn't __dev_alloc_skb() be uninline as well? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
On Feb 20 2008 15:47, Ilpo Järvinen wrote: > >-23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb > >-static inline struct sk_buff *dev_alloc_skb(unsigned int length) >-{ >- return __dev_alloc_skb(length, GFP_ATOMIC); >-} >+extern struct sk_buff *dev_alloc_skb(unsigned int length); Striking. How can this even happen? A callsite which calls dev_alloc_skb(n) is just equivalent to __dev_alloc_skb(n, GFP_ATOMIC); which means there's like 4 (or 8 if it's long) bytes more on the stack. For a worst case, count in another 8 bytes for push and pop or mov on the stack. But that still does not add up to 23 kb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
On Feb 20 2008 17:27, Patrick McHardy wrote: Striking. How can this even happen? A callsite which calls dev_alloc_skb(n) is just equivalent to __dev_alloc_skb(n, GFP_ATOMIC); which means there's like 4 (or 8 if it's long) bytes more on the stack. For a worst case, count in another 8 bytes for push and pop or mov on the stack. But that still does not add up to 23 kb. __dev_alloc_skb() is also an inline function which performs some extra work. Which raises the question - if dev_alloc_skb() is uninlined, shouldn't __dev_alloc_skb() be uninline as well? I'd like to see the results when {__dev_alloc_skb is externed and dev_alloc_skb remains inlined}. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
Jan Engelhardt wrote: On Feb 20 2008 15:47, Ilpo Järvinen wrote: -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb -static inline struct sk_buff *dev_alloc_skb(unsigned int length) -{ - return __dev_alloc_skb(length, GFP_ATOMIC); -} +extern struct sk_buff *dev_alloc_skb(unsigned int length); Striking. How can this even happen? A callsite which calls dev_alloc_skb(n) is just equivalent to __dev_alloc_skb(n, GFP_ATOMIC); which means there's like 4 (or 8 if it's long) bytes more on the stack. For a worst case, count in another 8 bytes for push and pop or mov on the stack. But that still does not add up to 23 kb. __dev_alloc_skb() is also an inline function which performs some extra work. Which raises the question - if dev_alloc_skb() is uninlined, shouldn't __dev_alloc_skb() be uninline as well? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
On Feb 20 2008 15:47, Ilpo Järvinen wrote: -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb -static inline struct sk_buff *dev_alloc_skb(unsigned int length) -{ - return __dev_alloc_skb(length, GFP_ATOMIC); -} +extern struct sk_buff *dev_alloc_skb(unsigned int length); Striking. How can this even happen? A callsite which calls dev_alloc_skb(n) is just equivalent to __dev_alloc_skb(n, GFP_ATOMIC); which means there's like 4 (or 8 if it's long) bytes more on the stack. For a worst case, count in another 8 bytes for push and pop or mov on the stack. But that still does not add up to 23 kb. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/8] [NET]: uninline dev_alloc_skb, de-bloats a lot
On Wed, 20 Feb 2008, Jan Engelhardt wrote: On Feb 20 2008 17:27, Patrick McHardy wrote: Striking. How can this even happen? A callsite which calls dev_alloc_skb(n) is just equivalent to __dev_alloc_skb(n, GFP_ATOMIC); which means there's like 4 (or 8 if it's long) bytes more on the stack. For a worst case, count in another 8 bytes for push and pop or mov on the stack. But that still does not add up to 23 kb. I think you misunderstood the results, if I uninlined dev_alloc_skb(), it _alone_ was uninlined which basically means that __dev_alloc_skb() that is inline as well is included inside that uninlined function. When both were inlined, they add up to everywhere, and uninlining dev_alloc_skb alone mitigates that for both(!) of them in every place where dev_alloc_skb is being called. Because __dev_alloc_skb call sites are few, most benefits show up already with dev_alloc_skb uninlining alone. On the other hand, if __dev_alloc_skb is uninlined, the size reasoning you used above applies to dev_alloc_skb callsites, and that is definately less than 23kB. __dev_alloc_skb() is also an inline function which performs some extra work. Which raises the question - if dev_alloc_skb() is uninlined, shouldn't __dev_alloc_skb() be uninline as well? Of course that could be done as well, however, I wouldn't be too keen to deepen callchain by both of them, ie., uninlined dev_alloc_skb would just contain few bytes which perform the call to __dev_alloc_skb which has the bit larger content due to that extra work. IMHO the best solution would duplicate the extra work to both of them on binary level (obviously not on the source level), e.g., by adding static inline ___dev_alloc_skb() to .h which is inlined to both of the variants. I'm not too sure if inline to __dev_alloc_skb() alone is enough in .c file to result in inlining of __dev_alloc_skb to dev_alloc_skb (with all gcc versions and relevant optimization settings). I'd like to see the results when {__dev_alloc_skb is externed and dev_alloc_skb remains inlined}. The results are right under your nose already... ;-) See from the list of the series introduction: http://marc.info/?l=linux-netdevm=120351526210711w=2 IMHO more interesting number (which I currently don't have) is the _remaining_ benefits of uninlining __dev_alloc_skb after dev_alloc_skb was first uninlined. -- i. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/