Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote: > On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote: > > > I don't understand what led Andi Kleen to also move .text.hot and > > > .text.unlikely together with .text [2], but this may have > > > been a related issue. > > > > > > [2] https://lkml.org/lkml/2015/7/19/377 > > > > The goal was just to move .hot and .unlikely all together, so that > > they are clustered and use the minimum amount of cache. On x86 doesn't > > matter where they are exactly, as long as each is together. > > If they are not explicitely listed then the linker interleaves > > them with the normal text, which defeats the purpose. > > I still don't see it, my reading of your patch is that you did > the opposite, by changing the description that puts all .text.hot > in front of .text, and all .text.unlikely after exit.text into > one that mixes them with .text. What am I missing here? No it doesn't mix .unlikely with .text, .unlikely is all in one place. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote: > On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote: > > > I don't understand what led Andi Kleen to also move .text.hot and > > > .text.unlikely together with .text [2], but this may have > > > been a related issue. > > > > > > [2] https://lkml.org/lkml/2015/7/19/377 > > > > The goal was just to move .hot and .unlikely all together, so that > > they are clustered and use the minimum amount of cache. On x86 doesn't > > matter where they are exactly, as long as each is together. > > If they are not explicitely listed then the linker interleaves > > them with the normal text, which defeats the purpose. > > I still don't see it, my reading of your patch is that you did > the opposite, by changing the description that puts all .text.hot > in front of .text, and all .text.unlikely after exit.text into > one that mixes them with .text. What am I missing here? .text.hot is actually not used, the critical part is .text.unlikely which was not listed and was interleaved before the patch. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote: > > I don't understand what led Andi Kleen to also move .text.hot and > > .text.unlikely together with .text [2], but this may have > > been a related issue. > > > > [2] https://lkml.org/lkml/2015/7/19/377 > > The goal was just to move .hot and .unlikely all together, so that > they are clustered and use the minimum amount of cache. On x86 doesn't > matter where they are exactly, as long as each is together. > If they are not explicitely listed then the linker interleaves > them with the normal text, which defeats the purpose. I still don't see it, my reading of your patch is that you did the opposite, by changing the description that puts all .text.hot in front of .text, and all .text.unlikely after exit.text into one that mixes them with .text. What am I missing here? Arnd
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Tuesday, August 9, 2016 9:20:16 AM CEST Alan Modra wrote: > On Mon, Aug 08, 2016 at 05:14:27PM +0200, Arnd Bergmann wrote: > > I have reverted that patch now, so ARM uses ".fixup" again like every > > other architecture does, and now "*(.fixup) *(.text .text.*)" works > > correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails > > the same way that I saw before: > > That is really odd. The linker isn't supposed to treat those script > snippets differently. First match for .fixup wins. Sorry for my mistake. I checked again and cannot reproduce what I thought I saw earlier. "*(.fixup) *(.text .text.*)" fails as would be expected. Arnd
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
> I don't understand what led Andi Kleen to also move .text.hot and > .text.unlikely together with .text [2], but this may have > been a related issue. The goal was just to move .hot and .unlikely all together, so that they are clustered and use the minimum amount of cache. On x86 doesn't matter where they are exactly, as long as each is together. If they are not explicitely listed then the linker interleaves them with the normal text, which defeats the purpose. -Andi
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Mon, Aug 08, 2016 at 05:14:27PM +0200, Arnd Bergmann wrote: > I have reverted that patch now, so ARM uses ".fixup" again like every > other architecture does, and now "*(.fixup) *(.text .text.*)" works > correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails > the same way that I saw before: That is really odd. The linker isn't supposed to treat those script snippets differently. First match for .fixup wins. $ cat > fixup1.s <<\EOF .global _start .text _start: .dc.a .L2 .L1: .section ".fixup","ax",%progbits .L2: .dc.a .L1 EOF $ cat > fixup2.s <<\EOF .section ".text.xyz","ax",%progbits .dc.a .L2 .L1: .section ".fixup","ax",%progbits .L2: .dc.a .L1 EOF $ cat > fixup.lnk <<\EOF SECTIONS { .text : { *(.fixup) *(.text .fixup .text.*) } } EOF $ as -o fixup1.o fixup1.s $ as -o fixup2.o fixup2.s $ ld -o fixup -T fixup.lnk -Map fixup.map fixup1.o fixup2.o $ cat fixup.map Memory Configuration Name Origin Length Attributes *default*0x 0x Linker script and memory map .text 0x 0x10 *(.fixup) .fixup 0x0x4 fixup1.o .fixup 0x00040x4 fixup2.o *(.text .fixup .text.*) .text 0x00080x4 fixup1.o 0x0008_start .text 0x000c0x0 fixup2.o .text.xyz 0x000c0x4 fixup2.o [snip] -- Alan Modra Australia Development Lab, IBM
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Monday, August 8, 2016 9:19:47 AM CEST Alan Modra wrote: > On Sun, Aug 07, 2016 at 10:26:19PM +0200, Arnd Bergmann wrote: > > On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote: > > > > > > If it can, then Nicholas' patch should be: > > > > > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) > > > *(.text .text.*) > > > > > > If you can't put .text.fixup too far away then you may as well just use > > > > > > *(.text .text.*) > > > > I tried this version: > > > > diff --git a/include/asm-generic/vmlinux.lds.h > > b/include/asm-generic/vmlinux.lds.h > > index b1f8828e9eac..fc210dacac9a 100644 > > --- a/include/asm-generic/vmlinux.lds.h > > +++ b/include/asm-generic/vmlinux.lds.h > > @@ -438,7 +438,9 @@ > > * during second ld run in second ld pass when generating System.map */ > > #define TEXT_TEXT\ > > ALIGN_FUNCTION(); \ > > - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ > > + *(.text.hot .text.hot.*)\ > > + *(.text.unlikely .text.fixup .text.unlikely.*) \ > > + *(.text .text.*)\ > > *(.ref.text)\ > > MEM_KEEP(init.text) \ > > MEM_KEEP(exit.text) \ > > > > but that failed to link an allyesconfig kernel because of references > > from .fixup to .text.*. Trying your version now: > > Well then, that proves you can't put .text.fixup too far aways from > the associated input section. > > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) > > Which means this is guaranteed to fail when you test it properly using > gcc's profiling options, in order to generate .text.hot* and/or > .text.unlikely* sections. I've investigated further and it seems that "*(.text.fixup) *(.text .text.*)" fails just because we list .text.fixup twice. The .text.fixup section was originally[1] introduced to work around the same link error that it is causing now: if we use recursive linking, merging .text and .text.fixup helps avoid the problems of sections that are >32MB before the final link. I have reverted that patch now, so ARM uses ".fixup" again like every other architecture does, and now "*(.fixup) *(.text .text.*)" works correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails the same way that I saw before: drivers/scsi/sg.o:(.fixup+0x4): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0xc): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x14): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x1c): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x24): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x2c): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x34): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x3c): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' drivers/scsi/sg.o:(.fixup+0x44): relocation truncated to fit: R_ARM_THM_JUMP24 against `.text.sg_ioctl' I don't understand what led Andi Kleen to also move .text.hot and .text.unlikely together with .text [2], but this may have been a related issue. > It seems to me the right thing to do would be to change kernel asm to > generate .text.foo.fixup for any .text.foo section. A gas feature > available with binutils-2.26 enabled by --sectname-subst might help > with implementing that. I think this what Nico wanted to use anyway to eliminate more functions from the output. Arnd [1] http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8321 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8322 [2] https://lkml.org/lkml/2015/7/19/377
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Mon, Aug 08, 2016 at 01:29:03PM +1000, Nicholas Piggin wrote: > On Sat, 6 Aug 2016 22:14:23 +0200 > Sam Ravnborgwrote: > > > On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > > > Introduce LINKER_DCE option for architectures to select if they want > > > to build with -ffunction-sections, -fdata-sections, and link with > > > --gc-sections. > > > > Can you please try to come up with a less cryptic name. > > "DCE" may make sense for you today. > > Bot the naive reader will benefit from the longer and > > more explcit form. > > Yes that's a good idea. The name sucks. > > We don't seem to consistently have a prefix for build configuration > options, but we could start? KBUILD_LD_DEAD_CODE_DATA_ELIMINATION? For cc related options we sort of have: KCONFIG_CC IIRC. So for LD related options we could use the shorter CONFIG_LD_ prefix. Sam
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Mon, 8 Aug 2016 00:12:37 -0400 (EDT) Nicolas Pitrewrote: > On Mon, 8 Aug 2016, Nicholas Piggin wrote: > > > On Sun, 7 Aug 2016 01:33:45 -0400 (EDT) > > Nicolas Pitre wrote: > > > > > On Fri, 5 Aug 2016, Nicholas Piggin wrote: > > > > > > > Introduce LINKER_DCE option for architectures to select if they want > > > > to build with -ffunction-sections, -fdata-sections, and link with > > > > --gc-sections. It requires some work (documented) to ensure all > > > > unreferenced entrypoints are live, and requires toolchain and > > > > build verification, so it is made a per-arch option for now. > > > > > > > > On a random powerpc64le build, this yelds a significant size saving, > > > > it boots and runs fine, but there is a lot I haven't tested as yet, > > > > so these savings may be reduced if there are bugs in the link. > > > > > > > > text databssdec filename > > > > 11169741 11807441923176 14273661 vmlinux > > > > 10445269 10041271919707 13369103 vmlinux.dce > > > > > > > > ~700K text, ~170K data, 6% removed from kernel image size. > > > > > > > > Signed-off-by: Nicholas Piggin > > > > > > I played with that too. However this needs distinct sections for > > > exception tables and the like otherwise the backward references from the > > > final exception table to those functions responsible for those exception > > > entries has the effect of pulling in all those functions even if their > > > entry point is never referenced, making --gc-sections less effective. > > > I managed to fix this only with a change to gas (accepted upstream). > > > > > > But once that is solved, you then have the missing forward reference > > > problem i.e. nothing actually references those individual exception > > > entry sections and ld happily drops them all. Having a KEEP() on each of > > > them is unworkable and defeats the purpose anyway. That requires a > > > dummy reloc to trick ld into pulling in those sections when the parent > > > section is also pulled in. > > > > Right, although we don't *need* those things just for enabling > > --gc-sections, do we? It may not be 100% optimal, but it's enough > > to avoid the regression when switching to --whole-archive build > > option. > > Oh absolutely. > > > Your results are impressive, and I don't want to stand in the way of > > either LTO or improving accuracy of --gc-sections. But both are things > > that can be built on top of this patch, I think. > > Indeed. Those patches are certainly welcome. They represent half of the > job already. I just wanted to provide some insight about the whole > picture in case someone else notices those flaws I have identified. Okay thanks, I appreciate you taking a look. I wanted to be sure I wasn't missing some bug here. Smaller kernel is nice for large systems because it means smaller icache/dcache footprint and fewer branch trampolines, so I'm always happy to see that effort. I will certainly help test LTO or some of these other gc-sections improvements on powerpc. Thanks, Nick
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Mon, 8 Aug 2016, Nicholas Piggin wrote: > On Sun, 7 Aug 2016 01:33:45 -0400 (EDT) > Nicolas Pitrewrote: > > > On Fri, 5 Aug 2016, Nicholas Piggin wrote: > > > > > Introduce LINKER_DCE option for architectures to select if they want > > > to build with -ffunction-sections, -fdata-sections, and link with > > > --gc-sections. It requires some work (documented) to ensure all > > > unreferenced entrypoints are live, and requires toolchain and > > > build verification, so it is made a per-arch option for now. > > > > > > On a random powerpc64le build, this yelds a significant size saving, > > > it boots and runs fine, but there is a lot I haven't tested as yet, > > > so these savings may be reduced if there are bugs in the link. > > > > > > text databssdec filename > > > 11169741 11807441923176 14273661 vmlinux > > > 10445269 10041271919707 13369103 vmlinux.dce > > > > > > ~700K text, ~170K data, 6% removed from kernel image size. > > > > > > Signed-off-by: Nicholas Piggin > > > > I played with that too. However this needs distinct sections for > > exception tables and the like otherwise the backward references from the > > final exception table to those functions responsible for those exception > > entries has the effect of pulling in all those functions even if their > > entry point is never referenced, making --gc-sections less effective. > > I managed to fix this only with a change to gas (accepted upstream). > > > > But once that is solved, you then have the missing forward reference > > problem i.e. nothing actually references those individual exception > > entry sections and ld happily drops them all. Having a KEEP() on each of > > them is unworkable and defeats the purpose anyway. That requires a > > dummy reloc to trick ld into pulling in those sections when the parent > > section is also pulled in. > > Right, although we don't *need* those things just for enabling > --gc-sections, do we? It may not be 100% optimal, but it's enough > to avoid the regression when switching to --whole-archive build > option. Oh absolutely. > Your results are impressive, and I don't want to stand in the way of > either LTO or improving accuracy of --gc-sections. But both are things > that can be built on top of this patch, I think. Indeed. Those patches are certainly welcome. They represent half of the job already. I just wanted to provide some insight about the whole picture in case someone else notices those flaws I have identified. Nicolas
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Sun, 7 Aug 2016 01:33:45 -0400 (EDT) Nicolas Pitrewrote: > On Fri, 5 Aug 2016, Nicholas Piggin wrote: > > > Introduce LINKER_DCE option for architectures to select if they want > > to build with -ffunction-sections, -fdata-sections, and link with > > --gc-sections. It requires some work (documented) to ensure all > > unreferenced entrypoints are live, and requires toolchain and > > build verification, so it is made a per-arch option for now. > > > > On a random powerpc64le build, this yelds a significant size saving, > > it boots and runs fine, but there is a lot I haven't tested as yet, > > so these savings may be reduced if there are bugs in the link. > > > > text databssdec filename > > 11169741 11807441923176 14273661 vmlinux > > 10445269 10041271919707 13369103 vmlinux.dce > > > > ~700K text, ~170K data, 6% removed from kernel image size. > > > > Signed-off-by: Nicholas Piggin > > I played with that too. However this needs distinct sections for > exception tables and the like otherwise the backward references from the > final exception table to those functions responsible for those exception > entries has the effect of pulling in all those functions even if their > entry point is never referenced, making --gc-sections less effective. > I managed to fix this only with a change to gas (accepted upstream). > > But once that is solved, you then have the missing forward reference > problem i.e. nothing actually references those individual exception > entry sections and ld happily drops them all. Having a KEEP() on each of > them is unworkable and defeats the purpose anyway. That requires a > dummy reloc to trick ld into pulling in those sections when the parent > section is also pulled in. Right, although we don't *need* those things just for enabling --gc-sections, do we? It may not be 100% optimal, but it's enough to avoid the regression when switching to --whole-archive build option. > Please see attached a subset of the slides I presented at ELC and Linaro > Connect last year to illustrate those issues. > > Also attached a sample patch partially implementing those changes. > > In short I'm very glad to see that this might steer interest across > multiple architectures. I felt like this was becoming much more > intrusive than I expected and that maybe LTO was a better bet after all. > But LTO has its evils too and I'm willing to look at gc-sections again > if there is interest from others as well. Your results are impressive, and I don't want to stand in the way of either LTO or improving accuracy of --gc-sections. But both are things that can be built on top of this patch, I think. We don't need to do the entire intrusive changes all at once. Thanks, Nick
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Sat, 6 Aug 2016 22:14:23 +0200 Sam Ravnborgwrote: > On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > > Introduce LINKER_DCE option for architectures to select if they want > > to build with -ffunction-sections, -fdata-sections, and link with > > --gc-sections. > > Can you please try to come up with a less cryptic name. > "DCE" may make sense for you today. > Bot the naive reader will benefit from the longer and > more explcit form. Yes that's a good idea. The name sucks. We don't seem to consistently have a prefix for build configuration options, but we could start? KBUILD_LD_DEAD_CODE_DATA_ELIMINATION? Thanks, Nick
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Sun, Aug 07, 2016 at 10:26:19PM +0200, Arnd Bergmann wrote: > On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote: > > > > If it can, then Nicholas' patch should be: > > > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text > > .text.*) > > > > If you can't put .text.fixup too far away then you may as well just use > > > > *(.text .text.*) > > I tried this version: > > diff --git a/include/asm-generic/vmlinux.lds.h > b/include/asm-generic/vmlinux.lds.h > index b1f8828e9eac..fc210dacac9a 100644 > --- a/include/asm-generic/vmlinux.lds.h > +++ b/include/asm-generic/vmlinux.lds.h > @@ -438,7 +438,9 @@ > * during second ld run in second ld pass when generating System.map */ > #define TEXT_TEXT\ > ALIGN_FUNCTION(); \ > - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ > + *(.text.hot .text.hot.*)\ > + *(.text.unlikely .text.fixup .text.unlikely.*) \ > + *(.text .text.*)\ > *(.ref.text)\ > MEM_KEEP(init.text) \ > MEM_KEEP(exit.text) \ > > but that failed to link an allyesconfig kernel because of references > from .fixup to .text.*. Trying your version now: Well then, that proves you can't put .text.fixup too far aways from the associated input section. > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) Which means this is guaranteed to fail when you test it properly using gcc's profiling options, in order to generate .text.hot* and/or .text.unlikely* sections. It seems to me the right thing to do would be to change kernel asm to generate .text.foo.fixup for any .text.foo section. A gas feature available with binutils-2.26 enabled by --sectname-subst might help with implementing that. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote: > > If it can, then Nicholas' patch should be: > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text > .text.*) > > If you can't put .text.fixup too far away then you may as well just use > > *(.text .text.*) I tried this version: diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index b1f8828e9eac..fc210dacac9a 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -438,7 +438,9 @@ * during second ld run in second ld pass when generating System.map */ #define TEXT_TEXT \ ALIGN_FUNCTION(); \ - *(.text.hot .text .text.fixup .text.unlikely .text.*) \ + *(.text.hot .text.hot.*)\ + *(.text.unlikely .text.fixup .text.unlikely.*) \ + *(.text .text.*)\ *(.ref.text)\ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \ but that failed to link an allyesconfig kernel because of references from .fixup to .text.*. Trying your version now: *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) Arnd
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On So, Aug 07 2016, Alan Modrawrote: > At the risk of being told you (kernel people) have already considerd > this I thought I should mention that the above isn't ideal. (Nor is > gcc's choice of .text.hot for hot sections, which clashes with > --function-sections for a function called "hot" but that's another > story.) Or --function-sections should use a unique prefix, like .text.functions.* Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > #define TEXT_TEXT\ > ALIGN_FUNCTION(); \ > - *(.text.hot .text .text.fixup .text.unlikely) \ > + *(.text.hot .text .text.fixup .text.unlikely .text.*) \ > *(.ref.text)\ > MEM_KEEP(init.text) \ > MEM_KEEP(exit.text) \ At the risk of being told you (kernel people) have already considerd this I thought I should mention that the above isn't ideal. (Nor is gcc's choice of .text.hot for hot sections, which clashes with --function-sections for a function called "hot" but that's another story.) You'd really like all the hot sections and cold sections to be together, for better cache locality. So the line ought to have been *(.text.hot) *(.text) *(.text.fixup) *(.text.unlikely) That would put all .text.hot sections together. Similarly for .text.unlikely. The trap of course is that this only works if .text.fixup from one object file can be placed relatively far away from .text in the same object file. If it can, then Nicholas' patch should be: *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*) If you can't put .text.fixup too far away then you may as well just use *(.text .text.*) -- Alan Modra Australia Development Lab, IBM
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Fri, 5 Aug 2016, Nicholas Piggin wrote: > Introduce LINKER_DCE option for architectures to select if they want > to build with -ffunction-sections, -fdata-sections, and link with > --gc-sections. It requires some work (documented) to ensure all > unreferenced entrypoints are live, and requires toolchain and > build verification, so it is made a per-arch option for now. > > On a random powerpc64le build, this yelds a significant size saving, > it boots and runs fine, but there is a lot I haven't tested as yet, > so these savings may be reduced if there are bugs in the link. > > text databssdec filename > 11169741 11807441923176 14273661 vmlinux > 10445269 10041271919707 13369103 vmlinux.dce > > ~700K text, ~170K data, 6% removed from kernel image size. > > Signed-off-by: Nicholas PigginI played with that too. However this needs distinct sections for exception tables and the like otherwise the backward references from the final exception table to those functions responsible for those exception entries has the effect of pulling in all those functions even if their entry point is never referenced, making --gc-sections less effective. I managed to fix this only with a change to gas (accepted upstream). But once that is solved, you then have the missing forward reference problem i.e. nothing actually references those individual exception entry sections and ld happily drops them all. Having a KEEP() on each of them is unworkable and defeats the purpose anyway. That requires a dummy reloc to trick ld into pulling in those sections when the parent section is also pulled in. Please see attached a subset of the slides I presented at ELC and Linaro Connect last year to illustrate those issues. Also attached a sample patch partially implementing those changes. In short I'm very glad to see that this might steer interest across multiple architectures. I felt like this was becoming much more intrusive than I expected and that maybe LTO was a better bet after all. But LTO has its evils too and I'm willing to look at gc-sections again if there is interest from others as well. Nicolas gc_slides.html.gz Description: application/gzip commit 1d7ec46257dc546bc7b87439788514fc4650a2b1 Author: Nicolas Pitre Date: Mon Oct 26 10:16:14 2015 -0400 ARM: pushlinkedsection introduction Signed-off-by: Nicolas Pitre diff --git a/Makefile b/Makefile index d5b3739119..75541414cb 100644 --- a/Makefile +++ b/Makefile @@ -775,6 +775,10 @@ ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y) KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO endif +# Named subsections +KBUILD_AFLAGS += -Wa,--sectname-subst +KBUILD_CFLAGS += -Wa,--sectname-subst + include scripts/Makefile.kasan include scripts/Makefile.extrawarn diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index b2bc8e1147..70161c9bfa 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -88,6 +88,17 @@ #endif /* + * Special .pushsection wrapper with explicit dependency to prevent + * garbage collection of the specified section. This is needed when no + * explicit symbol references are made to this section. + */ + .macro .pushlinkedsection name:vararg + .reloc . - 1, R_ARM_NONE, 9909f + .pushsection \name +9909: + .endm + +/* * Enable and disable interrupts */ #if __LINUX_ARM_ARCH__ >= 6 @@ -239,7 +250,7 @@ #define USER(x...) \ : x; \ - .pushsection __ex_table,"a";\ + .pushlinkedsection __ex_table.%S,"a"; \ .align 3; \ .long b,9001f;\ .popsection @@ -253,7 +264,7 @@ * ALT_SMP( W(instr) ... ) */ #define ALT_UP(instr...) \ - .pushsection ".alt.smp.init", "a" ;\ + .pushlinkedsection ".alt.smp.init.%S", "a" ;\ .long 9998b ;\ 9997: instr ;\ .if . - 9997b == 2 ;\ @@ -265,7 +276,7 @@ .popsection #define ALT_UP_B(label)\ .equup_b_offset, label - 9998b ;\ - .pushsection ".alt.smp.init", "a" ;\ + .pushlinkedsection ".alt.smp.init.%S", "a" ;\ .long 9998b ;\ W(b). + up_b_offset ;\ .popsection @@ -375,7 +386,7 @@ THUMB( orr \reg , \reg , #PSR_T_BIT) .error "Unsupported inc macro argument" .endif - .pushsection __ex_table,"a" +
Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination
On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote: > Introduce LINKER_DCE option for architectures to select if they want > to build with -ffunction-sections, -fdata-sections, and link with > --gc-sections. Can you please try to come up with a less cryptic name. "DCE" may make sense for you today. Bot the naive reader will benefit from the longer and more explcit form. It requires some work (documented) to ensure all > unreferenced entrypoints are live, and requires toolchain and > build verification, so it is made a per-arch option for now. > > On a random powerpc64le build, this yelds a significant size saving, > it boots and runs fine, but there is a lot I haven't tested as yet, > so these savings may be reduced if there are bugs in the link. > > text databssdec filename > 11169741 11807441923176 14273661 vmlinux > 10445269 10041271919707 13369103 vmlinux.dce > > ~700K text, ~170K data, 6% removed from kernel image size. > > Signed-off-by: Nicholas Piggin> --- > Makefile | 10 > arch/Kconfig | 13 ++ > include/asm-generic/vmlinux.lds.h | 52 > ++- > include/linux/compiler.h | 18 ++ > include/linux/export.h| 30 +++--- > include/linux/init.h | 38 ++-- > init/Makefile | 2 ++ > 7 files changed, 100 insertions(+), 63 deletions(-) > > diff --git a/Makefile b/Makefile > index b409076..d5ef31a 100644 > --- a/Makefile > +++ b/Makefile > @@ -618,6 +618,11 @@ include arch/$(SRCARCH)/Makefile > > KBUILD_CFLAGS+= $(call cc-option,-fno-delete-null-pointer-checks,) > > +ifdef CONFIG_LINKER_DCE > +KBUILD_CFLAGS+= $(call cc-option,-ffunction-sections,) > +KBUILD_CFLAGS+= $(call cc-option,-fdata-sections,) > +endif > + > ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE > KBUILD_CFLAGS+= -Os $(call cc-disable-warning,maybe-uninitialized,) > else > @@ -819,6 +824,11 @@ LDFLAGS_BUILD_ID = $(patsubst -Wl$(comma)%,%,\ > KBUILD_LDFLAGS_MODULE += $(LDFLAGS_BUILD_ID) > LDFLAGS_vmlinux += $(LDFLAGS_BUILD_ID) > > +ifdef CONFIG_LINKER_DCE > +# LDFLAGS_MODULE += $(call ld-option, --gc-sections,) > +LDFLAGS_vmlinux += $(call ld-option, --gc-sections,) > +endif Something you missed to clean up Sam