Last week I wrote a long post with suggestions on how to improve memory allocation and utilization and seeking some feedback. This time I would like to briefly update those interested on how we could drastically reduce kernel size (loader-stripped.elf) by 3MB (~ 33%).
Month ago there was a post about ideas to reduce kernel size and some findings I discovered using bloaty. One of the things Nadav noted was large (and unexplained) size of .rodata section - 2.47MB. Since then I spent some time digging into it and trying to find in our code anything that defines some large static data (strings, numbers, etc) that would go into .rodata. No success until I looked closer at this lines in makefile and noticed* --whole-archive* option: $(out)/loader.elf: $(stage1_targets) arch/$(arch)/loader.ld $(out)/bootfs.o $(call quiet, $(LD) -o $@ --defsym=OSV_KERNEL_BASE=$(kernel_base) \ -Bdynamic --export-dynamic --eh-frame-hdr --enable-new-dtags \ $(^:%.ld=-T %.ld) \ --whole-archive \ $(libstdc++.a) $(libgcc.a) $(libgcc_eh.a) \ $(boost-libs) \ --no-whole-archive, \ LINK loader.elf) I turns out that *--whole-archive* option forces linker to link everything (in reality I am guessing only sections we have in our linker script loader.ld) from those 5 libraries whether our kernel code uses this or not. Once I disabled this option the kernel size dropped by 3MB. And many images/apps I tested (native-example, java, python) work just fine but others fail with missing symbol errors. Here is some statistics about rodata section with whole-archive enabled and disabled: .... 26.0% 2.47Mi .rodata 2.47Mi 27.7% 93.3% 2.30Mi [section .rodata] 2.30Mi 93.3% 2.7% 67.3Ki musl/src/locale/iconv.c 67.3Ki 2.7% 1.6% 41.1Ki [407 Others] 41.1Ki 1.6% 0.4% 10.1Ki bsd/sys/crypto/rijndael/rijndael-alg-fst.c 10.1Ki 0.4% 0.3% 6.58Ki libc/crypt/encrypt.c 6.58Ki 0.3% 0.3% 6.58Ki musl/src/crypt/crypt_des.c 6.58Ki 0.3% 0.2% 4.34Ki musl/src/crypt/crypt_blowfish.c 4.34Ki 0.2% 0.2% 4.00Ki musl/src/math/exp2.c 4.00Ki 0.2% 0.1% 3.09Ki musl/src/ctype/iswpunct.c 3.09Ki 0.1% 0.1% 2.91Ki musl/src/ctype/iswalpha.c 2.91Ki 0.1% 0.1% 2.91Ki musl/src/ctype/wcwidth.c 2.91Ki 0.1% 0.1% 2.77Ki musl/src/math/__rem_pio2_large.c 2.77Ki 0.1% 0.1% 2.43Ki bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 2.43Ki 0.1% 0.1% 2.16Ki bsd/sys/cddl/contrib/opensolaris/uts/common/zmod/inflate.c 2.16Ki 0.1% 0.1% 2.02Ki external/x64/acpica/source/components/parser/psopcode.c 2.02Ki 0.1% 0.1% 2.00Ki bsd/sys/cddl/contrib/opensolaris/uts/common/zmod/opensolaris_crc32.c 2.00Ki 0.1% 0.1% 2.00Ki musl/src/math/exp2l.c 2.00Ki 0.1% 0.1% 1.85Ki musl/src/errno/strerror.c 1.85Ki 0.1% 0.1% 1.75Ki external/x64/acpica/source/components/namespace/nspredef.c 1.75Ki 0.1% 0.1% 1.51Ki musl/src/ctype/__ctype_tolower_loc.c 1.51Ki 0.1% 0.1% 1.51Ki musl/src/ctype/__ctype_toupper_loc.c 1.51Ki 0.1% .... 7.8% 531Ki .rodata 531Ki 8.6% 68.3% 363Ki [section .rodata] 363Ki 68.3% 12.6% 67.3Ki musl/src/locale/iconv.c 67.3Ki 12.6% 7.7% 41.1Ki [407 Others] 41.1Ki 7.7% 1.9% 10.1Ki bsd/sys/crypto/rijndael/rijndael-alg-fst.c 10.1Ki 1.9% 1.2% 6.58Ki libc/crypt/encrypt.c 6.58Ki 1.2% 1.2% 6.58Ki musl/src/crypt/crypt_des.c 6.58Ki 1.2% 0.8% 4.34Ki musl/src/crypt/crypt_blowfish.c 4.34Ki 0.8% 0.8% 4.00Ki musl/src/math/exp2.c 4.00Ki 0.8% 0.6% 3.09Ki musl/src/ctype/iswpunct.c 3.09Ki 0.6% 0.5% 2.91Ki musl/src/ctype/iswalpha.c 2.91Ki 0.5% 0.5% 2.91Ki musl/src/ctype/wcwidth.c 2.91Ki 0.5% 0.5% 2.77Ki musl/src/math/__rem_pio2_large.c 2.77Ki 0.5% 0.5% 2.43Ki bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 2.43Ki 0.5% 0.4% 2.16Ki bsd/sys/cddl/contrib/opensolaris/uts/common/zmod/inflate.c 2.16Ki 0.4% 0.4% 2.02Ki external/x64/acpica/source/components/parser/psopcode.c 2.02Ki 0.4% 0.4% 2.00Ki bsd/sys/cddl/contrib/opensolaris/uts/common/zmod/opensolaris_crc32.c 2.00Ki 0.4% 0.4% 2.00Ki musl/src/math/exp2l.c 2.00Ki 0.4% 0.3% 1.85Ki musl/src/errno/strerror.c 1.85Ki 0.3% 0.3% 1.75Ki external/x64/acpica/source/components/namespace/nspredef.c 1.75Ki 0.3% 0.3% 1.51Ki musl/src/ctype/__ctype_tolower_loc.c 1.51Ki 0.3% 0.3% 1.51Ki musl/src/ctype/__ctype_toupper_loc.c 1.51Ki 0.3% I do not understand how bloaty works but I think this line is key in the interpretation: 93.3% 2.30Mi [section .rodata] 2.30Mi 93.3% vs 68.3% 363Ki [section .rodata] 363Ki 68.3% which I guess specifies how much .rodata comes from these 5 libraries we link against. Some bloaty statistics about the 5 libraries we use: ../bloaty/bloaty -d sections \ /usr/lib/gcc/x86_64-linux-gnu/5/libstdc++.a \ /usr/lib/gcc/x86_64-linux-gnu/5/libgcc.a \ /usr/lib/gcc/x86_64-linux-gnu/5/libgcc_eh.a \ /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu//libboost_program_options.a \ /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu//libboost_system.a VM SIZE FILE SIZE -------------- -------------- 50.3% 1.94Mi .rodata 1.94Mi 20.4% 25.6% 1013Ki [11335 Others] 1.71Mi 18.0% 0.0% 0 [ELF Headers] 1.65Mi 17.4% 0.0% 0 .shstrtab 934Ki 9.6% 0.0% 0 .symtab 798Ki 8.2% 16.2% 638Ki .text 638Ki 6.6% 0.0% 0 .strtab 632Ki 6.5% 0.0% 0 [AR Symbol Table] 466Ki 4.8% 5.8% 229Ki .eh_frame 229Ki 2.4% 0.0% 0 .rela.text 190Ki 2.0% 0.0% 0 .rela.eh_frame 158Ki 1.6% 0.0% 0 .group 55.7Ki 0.6% 0.0% 0 [Unmapped] 44.5Ki 0.5% 0.9% 35.5Ki .data 35.5Ki 0.4% 0.0% 0 .rela.rodata 28.6Ki 0.3% 0.0% 0 [AR Headers] 25.5Ki 0.3% 0.4% 14.1Ki .gcc_except_table 14.1Ki 0.1% 0.3% 13.1Ki .text._ZN5boost15program_options17parse_config_fileIcEENS0_20basic_parsed_option 13.1Ki 0.1% 0.3% 11.9Ki .text._ZN5boost15program_options17parse_config_fileIwEENS0_20basic_parsed_option 11.9Ki 0.1% 0.0% 0 .rela.text._ZNSt6locale5_ImplC2Em 11.4Ki 0.1% 0.3% 10.3Ki .rodata.str1.8 10.3Ki 0.1% 100.0% 3.86Mi TOTAL 9.50Mi 100.0% the most striking of those is libgcc.a that has almost 2MB (!!!) of rodata: ../bloaty/bloaty -d sections /usr/lib/gcc/x86_64-linux-gnu/5/libgcc.a VM SIZE FILE SIZE -------------- -------------- 78.5% 1.94Mi .rodata 1.94Mi 67.0% 18.8% 473Ki .text 473Ki 16.0% 0.0% 0 [ELF Headers] 179Ki 6.1% 0.0% 0 .rela.text 101Ki 3.4% 0.0% 0 .symtab 67.8Ki 2.3% 1.4% 35.5Ki .data 35.5Ki 1.2% 1.2% 30.9Ki .eh_frame 30.9Ki 1.0% 0.0% 0 .strtab 22.7Ki 0.8% 0.0% 0 .shstrtab 20.8Ki 0.7% 0.0% 0 [AR Headers] 13.5Ki 0.5% 0.0% 0 [AR Symbol Table] 11.8Ki 0.4% 0.0% 0 .rela.eh_frame 11.2Ki 0.4% 0.0% 0 .rela.rodata 2.60Ki 0.1% 0.0% 0 [Unmapped] 2.19Ki 0.1% 0.0% 1.16Ki .text.startup 1.16Ki 0.0% 0.0% 0 .rela.text.startup 792 0.0% 0.0% 368 .rodata.cst16 368 0.0% 0.0% 243 [11 Others] 347 0.0% 0.0% 248 .rodata.cst8 248 0.0% 0.0% 208 .tbss 0 0.0% 0.0% 168 .bss 0 0.0% 100.0% 2.46Mi TOTAL 2.89Mi 100.0% Here are the examples of failure when whole-archive was disabled: 1) golang /go.so: failed looking up symbol _ZNSaIcEC1Ev (std::allocator<char>::allocator()) [backtrace] 0x0000000000343d29 <elf::object::symbol(unsigned int, bool)+825> 0x0000000000343e7b <elf::object::resolve_pltgot(unsigned int)+139> 0x0000000000344065 <elf_resolve_pltgot+69> 0x000000000038b16f <???+3715439> 0x00002000001ffe4f <???+2096719> 0x00000000004198ec <osv::application::run_main()+60> 0x000000000020c298 <osv::application::main()+152> 0x0000000000419a98 <???+4299416> 0x000000000044ad85 <???+4500869> 0x00000000003e90d6 <thread_main_c+38> 0x000000000038c4b2 <???+3720370> 2) tst-async.so TEST tst-async.so OSv v0.51.0-37-g186779b eth0: 192.168.122.15 /usr/lib/libboost_unit_test_framework.so.1.55.0: failed looking up symbol _ZTISt19basic_ostringstreamIcSt11char_traitsIcESaIcEE (typeinfo for std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >) [backtrace] 0x0000000000343d29 <elf::object::symbol(unsigned int, bool)+825> 0x000000000038ff06 <elf::object::arch_relocate_rela(unsigned int, unsigned int, void*, long)+166> 0x000000000033eb54 <elf::object::relocate_rela()+148> 0x00000000003416e7 <elf::object::relocate()+199> 0x0000000000345162 <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1602> 0x00000000003443b8 <elf::object::load_needed(std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+520> 0x0000000000345156 <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1590> 0x00000000003459aa <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+330> 0x0000000000418e81 <osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<0x00000000004195c7 <osv::application::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>0x000000000041982a <osv::application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+90> 0x00000000002131d9 <do_main_thread(void*)+2601> 0x000000000044ad85 <???+4500869> 0x00000000003e90d6 <thread_main_c+38> 0x000000000038c4b2 <???+3720370> Test tst-async.so FAILED I wonder if these are simply "missing symbol" scenarios that could be addressed by somehow forcing to link those into loader.elf. Relatedly I found this commit from 5 years ago by Avi that introduced --whole-archive option for good reasons - https://github.com/cloudius-systems/osv/commit/c9e61d4a45d88d8c8e79cd52fbcd38b91b291d5e. But I wonder if there is a better way to not use whole-archive and solve this problem in a different way (btw huge rodata is in libgcc.a not libstdc++.a). I found this article but not sure if it provides solution to different problem by using -u<symbol> workaround - http://www.lysium.de/blog/index.php?/archives/222-Lost-static-objects-in-static-libraries-with-GNU-linker-ld.html. In either case I was to run it by gcc/linker gurus on this mailing list to see if they can think of other ways we can mitigate possible problems (and what these problems might be) of not using --whole-archive. Certainly it would be nice to make kernel smaller by 3MB by simply removing 1 line from Makefile :-) I am also attaching 2 full bloaty reports as they also show statistics for other sections when we disable whole-archive. Finally I found this interesting presentation about ways to reduce code size - https://elinux.org/images/2/2d/ELC2010-gc-sections_Denys_Vlasenko.pdf Regards, Waldek -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
bloaty_no_whole_archive
Description: Binary data
bloaty_whole_archive
Description: Binary data