On 05/12/2015 09:44 AM, Ingo Molnar wrote: > > * Denys Vlasenko <dvlas...@redhat.com> wrote: > >> With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline >> very small functions we expect to be inlined. In particular, >> with this config: http://busybox.net/~vda/kernel_config >> there are more than a thousand copies of tiny spinlock-related functions: > > That's an x86-64 allyesconfig AFAICS, right?
Close, but I disabled options which are clearly "heavy debugging" stuff. IOW: many developers run their work machines with lock debugging etc, but few would constantly use something which slows kernel down by a factor of 3! So, CONFIG_KASAN is off. CONFIG_STAGING is also off. And a few others I forgot. I'm using this config to see which inlines should be deinlined. For that, I need to cover all callsites of each inline. Thus, I need ~allyesconfig. The discovery that there also exists the opposite problem (wrongly *un*inlined functions) was accidental. > It's not mysterious, but an effect of -Os plus allowing GCC to do > inlining heuristics: > > CONFIG_CC_OPTIMIZE_FOR_SIZE=y > CONFIG_OPTIMIZE_INLINING=y > > Does the problem go away if you unset of these config options? With CONFIG_CC_OPTIMIZE_FOR_SIZE off, problem greatly diminishes, but is not eliminated. Testing allyesconfig would take too long, so I just took defconfig. On defconfig kernel, the following functions below 16 bytes of machine code are auto-deinlined: #Calls_ Size(hex)_______ Name____________________ 7 000000000000000b t hweight_long 5 000000000000000f t init_once 4 000000000000000d t cpumask_set_cpu 4 000000000000000b t udp_lib_close 4 0000000000000006 t udp_lib_hash 3 000000000000000a t nofill 3 0000000000000006 t sg_set_page.part.7 2 000000000000000f t udplite_sk_init 2 000000000000000f t ct_seq_next 2 000000000000000e t encode_cookie 2 000000000000000d t ktime_get_real 2 000000000000000b t spin_lock 2 000000000000000b t device_create_release 2 000000000000000b t cpu_smt_flags 2 000000000000000b t cpu_core_flags 2 0000000000000009 t default_write_file 2 0000000000000008 t __initcall_pl_driver_init6 2 0000000000000008 t __initcall_nf_defrag_init6 2 0000000000000008 t __initcall_hid_init6 2 0000000000000008 t __initcall_ch_driver_init6 2 0000000000000008 t default_read_file 2 0000000000000006 t wiphy_to_rdev.part.4 2 0000000000000006 t s_stop 2 0000000000000006 t sg_set_page.part.3 2 0000000000000006 t generic_print_tuple 2 0000000000000006 t exp_seq_stop 2 0000000000000006 t ct_seq_stop 2 0000000000000006 t ct_cpu_seq_stop In particular, one of the functions from my patches, spin_lock(), has been auto-deinlined: ffffffff8108adb0 <spin_lock>: ffffffff8108adb0: 55 push %rbp ffffffff8108adb1: 48 89 e5 mov %rsp,%rbp ffffffff8108adb4: e8 37 db 81 00 callq ffffffff818a88f0 <_raw_spin_lock> ffffffff8108adb9: 5d pop %rbp ffffffff8108adba: c3 retq > Furtermore, what is the size win on x86 defconfig with these options > set? CONFIG_OPTIMIZE_INLINING=y is in defconfig. Size difference for CC_OPTIMIZE_FOR_SIZE: text data bss dec hex filename 12335864 1746152 1081344 15163360 e75fe0 vmlinux.CC_OPTIMIZE_FOR_SIZE=y 10373764 1684200 1077248 13135212 c86d6c vmlinux.CC_OPTIMIZE_FOR_SIZE=n Decrease by about 19%. -- vda -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/