On Sat, Aug 06, 2016 at 08:16:27PM -0700, Andi Kleen wrote: > Alexey Dobriyan <adobri...@gmail.com> writes: > > - > > + seq_printf(m, "State:\t%s", get_task_state(p)); > > + > > + seq_puts(m, "\nTgid:\t"); > > The only different should be the format string. > > Scanning the format string really shouldn't be that expensive?!?
Surprise, it is (see my reply to Al). What seq_put_decimal_ull() did is the equivalent of seq << "foo"; seq << bar; seq << '\n'; No precisions, not widths, no padding, no upper and lowercasing. > It would be better if you could find out why that is slow and optimize > it. Then you would benefit every seq_printf user, not just this > special case. > > Perhaps it could benefit from some of the bit masking tricks to > scan the string with wider tests than a word. And then what? Parsing format string is still be there. This is first line of profile of the first function (format_decode) │ static noinline_for_stack │ int format_decode(const char *fmt, struct printf_spec *spec) │ { 10.38 │ push %rbp <=== 1.07 │ mov %rsp,%rbp 1.09 │ push %r12 4.51 │ mov %rsi,%r12 1.40 │ push %rbx 1.86 │ mov %rdi,%rbx │ sub $0x8,%rsp It is so bloated that gcc needs to be asked to not screw up with stack size.