On 2015/4/24 21:49, Yunlong Song wrote:
> On 2015/4/24 21:31, Yunlong Song wrote:
>> [Profiling Background]
>> Now we are profiling the performance of ext4 and f2fs on an eMMC card with 
>> iozone,
>> we find a case that ext4 is better than f2fs in random write under the test 
>> of
>> "iozone -s 262144 -r 64 -i 0 -i 2". We want to analyze the I/O delay of the 
>> two
>> file systems. We have got a conclusion that 1% of sys_write takes up 60% 
>> time of
>> the overall sys_write (262144/64=4096). We want to find out the call stack 
>> during
>> this specific 1% sys_write. Our idea is to record the stack in a certain 
>> time period
>> and since the specific 1% case takes up 60% time, the total number of 
>> records of its
>> stack should also takes up 60% of the total records, then we can recognize 
>> those stacks
>> and figure out what the I/O stack of f2fs is doing in the 1% case.
>>
>> [Profiling Problem]
>>
>> Although perf can record the events (with call stack) of a specified pid, 
>> e.g. using
>> "perf record -g iozone -s 262144 -r 64 -i 0 -i 2". But we find iozone is 
>> interrupted
>> and the CPU is scheduled to other process. As a result, perf will not record 
>> any events
>> of iozone until iozone's context is recovered and the CPU is scheduled to 
>> continue
>> processing the sys_write of iozone. 
> 
>> This obeys our initial idea which is described in [Profiling Background],
> 
> This "disobeys" our initial idea which is described in [Profiling Background],
> 
>  since we cannot recognize the call stacks of the specific 1% case
>> by using the ratio of the record number.
>>
>> [Alternative Solution without Perf]
>> We can use /proc/#pid/stack to record the stack in a certain period (e.g. 
>> 1ms) of iozone,
>> no matter whether iozone is interrupted or not. However, we have not taken a 
>> deep sight
>> into this, since we want to use perf to do this kind of thing.
>>
>> [Question about Perf]
>> So we have a question that "How does perf still record the stack of a 
>> specified pid even
>> when that process is interrupted and CPU is scheduled to other process?"
>>
> 
> 

Here is another example:

$ cat test.c
void func_loop(void)
{
        int i, sum = 0;
        for (i = 0; i < 100000; i++)
                sum += 1;
}

void func_sleep(void)
{
        sleep(1);
}

int main(void)
{
        while (1) {
                func_loop();
                func_sleep();
        }
        return 0;
}

$ gcc test.c -o test -O0
$ ./test
$ ps aux | grep test
Yunlong+ 133798  0.0  0.0   4056   348 pts/7    S+   21:52   0:00 ./test
$ perf record -g -p 133798 sleep 5
$ perf report -g --stdio
# To display the perf.data header info, please use --header/--header-only 
options.
#
# Samples: 16  of event 'cycles'
# Event count (approx.): 2439331
#
# Children      Self  Command  Shared Object      Symbol
# ........  ........  .......  .................  
...............................
#
    79.46%    76.48%  test     test               [.] func_loop
               |
               ---func_loop
                  |
                  |--97.07%-- main
                  |          __libc_start_main
                   --2.93%-- [...]

    77.14%     0.00%  test     libc-2.19.so       [.] __libc_start_main
               |
               ---__libc_start_main

    77.14%     0.00%  test     test               [.] main
               |
               ---main
                  __libc_start_main

    20.54%     0.00%  test     [unknown]          [.] 0000000000000000
               |
               ---0

    20.54%     0.00%  test     libc-2.19.so       [.] __GI___libc_nanosleep
               |
               ---__GI___libc_nanosleep
                  0

    20.54%     0.00%  test     [unknown]          [k] 0x00007f2fa07ef6b0
               |
               ---0x7f2fa07ef6b0
                  __GI___libc_nanosleep
                  0

    20.54%     0.00%  test     [kernel.kallsyms]  [k] system_call_fastpath
               |
               ---system_call_fastpath
                  0x7f2fa07ef6b0
                  __GI___libc_nanosleep

As shown above, the func_loop is regarded as the main operation and its call 
stack
is showed on the top of perf report. However, we know that func_sleep is really 
the
main operation from the view of time duration indeed. And if we want to its 
actual
call stack in the real time, we can see the result as following:

$ while true ; do cat /proc/133798/stack  ; sleep 0.1 ; done
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8107aed3>] hrtimer_nanosleep+0xa3/0x160
[<ffffffff8107afed>] SyS_nanosleep+0x5d/0x80
[<ffffffff81519329>] system_call_fastpath+0x16/0x1b
[<00007f2fa07ef6b0>] 0x7f2fa07ef6b0
[<ffffffffffffffff>] 0xffffffffffffffff

As shown above, func_sleep is regarded as the main operation from the view of 
time and
its call stack is printed in real time.

-- 
Thanks,
Yunlong Song

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to