Re: [PATCH] perf record: Delete file if a failure occurs writing the perf data file

David Ahern Mon, 11 Nov 2013 06:45:51 -0800

On 11/11/13, 2:37 AM, Ingo Molnar wrote:


* David Ahern <[email protected]> wrote:

If perf fails to write data to the data file (e.g., ENOSPC error) it fails
with the message:
   failed to write perf data, error: No space left on device

and stops — killing the workload too. The file is an unknown state.
Trying to read it (e.g., perf report) fails with a SIGBUS error.


Ouch - guys please first investiage that SIGBUS, we should not behave
unexpectedly on _any_ (read: random) perf.data file contents. The SIGBUS
likely suggests that the parsing isn't robust enough.


I think we know why the SIGBUS is happening. From 'man mmap':


From man mmap:
       SIGBUS Attempted access to a portion of the buffer that
       does not correspond  to  the  file (for  example, beyond
       the end of the file, ...

With regards to perf-record, on a write() failure the header is notupdated. From a recent change we try to proceed even though the datasize is 0 - parsing the events we can. We finally hit upon an event thatis only partially in the file (eg., header, but no data for event).Trying to read the event data leads to the SIGBUS:


Running perf-report in gdb:

WARNING: The /tmp/mnt/perf.data file's data size field is 0 which isunexpected.

Was the 'perf record' command properly terminated?


Program received signal SIGBUS, Bus error.

perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,data=0x7fffffffd260)

    at util/evsel.c:1242
1242            u16 max_size = event->header.size;
(gdb) bt

#0 perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,data=0x7fffffffd260)

    at util/evsel.c:1242

#1 0x000000000047c9ce in flush_sample_queue (s=0x94e2b0,tool=0x7fffffffde80)

    at util/session.c:542
#2  0x000000000047e2d4 in __perf_session__process_events (session=0x94e2b0,

data_offset=<optimized out>, data_size=<optimized out>,file_size=1048576, tool=0x7fffffffde80)

    at util/session.c:1388

#3 0x000000000042993c in __cmd_report (rep=0x7fffffffde80) atbuiltin-report.c:509#4 cmd_report (argc=0, argv=0x7fffffffe370, prefix=<optimized out>) atbuiltin-report.c:967#5 0x000000000041b063 in run_builtin (p=0x7cdf28, argc=4,argv=0x7fffffffe370) at perf.c:319#6 0x000000000041a8e3 in handle_internal_command (argv=0x7fffffffe370,argc=4) at perf.c:376

#7  run_argv (argv=0x7fffffffe180, argcp=0x7fffffffe18c) at perf.c:420
#8  main (argc=4, argv=0x7fffffffe370) at perf.c:521

Fix by deleting the file on a failure.


That only works around the issue - if the same data file is produced by
some other method (or maliciously) then perf report will still SIGBUS ...

We could handle SIGBUS in the analysis commands too. See the suggestionI had for handling the output failure using the mmap output option whichuses lngjmp.


David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] perf record: Delete file if a failure occurs writing the perf data file

Reply via email to