On 11/11/13, 2:37 AM, Ingo Molnar wrote:
* David Ahern <dsah...@gmail.com> wrote:
If perf fails to write data to the data file (e.g., ENOSPC error) it fails
with the message:
failed to write perf data, error: No space left on device
and stops — killing the workload too. The file is an unknown state.
Trying to read it (e.g., perf report) fails with a SIGBUS error.
Ouch - guys please first investiage that SIGBUS, we should not behave
unexpectedly on _any_ (read: random) perf.data file contents. The SIGBUS
likely suggests that the parsing isn't robust enough.
I think we know why the SIGBUS is happening. From 'man mmap':
From man mmap:
SIGBUS Attempted access to a portion of the buffer that
does not correspond to the file (for example, beyond
the end of the file, ...
With regards to perf-record, on a write() failure the header is not
updated. From a recent change we try to proceed even though the data
size is 0 - parsing the events we can. We finally hit upon an event that
is only partially in the file (eg., header, but no data for event).
Trying to read the event data leads to the SIGBUS:
Running perf-report in gdb:
WARNING: The /tmp/mnt/perf.data file's data size field is 0 which is
unexpected.
Was the 'perf record' command properly terminated?
Program received signal SIGBUS, Bus error.
perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,
data=0x7fffffffd260)
at util/evsel.c:1242
1242 u16 max_size = event->header.size;
(gdb) bt
#0 perf_evsel__parse_sample (evsel=0x94eec0, event=0x7ffff7ed9d80,
data=0x7fffffffd260)
at util/evsel.c:1242
#1 0x000000000047c9ce in flush_sample_queue (s=0x94e2b0,
tool=0x7fffffffde80)
at util/session.c:542
#2 0x000000000047e2d4 in __perf_session__process_events (session=0x94e2b0,
data_offset=<optimized out>, data_size=<optimized out>,
file_size=1048576, tool=0x7fffffffde80)
at util/session.c:1388
#3 0x000000000042993c in __cmd_report (rep=0x7fffffffde80) at
builtin-report.c:509
#4 cmd_report (argc=0, argv=0x7fffffffe370, prefix=<optimized out>) at
builtin-report.c:967
#5 0x000000000041b063 in run_builtin (p=0x7cdf28, argc=4,
argv=0x7fffffffe370) at perf.c:319
#6 0x000000000041a8e3 in handle_internal_command (argv=0x7fffffffe370,
argc=4) at perf.c:376
#7 run_argv (argv=0x7fffffffe180, argcp=0x7fffffffe18c) at perf.c:420
#8 main (argc=4, argv=0x7fffffffe370) at perf.c:521
Fix by deleting the file on a failure.
That only works around the issue - if the same data file is produced by
some other method (or maliciously) then perf report will still SIGBUS ...
We could handle SIGBUS in the analysis commands too. See the suggestion
I had for handling the output failure using the mmap output option which
uses lngjmp.
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/