Em Thu, Jan 29, 2015 at 03:55:22PM -0800, Sukadev Bhattiprolu escreveu:
> Arnaldo,
> 
> On one of our systems we are seeing an intermittent SIGSEGV with
> 
>       perf trace sleep 1
> 
> and I have question about the 'draining' flag below:
> | 
> | From 46fb3c21d20415dd2693570c33d0ea6eb8745e04 Mon Sep 17 00:00:00 2001
> | From: Arnaldo Carvalho de Melo <[email protected]>
> | Date: Mon, 22 Sep 2014 14:39:48 -0300
> | Subject: [PATCH 1/1] perf trace: Filter out POLLHUP'ed file descriptors
> | 
> | So that we don't continue polling on vanished file descriptors, i.e.
> | file descriptors for events monitoring threads that exited.
> | 
> | I.e. the following 'trace' command now exits as expected, instead
> | of staying in an eternal loop:
> | 
> |       $ sleep 5s &
> |       $ trace -p `pidof sleep`
> | 
> | Reported-by: Jiri Olsa <[email protected]>
> | Cc: Adrian Hunter <[email protected]>
> | Cc: David Ahern <[email protected]>
> | Cc: Don Zickus <[email protected]>
> | Cc: Frederic Weisbecker <[email protected]>
> | Cc: Jiri Olsa <[email protected]>
> | Cc: Mike Galbraith <[email protected]>
> | Cc: Namhyung Kim <[email protected]>
> | Cc: Paul Mackerras <[email protected]>
> | Cc: Peter Zijlstra <[email protected]>
> | Cc: Stephane Eranian <[email protected]>
> | Link: http://lkml.kernel.org/n/[email protected]
> | Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
> | ---
> |  tools/perf/builtin-trace.c | 7 ++++++-
> |  1 file changed, 6 insertions(+), 1 deletion(-)
> | 
> | diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> | index b8fedf3..fe39dc6 100644
> | --- a/tools/perf/builtin-trace.c
> | +++ b/tools/perf/builtin-trace.c
> | @@ -2044,6 +2044,7 @@ static int trace__run(struct trace *trace, int argc, 
> const char **argv)
> |     int err = -1, i;
> |     unsigned long before;
> |     const bool forks = argc > 0;
> | +   bool draining = false;
> |     char sbuf[STRERR_BUFSIZE];
> |  
> |     trace->live = true;
> | @@ -2171,8 +2172,12 @@ next_event:
> |     if (trace->nr_events == before) {
> |             int timeout = done ? 100 : -1;
> |  
> | -           if (perf_evlist__poll(evlist, timeout) > 0)
> | +           if (!draining && perf_evlist__poll(evlist, timeout) > 0) {
> | +                   if (perf_evlist__filter_pollfd(evlist, POLLERR | 
> POLLHUP) == 0)
> | +                           draining = true;
> | +
> 
> If an fd gets into POLLHUP state, perf_evlist__filter_pollfd() removes
> ("puts") the mmap for the fd. We are seeing that sometimes (frequently)
> _all_ fds are in the POLLHUP state and hence their mmap->base are set
> to NULL.
> 
> 
> |                     goto again;
> 
> Now with this goto, we go back and call perf_evlist__mmap_read() which
> tries to access the freed mmaps.
> 
> Should there be another check to before reading the mmap again ?

Possibly, checking, but a similar algorithm should be in place for
'record', do you see any problems there? I.e. with 'perf record sleep
1'?

- Arnaldo
 
> I must add that I don't get the SIGSEGV on recent perf-core and the
> system where we get the crash, first runs into the following
> errors that we are still looking into (maybe related to "ppc64le"
> architecture).
> 
>       Problems reading syscall 45 information
>       Problems reading syscall 5 information
>       Problems reading syscall 5 information
>       Problems reading syscall 108 information
>       Problems reading syscall 108 information
>       Problems reading syscall 90 information
>       Problems reading syscall 90 information
>       Problems reading syscall 6 information
> 
> Unlike the SIGSEGV, these errors occur always.
> 
> | +           }
> |     } else {
> |             goto again;
> |     }
> | -- 
> | 1.8.3.1
> | 
> 
> Following hack seems to fix the SIGSEGV, but then we completely ignore
> 'draining' flag.
> 
> 
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index fb12645..ac25e16 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -2173,8 +2173,10 @@ next_event:
>                 int timeout = done ? 100 : -1;
>  
>                 if (!draining && perf_evlist__poll(evlist, timeout) > 0) {
> -                       if (perf_evlist__filter_pollfd(evlist, POLLERR | 
> POLLHUP) == 0)
> +                       if (perf_evlist__filter_pollfd(evlist, POLLERR | 
> POLLHUP) == 0) {
>                                 draining = true;
> +                               goto out_disable;
> +                       }
>  
>                         goto again;
>                 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to