Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-14 Thread Sven Schnelle
Hi Steve,

On Wed, May 13, 2020 at 03:30:33PM -0400, Steven Rostedt wrote:
> On Wed, 13 May 2020 18:15:57 +0200
> Sven Schnelle  wrote:
> 
> > Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
> > The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 
> > cpus
> > in total). I've tried disabling preemption, but this didn't help.
> > 
> > It's always this check that causes the loop:
> > 
> > if (iter->head >= rb_page_size(iter->head_page)) {
> > rb_inc_iter(iter);
> > goto again;
> > }
> > 
> > On the first loop iter->head is some value > 0 and rb_page_size returns
> > 0, afterwards it gets twice to this check with both values 0. The third
> > time the warning is triggered. Maybe that information helps.
> 
> I figured out what was causing this, and that's just that the writer and
> the iterator could end up almost "in sync" where the writer writes to each
> of the pages the iterator is trying to read, and this can trigger the three
> failures of "zero commits" per page.
> 
> I had a way to detect this, but then realized that it may be possible for
> an active writer to possibly trigger the other failures to get an event,
> that I just decided to force it to try three times, and simply return NULL
> if an active writer is messing with the iterator. The iterator is a "best
> effort" to read the buffer if there's an active writer. The consumer read
> (trace_pipe) is made for that.
> 
> This patch should solve you issues.
> 
> (care to give a Tested-by: if it works for you?)

Well, as there's no longer a RB_WARN_ON that indeed fixes the issue :-)

Tested-by: Sven Schnelle 

Thanks!

Sven


Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-13 Thread Steven Rostedt
On Wed, 13 May 2020 18:15:57 +0200
Sven Schnelle  wrote:

> Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
> The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 cpus
> in total). I've tried disabling preemption, but this didn't help.
> 
> It's always this check that causes the loop:
> 
> if (iter->head >= rb_page_size(iter->head_page)) {
>   rb_inc_iter(iter);
>   goto again;
> }
> 
> On the first loop iter->head is some value > 0 and rb_page_size returns
> 0, afterwards it gets twice to this check with both values 0. The third
> time the warning is triggered. Maybe that information helps.

I figured out what was causing this, and that's just that the writer and
the iterator could end up almost "in sync" where the writer writes to each
of the pages the iterator is trying to read, and this can trigger the three
failures of "zero commits" per page.

I had a way to detect this, but then realized that it may be possible for
an active writer to possibly trigger the other failures to get an event,
that I just decided to force it to try three times, and simply return NULL
if an active writer is messing with the iterator. The iterator is a "best
effort" to read the buffer if there's an active writer. The consumer read
(trace_pipe) is made for that.

This patch should solve you issues.

(care to give a Tested-by: if it works for you?)

-- Steve

>From 29b36cd00ddcae278430e70e55dd64855e9a54b3 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" 
Date: Wed, 13 May 2020 15:18:01 -0400
Subject: [PATCH] ring-buffer: Don't deactivate the ring buffer on failed
 iterator reads

If the function tracer is running and the trace file is read (which uses the
ring buffer iterator), the iterator can get in sync with the writes, and
caues it to fail to find a page with content it can read three times. This
causes a warning and deactivation of the ring buffer code.

Looking at the other cases of failure to get an event, it appears that
there's a chance that the writer could cause them too. Since the iterator is
a "best effort" to read the ring buffer if there's an active writer (the
consumer reader is made for this case "see trace_pipe"), if it fails to get
an event after three tries, simply give up and return NULL. Don't warn, nor
disable the ring buffer on this failure.

Link: https://lore.kernel.org/r/20200429090508.GG5770@shao2-debian

Reported-by: kernel test robot 
Fixes: ff84c50cfb4b ("ring-buffer: Do not die if rb_iter_peek() fails more than 
thrice")
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ring_buffer.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6f0b42ceeb00..448d5f528764 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4034,7 +4034,6 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
int nr_loops = 0;
-   bool failed = false;
 
if (ts)
*ts = 0;
@@ -4056,19 +4055,14 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
return NULL;
 
/*
-* We repeat when a time extend is encountered or we hit
-* the end of the page. Since the time extend is always attached
-* to a data event, we should never loop more than three times.
-* Once for going to next page, once on time extend, and
-* finally once to get the event.
-* We should never hit the following condition more than thrice,
-* unless the buffer is very small, and there's a writer
-* that is causing the reader to fail getting an event.
+* As the writer can mess with what the iterator is trying
+* to read, just give up if we fail to get an event after
+* three tries. The iterator is not as reliable when reading
+* the ring buffer with an active write as the consumer is.
+* Do not warn if the three failures is reached.
 */
-   if (++nr_loops > 3) {
-   RB_WARN_ON(cpu_buffer, !failed);
+   if (++nr_loops > 3)
return NULL;
-   }
 
if (rb_per_cpu_empty(cpu_buffer))
return NULL;
@@ -4079,10 +4073,8 @@ rb_iter_peek(struct ring_buffer_iter *iter, u64 *ts)
}
 
event = rb_iter_head_event(iter);
-   if (!event) {
-   failed = true;
+   if (!event)
goto again;
-   }
 
switch (event->type_len) {
case RINGBUF_TYPE_PADDING:
-- 
2.20.1




Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-13 Thread Steven Rostedt
On Wed, 13 May 2020 18:15:57 +0200
Sven Schnelle  wrote:

> Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
> The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 cpus
> in total). I've tried disabling preemption, but this didn't help.
> 
> It's always this check that causes the loop:
> 
> if (iter->head >= rb_page_size(iter->head_page)) {
>   rb_inc_iter(iter);
>   goto again;
> }
> 
> On the first loop iter->head is some value > 0 and rb_page_size returns
> 0, afterwards it gets twice to this check with both values 0. The third
> time the warning is triggered. Maybe that information helps.

Letting it run long enough, I was able to trigger it.

I think I know what's wrong with it. I'll put in some debugging to see if
my thoughts are accurate.

Thanks for bringing this back to my attention.

-- Steve


Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-13 Thread Sven Schnelle
Hi Steve,

On Wed, May 13, 2020 at 09:29:22AM -0400, Steven Rostedt wrote:
> On Wed, 13 May 2020 11:19:06 +0200
> Sven Schnelle  wrote:
> 
> > Did you had a chance to look into this? I can easily reproduce this both on 
> > x86
> > and s390 by doing:
> > 
> > cd /sys/kernel/tracing
> > cat /dev/zero >/dev/null & # generate some load
> > echo function >current_tracer
> > # wait a few seconds to fill the buffer
> > cat trace
> > 
> > Usually it will print the warn after a few seconds.
> > 
> > I haven't digged through all the ring buffer code yet, so i thought i might 
> > ask
> > whether you have an idea what's going on.
> 
> Can you send me the config for where you can reproduce it on x86?
> 
> The iterator now doesn't stop the ring buffer when it iterates, and is
> doing so over a live buffer (but should be able to handle it). It's
> triggering something I thought wasn't suppose to happen (which must be
> happening).
> 
> Perhaps with your config I'd be able to reproduce it.

Thanks for looking into this. I've attached my /proc/config.gz to this Mail.
The x86 system is my Laptop which is a Thinkpad X280 with 4 HT CPUs (so 8 cpus
in total). I've tried disabling preemption, but this didn't help.

It's always this check that causes the loop:

if (iter->head >= rb_page_size(iter->head_page)) {
rb_inc_iter(iter);
goto again;
}

On the first loop iter->head is some value > 0 and rb_page_size returns
0, afterwards it gets twice to this check with both values 0. The third
time the warning is triggered. Maybe that information helps.

Thanks,
Sven


config.gz
Description: application/gzip


Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-13 Thread Steven Rostedt
On Wed, 13 May 2020 11:19:06 +0200
Sven Schnelle  wrote:

> Did you had a chance to look into this? I can easily reproduce this both on 
> x86
> and s390 by doing:
> 
> cd /sys/kernel/tracing
> cat /dev/zero >/dev/null & # generate some load
> echo function >current_tracer
> # wait a few seconds to fill the buffer
> cat trace
> 
> Usually it will print the warn after a few seconds.
> 
> I haven't digged through all the ring buffer code yet, so i thought i might 
> ask
> whether you have an idea what's going on.

Can you send me the config for where you can reproduce it on x86?

The iterator now doesn't stop the ring buffer when it iterates, and is
doing so over a live buffer (but should be able to handle it). It's
triggering something I thought wasn't suppose to happen (which must be
happening).

Perhaps with your config I'd be able to reproduce it.

-- Steve


Re: [tracing] 06e0a548ba: WARNING:at_kernel/trace/ring_buffer.c:#ring_buffer_iter_peek

2020-05-13 Thread Sven Schnelle
Hi Steve,

On Wed, Apr 29, 2020 at 05:05:09PM +0800, kernel test robot wrote:
> 
> kern  :warn  : [  886.763510] WARNING: CPU: 70 PID: 22584 at 
> kernel/trace/ring_buffer.c:4067 ring_buffer_iter_peek+0x13c/0x1d0
> kern  :warn  : [  886.776216] Modules linked in: test_firmware intel_rapl_msr 
> intel_rapl_common skx_edac nfit libnvdimm btrfs x86_pkg_temp_thermal 
> intel_powerclamp blake2b_generic xor zstd_decompress zstd_compress coretemp 
> ast kvm_intel kvm raid6_pq drm_vram_helper drm_ttm_helper libcrc32c irqbypass 
> ttm crct10dif_pclmul crc32_pclmul drm_kms_helper snd_pcm crc32c_intel 
> ghash_clmulni_intel ahci snd_timer syscopyarea libahci aesni_intel ipmi_ssif 
> sysfillrect snd crypto_simd sysimgblt cryptd nvme fb_sys_fops mei_me 
> soundcore ipmi_si glue_helper pcspkr drm nvme_core ioatdma ipmi_devintf 
> t10_pi libata mei i2c_i801 joydev lpc_ich dca ipmi_msghandler wmi acpi_pad 
> acpi_power_meter ip_tables
> kern  :warn  : [  886.838112] CPU: 70 PID: 22584 Comm: cat Not tainted 
> 5.6.0-rc4-00017-g06e0a548bad0f #1
> kern  :warn  : [  886.846573] RIP: 0010:ring_buffer_iter_peek+0x13c/0x1d0
> kern  :warn  : [  886.852355] Code: 41 5e 41 5f c3 83 38 1d 0f 85 98 00 00 00 
> 48 89 df e8 78 cb ff ff e9 32 ff ff ff 80 3c 24 00 75 c8 49 8b 44 24 10 f0 ff 
> 40 08 <0f> 0b 4c 89 f6 4c 89 ef e8 37 8c 8f 00 48 83 c4 08 31 c0 5b 5d 41
> kern  :warn  : [  886.872239] RSP: 0018:c900213e7d78 EFLAGS: 00010002
> kern  :warn  : [  886.878030] RAX: 889883ea7740 RBX: 88a3fbd93720 
> RCX: 88a42101b700
> kern  :warn  : [  886.885731] RDX: 88a415c02e00 RSI: 88a42101b440 
> RDI: 88a3fbd93720
> kern  :warn  : [  886.893432] RBP:  R08: 00ce959ae6ab 
> R09: 88a42101bb80
> kern  :warn  : [  886.901129] R10: 88a3e99e2300 R11: 0178 
> R12: 88a415c02e00
> kern  :warn  : [  886.908831] R13: 88a415c02e18 R14: 0296 
> R15: c900213e7df8
> kern  :warn  : [  886.916521] FS:  7f12d51d2640() 
> GS:88a44f78() knlGS:
> kern  :warn  : [  886.925179] CS:  0010 DS:  ES:  CR0: 
> 80050033
> kern  :warn  : [  886.931495] CR2: 7f12d4e36000 CR3: 0023e901e006 
> CR4: 007606e0
> kern  :warn  : [  886.939194] DR0:  DR1:  
> DR2: 
> kern  :warn  : [  886.946896] DR3:  DR6: fffe0ff0 
> DR7: 0400
> kern  :warn  : [  886.954588] PKRU: 5554
> kern  :warn  : [  886.957858] Call Trace:
> kern  :warn  : [  886.960866]  peek_next_entry+0x22/0x70
> kern  :warn  : [  886.965165]  __find_next_entry+0xaf/0x180
> kern  :warn  : [  886.969714]  trace_find_next_entry_inc+0x1e/0x80
> kern  :warn  : [  886.974863]  s_next+0x49/0x70
> kern  :warn  : [  886.978355]  seq_read+0x23f/0x400
> kern  :warn  : [  886.982193]  vfs_read+0x9b/0x160
> kern  :warn  : [  886.985937]  ksys_read+0xa1/0xe0
> kern  :warn  : [  886.989677]  do_syscall_64+0x5b/0x1f0
> kern  :warn  : [  886.993853]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> kern  :warn  : [  886.999408] RIP: 0033:0x7f12d50f559e
> kern  :warn  : [  887.003479] Code: c0 e9 c6 fe ff ff 50 48 8d 3d 36 fc 09 00 
> e8 e9 e7 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 
> 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
> kern  :warn  : [  887.023260] RSP: 002b:7ffdab31acd8 EFLAGS: 0246 
> ORIG_RAX: 
> kern  :warn  : [  887.031350] RAX: ffda RBX: 0002 
> RCX: 7f12d50f559e
> kern  :warn  : [  887.039006] RDX: 0002 RSI: 7f12d4e36000 
> RDI: 0005
> kern  :warn  : [  887.046657] RBP: 7f12d4e36000 R08: 7f12d4e35010 
> R09: 
> kern  :warn  : [  887.054305] R10: fc4d R11: 0246 
> R12: 7f12d4e36000
> kern  :warn  : [  887.061958] R13: 0005 R14: 0002 
> R15: 0f93
> kern  :warn  : [  887.069608] ---[ end trace 507492ef8332a5b4 ]---
> 

Did you had a chance to look into this? I can easily reproduce this both on x86
and s390 by doing:

cd /sys/kernel/tracing
cat /dev/zero >/dev/null & # generate some load
echo function >current_tracer
# wait a few seconds to fill the buffer
cat trace

Usually it will print the warn after a few seconds.

I haven't digged through all the ring buffer code yet, so i thought i might ask
whether you have an idea what's going on.

Thanks
Sven