Processed: Re: Bug#982122: redis: experimental package OOMs s390x buildds
Processing commands for cont...@bugs.debian.org: > forwarded 982122 https://github.com/redis/redis/issues/9369 Bug #982122 [src:redis] redis: experimental package OOMs s390x buildds Set Bug forwarded-to-address to 'https://github.com/redis/redis/issues/9369'. > thanks Stopping processing here. Please contact me if you need assistance. -- 982122: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=982122 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#982122: redis: experimental package OOMs s390x buildds
forwarded 982122 https://github.com/redis/redis/issues/9369 thanks Hi Julien, > https://people.debian.org/~jcristau/redis_6.2.5-2_s390x-2021-08-11T16:17:34Z This was very useful and, in conjunction with your suggestion of potentially reproducing it on a porterbox, I have been able to reproduce this on zelenka.debian.org... leading me to have enough info to file it upstream: https://github.com/redis/redis/issues/9369 As I mention there, it is currently unknown whether the changes in the rather suspect commit introduced the underlying problem or merely introducing a test that exposes it. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org 🍥 chris-lamb.co.uk `-
Bug#982122: redis: experimental package OOMs s390x buildds
On Thu, Aug 12, 2021 at 11:13:59AM +0100, Chris Lamb wrote: > Hey Julian, > > > sorry about my tone yesterday, and thanks for working on this, that's > > great to hear. > > Really, no worries at all... > > Still, I'm somewhat at a loss to debug this. In the first instance, can > one of you throw over the s390x log for the latest upload? As alluded > to in my previous mail, that should have some more debugging information > and buildd.debian.org is telling me "no log" at the moment. > https://people.debian.org/~jcristau/redis_6.2.5-2_s390x-2021-08-11T16:17:34Z > Failing that, I would be happy to disable the testsuite for the time > being, limiting this to s390x on the problematic experimental branch. > Let me know your thoughts on this. > That'd be kind of unfortunate, obviously. I guess the issue isn't limited to a specific test? Have you maybe tried reaching out to the porters on the debian-s390 list? Do you know if it's reproducible on the porter box? Cheers, Julien
Bug#982122: redis: experimental package OOMs s390x buildds
Hey Julian, > sorry about my tone yesterday, and thanks for working on this, that's > great to hear. Really, no worries at all... Still, I'm somewhat at a loss to debug this. In the first instance, can one of you throw over the s390x log for the latest upload? As alluded to in my previous mail, that should have some more debugging information and buildd.debian.org is telling me "no log" at the moment. Failing that, I would be happy to disable the testsuite for the time being, limiting this to s390x on the problematic experimental branch. Let me know your thoughts on this. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org 🍥 chris-lamb.co.uk `-
Bug#982122: redis: experimental package OOMs s390x buildds
On Wed, Aug 11, 2021 at 10:38:03PM -, Chris Lamb wrote: > Julien Cristau wrote: > > > It'd be appreciated if you could make fixing this a priority, and > > refrained from uploading further versions until then. > > Sure. Just to say though, your message was rather unfortunate to > receive given this latest upload was, in part, an attempt to resolve > this very issue. > Hi Chris, sorry about my tone yesterday, and thanks for working on this, that's great to hear. Cheers, Julien
Bug#982122: redis: experimental package OOMs s390x buildds
Julien Cristau wrote: > It'd be appreciated if you could make fixing this a priority, and > refrained from uploading further versions until then. Sure. Just to say though, your message was rather unfortunate to receive given this latest upload was, in part, an attempt to resolve this very issue. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org 🍥 chris-lamb.co.uk `-
Bug#982122: redis: experimental package OOMs s390x buildds
On Sat, Feb 06, 2021 at 04:58:09PM +, Adam D. Barratt wrote: > Source: redis > Version: 5:6.2~rc3-1 > Severity: serious > Tags: ftbfs > > Hi, > > Both s390x buildds hit OOM conditions while attempting to build redis > 6.2 in experimental. > > The log from zani ends with: > > [33/63 done]: integration/rdb (10 seconds) > Testing integration/corrupt-dump > [ok]: corrupt payload: #7445 - with sanitize > [...] > [ok]: corrupt payload: fuzzer findings - hash convert asserts on RESTORE with > shallow sanitization > [ok]: corrupt payload: OOM in rdbGenericLoadStringObject > [TIMEOUT]: clients state report follows. > sock2aa3bc1aa00 => (SPAWNED SERVER) pid:45952 > Killing still running Redis server 41748 > > Today's redis upload to experimental OOMed on the s390x buildd again. It'd be appreciated if you could make fixing this a priority, and refrained from uploading further versions until then. Thanks, Julien
Bug#982122: redis: experimental package OOMs s390x buildds
Hi Chris, On Mon, 2021-02-15 at 18:28 +, Chris Lamb wrote: > Ah, indeed, the failure mode means that the log never made it to > > buildd.d.o. > > Curious, not heard of that failure mode — is there someplace I can > learn about that? No worries if not. I'm not sure if it's documented, but in this case I think enough of the system was unresponsive or killed to make the connection back to buildd.d.o fail. > > I've attached a copy of the log from zani. > > Ah, thanks. Unfortunately, it does not point us straight to the > solution. I note that you titled this bug "package OOMs" — I point > this out because the "OOM" text the log is actually the name of the > test. As in, here is tests/integration/corrupt-dump.tcl: > [...] > Do we have confirmation somewhere that the build is actually OOMing, > rather than it just timing out on a test that was designed to test > *for* an OOM condition. This OOM-related bug *should* be fixed by > virtue of them adding the test to begin with (!) but if we can show > that it is still OOMing, I suspect that upstream will be able to > address it quickly. I don't know how much context would be needed, but the machine definitely OOMed: Feb 3 20:45:22 zani/zani kernel: redis-server invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 Feb 3 20:45:22 zani/zani kernel: redis-server cpuset=/ mems_allowed=0 Feb 3 20:45:22 zani/zani kernel: CPU: 0 PID: 45952 Comm: redis-server Not tainted 4.19.0-14-s390x #1 Debian 4.19.171-2 Feb 3 20:45:22 zani/zani kernel: Hardware name: IBM 8561 LT1 400 (z/VM 7.1.0) Feb 3 20:45:22 zani/zani kernel: Call Trace: Feb 3 20:45:22 zani/zani kernel: ([<00113f2a>] show_stack+0x5a/0x78) Feb 3 20:45:22 zani/zani kernel: [<00802d1a>] dump_stack+0x8a/0xb8 Feb 3 20:45:22 zani/zani kernel: [<00800962>] dump_header+0x82/0x2c0 Feb 3 20:45:22 zani/zani kernel: [<002b46fe>] oom_kill_process+0xde/0x380 Feb 3 20:45:22 zani/zani kernel: [<002b550c>] out_of_memory+0x24c/0x3b8 Feb 3 21:07:50 zani/zani kernel: [<002bd032>] __alloc_pages_nodemask+0x10b2/0x1160 Feb 3 21:07:50 zani/zani kernel: [<0012b0c6>] page_table_alloc+0x15e/0x2c8 Feb 3 21:07:50 zani/zani kernel: [<002f8b76>] __pte_alloc+0x2e/0xf8 Feb 3 21:07:50 zani/zani kernel: [<002ff258>] __handle_mm_fault+0xfc0/0x11c0 Feb 3 21:07:50 zani/zani kernel: [<002ff584>] handle_mm_fault+0x12c/0x298 Feb 3 21:07:50 zani/zani kernel: [<00123a12>] do_dat_exception+0x182/0x440 Feb 3 21:07:50 zani/zani kernel: [<0080d9d4>] pgm_check_handler+0x190/0x1e4 ... Feb 3 21:07:50 zani/zani kernel: sshd invoked oom-killer: gfp_mask=0x7080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null), order=2, oom_score_adj=-1000 Feb 3 21:07:50 zani/zani kernel: sshd cpuset=/ mems_allowed=0 Feb 3 21:07:50 zani/zani kernel: CPU: 0 PID: 1463 Comm: sshd Not tainted 4.19.0-14-s390x #1 Debian 4.19.171-2 Feb 3 21:07:50 zani/zani kernel: Hardware name: IBM 8561 LT1 400 (z/VM 7.1.0) Feb 3 21:07:50 zani/zani kernel: Call Trace: Feb 3 21:07:50 zani/zani kernel: ([<00113f2a>] show_stack+0x5a/0x78) Feb 3 21:07:50 zani/zani kernel: [<00802d1a>] dump_stack+0x8a/0xb8 Feb 3 21:07:50 zani/zani kernel: [<00800962>] dump_header+0x82/0x2c0 Feb 3 21:07:50 zani/zani kernel: [<002b46fe>] oom_kill_process+0xde/0x380 Feb 3 21:07:50 zani/zani kernel: [<002b550c>] out_of_memory+0x24c/0x3b8 Feb 3 21:07:50 zani/zani kernel: [<002bd032>] __alloc_pages_nodemask+0x10b2/0x1160 Feb 3 21:07:50 zani/zani kernel: [<0013e414>] copy_process.part.4+0x24c/0x1fb0 Feb 3 21:07:50 zani/zani kernel: [<00140550>] _do_fork+0xf0/0x430 Feb 3 21:07:50 zani/zani kernel: [<001409ce>] sys_clone+0x3e/0x50 Feb 3 21:07:50 zani/zani kernel: [<0080d630>] system_call+0xd8/0x2bc ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 45952 (redis-server), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: sshd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 ... Feb 3 21:07:50 zani/zani kernel: munin-node invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 36654 (schroot), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 34994 (sbuild), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 1508 (syslog-ng), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: oom_reaper: reaped process 1863 (samhain), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB ... Feb 3 21:07:50 zani/zani kernel: dpkg-buildpackage invoked oom-killer: gfp_mask=0x6200ca(GFP_
Bug#982122: redis: experimental package OOMs s390x buildds
Hi Adam, > Ah, indeed, the failure mode means that the log never made it to > buildd.d.o. Curious, not heard of that failure mode — is there someplace I can learn about that? No worries if not. > I've attached a copy of the log from zani. Ah, thanks. Unfortunately, it does not point us straight to the solution. I note that you titled this bug "package OOMs" — I point this out because the "OOM" text the log is actually the name of the test. As in, here is tests/integration/corrupt-dump.tcl: 447 test {corrupt payload: OOM in rdbGenericLoadStringObject} { 448 start_server [list overrides [list loglevel verbose use-exit-on-panic yes crash-memcheck-enabled no] ] { 449 r config set sanitize-dump-payload no 450 catch { r RESTORE x 0 "\x0A\x81\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFF […] 451 assert_match "*Bad data format*" $err 452 r ping 453 } 454 } Do we have confirmation somewhere that the build is actually OOMing, rather than it just timing out on a test that was designed to test *for* an OOM condition. This OOM-related bug *should* be fixed by virtue of them adding the test to begin with (!) but if we can show that it is still OOMing, I suspect that upstream will be able to address it quickly. If it helps, this test was added in this commit: https://github.com/antirez/redis/commit/7ca00d694d44be13a3ff9ff1c96b49222ac9463b ... which was in: $ git tag --contains 7ca00d694d44be13a3ff9ff1c96b49222ac9463b 6.2-rc1 6.2-rc2 6.2-rc3 Not sure if previous s390x builds were failing, which might be another route to fixing this. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org 🍥 chris-lamb.co.uk `-
Bug#982122: redis: experimental package OOMs s390x buildds
Hi Adam, > Both s390x buildds hit OOM conditions while attempting to build redis > 6.2 in experimental. > > The log from zani ends with: > > [..] Thanks. I can't seem to find the full log anywhere thoug; can you help? I might need that before I can raise it with upstream. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org 🍥 chris-lamb.co.uk `-
Bug#982122: redis: experimental package OOMs s390x buildds
Source: redis Version: 5:6.2~rc3-1 Severity: serious Tags: ftbfs Hi, Both s390x buildds hit OOM conditions while attempting to build redis 6.2 in experimental. The log from zani ends with: [33/63 done]: integration/rdb (10 seconds) Testing integration/corrupt-dump [ok]: corrupt payload: #7445 - with sanitize [...] [ok]: corrupt payload: fuzzer findings - hash convert asserts on RESTORE with shallow sanitization [ok]: corrupt payload: OOM in rdbGenericLoadStringObject [TIMEOUT]: clients state report follows. sock2aa3bc1aa00 => (SPAWNED SERVER) pid:45952 Killing still running Redis server 41748 Regards, Adam