[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-12-20 22:55 EDT--- Could the Ubuntu team check if this is still an issue with the 4.8 kernel? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for Power architecture for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-12-20 22:55 EDT--- Could the Ubuntu team check if this is still an issue with the 4.8 kernel? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for Power architecture for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs --- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT--- Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned. I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further. --- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT--- In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel Thanks, Balbir --- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT--- I've been working off the assumption that the bug was fixed in mainline :) I tried a few runs, including 4.5 (4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline --- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT--- Does this succeed on your system? Could you please try three runs? timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 --- Comment From balb...@au1.ibm.com 2016-08-09 21:29 EDT--- Could the team please try the patch I posted at http://marc.info/?l=linux-mm&m=147071635030062&w=2? It is under discussion at the moment. I've tried it a few times at my end on top of the xenial git tree on top of the oom reaper changes. More testing in progress at my end -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs --- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT--- Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned. I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further. --- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT--- In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel Thanks, Balbir --- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT--- I've been working off the assumption that the bug was fixed in mainline :) I tried a few runs, including 4.5 (4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline --- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT--- Does this succeed on your system? Could you please try three runs? timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs --- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT--- Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned. I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further. --- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT--- In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel Thanks, Balbir --- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT--- I've been working off the assumption that the bug was fixed in mainline :) I tried a few runs, including 4.5 (4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline --- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT--- Does this succeed on your system? Could you please try three runs? timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs --- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT--- Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned. I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further. --- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT--- In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel Thanks, Balbir --- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT--- I've been working off the assumption that the bug was fixed in mainline :) I tried a few runs, including 4.5 (4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs --- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT--- Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned. I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further. --- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT--- In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel Thanks, Balbir -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs --- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT--- Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned. I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. --- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT--- No, the diff matches, sorry for the confusion, but here is what I said "I also verified the diff matches my changes" In summary, here is what I did 1. cloned the sources 2. built locally on my machine 3. Ran stress-ng with recommended parameters 4. The test succeeded, got back the console Did four runs and I got back the console each time However with the provided binaries Step 3 (stress-ng) failed for me once in two runs -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further --- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT--- I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes. I ran timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 twice Both the times, the test did the right thing. Could someone verify if (a) The smaller subset works fine? (b) The larger test fails, if so, can we get a run with lockdep I was just testing for the command line above and I could see a difference with those patches. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment --- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT--- I am cloning the sources to debug further -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me --- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT--- Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed. timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0 I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run --- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT--- In the kern.log posted, it looks like the problem has moved to rwsem_wake+0xcc/0x110 up_write+0x78/0x90 unlink_anon_vmas+0x15c/0x2c0 A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT--- I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-13 23:04 EDT--- I also added af8e15cc85a253155fdcea707588bf6ddfc0be2e to my diff, just FYI -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-12 04:14 EDT--- I backported the oom-reaper changes from v4.5 and I've had good runs so far (2 runs with machine returning to console) I took aac453635549699c13a84ea1456d5b0e574ef855 + next 7 patches and removed unsupported bits. I also took the changes for schedule_timeout_idle() + memcontrol changes I pointed out earlier. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-11 00:59 EDT--- Can you please provide links to the sources as well, just to do a quick diff against the 4.5 working git? Have we made further progress on bisect? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-10 23:00 EDT--- No luck with the new build shared (just 1 run, I'll try more runs).. More debugging in progress as well -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-06 20:56 EDT--- >From what I can see the following is the root cause of the issue cgroup_threadgroup_rwsem almost serializes accesses on the system 1. stress-ng-brk has cgroup_threadgroup_rwsem held in read mode via copy_process() and does a schedule_timeout() from __alloc_pages_nodemask() which never seems to return from schedule_timeout() 3fff96ceeab0 0 2799 2701 0x00040002 [ 4401.831972] Call Trace: [ 4401.831973] [c00ee71433c0] [c00ee7143400] 0xc00ee7143400 (unreliable) [ 4401.831975] [c00ee7143590] [c0017c64] __switch_to+0x204/0x360 [ 4401.831977] [c00ee71435e0] [c0bb917c] __schedule+0x40c/0xe70 [ 4401.831979] [c00ee71436a0] [c0bb9c34] schedule+0x54/0xd0 [ 4401.831981] [c00ee71436d0] [c0bc0524] schedule_timeout+0x384/0x4f0 [ 4401.831983] [c00ee7143800] [c027de1c] __alloc_pages_nodemask+0xd0c/0xf40 [ 4401.831985] [c00ee7143a10] [c02e8d40] alloc_pages_current+0xc0/0x240 [ 4401.831988] [c00ee7143a70] [c0056b6c] page_table_alloc+0xcc/0x1e0 [ 4401.831989] [c00ee7143ac0] [c02b5824] __pte_alloc+0x54/0x1e0 [ 4401.831991] [c00ee7143b10] [c02b8584] copy_page_range+0x754/0x8f0 [ 4401.831993] [c00ee7143c40] [c00bcee4] copy_process.isra.6+0x1834/0x1ab0 [ 4401.831995] [c00ee7143d60] [c00bd33c] _do_fork+0xac/0x980 [ 4401.831997] [c00ee7143e30] [c000946c] ppc_clone+0x8/0xc [ 4401.861569] cfs_rq[23]:/user.slice [ 4401.861570] .exec_clock: 1725230.642232 [ 4401.861571] .MIN_vruntime : 0.01 [ 4401.861572] .min_vruntime : 1154678.434341 [ 4401.861573] .max_vruntime : 0.01 [ 4401.861573] .spread: 0.00 [ 4401.861574] .spread0 : -97866589.605918 [ 4401.861575] .nr_spread_over: 11 [ 4401.861575] .nr_running: 0 [ 4401.862187]stress-ng-brk 2799 1154678.007061854611 120 688670.967816 1148995.803148 2318289.407734 0 0 /user.slice 2. Since cgroup_threadgroup_rwsem is grabbed, we are unable to make any processes exit [ 4177.396262] Showing all locks held in the system: [ 4177.396263] 4 locks held by systemd/1: [ 4177.396268] #0: (sb_writers#9){.+.+.+}, at: [] __sb_start_write+0x100/0x130 [ 4177.396272] #1: (&of->mutex){+.+.+.}, at: [] kernfs_fop_write+0x7c/0x1f0 [ 4177.396275] #2: (cgroup_mutex){+.+.+.}, at: [] cgroup_kn_lock_live+0x14c/0x280 [ 4177.396278] #3: (&cgroup_threadgroup_rwsem){++}, at: [] percpu_down_write+0x50/0x180 I think at #3, we are waiting for all readers to exit cgroup_threadgroup_rwsem, this further blocks exiting threads [ 4177.396548] #0: (&cgroup_threadgroup_rwsem){++}, at: [] exit_signals+0x50/0x1a0 [ 4177.396548] 1 lock held by kworker/dying/1348: [ 4177.396551] #0: (&cgroup_threadgroup_rwsem){++}, at: [] exit_signals+0x50/0x1a0 [ 4177.396552] 1 lock held by kworker/dying/1919: [ 4177.396555] #0: (&cgroup_threadgroup_rwsem){++}, at: [] exit_signals+0x50/0x1a0 [ 4177.396555] 1 lock held by kworker/19:2/1930: A similar deadlock was seen and solved in 4.5 (see https://lkml.org/lkml/2016/4/17/56) More debugging in progress --- Comment From balb...@au1.ibm.com 2016-07-07 09:52 EDT--- After debugging, the following seems to work fine for me Apply the fixes mentioned at https://lkml.org/lkml/2016/4/17/56 and disable block-cgroup controller. The block cgroup controller has no specific changes to fix any deadlocks that I am aware of, so it needs more testing and root cause analysis. I expected the can_attach callback to potentially cause this, but it does not seem to be the case. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From balb...@au1.ibm.com 2016-07-01 02:36 EDT--- What is the criteria for forward progress of the stress? I did a quick check for what processes are OOM'd In new kernel 2 apport 1 cron 1 dhclient 1 gmain 2 in:imklog 1 (journald) 1 kworker/u160:4 1 rs:main 1 stress-ng 972 stress-ng-bighe 157 stress-ng-brk 10 swapper/1 2 swapper/16 17 swapper/2 39 swapper/40 15 swapper/41 1 swapper/65 3 systemd 3 systemd-cgroups 10 systemd-journal 1 systemd-logind In the 14.04 kernel 1 dhclient 1 in:imklog 1 in:imuxsock 3 irqbalance 1 jbd2/sda2-8 1 kworker/u160:1 1 stress-ng 226 stress-ng-brk 32 swapper/16 3 swapper/23 2 swapper/31 3 swapper/32 22 swapper/46 5 swapper/55 33 swapper/56 3 swapper/63 6 swapper/64 18 swapper/7 6 swapper/72 7 swapper/8 We changed the OOM killer in 4.6 (see https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aac453635549699c13a84ea1456d5b0e574ef855). Looks like we have good behaviour with 4.6 which could be a result of the change. I am yet to look at the source of memstress_ng, but if the processes selected for OOM impact the result of the test, we could have a probable explanation. It will also be interesting to continue the bisect and see where we end up. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1573062] Comment bridged from LTC Bugzilla
--- Comment From heji...@cn.ibm.com 2016-05-27 03:40 EDT--- Hi, where could I get the src/binary of memory_stress_ng. I will try to reproduce it in local power servers -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1573062 Title: memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs