------- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT-------
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

------- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT-------
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

------- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT-------
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

------- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT-------
I am cloning the sources to debug further

------- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT-------
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

------- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT-------
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

------- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT-------
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

------- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT-------
In the meanwhile, any updates on the bisect? I was hoping we could do both 
things (RCA and bisect) in parallel

Thanks,
Balbir

------- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT-------
I've been working off the assumption that the bug was fixed in mainline :)

I tried a few runs, including 4.5
(4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as
well (comment #25). I presume I should stick to comment #92 and assume
that the bug is still present in mainline

------- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT-------
Does this succeed on your system? Could you please try three runs?

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Yakkety:
  In Progress

Bug description:
  memory_stress_ng, as part of server certification is failing for IBM
  Power S812LC(TN71-BP012) in bare metal mode. Failing in this case is
  defined by the test locking up the server in an unrecoverable state
  which only a reboot will fix.

  I will be attaching screen and kern logs for the failures and a
  successful run on 14.04 on the same server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to