[Bug 1573062] Comment bridged from LTC Bugzilla

2016-12-20 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-12-20 22:55 EDT---
Could the Ubuntu team check if this is still an issue with the 4.8 kernel?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for Power architecture for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-12-20 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-12-20 22:55 EDT---
Could the Ubuntu team check if this is still an issue with the 4.8 kernel?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for Power architecture for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-08-09 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

--- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT---
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

--- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT---
In the meanwhile, any updates on the bisect? I was hoping we could do both 
things (RCA and bisect) in parallel

Thanks,
Balbir

--- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT---
I've been working off the assumption that the bug was fixed in mainline :)

I tried a few runs, including 4.5
(4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as
well (comment #25). I presume I should stick to comment #92 and assume
that the bug is still present in mainline

--- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT---
Does this succeed on your system? Could you please try three runs?

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

--- Comment From balb...@au1.ibm.com 2016-08-09 21:29 EDT---
Could the team please try the patch I posted at 
http://marc.info/?l=linux-mm&m=147071635030062&w=2? It is under discussion at 
the moment. I've tried it a few times at my end on top of the xenial git tree 
on top of the oom reaper changes. More testing in progress at my end

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-08-09 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

--- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT---
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

--- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT---
In the meanwhile, any updates on the bisect? I was hoping we could do both 
things (RCA and bisect) in parallel

Thanks,
Balbir

--- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT---
I've been working off the assumption that the bug was fixed in mainline :)

I tried a few runs, including 4.5
(4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as
well (comment #25). I presume I should stick to comment #92 and assume
that the bug is still present in mainline

--- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT---
Does this succeed on your system? Could you please try three runs?

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-31 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

--- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT---
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

--- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT---
In the meanwhile, any updates on the bisect? I was hoping we could do both 
things (RCA and bisect) in parallel

Thanks,
Balbir

--- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT---
I've been working off the assumption that the bug was fixed in mainline :)

I tried a few runs, including 4.5
(4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as
well (comment #25). I presume I should stick to comment #92 and assume
that the bug is still present in mainline

--- Comment From balb...@au1.ibm.com 2016-07-31 21:17 EDT---
Does this succeed on your system? Could you please try three runs?

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-25 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

--- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT---
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

--- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT---
In the meanwhile, any updates on the bisect? I was hoping we could do both 
things (RCA and bisect) in parallel

Thanks,
Balbir

--- Comment From balb...@au1.ibm.com 2016-07-25 23:37 EDT---
I've been working off the assumption that the bug was fixed in mainline :)

I tried a few runs, including 4.5
(4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as
well (comment #25). I presume I should stick to comment #92 and assume
that the bug is still present in mainline

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-25 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

--- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT---
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

--- Comment From balb...@au1.ibm.com 2016-07-25 09:09 EDT---
In the meanwhile, any updates on the bisect? I was hoping we could do both 
things (RCA and bisect) in parallel

Thanks,
Balbir

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-25 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

--- Comment From balb...@au1.ibm.com 2016-07-25 08:08 EDT---
Strange, I am able to reproduce the issue with the provided binaries, but not 
when I build it. I am not doing a deb build, but just a make -j64 with the 
config from /boot for 4.4.0-28. The problem could be at my end, but I am a 
little concerned.

I also noticed that if I am interacting with the system during runs, it
succeeds, frequently checking if the console is active (enters and
control-o-h). I am going to see if I can get a repro again and debug
further.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-21 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

--- Comment From balb...@au1.ibm.com 2016-07-21 20:14 EDT---
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-19 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

--- Comment From balb...@au1.ibm.com 2016-07-19 23:52 EDT---
I cloned the kernel from 
https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062
 and built with the machine config specified from /boot/config. I also verified 
the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a
difference with those patches.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-19 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

--- Comment From balb...@au1.ibm.com 2016-07-19 19:51 EDT---
I am cloning the sources to debug further

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-18 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

--- Comment From balb...@au1.ibm.com 2016-07-19 01:04 EDT---
Looks like I got a failure with the run on 
http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the
the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout
$runtime --brk 0

I've tried three times with my diff (all success) and twice with the
kernel @ ~kamal (one failure and one success). I've not tried the longer
7 hour run

--- Comment From balb...@au1.ibm.com 2016-07-19 01:37 EDT---
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the
sem->wait_lock. I can see a whole bunch of exiting  stress-ng-mmapf
stuck on this lock, spinning. I'll double check this. Can we get a build
with lockdep enabled? I am unable to reproduce this issue at my end with
the diff applied on my machine at the moment

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-17 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-17 21:43 EDT---
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ 
and it worked fine for me

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-13 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-13 23:04 EDT---
I also added af8e15cc85a253155fdcea707588bf6ddfc0be2e to my diff, just FYI

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-12 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-12 04:14 EDT---
I backported the oom-reaper changes from v4.5  and I've had good runs so far (2 
runs with machine returning to console)

I took aac453635549699c13a84ea1456d5b0e574ef855 + next 7 patches and
removed unsupported bits. I also took the changes for
schedule_timeout_idle() + memcontrol changes I pointed out earlier.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-10 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-11 00:59 EDT---
Can you please provide links to the sources as well, just to do a quick diff 
against the 4.5 working git?

Have we made further progress on bisect?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-10 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-10 23:00 EDT---
No luck with the new build shared (just 1 run, I'll try more runs).. More 
debugging in progress as well

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-07 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-06 20:56 EDT---
>From what I can see the following is the root cause of the issue

cgroup_threadgroup_rwsem almost serializes accesses on the system

1. stress-ng-brk has cgroup_threadgroup_rwsem held in read mode via
copy_process() and does a schedule_timeout() from
__alloc_pages_nodemask() which never seems to return from
schedule_timeout()

3fff96ceeab0 0  2799   2701 0x00040002
[ 4401.831972] Call Trace:
[ 4401.831973] [c00ee71433c0] [c00ee7143400] 0xc00ee7143400 
(unreliable)
[ 4401.831975] [c00ee7143590] [c0017c64] __switch_to+0x204/0x360
[ 4401.831977] [c00ee71435e0] [c0bb917c] __schedule+0x40c/0xe70
[ 4401.831979] [c00ee71436a0] [c0bb9c34] schedule+0x54/0xd0
[ 4401.831981] [c00ee71436d0] [c0bc0524] 
schedule_timeout+0x384/0x4f0
[ 4401.831983] [c00ee7143800] [c027de1c] 
__alloc_pages_nodemask+0xd0c/0xf40
[ 4401.831985] [c00ee7143a10] [c02e8d40] 
alloc_pages_current+0xc0/0x240
[ 4401.831988] [c00ee7143a70] [c0056b6c] page_table_alloc+0xcc/0x1e0
[ 4401.831989] [c00ee7143ac0] [c02b5824] __pte_alloc+0x54/0x1e0
[ 4401.831991] [c00ee7143b10] [c02b8584] copy_page_range+0x754/0x8f0
[ 4401.831993] [c00ee7143c40] [c00bcee4] 
copy_process.isra.6+0x1834/0x1ab0
[ 4401.831995] [c00ee7143d60] [c00bd33c] _do_fork+0xac/0x980
[ 4401.831997] [c00ee7143e30] [c000946c] ppc_clone+0x8/0xc

[ 4401.861569] cfs_rq[23]:/user.slice
[ 4401.861570]   .exec_clock: 1725230.642232
[ 4401.861571]   .MIN_vruntime  : 0.01
[ 4401.861572]   .min_vruntime  : 1154678.434341
[ 4401.861573]   .max_vruntime  : 0.01
[ 4401.861573]   .spread: 0.00
[ 4401.861574]   .spread0   : -97866589.605918
[ 4401.861575]   .nr_spread_over: 11
[ 4401.861575]   .nr_running: 0

[ 4401.862187]stress-ng-brk  2799   1154678.007061854611   120
688670.967816   1148995.803148   2318289.407734 0 0 /user.slice

2. Since cgroup_threadgroup_rwsem is grabbed, we are unable to make any
processes exit

[ 4177.396262] Showing all locks held in the system:
[ 4177.396263] 4 locks held by systemd/1:
[ 4177.396268]  #0:  (sb_writers#9){.+.+.+}, at: [] 
__sb_start_write+0x100/0x130
[ 4177.396272]  #1:  (&of->mutex){+.+.+.}, at: [] 
kernfs_fop_write+0x7c/0x1f0
[ 4177.396275]  #2:  (cgroup_mutex){+.+.+.}, at: [] 
cgroup_kn_lock_live+0x14c/0x280
[ 4177.396278]  #3:  (&cgroup_threadgroup_rwsem){++}, at: 
[] percpu_down_write+0x50/0x180

I think at #3, we are waiting for all readers to exit
cgroup_threadgroup_rwsem, this further blocks exiting threads

[ 4177.396548]  #0:  (&cgroup_threadgroup_rwsem){++}, at: 
[] exit_signals+0x50/0x1a0
[ 4177.396548] 1 lock held by kworker/dying/1348:
[ 4177.396551]  #0:  (&cgroup_threadgroup_rwsem){++}, at: 
[] exit_signals+0x50/0x1a0
[ 4177.396552] 1 lock held by kworker/dying/1919:
[ 4177.396555]  #0:  (&cgroup_threadgroup_rwsem){++}, at: 
[] exit_signals+0x50/0x1a0
[ 4177.396555] 1 lock held by kworker/19:2/1930:

A similar deadlock was seen and solved in 4.5 (see
https://lkml.org/lkml/2016/4/17/56)

More debugging in progress

--- Comment From balb...@au1.ibm.com 2016-07-07 09:52 EDT---
After debugging, the following seems to work fine for me

Apply the fixes mentioned at https://lkml.org/lkml/2016/4/17/56 and
disable block-cgroup controller.

The block cgroup controller has no specific changes to fix any deadlocks
that I am aware of, so it needs more testing and root cause analysis. I
expected the can_attach callback to potentially cause this, but it does
not seem to be the case.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-07-06 Thread bugproxy
--- Comment From balb...@au1.ibm.com 2016-07-01 02:36 EDT---
What is the criteria for forward progress of the stress?

I did a quick check for what processes are OOM'd

In new kernel

2 apport
1 cron
1 dhclient
1 gmain
2 in:imklog
1 (journald)
1 kworker/u160:4
1 rs:main
1 stress-ng
972 stress-ng-bighe
157 stress-ng-brk
10 swapper/1
2 swapper/16
17 swapper/2
39 swapper/40
15 swapper/41
1 swapper/65
3 systemd
3 systemd-cgroups
10 systemd-journal
1 systemd-logind

In the 14.04 kernel

1 dhclient
1 in:imklog
1 in:imuxsock
3 irqbalance
1 jbd2/sda2-8
1 kworker/u160:1
1 stress-ng
226 stress-ng-brk
32 swapper/16
3 swapper/23
2 swapper/31
3 swapper/32
22 swapper/46
5 swapper/55
33 swapper/56
3 swapper/63
6 swapper/64
18 swapper/7
6 swapper/72
7 swapper/8

We changed the OOM killer in 4.6 (see 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aac453635549699c13a84ea1456d5b0e574ef855).
 Looks like we have good behaviour with 4.6 which could be a result of the 
change. I am yet to look at the source of memstress_ng, but if the
processes selected for OOM impact the result of the test, we could have a 
probable explanation. It will also be interesting to continue the bisect and 
see where we end up.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1573062] Comment bridged from LTC Bugzilla

2016-05-27 Thread bugproxy
--- Comment From heji...@cn.ibm.com 2016-05-27 03:40 EDT---
Hi, where could I get the src/binary of memory_stress_ng. I will try to 
reproduce it in local power servers

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1573062

Title:
  memory_stress_ng failing for IBM Power S812LC(TN71-BP012) for 16.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1573062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs