[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-11-11 Thread kng ops
I want to report that the CPU time bug is still present in the following
kernel

2.6.32-317-ec2 on AWS instance m1.large

with CPU

model name  : Intel(R) Xeon(R) CPU   E5507  @ 2.27GHz

I logged the output of ps every minute and I can now show the bug
happening, as you can see the CPU time originally at 35790:43 does get
corrupted and the next minute jumps at 17179869:11 to reach 29322904:44
1 minute later

www-data  8269 31.3  9.2 853516 725496 ?   Sl   Aug23 35789:36 
/usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ...
www-data  8269 31.3  9.2 866128 731640 ?   Sl   Aug23 35790:09 
/usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ...
www-data  8269 31.3  9.3 868704 738800 ?   Sl   Aug23 35790:43 
/usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ...
www-data  8269 31.3  9.4 872964 742660 ?   Sl   Aug23 35791:17 
/usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ...
www-data  8269 120093018  9.4 885060 748036 ?  Sl   Aug23 17179869:11 
/usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... www-data  8269 10652  9.3 
868748 739144 ?  Sl   Aug23 29322904:44 
/usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ...

It took us around 3 months running this process at around 30% load to
reach this condition

Because of this problem the machine currently reports a load of 0 when
instead we know it should report between 20 and 40% depending by the
moment of the day.

Please let me know If I need to open a specific ticket for this issue.

Today we're going to upgrade to 2.6.32-319-ec2

thanks,

Paolo Negri

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-11-11 Thread Stefan Bader
Reading back through the previous comments I think this should be
handled a a separate issue. The original issue caused processes not to
be forked while I understand this seems to be rather load/system time
related but not as fatal as the main issue of this report. Could you
please open a new ticket for this? And let me know the bug number.
Thanks

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-11-11 Thread kng ops
new ticket opened at

https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/889012

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-07-26 Thread Stefan Bader
Based on the latest comments (unfortunately this bug was not referenced
when committing the change), I am closing this bug as fixed. If anybody
thinks it is still a problem, feel free to re-open it (or a new report).
Thanks.

** Changed in: linux-ec2 (Ubuntu)
   Status: New = Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-13 Thread Don Spidell
FWIW, I upgraded to 2.6.32-316-ec2 #31 on Ubuntu 10.04 running on an
E5507 and I have not run into this issue on any of the boxes I'm running
since the upgrade.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-08 Thread Jan Gerle
Hi Stefan,

we experienced the kernel lockups under heavy load on Ubuntu 10.04
running the standard 2.6.32-314-ec2 kernel from the ami-3202f25b (us-
east-1). The underlying hardware is a Intel(R) Xeon(R) CPU E5507 @
2.27GHz w/ 4MB cache.

I have installed the linux-
image-2.6.32-317-ec2_2.6.32-317.31~preview2_amd64 from .deb and now the
lockups are gone.

Cheers,
Jan

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-08 Thread Stefan Bader
Jan, hopefully we got the fix for this in the next official update. It 
currently is waiting to clear proposed but was delayed by findings in some of 
the changes used by all kernels.
But thanks for testing the preview kernel. That is build from the bigger set of 
changes which I would like to merge in the future. And positive feedback helps 
to argue for that change.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-08 Thread Eduardo Ferreira
Ah, good to know (#62). Next time we'll use the new instances.
By the way, it's running non-stop since last week.
Great job!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-02 Thread Eduardo Ferreira
Hey, Stefan,

We're having instances hanging throughout the day
(https://forums.aws.amazon.com/thread.jspa?threadID=68692) and it looks
like it may be related to this.

Looking at your comment (#53) I wanted to apply the changes you
mentioned, but I need some help on how to install them. I'm familiar
with apt-get install, etc, but I'm not sure what I need to do to use the
packages you prepared.

Would you let me know how I can do that?

Thank you.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-02 Thread Eduardo Ferreira
Nevermind, I found out how to do that, i.e. launch new instances using AMI 
(ami-40db2229) with matching AKI (aki-427d952b).
It's all running now... hopeful that it won't hang anymore.
Thank you!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-06-02 Thread Scott Moser
Eduardo,
  Just fyi, any recent (this year) lucid image will already use the pv-grub 
kernels by default.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-05-17 Thread Stefan Bader
Hi Robert,

thanks for the update and really good to hear. The bug you mentioned
might be one of the possible reasons. There also have been changes to
change timer and ipi interrupts to be per-cpu and the rest to use
fasteoi instead of level (moving around masking code as well). Having
irqs use the level based handler was claimed to potentially cause
missing interrupts which may or may not have some impact in this case.
The other changes to process time accounting probably only have impact
on runtime output which was showing impossible values (larger than
uptime) but who knows. So yeah, one of the above likely. :)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-05-17 Thread Chetan Sarva
Robert,

Can you confirm what CPU is running on that node? The Librato guys have
apparently narrowed it down to only happening on the Intel E5507. See
this blog post for more:

https://silverline.librato.com/blog/main/EC2_Users_Should_be_Cautious_When_Booting_Ubuntu_10_04_AMIs

I checked one of our problem nodes that's still up and sure enough, it's
running on an E5507. We have another node using the same CPU but with
the same test kernel that was posted by Stefan above. So far, it looks
ok, but we haven't put too much load on it yet and it's only been about
4 days.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-05-17 Thread Stefan Bader
Which would be exactly the bug Robert was pointing to. Hrm, it sounds
more and more sensible to pick fix up sooner than later as the full
changeset is harder to get approved.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-05-16 Thread Robert Newson
Hi Stefan,

Just an update. That same node has run flawlessly under high load since
my last post, no lockups. What are your thoughts on the claim that this
bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/727459 is the
cause of the problem?

B.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-05-05 Thread Robert Newson
Stefan,

I've been testing your kernels and stability seems much improved. It's
only been a few days but the server has been under quite a lot of load
without an issue, with the stock ec2 kernel this was definitely not the
case.

Cautious optimism, but I'd love to isolate which of the patches was the
fix. do you have a list of the patches you incorporated?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-05-05 Thread Stefan Bader
Robert, that really sounds like good news. To answer the question what exactly 
improved things could get a bit difficult. It also depends a bit what exactly 
the last stock ec2 kernel was you are comparing against. I had been doing a 
first round of changes which Jeremy has been testing (from what he wrote I was 
assuming that the fork/clone issue was gone but still some process time 
accounting going wrong).
With this second, bigger round of changes, there were changes to the interrupt 
handling and also some timer, and accounting changes which could have have an 
effect. But also quite a bit more. If you want to look at the changes yourself, 
I try to keep the delta up to date as long as it is not in the stock kernel at 
git://kernel.ubuntu.com/smb/ubuntu-lucid.git ec2-next. The reason this quite 
big is a bit of a long story. In short the ec2 code is based on a patch-set 
that is rebased. And we don't know exactly against what. So individual patches 
can change without proper explanation why. I tried to document things as much 
as I was able, still it is not always nice.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-04-27 Thread Stefan Bader
This is a general request for some testing help for Ubuntu 10.04 (Lucid) on 
ec2. If there are other problem reports about strange process cpu times or 
hangs without any message it would be good to pass this on.
Trying to pull in a wide range of upstream fixes, I have prepared linux-ec2 
packages that include all the fixes to Xen code which could be found in other 
places. This is quite a big change, so while I have successfully booted the 
test kernels on various ec2 instances, this usually does not really provide 
much insight about what happens when facing more or less real workloads. So 
anybody who is willing and able to help testing would be highly welcome.

The kernel packages are at: http://people.canonical.com/~smb/lucid-ec2-ng/
To use those, http://uec-images.ubuntu.com/lucid/current/ has current AMIs and 
using --kernel with a matching AKI from 
https://lists.ubuntu.com/archives/ubuntu-cloud/2010-December/000466.html will 
use pv-grub for booting. So the new kernel can just be installed via dpkg and 
activated by rebooting.

Please report back anything that seems to be worse or better than
before. Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-04-12 Thread Mike Malone
Since no one developed a reproducible test case it is, unfortunately,
difficult to say whether this bug is resolved. We moved to a 2.6.35
series kernel and stopped seeing this particular problem (although we
have seen other problems that are eerily similar). All of the
information to reproduce the behavior is in the comments above, but it
does take a lot of time and patience to trigger it. Either that, or
hundreds of instances running with production load.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-04-07 Thread Stefan Bader
The main problem here is the question whether there still is a problem.
There has been a first round of cleanup patches going into the ec2
kernels with 2.6.32-314.27. The last update/response I got was from
Jeremy and he seemed at least not to see the blocking behavior with
those. So as long as there are no confirmations that this particular
problem was still observed with kernels after 314.27, I would tend to
close this ticket as resolved.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-04-06 Thread Jordan Sissel
This ticket has been idle for a month, anyone have a resolution yet? :(

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-03-05 Thread Jeremy Deininger
Stefan,
The test machine did not exhibit the fork/clone behavior in this bug.  The 
wrong cpu times were pretty bad though and perhaps unrelated to this bug.  The 
experiment lasted 3 days on a loaded 4096meg cache large instance.  That 
*should have been long enough but with this bug it's tough to tell.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-03-03 Thread Stefan Bader
Hi Jeremy, I have not heard back, so I am wondering whether the
hang/crashing occurred after all?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-22 Thread Jeremy Deininger
Stefan,
I have some boxes that seem to reproduce this behaviour rather frequently.  
This is great news since this bug is so hard to reproduce.  

I just launched with the new testing kernels you provided (aki-9ab546f3
x86_64) on a server that has 4096meg cache (the bad behaving size).
Unfortunately I'm already seeing signs that this kernel has the CPU
reporting bug where CPU times are in the thousands of days.  Usually
this means the box will be crashing in the next 1-2 days.. Will post
back when I confirm that the fork bug still exists in these kernels.
The CPUtime bug still exists.  It was speculated that these were
unrelated..  This machine was just launched and within minutes it
displayed wrong cpu times.

process list snip from top:

26614 root  20   0  570m 448m 4972 S0  2.9   0:39.41 ruby   

  
26671 root  20   0  539m 419m 4948 S0  2.7   0:35.79 ruby   

  
28441 root  20   0  528m 409m 4212 S0  2.7 24207187w   ruby 


26310 root  20   0  523m 405m 5040 S3  2.6   0:51.46 ruby   

  
29320 root  20   0  523m 405m 4948 S0  2.6 91970855d ruby

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-15 Thread Andrew Frolov
Same as Alex - we were having this issue with our production servers on 
2.6.32-309-ec2 kernel.
After upgrading to Maverick (2.6.35-24-virtual) issue is gone. So far (5 days) 
we haven't met this bug.
I would also know about kernel version that doesn't have this issue anymore.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-15 Thread Stefan Bader
Well maybe the kernel I posted. Kind of hard to say without anybody
testing it. ;-) The problem is that Lucid and Maverick kernels for EC2
are very different approaches. Maverick already has enough Xen code in
the mainline kernel that only the configuration is different to the
normal kernel. Lucid required a substantial number of patches on top of
the code that was upstream and the approach used there has the risk of
getting a skew between changes to the generic kernel code and the xen
code. In the test kernels, at least that skew should be fixed and maybe
that was causing this issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-14 Thread Alec Thomas
We are now seeing this on a whole bunch of production machines all
running 10.04, 2.6.32-312-ec2. Has anyone confirmed a kernel (old or
new) that doesn't have this bug?

We've just upgraded one machine to 10.10 to see if it fixes the issue.

We haven't tried Stefan's kernels, but may if 10.10 doesn't show
improvement.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-13 Thread Jeremy Deininger
Hi guys,

I've been following this bug very closely and I just had it happen to me
on an Instance running Maverick.  Kernel version was 2.6.35-22-virtual
and I realize a newer -25- is out.  Will see if I can repro with that as
well.  This was a classic example of the bug, it happened during a large
apt-get install sequence and stopped allowing forking of new processes.
Anyone else seen this in Maverick (10.10)?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-10 Thread Stefan Bader
While there is not yet a patch that I would suspect of being the one
fixing this issue, there were a larger number of changes that had been
done upstream but missed the Lucid ec2 kernel because it is using xen
specific copies of x86 files. Some of them will have no effect as
compile options cause the code not to be included but some were in areas
which may have effect on this issue.

I have prepared some test kernel packages (and also some test aki's)
which contain the sync with all current stable updates. It would be
awesome if someone could check that kernel and see whether it removes
the problem already (at least not introduces new ones). I booted a i386
and amd64 instance with it and could not see immediate problems.

The debian packages (for pv-grub) would be at
http://people.canonical.com/~smb/lp708920/, the aki's are:

us-west-1   aki-4faefe0ax86_64
us-east-1   aki-9ab546f3x86_64
eu-west-1   aki-bdf2c6c9x86_64
ap-southeast-1  aki-8689f7d4x86_64

us-west-1   aki-49aefe0ci386
us-east-1   aki-6cb44705i386
eu-west-1   aki-a7f2c6d3i386
ap-southeast-1  aki-8c89f7dei386

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-04 Thread Stefan Bader
I tried the approach from comment #30 and also comment #37 with ami-
fa01f193 in us-east. This runs a 2.6.32-312.24 kernel currently in
proposed and was running on hw that showed 6M cache in cpuinfo. Both
methods did not cause the scheduler bug for me.

I will have a look at the source code and see whether something can be
found there. This unfortunately is a bit bulky.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-03 Thread Andy Whitcroft
The commit mentioned in comment #39 is included in the
Ubuntu-2.6.32-311.23 EC2 kernel.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-02 Thread Matt Wilson
Mike,

You bring up a good point about CFS' need for good process time
accounting. I think that this upstream patch may fix a lot of problems:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8a22b9996b001c88f2bfb54c6de6a05fc39e177a

This patch is in 2.6.34.7, may not be in the 10.04 kernel.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-01 Thread Mike Malone
Hey Matt,

I ran it once while the system was idle:
https://gist.github.com/fb35566354afc442bf2d

And then again in a tight loop with a 50ms sleep while I locked the
system up, in the hopes of catching something interesting just before or
during the period when the system was locked:
https://gist.github.com/ce0887dcdc125afcca94

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-01 Thread Matt Wilson
I've done a lot of looking at this today. It feels like the problem may
lie in the process scheduler. When I pin the CPU burning process to CPU0
(through taskset -pc 0 $pid_printed_by_a_out), and pin a bash shell
also to CPU0, I see failure of the bash process to wake after sleeping
(i.e., it's runnable, but CFS isn't giving it time). I've seen the bash
process start to be scheduled after around 3 minutes, and I've also seen
it just sit there.

Every time I've seen a scheduler debug trace (triggered via echo w 
/proc/sysrq-trigger), there have been other runnable processes on the
spinning CPU that don't seem to be getting scheduled at all.

I've not been able to reproduce this problem on the kernel used in the
Amazon Linux AMI (currently 2.6.34.7). This is in line with other user's
observations (http://twitter.com/#!/synack/status/30415380321140737).

I think that Canonical might need to look into what (if any) changes
they've made to CFS in the 10.04 kernel tree. It's also possible that
improvements have been made in CFS between 2.6.32 and 2.6.34 that
account for better performance.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-02-01 Thread Mike Malone
Matt,

That's definitely in line with what we're seeing. Not to beat a dead
horse, and I am not a kernel hacker (so I apologize for any naivety),
etc... but that's basically what led me to believe the
CLOCK_PROCESS_CPUTIME_ID timers, top issues, and other CPU time
monitoring may be relevant here.

Based on my rather high level understanding of CFS -- the big idea being
that processes are scheduled based on their run time --  it seems
plausible that a process that appears to be using no CPU time would
continue to be prioritized by the scheduler. I'll leave any further
analysis to someone who knows how these systems work.

We have done a bit more testing on the 2.6.35 series kernels, with
Ubuntu 10.10 (Maverick). So far we have not been able to reproduce this
issue there. If we do I'll report back, but for now we are moving
forward with migrating effected systems to Maverick. We're happy to
assist with efforts to track this further, but I think we've documented
most of what we've experienced.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Alec Thomas
I am running Eclipse on EC2 over NX and this bug surfaces quite quickly,
in the form of total system hangs for .5 to 5 minutes at a time.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Matt Wilson
Alec,

Do any hung task kernel stack traces get emitted during your hangs?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Mike Malone
Starting irqbalance (without rebooting) on a node in a state where
fork() will hang does not help.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Jordan Sissel
I see this in a particularly annoying way.

apt-get when run from another tool (like puppet) will have /dev/null for
stdin. apt-get foolishly select(2)s on stdin which results in 100% cpu
usage (stdin is always ready when it is /dev/null). This 100% cpu
qualifies for the fork-blocking, and since apt-get forks dpkg (which
again forks dpkg) during installation, dpkg never actually forks, so
apt-get stalls the entire system for hours if it ever even comes back.

I found a work around for *only* the apt-get or similar situations in
that running the process through nice(8) seems to allow fork to unwedge.
This is obviously impractical for low-latency services like databases,
etc, but can help for others.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Matt Wilson
Jordan,

Do you see this behavior at boot, or only after your instance has been
up and running for a while?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Alec Thomas
@Matt,

No I don't see anything particularly unusual in the logs at all :\

Also, as suggested by Gavin, I rebooted my EC2 instance until I got a
machine with CPU model 23 and have not had a repeat of the issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Matt Wilson
Alec,

Do you have instance IDs from your hanging instances?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Alec Thomas
The first one I rebuilt, as I thought it might have been related to a
dud EBS volume, but the current one is i-55a92d39. Mail me at
a...@swapoff.org if you want to take this offline.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-31 Thread Matt Wilson
If anyone has a machine that they can get into the hanging state (with
fork() blocking), can you run run echo w  /proc/sysrq-trigger as root
and post the results?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-30 Thread Matt Wilson
Hi Mike,

Let's focus on the fork() hangs in this bug. It's true that the two
could be related, but the symptoms don't quite line up.

You say you can reproduce the behavior on 2.3.32-311. Do you have a
procedure for getting an instance into the broken state, so you can then
cause fork() hangs with spinning CPUs?

Also, is irqbalance running on your instances? If not, does the fork()
hanging behavior change if irqbalance is running on the instance
(started at boot)?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-30 Thread Mike Malone
Matt,

Still no way to reproduce deterministically. I've just been running
variations of the test I posted above, writing to /dev/null and setting
timer signals. At some point the tests/system start hanging.

irqbalance is not running on any of the test instances that are hanging
(it appears to be started automatically on the Maverick ami, but that
node is happy). I believe we've run it in the past on some nodes. I'll
check if we there are any that are hanging that also have irqbalance
started at boot. I know that I personally started irqbalance on some
nodes after boot to see if that would fix things (to no avail), but it
wasn't started at boot.

If we don't have any running nodes with irqbalance I'll have to reboot a
node and start over, which may take some time.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-30 Thread Matt Wilson
If you have an instance in a state where fork() will hang if you spin a
CPU, it would be a good experiment to see if irqbalance helps at all.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-29 Thread Mike Malone
I am able to reproduce this behavior on an instance running kernel
version 2.3.32-311 using ami-f8f40591. As with the older kernel, new
instances don't immediately exhibit symptoms.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-29 Thread Mike Malone
Fat fingered that kernel version. Should be 2.6.32-311-ec2.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-29 Thread Mike Malone
Also, potentially related, the 2.6.32 kernels seem to sometimes drop
timer signals for CLOCK_PROCESS_CPUTIME_ID and continue to report
unusual process CPU times in system monitoring tools like top and via
the proc filesystem. I can reproduce the signal behavior with this
program: https://gist.github.com/14585329e013d1bf5134 -- some runs are
uneventful and proceed as expected, others hang forever without ever
receiving a signal.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Stefan Bader
Can someone confirm whether this is an issue with a recent version of he
kernel. Unsuccessfully tried the sample program with 2.6.32-311-ec2.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Scott Moser
ami-fd4aa494 is one of our official images:  us-east-1 ubuntu-
lucid-10.04-amd64-server-20100427.1

That said, its very old.  The build there (unless changed) is running kernel 
linux-image-2.6.32-305-ec2 2.6.32-305.9
The most recent lucid amd64 is:
  ami-f8f40591 ubuntu-lucid-10.04-amd64-server-20101228
running kernel linux-image-2.6.32-311-ec2 2.6.32-311.23

You can see the extensive list of changes between the two kernel versions at
http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-ec2/linux-ec2_2.6.32-311.23/changelog

You should be able to try the new kernel by
a.) specify it at launch time 
ec2-run-instances --kernel aki-74f4051d ami-fd4aa494 ...
   Then, apt-get update  apt-get upgrade to get yourself the kernel modules 
to match the running kernel.
   This should work, as the lucid kernel has networking and disk builtin
b.) try the newer ami 'ami-f8f40591'
c.) install grub-legacy-ec2, stop instance, start instance with pv-grub aki 
aki-427d952b.  From then on, you could reboot into a different kernel.

Determining if we can reproduce this on a recent kernel will just give
us much more information to go from, and avoid chasing ghosts.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Mike Malone
We are running ami-fd4aa494 with 2.6.32-305-ec2 in us-east. I'll see
what I can do about setting up a couple nodes with the more recent
2.6.32 kernel build and report back.

We've already started running a few Maverick instances with
2.6.35-24-virtual, and so far they appear to be more stable.
Unfortunately, the issue is not easy to reproduce initially (as
evidenced by the efforts in this thread). Recently restarted instances
appear to be more stable than those that have been running for a while
under load. It looks like some heisenbug gets the system into a sideways
state, and once that happens you can lock things up pretty
deterministically with something as trivial as a tight loop. So it's
possible that Maverick will go sideways too at some point and we simply
haven't seen it yet. Hard to say for sure without knowing what the
trigger is.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Paul Lathrop
We've booted up a couple instances using the new kernel and are
attempting to break them. As Mike pointed out, it is hard for us to
prove a negative; this is something that we can reliably reproduce once
we get the machine to do it once, but before that it is inconsistent.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-28 Thread Mike Malone
We're still working on a way to repro this on a new instance. Meanwhile,
we're moving forward with testing on the newer 2.6.32 kernel and on
Maverick's 2.6.35 kernel.

One observation we have made is that if we run the libctest in a loop
(`while :; do ./libctest; done`) on 2.6.32 it will eventually hang the
process (apparently forever, I've left it as long as 4 hours). This
behavior is reproducible on a fresh instance and happens on both
2.6.32-305 and 2.6.32-311. We are NOT able to repro on 2.6.35. Could be
unrelated, or expected behavior, but from my review of the libctest code
what we're seeing does appear pathological.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
This is a transcription of the test program from the youtube video:


#include unistd.h
#include stdlib.h
#include stdio.h

int main(int argc, char **argv) {
  int children = 0;
  int status;
  int i = 0;

  if (argc  2) {
printf(Usage: %s children\n, argv[0]);
return 1;
  }

  children = atoi(argv[1]);
  for (i = 0; i  children; i++) {
printf(Starting child %d\n, i);
if (fork() == 0) {
  fprintf(stderr, child%d up %d\n, i, getpid());
  while (1) {
/* Nothing, just spin */
  }
} else {
  // parent continues
}
  }

  wait(status);
  return 0;
}


** Visibility changed to: Private

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
On a system in this condition, sometimes hung task traces are seen:


kernel: [65098.694112] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
disables this message.
kernel: [65098.694117] cron D 880001885380 0 21248 569 0x
kernel: [65098.694121] 880772e25d20 0282 0001 
880772e25ca0
kernel: [65098.694124] 0002 880772e25ce8 8802e100c678 
880772e25fd8
kernel: [65098.694126] 8802e100c2c0 8802e100c2c0 8802e100c2c0 
880772e25fd8
kernel: [65098.694128] Call Trace:
kernel: [65098.694137] [810e312a] ? cache_alloc_refill+0x6a/0x270
kernel: [65098.694141] [814a199d] schedule_timeout+0x1dd/0x2c0
kernel: [65098.694145] [814a0b02] wait_for_common+0xf2/0x1d0
kernel: [65098.694150] [81030260] ? default_wake_function+0x0/0x10
kernel: [65098.694152] [814a0c78] wait_for_completion+0x18/0x20
kernel: [65098.694156] [8103ba19] do_fork+0x149/0x430
kernel: [65098.694160] [81109f04] ? mntput_no_expire+0x24/0x110
kernel: [65098.694164] [810534e0] ? set_one_prio+0x70/0xd0
kernel: [65098.694168] [8100fe60] sys_vfork+0x20/0x30
kernel: [65098.694171] [81009f73] stub_vfork+0x13/0x20
kernel: [65098.694174] [81009ba8] ? system_call_fastpath+0x16/0x1b

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
It seems that this reproduction case only happens after the system has
been used for some unknown amount of time. At that point, fork() hangs
can be triggered at will. If the instance is rebooted, the test case no
longer causes hangs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Robbie Williamson
** Visibility changed to: Public

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
Attaching /proc/slabinfo from a system that can be used to cause fork()
hangs.

** Attachment added: /proc/slabinfo from a sick instance
   
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+attachment/1811279/+files/slabinfo.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Mikhail P
Quick note: it is not just Lucid; seeing similar behavior in Karmic. In
fact, Matt, the node we looked at today was a Karmic node.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Mikhail P
Argh never mind, had my alphabet backwards. It was a 10.04 LTS aka Lucid
node.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
Some discussions on this are at http://twitter.com/#!/mjmalone

Video posted by
http://twitter.com/#!/jordansissel/status/30421571315175425

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
The following kernel stack was captured on a system in fork() hangs
state via echo t  /proc/sysrq-trigger. The code for libctest is here:
https://gist.github.com/2d2b78987ea451c2edd6

6[853486.204130] libctest  R  running task0 13658   1417 
0x
4[853486.204132]    88088b03a980 
8800e3489ea8
4[853486.204134]  810909f7 88088b03aa48 88088b03a980 
8800e3489ec8
4[853486.204137]  81109f04 880889a27c80 8002 
8804d4470680
4[853486.204139] Call Trace:
4[853486.204141]  [810909f7] ? __call_rcu+0x77/0x1c0
4[853486.204143]  [81109f04] ? mntput_no_expire+0x24/0x110
4[853486.204145]  [811078ab] ? alloc_fd+0x4b/0x160
4[853486.204148]  [810f79f0] ? putname+0x30/0x50
4[853486.204150]  [810eb526] ? do_sys_open+0x106/0x160
4[853486.204152]  [810eb5ab] ? sys_open+0x1b/0x20
4[853486.204154]  [81009ba8] ? system_call_fastpath+0x16/0x1b
4[853486.204157]  [81009b40] ? system_call+0x0/0x52
4[853486.204159]  [81009b40] ? system_call+0x0/0x52

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Gavin
I work at a company that has had very similar issues.

One of the suspected theories (because this seems to strike us even
within the JVM where we aren't forking) is that it's a problem between
the hypervisor and the hardware.

What kind of hardware were you running when you found this problem? We
found that there are at least three varieties of m1.xlarge CPU
architectures (cat /proc/cpuinfo). Easiest way to distinguish is by
checking the cpu cache size. We've seen 1M cache, 4M and 6M. The problem
you're detailing has been consistently reproduced for us on the 1M cache
variety of instances.

Is this true for you, too?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Gavin
Bad architecture:

vendor_id   : GenuineIntel
cpu family  : 6
model   : 26
model name  : Intel(R) Xeon(R) CPU   E5507  @ 2.27GHz
stepping: 5
cpu MHz : 2266.746
cache size  : 4096 KB

Good architecture:

vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5430  @ 2.66GHz
stepping: 10
cpu MHz : 2660.000
cache size  : 6144 KB

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Gavin
Correction, we see the problem with the 4M cache CPUs.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
Gavin,

Can you reproduce the issue at will? I'm struggling to find a way to
reproduce the issue on a freshly booted instance.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Mike Malone
The node we were working on this morning was:

vendor_id   : GenuineIntel
cpu family  : 6
model   : 26
model name  : Intel(R) Xeon(R) CPU   X5550  @ 2.67GHz
stepping: 5
cpu MHz : 2666.760
cache size  : 8192 KB

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2

2011-01-27 Thread Matt Wilson
Mike, can you click on the affects me for this bug?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is a direct subscriber.
https://bugs.launchpad.net/bugs/708920

Title:
  Strange 'fork/clone' blocking behavior under high cpu usage on EC2

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs