[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I want to report that the CPU time bug is still present in the following kernel 2.6.32-317-ec2 on AWS instance m1.large with CPU model name : Intel(R) Xeon(R) CPU E5507 @ 2.27GHz I logged the output of ps every minute and I can now show the bug happening, as you can see the CPU time originally at 35790:43 does get corrupted and the next minute jumps at 17179869:11 to reach 29322904:44 1 minute later www-data 8269 31.3 9.2 853516 725496 ? Sl Aug23 35789:36 /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... www-data 8269 31.3 9.2 866128 731640 ? Sl Aug23 35790:09 /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... www-data 8269 31.3 9.3 868704 738800 ? Sl Aug23 35790:43 /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... www-data 8269 31.3 9.4 872964 742660 ? Sl Aug23 35791:17 /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... www-data 8269 120093018 9.4 885060 748036 ? Sl Aug23 17179869:11 /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... www-data 8269 10652 9.3 868748 739144 ? Sl Aug23 29322904:44 /usr/local/lib/erlang/erts-5.8.4/bin/beam.smp -P ... It took us around 3 months running this process at around 30% load to reach this condition Because of this problem the machine currently reports a load of 0 when instead we know it should report between 20 and 40% depending by the moment of the day. Please let me know If I need to open a specific ticket for this issue. Today we're going to upgrade to 2.6.32-319-ec2 thanks, Paolo Negri -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Reading back through the previous comments I think this should be handled a a separate issue. The original issue caused processes not to be forked while I understand this seems to be rather load/system time related but not as fatal as the main issue of this report. Could you please open a new ticket for this? And let me know the bug number. Thanks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
new ticket opened at https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/889012 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Based on the latest comments (unfortunately this bug was not referenced when committing the change), I am closing this bug as fixed. If anybody thinks it is still a problem, feel free to re-open it (or a new report). Thanks. ** Changed in: linux-ec2 (Ubuntu) Status: New = Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
FWIW, I upgraded to 2.6.32-316-ec2 #31 on Ubuntu 10.04 running on an E5507 and I have not run into this issue on any of the boxes I'm running since the upgrade. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hi Stefan, we experienced the kernel lockups under heavy load on Ubuntu 10.04 running the standard 2.6.32-314-ec2 kernel from the ami-3202f25b (us- east-1). The underlying hardware is a Intel(R) Xeon(R) CPU E5507 @ 2.27GHz w/ 4MB cache. I have installed the linux- image-2.6.32-317-ec2_2.6.32-317.31~preview2_amd64 from .deb and now the lockups are gone. Cheers, Jan -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Jan, hopefully we got the fix for this in the next official update. It currently is waiting to clear proposed but was delayed by findings in some of the changes used by all kernels. But thanks for testing the preview kernel. That is build from the bigger set of changes which I would like to merge in the future. And positive feedback helps to argue for that change. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Ah, good to know (#62). Next time we'll use the new instances. By the way, it's running non-stop since last week. Great job! -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hey, Stefan, We're having instances hanging throughout the day (https://forums.aws.amazon.com/thread.jspa?threadID=68692) and it looks like it may be related to this. Looking at your comment (#53) I wanted to apply the changes you mentioned, but I need some help on how to install them. I'm familiar with apt-get install, etc, but I'm not sure what I need to do to use the packages you prepared. Would you let me know how I can do that? Thank you. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Nevermind, I found out how to do that, i.e. launch new instances using AMI (ami-40db2229) with matching AKI (aki-427d952b). It's all running now... hopeful that it won't hang anymore. Thank you! -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Eduardo, Just fyi, any recent (this year) lucid image will already use the pv-grub kernels by default. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hi Robert, thanks for the update and really good to hear. The bug you mentioned might be one of the possible reasons. There also have been changes to change timer and ipi interrupts to be per-cpu and the rest to use fasteoi instead of level (moving around masking code as well). Having irqs use the level based handler was claimed to potentially cause missing interrupts which may or may not have some impact in this case. The other changes to process time accounting probably only have impact on runtime output which was showing impossible values (larger than uptime) but who knows. So yeah, one of the above likely. :) -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Robert, Can you confirm what CPU is running on that node? The Librato guys have apparently narrowed it down to only happening on the Intel E5507. See this blog post for more: https://silverline.librato.com/blog/main/EC2_Users_Should_be_Cautious_When_Booting_Ubuntu_10_04_AMIs I checked one of our problem nodes that's still up and sure enough, it's running on an E5507. We have another node using the same CPU but with the same test kernel that was posted by Stefan above. So far, it looks ok, but we haven't put too much load on it yet and it's only been about 4 days. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Which would be exactly the bug Robert was pointing to. Hrm, it sounds more and more sensible to pick fix up sooner than later as the full changeset is harder to get approved. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hi Stefan, Just an update. That same node has run flawlessly under high load since my last post, no lockups. What are your thoughts on the claim that this bug https://bugs.launchpad.net/ubuntu/+source/linux/+bug/727459 is the cause of the problem? B. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Stefan, I've been testing your kernels and stability seems much improved. It's only been a few days but the server has been under quite a lot of load without an issue, with the stock ec2 kernel this was definitely not the case. Cautious optimism, but I'd love to isolate which of the patches was the fix. do you have a list of the patches you incorporated? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Robert, that really sounds like good news. To answer the question what exactly improved things could get a bit difficult. It also depends a bit what exactly the last stock ec2 kernel was you are comparing against. I had been doing a first round of changes which Jeremy has been testing (from what he wrote I was assuming that the fork/clone issue was gone but still some process time accounting going wrong). With this second, bigger round of changes, there were changes to the interrupt handling and also some timer, and accounting changes which could have have an effect. But also quite a bit more. If you want to look at the changes yourself, I try to keep the delta up to date as long as it is not in the stock kernel at git://kernel.ubuntu.com/smb/ubuntu-lucid.git ec2-next. The reason this quite big is a bit of a long story. In short the ec2 code is based on a patch-set that is rebased. And we don't know exactly against what. So individual patches can change without proper explanation why. I tried to document things as much as I was able, still it is not always nice. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
This is a general request for some testing help for Ubuntu 10.04 (Lucid) on ec2. If there are other problem reports about strange process cpu times or hangs without any message it would be good to pass this on. Trying to pull in a wide range of upstream fixes, I have prepared linux-ec2 packages that include all the fixes to Xen code which could be found in other places. This is quite a big change, so while I have successfully booted the test kernels on various ec2 instances, this usually does not really provide much insight about what happens when facing more or less real workloads. So anybody who is willing and able to help testing would be highly welcome. The kernel packages are at: http://people.canonical.com/~smb/lucid-ec2-ng/ To use those, http://uec-images.ubuntu.com/lucid/current/ has current AMIs and using --kernel with a matching AKI from https://lists.ubuntu.com/archives/ubuntu-cloud/2010-December/000466.html will use pv-grub for booting. So the new kernel can just be installed via dpkg and activated by rebooting. Please report back anything that seems to be worse or better than before. Thanks! -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Since no one developed a reproducible test case it is, unfortunately, difficult to say whether this bug is resolved. We moved to a 2.6.35 series kernel and stopped seeing this particular problem (although we have seen other problems that are eerily similar). All of the information to reproduce the behavior is in the comments above, but it does take a lot of time and patience to trigger it. Either that, or hundreds of instances running with production load. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
The main problem here is the question whether there still is a problem. There has been a first round of cleanup patches going into the ec2 kernels with 2.6.32-314.27. The last update/response I got was from Jeremy and he seemed at least not to see the blocking behavior with those. So as long as there are no confirmations that this particular problem was still observed with kernels after 314.27, I would tend to close this ticket as resolved. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
This ticket has been idle for a month, anyone have a resolution yet? :( -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Stefan, The test machine did not exhibit the fork/clone behavior in this bug. The wrong cpu times were pretty bad though and perhaps unrelated to this bug. The experiment lasted 3 days on a loaded 4096meg cache large instance. That *should have been long enough but with this bug it's tough to tell. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hi Jeremy, I have not heard back, so I am wondering whether the hang/crashing occurred after all? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Stefan, I have some boxes that seem to reproduce this behaviour rather frequently. This is great news since this bug is so hard to reproduce. I just launched with the new testing kernels you provided (aki-9ab546f3 x86_64) on a server that has 4096meg cache (the bad behaving size). Unfortunately I'm already seeing signs that this kernel has the CPU reporting bug where CPU times are in the thousands of days. Usually this means the box will be crashing in the next 1-2 days.. Will post back when I confirm that the fork bug still exists in these kernels. The CPUtime bug still exists. It was speculated that these were unrelated.. This machine was just launched and within minutes it displayed wrong cpu times. process list snip from top: 26614 root 20 0 570m 448m 4972 S0 2.9 0:39.41 ruby 26671 root 20 0 539m 419m 4948 S0 2.7 0:35.79 ruby 28441 root 20 0 528m 409m 4212 S0 2.7 24207187w ruby 26310 root 20 0 523m 405m 5040 S3 2.6 0:51.46 ruby 29320 root 20 0 523m 405m 4948 S0 2.6 91970855d ruby -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Same as Alex - we were having this issue with our production servers on 2.6.32-309-ec2 kernel. After upgrading to Maverick (2.6.35-24-virtual) issue is gone. So far (5 days) we haven't met this bug. I would also know about kernel version that doesn't have this issue anymore. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Well maybe the kernel I posted. Kind of hard to say without anybody testing it. ;-) The problem is that Lucid and Maverick kernels for EC2 are very different approaches. Maverick already has enough Xen code in the mainline kernel that only the configuration is different to the normal kernel. Lucid required a substantial number of patches on top of the code that was upstream and the approach used there has the risk of getting a skew between changes to the generic kernel code and the xen code. In the test kernels, at least that skew should be fixed and maybe that was causing this issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
We are now seeing this on a whole bunch of production machines all running 10.04, 2.6.32-312-ec2. Has anyone confirmed a kernel (old or new) that doesn't have this bug? We've just upgraded one machine to 10.10 to see if it fixes the issue. We haven't tried Stefan's kernels, but may if 10.10 doesn't show improvement. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hi guys, I've been following this bug very closely and I just had it happen to me on an Instance running Maverick. Kernel version was 2.6.35-22-virtual and I realize a newer -25- is out. Will see if I can repro with that as well. This was a classic example of the bug, it happened during a large apt-get install sequence and stopped allowing forking of new processes. Anyone else seen this in Maverick (10.10)? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
While there is not yet a patch that I would suspect of being the one fixing this issue, there were a larger number of changes that had been done upstream but missed the Lucid ec2 kernel because it is using xen specific copies of x86 files. Some of them will have no effect as compile options cause the code not to be included but some were in areas which may have effect on this issue. I have prepared some test kernel packages (and also some test aki's) which contain the sync with all current stable updates. It would be awesome if someone could check that kernel and see whether it removes the problem already (at least not introduces new ones). I booted a i386 and amd64 instance with it and could not see immediate problems. The debian packages (for pv-grub) would be at http://people.canonical.com/~smb/lp708920/, the aki's are: us-west-1 aki-4faefe0ax86_64 us-east-1 aki-9ab546f3x86_64 eu-west-1 aki-bdf2c6c9x86_64 ap-southeast-1 aki-8689f7d4x86_64 us-west-1 aki-49aefe0ci386 us-east-1 aki-6cb44705i386 eu-west-1 aki-a7f2c6d3i386 ap-southeast-1 aki-8c89f7dei386 -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I tried the approach from comment #30 and also comment #37 with ami- fa01f193 in us-east. This runs a 2.6.32-312.24 kernel currently in proposed and was running on hw that showed 6M cache in cpuinfo. Both methods did not cause the scheduler bug for me. I will have a look at the source code and see whether something can be found there. This unfortunately is a bit bulky. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
The commit mentioned in comment #39 is included in the Ubuntu-2.6.32-311.23 EC2 kernel. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Mike, You bring up a good point about CFS' need for good process time accounting. I think that this upstream patch may fix a lot of problems: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8a22b9996b001c88f2bfb54c6de6a05fc39e177a This patch is in 2.6.34.7, may not be in the 10.04 kernel. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hey Matt, I ran it once while the system was idle: https://gist.github.com/fb35566354afc442bf2d And then again in a tight loop with a 50ms sleep while I locked the system up, in the hopes of catching something interesting just before or during the period when the system was locked: https://gist.github.com/ce0887dcdc125afcca94 -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I've done a lot of looking at this today. It feels like the problem may lie in the process scheduler. When I pin the CPU burning process to CPU0 (through taskset -pc 0 $pid_printed_by_a_out), and pin a bash shell also to CPU0, I see failure of the bash process to wake after sleeping (i.e., it's runnable, but CFS isn't giving it time). I've seen the bash process start to be scheduled after around 3 minutes, and I've also seen it just sit there. Every time I've seen a scheduler debug trace (triggered via echo w /proc/sysrq-trigger), there have been other runnable processes on the spinning CPU that don't seem to be getting scheduled at all. I've not been able to reproduce this problem on the kernel used in the Amazon Linux AMI (currently 2.6.34.7). This is in line with other user's observations (http://twitter.com/#!/synack/status/30415380321140737). I think that Canonical might need to look into what (if any) changes they've made to CFS in the 10.04 kernel tree. It's also possible that improvements have been made in CFS between 2.6.32 and 2.6.34 that account for better performance. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Matt, That's definitely in line with what we're seeing. Not to beat a dead horse, and I am not a kernel hacker (so I apologize for any naivety), etc... but that's basically what led me to believe the CLOCK_PROCESS_CPUTIME_ID timers, top issues, and other CPU time monitoring may be relevant here. Based on my rather high level understanding of CFS -- the big idea being that processes are scheduled based on their run time -- it seems plausible that a process that appears to be using no CPU time would continue to be prioritized by the scheduler. I'll leave any further analysis to someone who knows how these systems work. We have done a bit more testing on the 2.6.35 series kernels, with Ubuntu 10.10 (Maverick). So far we have not been able to reproduce this issue there. If we do I'll report back, but for now we are moving forward with migrating effected systems to Maverick. We're happy to assist with efforts to track this further, but I think we've documented most of what we've experienced. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I am running Eclipse on EC2 over NX and this bug surfaces quite quickly, in the form of total system hangs for .5 to 5 minutes at a time. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Alec, Do any hung task kernel stack traces get emitted during your hangs? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Starting irqbalance (without rebooting) on a node in a state where fork() will hang does not help. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I see this in a particularly annoying way. apt-get when run from another tool (like puppet) will have /dev/null for stdin. apt-get foolishly select(2)s on stdin which results in 100% cpu usage (stdin is always ready when it is /dev/null). This 100% cpu qualifies for the fork-blocking, and since apt-get forks dpkg (which again forks dpkg) during installation, dpkg never actually forks, so apt-get stalls the entire system for hours if it ever even comes back. I found a work around for *only* the apt-get or similar situations in that running the process through nice(8) seems to allow fork to unwedge. This is obviously impractical for low-latency services like databases, etc, but can help for others. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Jordan, Do you see this behavior at boot, or only after your instance has been up and running for a while? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
@Matt, No I don't see anything particularly unusual in the logs at all :\ Also, as suggested by Gavin, I rebooted my EC2 instance until I got a machine with CPU model 23 and have not had a repeat of the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Alec, Do you have instance IDs from your hanging instances? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
The first one I rebuilt, as I thought it might have been related to a dud EBS volume, but the current one is i-55a92d39. Mail me at a...@swapoff.org if you want to take this offline. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
If anyone has a machine that they can get into the hanging state (with fork() blocking), can you run run echo w /proc/sysrq-trigger as root and post the results? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Hi Mike, Let's focus on the fork() hangs in this bug. It's true that the two could be related, but the symptoms don't quite line up. You say you can reproduce the behavior on 2.3.32-311. Do you have a procedure for getting an instance into the broken state, so you can then cause fork() hangs with spinning CPUs? Also, is irqbalance running on your instances? If not, does the fork() hanging behavior change if irqbalance is running on the instance (started at boot)? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Matt, Still no way to reproduce deterministically. I've just been running variations of the test I posted above, writing to /dev/null and setting timer signals. At some point the tests/system start hanging. irqbalance is not running on any of the test instances that are hanging (it appears to be started automatically on the Maverick ami, but that node is happy). I believe we've run it in the past on some nodes. I'll check if we there are any that are hanging that also have irqbalance started at boot. I know that I personally started irqbalance on some nodes after boot to see if that would fix things (to no avail), but it wasn't started at boot. If we don't have any running nodes with irqbalance I'll have to reboot a node and start over, which may take some time. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
If you have an instance in a state where fork() will hang if you spin a CPU, it would be a good experiment to see if irqbalance helps at all. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I am able to reproduce this behavior on an instance running kernel version 2.3.32-311 using ami-f8f40591. As with the older kernel, new instances don't immediately exhibit symptoms. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Fat fingered that kernel version. Should be 2.6.32-311-ec2. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Also, potentially related, the 2.6.32 kernels seem to sometimes drop timer signals for CLOCK_PROCESS_CPUTIME_ID and continue to report unusual process CPU times in system monitoring tools like top and via the proc filesystem. I can reproduce the signal behavior with this program: https://gist.github.com/14585329e013d1bf5134 -- some runs are uneventful and proceed as expected, others hang forever without ever receiving a signal. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Can someone confirm whether this is an issue with a recent version of he kernel. Unsuccessfully tried the sample program with 2.6.32-311-ec2. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
ami-fd4aa494 is one of our official images: us-east-1 ubuntu- lucid-10.04-amd64-server-20100427.1 That said, its very old. The build there (unless changed) is running kernel linux-image-2.6.32-305-ec2 2.6.32-305.9 The most recent lucid amd64 is: ami-f8f40591 ubuntu-lucid-10.04-amd64-server-20101228 running kernel linux-image-2.6.32-311-ec2 2.6.32-311.23 You can see the extensive list of changes between the two kernel versions at http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-ec2/linux-ec2_2.6.32-311.23/changelog You should be able to try the new kernel by a.) specify it at launch time ec2-run-instances --kernel aki-74f4051d ami-fd4aa494 ... Then, apt-get update apt-get upgrade to get yourself the kernel modules to match the running kernel. This should work, as the lucid kernel has networking and disk builtin b.) try the newer ami 'ami-f8f40591' c.) install grub-legacy-ec2, stop instance, start instance with pv-grub aki aki-427d952b. From then on, you could reboot into a different kernel. Determining if we can reproduce this on a recent kernel will just give us much more information to go from, and avoid chasing ghosts. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
We are running ami-fd4aa494 with 2.6.32-305-ec2 in us-east. I'll see what I can do about setting up a couple nodes with the more recent 2.6.32 kernel build and report back. We've already started running a few Maverick instances with 2.6.35-24-virtual, and so far they appear to be more stable. Unfortunately, the issue is not easy to reproduce initially (as evidenced by the efforts in this thread). Recently restarted instances appear to be more stable than those that have been running for a while under load. It looks like some heisenbug gets the system into a sideways state, and once that happens you can lock things up pretty deterministically with something as trivial as a tight loop. So it's possible that Maverick will go sideways too at some point and we simply haven't seen it yet. Hard to say for sure without knowing what the trigger is. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
We've booted up a couple instances using the new kernel and are attempting to break them. As Mike pointed out, it is hard for us to prove a negative; this is something that we can reliably reproduce once we get the machine to do it once, but before that it is inconsistent. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
We're still working on a way to repro this on a new instance. Meanwhile, we're moving forward with testing on the newer 2.6.32 kernel and on Maverick's 2.6.35 kernel. One observation we have made is that if we run the libctest in a loop (`while :; do ./libctest; done`) on 2.6.32 it will eventually hang the process (apparently forever, I've left it as long as 4 hours). This behavior is reproducible on a fresh instance and happens on both 2.6.32-305 and 2.6.32-311. We are NOT able to repro on 2.6.35. Could be unrelated, or expected behavior, but from my review of the libctest code what we're seeing does appear pathological. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
This is a transcription of the test program from the youtube video: #include unistd.h #include stdlib.h #include stdio.h int main(int argc, char **argv) { int children = 0; int status; int i = 0; if (argc 2) { printf(Usage: %s children\n, argv[0]); return 1; } children = atoi(argv[1]); for (i = 0; i children; i++) { printf(Starting child %d\n, i); if (fork() == 0) { fprintf(stderr, child%d up %d\n, i, getpid()); while (1) { /* Nothing, just spin */ } } else { // parent continues } } wait(status); return 0; } ** Visibility changed to: Private -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
On a system in this condition, sometimes hung task traces are seen: kernel: [65098.694112] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. kernel: [65098.694117] cron D 880001885380 0 21248 569 0x kernel: [65098.694121] 880772e25d20 0282 0001 880772e25ca0 kernel: [65098.694124] 0002 880772e25ce8 8802e100c678 880772e25fd8 kernel: [65098.694126] 8802e100c2c0 8802e100c2c0 8802e100c2c0 880772e25fd8 kernel: [65098.694128] Call Trace: kernel: [65098.694137] [810e312a] ? cache_alloc_refill+0x6a/0x270 kernel: [65098.694141] [814a199d] schedule_timeout+0x1dd/0x2c0 kernel: [65098.694145] [814a0b02] wait_for_common+0xf2/0x1d0 kernel: [65098.694150] [81030260] ? default_wake_function+0x0/0x10 kernel: [65098.694152] [814a0c78] wait_for_completion+0x18/0x20 kernel: [65098.694156] [8103ba19] do_fork+0x149/0x430 kernel: [65098.694160] [81109f04] ? mntput_no_expire+0x24/0x110 kernel: [65098.694164] [810534e0] ? set_one_prio+0x70/0xd0 kernel: [65098.694168] [8100fe60] sys_vfork+0x20/0x30 kernel: [65098.694171] [81009f73] stub_vfork+0x13/0x20 kernel: [65098.694174] [81009ba8] ? system_call_fastpath+0x16/0x1b -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
It seems that this reproduction case only happens after the system has been used for some unknown amount of time. At that point, fork() hangs can be triggered at will. If the instance is rebooted, the test case no longer causes hangs. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
** Visibility changed to: Public -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Attaching /proc/slabinfo from a system that can be used to cause fork() hangs. ** Attachment added: /proc/slabinfo from a sick instance https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/708920/+attachment/1811279/+files/slabinfo.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Quick note: it is not just Lucid; seeing similar behavior in Karmic. In fact, Matt, the node we looked at today was a Karmic node. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Argh never mind, had my alphabet backwards. It was a 10.04 LTS aka Lucid node. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Some discussions on this are at http://twitter.com/#!/mjmalone Video posted by http://twitter.com/#!/jordansissel/status/30421571315175425 -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
The following kernel stack was captured on a system in fork() hangs state via echo t /proc/sysrq-trigger. The code for libctest is here: https://gist.github.com/2d2b78987ea451c2edd6 6[853486.204130] libctest R running task0 13658 1417 0x 4[853486.204132] 88088b03a980 8800e3489ea8 4[853486.204134] 810909f7 88088b03aa48 88088b03a980 8800e3489ec8 4[853486.204137] 81109f04 880889a27c80 8002 8804d4470680 4[853486.204139] Call Trace: 4[853486.204141] [810909f7] ? __call_rcu+0x77/0x1c0 4[853486.204143] [81109f04] ? mntput_no_expire+0x24/0x110 4[853486.204145] [811078ab] ? alloc_fd+0x4b/0x160 4[853486.204148] [810f79f0] ? putname+0x30/0x50 4[853486.204150] [810eb526] ? do_sys_open+0x106/0x160 4[853486.204152] [810eb5ab] ? sys_open+0x1b/0x20 4[853486.204154] [81009ba8] ? system_call_fastpath+0x16/0x1b 4[853486.204157] [81009b40] ? system_call+0x0/0x52 4[853486.204159] [81009b40] ? system_call+0x0/0x52 -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
I work at a company that has had very similar issues. One of the suspected theories (because this seems to strike us even within the JVM where we aren't forking) is that it's a problem between the hypervisor and the hardware. What kind of hardware were you running when you found this problem? We found that there are at least three varieties of m1.xlarge CPU architectures (cat /proc/cpuinfo). Easiest way to distinguish is by checking the cpu cache size. We've seen 1M cache, 4M and 6M. The problem you're detailing has been consistently reproduced for us on the 1M cache variety of instances. Is this true for you, too? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Bad architecture: vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU E5507 @ 2.27GHz stepping: 5 cpu MHz : 2266.746 cache size : 4096 KB Good architecture: vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz stepping: 10 cpu MHz : 2660.000 cache size : 6144 KB -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Correction, we see the problem with the 4M cache CPUs. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Gavin, Can you reproduce the issue at will? I'm struggling to find a way to reproduce the issue on a freshly booted instance. -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
The node we were working on this morning was: vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU X5550 @ 2.67GHz stepping: 5 cpu MHz : 2666.760 cache size : 8192 KB -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 708920] Re: Strange 'fork/clone' blocking behavior under high cpu usage on EC2
Mike, can you click on the affects me for this bug? -- You received this bug notification because you are a member of Ubuntu Bugs, which is a direct subscriber. https://bugs.launchpad.net/bugs/708920 Title: Strange 'fork/clone' blocking behavior under high cpu usage on EC2 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs