Re: [Bug 726461] Re: sshd on lucid causes kernel panic
I'm afraid it's not practical to test on the newest Ubuntu in this environment. I'm absolutely convinced this is sshd related because if I take SSH out of the equation, I cannot reproduce this at all, yet it does it consistently when SSH is used to execute commands on the box as part of the backup process. I've worked around it for now by switching the host that runs the backup script and it appears to be stable. I'm happy to leave it at that as it's now at least working. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
Please install linux-crashdump package and reboot the machine. This will enable core dump once the crash happens again. I have done already and there was no core dump stored. I have to get this fixed as it's causing us problems now. I'm going to reinstall 10.04.02 32 bit and see if we still have a problem as that server only had this problem after moving to 64 bit and it ran solidly on 32 bit for two years before. I can move the software I needed 64 bit for elsewhere. Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
The box messed about again yesterday. This time I was still able to Ctrl+F1, Ctrl+F2 between terminals but it wouldn't accept any console input. Nor would it respond to anything except pings on the LAN. The keyboard Num/Caps/Scroll were flashing on the keyboard. On reset there doesn't seem to have been a core dump though. This time OCFS2 was definitely not involved as it wasn't in use at the time. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to ocfs2-tools in ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
The box messed about again yesterday. This time I was still able to Ctrl+F1, Ctrl+F2 between terminals but it wouldn't accept any console input. Nor would it respond to anything except pings on the LAN. The keyboard Num/Caps/Scroll were flashing on the keyboard. On reset there doesn't seem to have been a core dump though. This time OCFS2 was definitely not involved as it wasn't in use at the time. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
I'm not at work this week, but I'll look at it when I get back. Touch wood the extra RAM seems to have stopped it happening so far this week. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to openssh in ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
I'm not at work this week, but I'll look at it when I get back. Touch wood the extra RAM seems to have stopped it happening so far this week. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
It's using the native DLM. However, it's the only node in the cluster. Our live VM environment is Ubuntu server running KVM and OCFS2 and we have 2 nodes. This box is just a backup server which takes an lvm2 snapshot of the OCFS2 filesystem, exports it over iSCSI and then as a separate cluster mounts the filesystem and copies the VM images off. Alex -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to openssh in ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
The OCFS2 filesystem is held on an OpenFiler box. Cluster A is our live VM system with two nodes. Cluster B is the backup system (the box in question). Once a week, the OpenFiler box takes an LVM snapshot of the filesystem normally used by cluster A and exports it as a new iSCSI target. The backup server, which is the sole member of cluster B then connects to the new iSCSI target and mounts the OCFS2 filesystem. It then copies the VM images from inside the OCFS2 filesystem to local disk It then unmounts the OCFS2 filesystem, disconnects from the iSCSI target. The OpenFiler box then stops exporting the backup target, and destroys the snapshot. So yes, the backup server is using OCFS2 properly, but not in the production environment cluster since we're dealing with historical data in the snapshot. Alex -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to openssh in ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
It's using the native DLM. However, it's the only node in the cluster. Our live VM environment is Ubuntu server running KVM and OCFS2 and we have 2 nodes. This box is just a backup server which takes an lvm2 snapshot of the OCFS2 filesystem, exports it over iSCSI and then as a separate cluster mounts the filesystem and copies the VM images off. Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
The OCFS2 filesystem is held on an OpenFiler box. Cluster A is our live VM system with two nodes. Cluster B is the backup system (the box in question). Once a week, the OpenFiler box takes an LVM snapshot of the filesystem normally used by cluster A and exports it as a new iSCSI target. The backup server, which is the sole member of cluster B then connects to the new iSCSI target and mounts the OCFS2 filesystem. It then copies the VM images from inside the OCFS2 filesystem to local disk It then unmounts the OCFS2 filesystem, disconnects from the iSCSI target. The OpenFiler box then stops exporting the backup target, and destroys the snapshot. So yes, the backup server is using OCFS2 properly, but not in the production environment cluster since we're dealing with historical data in the snapshot. Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
Has every instance of this bug involved ssh in the stack trace? All the ones we've had to date. I initially thought it was apparmor related because it mentions memory allocation etc but having removed the packages and rebuilt initramfs I guess not. If it happens again from now on, I'll make sure we always get a photo. Can you describe the software raid stack in detail? It's really simple. 4x 2TB drives, each with a 2 partitons, the first being 2GB and the remainder. sda1 and sdb1 are RAID1 as /boot, sdc1 and sdd1 are RAID1 for swap and the sda2,b2,c2,d2 are RAID5 as / Currently the / filesystem is about 70% full. My first guess would be that OCFS is to blame. Would it be possible to run this server for awhile without it, or is that impossible? Can you either do without the SAN, or mount it as another fs type? It's not possible really. The machine only mounts a snapshot of our live SAN via iSCSI once a week to copy over virtual machine images. It TENDS to lock up at those times, however it's also by far the busiest time on the box as it'll be running rsnapshot backups at the same time too. It's not unsual for the box to have 180% IO WAIT across the two CPUs at those times. We've put more RAM in the box again this week to see if that solves it. I'm wondering if it's starting to swap to the point that the box is getting overloaded. Time will tell if that has any effect. It's hard because munin gives some idea of things like load/memory usage as the box loads up, but isn't frequent enough to catch a spike! Cheers Alex -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to openssh in ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
Re: [Bug 726461] Re: sshd on lucid causes kernel panic
Has every instance of this bug involved ssh in the stack trace? All the ones we've had to date. I initially thought it was apparmor related because it mentions memory allocation etc but having removed the packages and rebuilt initramfs I guess not. If it happens again from now on, I'll make sure we always get a photo. Can you describe the software raid stack in detail? It's really simple. 4x 2TB drives, each with a 2 partitons, the first being 2GB and the remainder. sda1 and sdb1 are RAID1 as /boot, sdc1 and sdd1 are RAID1 for swap and the sda2,b2,c2,d2 are RAID5 as / Currently the / filesystem is about 70% full. My first guess would be that OCFS is to blame. Would it be possible to run this server for awhile without it, or is that impossible? Can you either do without the SAN, or mount it as another fs type? It's not possible really. The machine only mounts a snapshot of our live SAN via iSCSI once a week to copy over virtual machine images. It TENDS to lock up at those times, however it's also by far the busiest time on the box as it'll be running rsnapshot backups at the same time too. It's not unsual for the box to have 180% IO WAIT across the two CPUs at those times. We've put more RAM in the box again this week to see if that solves it. I'm wondering if it's starting to swap to the point that the box is getting overloaded. Time will tell if that has any effect. It's hard because munin gives some idea of things like load/memory usage as the box loads up, but isn't frequent enough to catch a spike! Cheers Alex -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/726461 Title: sshd on lucid causes kernel panic -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs