Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-28 Thread Alex Harrington
I'm afraid it's not practical to test on the newest Ubuntu in this
environment.

I'm absolutely convinced this is sshd related because if I take SSH
out of the equation, I cannot reproduce this at all, yet it does it
consistently when SSH is used to execute commands on the box as part
of the backup process.

I've worked around it for now by switching the host that runs the
backup script and it appears to be stable. I'm happy to leave it at
that as it's now at least working.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-19 Thread Alex Harrington
 Please install linux-crashdump package and reboot the machine. This will
 enable core dump once the crash happens again.

I have done already and there was no core dump stored.

I have to get this fixed as it's causing us problems now. I'm going to
reinstall 10.04.02 32 bit and see if we still have a problem as that
server only had this problem after moving to 64 bit and it ran solidly
on 32 bit for two years before.

I can move the software I needed 64 bit for elsewhere.

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-18 Thread Alex Harrington
The box messed about again yesterday. This time I was still able to
Ctrl+F1, Ctrl+F2 between terminals but it wouldn't accept any console
input. Nor would it respond to anything except pings on the LAN.

The keyboard Num/Caps/Scroll were flashing on the keyboard. On reset
there doesn't seem to have been a core dump though.

This time OCFS2 was definitely not involved as it wasn't in use at the
time.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to ocfs2-tools in ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-18 Thread Alex Harrington
The box messed about again yesterday. This time I was still able to
Ctrl+F1, Ctrl+F2 between terminals but it wouldn't accept any console
input. Nor would it respond to anything except pings on the LAN.

The keyboard Num/Caps/Scroll were flashing on the keyboard. On reset
there doesn't seem to have been a core dump though.

This time OCFS2 was definitely not involved as it wasn't in use at the
time.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-10 Thread Alex Harrington
I'm not at work this week, but I'll look at it when I get back. Touch
wood the extra RAM seems to have stopped it happening so far this
week.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to openssh in ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-10 Thread Alex Harrington
I'm not at work this week, but I'll look at it when I get back. Touch
wood the extra RAM seems to have stopped it happening so far this
week.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-03 Thread Alex Harrington
It's using the native DLM. However, it's the only node in the cluster.
Our live VM environment is Ubuntu server running KVM and OCFS2 and we
have 2 nodes. This box is just a backup server which takes an lvm2
snapshot of the OCFS2 filesystem, exports it over iSCSI and then as a
separate cluster mounts the filesystem and copies the VM images off.

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to openssh in ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-03 Thread Alex Harrington
The OCFS2 filesystem is held on an OpenFiler box.

Cluster A is our live VM system with two nodes.
Cluster B is the backup system (the box in question).

Once a week, the OpenFiler box takes an LVM snapshot of the filesystem
normally used by cluster A and exports it as a new iSCSI target.
The backup server, which is the sole member of cluster B then connects
to the new iSCSI target and mounts the OCFS2 filesystem.
It then copies the VM images from inside the OCFS2 filesystem to local disk
It then unmounts the OCFS2 filesystem, disconnects from the iSCSI target.
The OpenFiler box then stops exporting the backup target, and destroys
the snapshot.

So yes, the backup server is using OCFS2 properly, but not in the
production environment cluster since we're dealing with historical
data in the snapshot.

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to openssh in ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-03 Thread Alex Harrington
It's using the native DLM. However, it's the only node in the cluster.
Our live VM environment is Ubuntu server running KVM and OCFS2 and we
have 2 nodes. This box is just a backup server which takes an lvm2
snapshot of the OCFS2 filesystem, exports it over iSCSI and then as a
separate cluster mounts the filesystem and copies the VM images off.

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-03 Thread Alex Harrington
The OCFS2 filesystem is held on an OpenFiler box.

Cluster A is our live VM system with two nodes.
Cluster B is the backup system (the box in question).

Once a week, the OpenFiler box takes an LVM snapshot of the filesystem
normally used by cluster A and exports it as a new iSCSI target.
The backup server, which is the sole member of cluster B then connects
to the new iSCSI target and mounts the OCFS2 filesystem.
It then copies the VM images from inside the OCFS2 filesystem to local disk
It then unmounts the OCFS2 filesystem, disconnects from the iSCSI target.
The OpenFiler box then stops exporting the backup target, and destroys
the snapshot.

So yes, the backup server is using OCFS2 properly, but not in the
production environment cluster since we're dealing with historical
data in the snapshot.

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-02 Thread Alex Harrington
 Has every instance of this bug involved ssh in the stack trace?

All the ones we've had to date. I initially thought it was apparmor
related because it mentions memory allocation etc but having removed
the packages and rebuilt initramfs I guess not.

If it happens again from now on, I'll make sure we always get a photo.

 Can you describe the software raid stack in detail?

It's really simple. 4x 2TB drives, each with a 2 partitons, the first
being 2GB and the remainder. sda1 and sdb1 are RAID1 as /boot, sdc1
and sdd1 are RAID1 for swap and the sda2,b2,c2,d2 are RAID5 as /
Currently the / filesystem is about 70% full.

 My first guess would be that OCFS is to blame.  Would it be possible to
 run this server for awhile without it, or is that impossible?  Can you
 either do without the SAN, or mount it as another fs type?

It's not possible really. The machine only mounts a snapshot of our
live SAN via iSCSI once a week to copy over virtual machine images. It
TENDS to lock up at those times, however it's also by far the busiest
time on the box as it'll be running rsnapshot backups at the same time
too. It's not unsual for the box to have 180% IO WAIT across the two
CPUs at those times.

We've put more RAM in the box again this week to see if that solves
it. I'm wondering if it's starting to swap to the point that the box
is getting overloaded. Time will tell if that has any effect. It's
hard because munin gives some idea of things like load/memory usage as
the box loads up, but isn't frequent enough to catch a spike!

Cheers

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to openssh in ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


Re: [Bug 726461] Re: sshd on lucid causes kernel panic

2011-03-02 Thread Alex Harrington
 Has every instance of this bug involved ssh in the stack trace?

All the ones we've had to date. I initially thought it was apparmor
related because it mentions memory allocation etc but having removed
the packages and rebuilt initramfs I guess not.

If it happens again from now on, I'll make sure we always get a photo.

 Can you describe the software raid stack in detail?

It's really simple. 4x 2TB drives, each with a 2 partitons, the first
being 2GB and the remainder. sda1 and sdb1 are RAID1 as /boot, sdc1
and sdd1 are RAID1 for swap and the sda2,b2,c2,d2 are RAID5 as /
Currently the / filesystem is about 70% full.

 My first guess would be that OCFS is to blame.  Would it be possible to
 run this server for awhile without it, or is that impossible?  Can you
 either do without the SAN, or mount it as another fs type?

It's not possible really. The machine only mounts a snapshot of our
live SAN via iSCSI once a week to copy over virtual machine images. It
TENDS to lock up at those times, however it's also by far the busiest
time on the box as it'll be running rsnapshot backups at the same time
too. It's not unsual for the box to have 180% IO WAIT across the two
CPUs at those times.

We've put more RAM in the box again this week to see if that solves
it. I'm wondering if it's starting to swap to the point that the box
is getting overloaded. Time will tell if that has any effect. It's
hard because munin gives some idea of things like load/memory usage as
the box loads up, but isn't frequent enough to catch a spike!

Cheers

Alex

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/726461

Title:
  sshd on lucid causes kernel panic

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs