[lxc-users] Getting kdump to work on an LXC server

2014-09-11 Thread Rod Bruce
Greetings,
I have been working on a problem the last couple of days and I believe I
have come up with a solution so I thought I would share it with the list
in case anybody else runs into this or someone has a better solution.


Problem:
I have had a server running Ubuntu 14.04 hang a couple of times. I try
to run everything using standard Ubuntu packages. The server is an LXC
host with two containers running on it (but several more planned). I
wanted to get a kernel core dump if it hung again so I started
investigating kdump/kexec. I installed, configured, and tested
kdump/kexec on another server and it worked as advertised. However, when
I tried it on the LXC server it would save the core dump OK but the
server would fail to reboot or hang at some other point in the process.

I noticed that when kexec was booting the secondary kernel it was
starting up all of the services that start on a normal boot, including
LXC, and that seemed to be causing a problem. When I set the containers
to not auto boot, kdump worked as expected. However, we want the
containers to auto boot so I had to come up with a different solution.


Things I tried that did not work:

- I added the parameter KDUMP_RUNLEVEL="1" to the
/etc/default/kdump-tools file. KDUMP_RUNLEVEL="1" is something I found
mentioned on a couple of pages but it is not in any of the man pages or
Ubuntu documentation.

- I uncommented the KDUMP_CMDLINE_APPEND parameter in the
/etc/default/kdump-tools file and changed the line to
KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb 1" which would tell kexec
to boot into single-user mode. This did boot to single-user mode,
however single-user mode is not adequate because it asks for a root
password (for which there is a work-around) but it also does not mount
extra file systems (like /var/crash).


The solution I came up with:

I changed the default run-level from 2 to 3, set LXC to not start on
run-level 2, and configure kdump to boot to run-level 2. Historically,
run-level 2 was multi-user mode without networking and run-level 3 was
the same as 2 but with network support enabled. As far as I can tell, at
least with a standard Ubuntu 14.04 server install there is no difference
between run-levels 2 and 3.


Here are the details:

1. Change the default run-level from 2 to 3:

sudo sed -i "s/^env DEFAULT_RUNLEVEL=2/env DEFAULT_RUNLEVEL=3/"
/etc/init/rc-sysinit.conf

2. Set LXC to not start on run-level 2:

sudo sed -i "s/^start on runlevel \[2345\]/start on runlevel \[345\]/"
/etc/init/lxc.conf

sudo sed -i "s/^stop on starting rc RUNLEVEL=\[016\]/stop on starting rc
RUNLEVEL=\[0126\]/" /etc/init/lxc.conf

3. Configure kdump to boot to run-level 2:

sudo sed -i "s/^#KDUMP_CMDLINE_APPEND=\"irqpoll maxcpus=1
nousb\"/KDUMP_CMDLINE_APPEND=\"irqpoll maxcpus=1 nousb 2\"/"
/etc/default/kdump-tools


After I made these changes I rebooted the server, ran some tests and
everything seems to be working.


-- 

Rod Bruce
UNIX System and Network Administrator
PALS, A Program of the
Minnesota State Colleges and Universities
rod.br...@mnsu.edu
507.389.2000

Quis custodiet ipsos custodes?
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] Getting kdump to work on an LXC server

2014-09-12 Thread Rod Bruce
On 09/11/2014 04:07 PM, Serge Hallyn wrote:
> Quoting Rod Bruce (rod.br...@mnsu.edu):
>> Greetings,
>> I have been working on a problem the last couple of days and I believe I
>> have come up with a solution so I thought I would share it with the list
>> in case anybody else runs into this or someone has a better solution.
>>
>>
>> Problem:
>> I have had a server running Ubuntu 14.04 hang a couple of times. I try
>> to run everything using standard Ubuntu packages. The server is an LXC
>> host with two containers running on it (but several more planned). I
>> wanted to get a kernel core dump if it hung again so I started
>> investigating kdump/kexec. I installed, configured, and tested
>> kdump/kexec on another server and it worked as advertised. However, when
>> I tried it on the LXC server it would save the core dump OK but the
>> server would fail to reboot or hang at some other point in the process.
>>
>> I noticed that when kexec was booting the secondary kernel it was
>> starting up all of the services that start on a normal boot, including
>> LXC, and that seemed to be causing a problem. When I set the containers
> 
> Do you have any idea why it was causing a problem?

I do not.

> 
> Now that you are kexecing into runlevel 2, after you do that, are you
> able to start the lxc service and lxc container by hand?

When kexec kicks off the secondary kernel as part of the kdump process
it writes out the kernel core dump and, when that is finished, the
server reboots. This is the result I was working toward. With this
scenario there is not a chance to start lxc.

I will try to simulate this by kexecing manually into runlevel 2 and
then starting up lxc and report back.

> _______
> lxc-users mailing list
> lxc-users@lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users
> 


-- 

Rod Bruce
UNIX System and Network Administrator
PALS, A Program of the
Minnesota State Colleges and Universities
rod.br...@mnsu.edu
507.389.2000

Quis custodiet ipsos custodes?
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users