Re: [Autotest] kernel install issues

Lucas Meneghel Rodrigues Tue, 29 May 2012 17:30:14 -0700

On Tue, May 29, 2012 at 7:53 PM, DeFolo, Daniel <[email protected]> wrote:
> Hi,
>
> I'm assuming I'm hitting a defect, but I'm not sure if the norm is to just 
> file an issue on github or to send email first.
>
> From a simple kernel test job I'm seeing the following failure in the 
> debug/client.0.log after the kernel build (apparently from client/kernel.py 
> in _add_kernel_to_bootloader) :
>
> 21:02:46 INFO |        GOOD    build   kernel.install timestamp=1337994166   
> localtime=May 25 21:02:46
> 21:02:46 DEBUG| Persistent state client.steps now set to [([], 
> 'job.end_reboot_and_verify', [1337994166, '2.6.36-autotest::#1 SMP Fri May 25 
> 20:56:11 EDT 2012', 'build', []], {}), ([], 'step_test', 
> ('http://serverxxxx.hp.com/kernels/kernel-2.6.36.tar.bz2',), {})]
> 21:02:46 DEBUG| No kernel found for title "autotest". Assuming no entry 
> exists, and emulating boottool(.pl) behavior and being silent about it.
> 21:02:46 ERROR| grubby fatal error: unable to find a suitable template
> 21:02:47 DEBUG| Running 'touch /fastboot'
>
>
> That fatal error happens in the bootloader.add_kernel() call, but doesn't 
> result in giving up on the kernel install.  This results autotest proceeding 
> to reboot the system and it just sitting forever at the booting from disk C 
> due to there no longer being any kernels in the bootloader.   At that point 
> we just have to re-install the test system.
>
> This is on a test client with RHEL 6.2 server and with autotest from the 
> following commit ID installed:
> 716554702f8bbc86738b96272df0d27ce8be889c


This version has a bug, which I fixed two commits later:

commit a998776171dcb2591d20ced4db17d29f37f1add4
Author: Lucas Meneghel Rodrigues <[email protected]>
Date:   Mon May 21 19:09:07 2012 -0300

    hosts.base_classes: Fix a bug in the cleanup_kernels code

    The heuristics used to determine which kernel versions are
    referenced by the bootloader configuration are no longer
    valid for the newer grubby-based code - now that the entries
    dict retorns a full kernel path, such as

    /boot/vmlinuz-3.3.1-3.fc17.x86_64

    We need to use the basename, then strip 'vmlinuz-', rather
    than just strip 'vmlinuz-' from the resulting string. This
    bug fixes a condition where the used kernels were being
    incorrectly determined, hence autotest would go and remove
    all kernels of a machine.

    Signed-off-by: Lucas Meneghel Rodrigues <[email protected]>


> I think we also saw the same issue with the following version of autotest 
> installed, but I need to test 1 more time to confirm:
> https://github.com/autotest/autotest/commit/ed05905987207e30b8ebfeb4d6e1dcf9e63d8979

I've fixed these issues and even released 0.14.1 with all the fixes.

> Older versions of autotest (e.g. 13.0) that were using the older version of 
> grubby and boottool are able to successfully install the exact same kernel 
> with the below being the output I see in the logs for that same step:
>
>  02/04 20:22:19 INFO |    kernel:0016| --- END kernel.install ---
> 02/04 20:22:19 INFO |       job:0211|         GOOD    build   kernel.install  
>       timestamp=1328404939   localtime=Feb 04 20:22:19
> 02/04 20:22:19 DEBUG|  base_job:0347| Persistent state client.steps now set 
> to [([], 'job.end_reboot_and_verify', [1328404939, '2.6.36-autotest::#1 SMP 
> Sat Feb 4 20:08:57 EST 2012', 'build', []], {}), ([], 'step_test', 
> ('http://serverxxxx.hp.com/kernels/kernel-2.6.36.tar.bz2',), {})]
> 02/04 20:22:19 DEBUG|base_utils:0074| Running 
> '/usr/local/autotest/tools/boottool "--remove-kernel=autotest"'
> 02/04 20:22:19 DEBUG|base_utils:0074| Running 
> '/usr/local/autotest/tools/boottool "--info=all"'
> 02/04 20:22:20 DEBUG|base_utils:0074| Running 
> '/usr/local/autotest/tools/boottool "--add-kernel=/boot/vmlinuz-autotest" 
> "--title=autotest" "--args=_dummy_" "--initrd=/boot/initrd-autotest" 
> "--position=end"'
> 02/04 20:22:20 DEBUG|base_utils:0074| Running 
> '/usr/local/autotest/tools/boottool "--update-kernel=autotest" 
> "--args=console=ttyS0"'
> 02/04 20:22:20 DEBUG|base_utils:0074| Running 
> '/usr/local/autotest/tools/boottool "--update-kernel=autotest" 
> "--args=IDENT=1328404939"'
> 02/04 20:22:20 DEBUG|base_utils:0074| Running 
> '/usr/local/autotest/tools/boottool "--update-kernel=autotest" 
> "--remove-args=_dummy_"'
> 02/04 20:22:20 DEBUG|base_utils:0074| Running 'touch /fastboot'

>From 0.13.1 to 0.14.0, we changed the bootloader related code. It's an
entirely new stack, based on bleeding edge versions of the program
known as grubby. The reason why this was changed is to support grub2
*and* offload some of the logic to a mature upstream project dedicated
to do nothing but manipulate bootloader entries.

>
> I recognize I likely don't have enough information in this message to debug 
> the actual gruby fatal error, and I'm still trying to triage this failure a 
> little bit more (to see if I can tell exactly which grubby command is failing 
> during that fatal error); however, I think in general the practice should be 
> that if autotest sees an error when adding the kernel to the bootloader that 
> it should NOT proceed to reboot the system.  That just makes it harder to 
> debug the problem and produces a system that no longer boots.

Fair enough, we might just throw an exception rather than being silent
about the failures.

> -Dan
> _______________________________________________
> Autotest mailing list
> [email protected]
> http://test.kernel.org/cgi-bin/mailman/listinfo/autotest



-- 
Lucas
_______________________________________________
Autotest mailing list
[email protected]
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

Re: [Autotest] kernel install issues

Reply via email to