Hi, I've recently upgraded three QNAP devices from wheezy to jessie(testing). (I sent an upgrade-report about the wheezy->jessie upgrade to the bugtracker earlier, in http://bugs.debian.org/781742, by the way).
As I was updating from jessie circa 2015.04.02 to jessie today 2015.04.05, I hit a few major issues. The updates applied today, according to /var/log/apt/history.log, were: >Upgrade: bsdutils:armel (2.25.2-5, 2.25.2-6), perl-modules:armel (5.20.2-2, >5.20.2-3), libcap2:armel (2.24-7, 2.24-8), libudev1:armel (215-12, 215-14), >perl:armel (5.20.2-2, 5.20.2-3), unrar:armel (5.0.10-1, 5.2.7-0.1), >systemd-sysv:armel (215-12, 215-14), libmount1:armel (2.25.2-5, 2.25.2-6), >libblkid1:armel (2.25.2-5, 2.25.2-6), mount:armel (2.25.2-5, 2.25.2-6), >perl-doc:armel (5.20.2-2, 5.20.2-3), systemd:armel (215-12, 215-14), >libsystemd0:armel (215-12, 215-14), libcap2-bin:armel (2.24-7, 2.24-8), >udev:armel (215-12, 215-14), util-linux:armel (2.25.2-5, 2.25.2-6), >perl-base:armel (5.20.2-2, 5.20.2-3), libperl5.20:armel (5.20.2-2, >5.20.2-3), util-linux-locales:armel (2.25.2-5, 2.25.2-6), libuuid1:armel >(2.25.2-5, 2.25.2-6), libsmartcols1:armel (2.25.2-5, 2.25.2-6) On one QNAP TS-419P+ turbo, the upgrade went fine including a reboot, since there was a flash-kernel trigger. On a second, identical QNAP TS-419P+ turbo with only a slightly different set of packages, the upgrade caused the machine to no longer boot. The actual upgrade went fine, but the machine did not come up after a "shutdown -r now". More details below. On a third machine, a QNAP TS-219P II Turbo, about 10 seconds after the upgrade completed, the kernel started spewing kernel oopses and commands were segfaulting left and right. I had to pull the power physically, but fortunately the machine seems stable after booting. More details below. OK, so, for the one that didn't boot up: The LCD display showed the "SYSTEM BOOTING >>>" and then went blank, which is normal. The one LED was flashing red. But other than having power, the machine never appeared on the network. I even let it run for 3+ hours in case there was an fsck running, but there weren't any hard disk activity. I pulled the power and connected one of the four SATA drives via an SATA-USB adapter to a different computer. I could see the md raid1 and raid5 partition slices were marked clean, as were the ext4 filesystems too. But there were no entries in /var/log/* since the fatal shutdown. In the end I managed to enter recovery by building a wheezy installer TFTP image by combining old mtdblock backups (fewf!) with the wheezy installer kernel+initrd and serve via an adhoc DNSMasq dhcp+tftp setup on a nearby macbook. Entering the installer, and letting it load the mdcfg parts etc, and dropping in a shell, everything looked fine. I manually mounted the root filesystem, bind-mounted /dev inside the target chroot, as well as proc and sys filesystems. I couldn't figure out how to run update-initramfs to regenerate initrd (it really doesn't like to run inside a chroot from the installer emergency shell apparently), but I ran flash-kernel which re-flashed the existing kernel+initrd from /boot. A reboot later and the system came up as if nothing had happened. Any ideas what that could have been? Unfortunately I don't have the setup for a serial console. Could it have been a bad flash on the previous flash-kernel run during the update? And then for the other machine that started oopsing all over the place. This was really worrying. I saw something in the systemd changelogs about duplicate swap mounts, is it possible that the upgrade did something weird with the active swap partition? The machine has a 2GB swap on /dev/md1 which is a 2-device raid1 array. Some choice log entries for this case: Apr 05 12:56:38 hostname systemd[1]: Reexecuting. Apr 05 12:56:38 hostname systemd[1]: systemd 215 running in system mode. (+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR) Apr 05 12:56:38 hostname systemd[1]: Detected architecture 'arm'. Apr 05 12:56:39 hostname systemd[1]: Reloading. Apr 05 12:56:39 hostname systemd[1]: Reloading. Apr 05 12:56:40 hostname systemd[1]: Reloading. Apr 05 12:57:19 hostname dovecot[480]: imap(xxx): Disconnected: Logged out in=351 out=1481 Apr 05 12:57:20 hostname kernel: Unable to handle kernel paging request at virtual address 05e93644 Apr 05 12:57:20 hostname kernel: pgd = de740000 Apr 05 12:57:20 hostname kernel: [05e93644] *pgd=00000000 Apr 05 12:57:20 hostname kernel: Internal error: Oops: 5 [#1] ARM Apr 05 12:57:20 hostname kernel: Modules linked in: hmac sha1_generic sha1_arm ehci_orion ehci_hcd marvell usbcore orion_wdt usb_common mv_cesa ahci libahci sg mv643xx_eth mvmdio of_mdio libphy evdev loop gpio_keys fuse ipv6 autofs4 ext4 mbcache jbd2 raid1 md_mod sd_mod crc_t10dif crct10dif_generic crct10dif_common sata_mv libata scsi_mod Apr 05 12:57:20 hostname kernel: CPU: 0 PID: 1498 Comm: mandb Not tainted 3.16.0-4-kirkwood #1 Debian 3.16.7-ckt7-1 Apr 05 12:57:20 hostname kernel: task: c098fa80 ti: de7c2000 task.ti: de7c2000 Apr 05 12:57:20 hostname kernel: PC is at get_vmalloc_info+0x64/0xf4 Apr 05 12:57:20 hostname kernel: LR is at meminfo_proc_show+0x5c/0x3e4 Apr 05 12:57:20 hostname kernel: pc : [<c00e0dcc>] lr : [<c0146b94>] psr: a0000013 sp : de7c3d68 ip : e0000000 fp : 00013445 Apr 05 12:57:20 hostname kernel: r10: de7c3f80 r9 : 00000400 r8 : b6b9f000 Apr 05 12:57:20 hostname kernel: r7 : 00000001 r6 : e0efe000 r5 : c061a280 r4 : c05a6e40 Apr 05 12:57:20 hostname kernel: r3 : 05e93644 r2 : e0efe000 r1 : 05e9365c r0 : de7c3e78 Apr 05 12:57:20 hostname kernel: Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Apr 05 12:57:20 hostname kernel: Control: 0005397f Table: 1e740000 DAC: 00000015 Apr 05 12:57:20 hostname kernel: Process mandb (pid: 1498, stack limit = 0xde7c21c0) (... lots of hex bytes showing a stack dump ...) Apr 05 12:57:20 hostname kernel: [<c00e0dcc>] (get_vmalloc_info) from [<c0146b94>] (meminfo_proc_show+0x5c/0x3e4) Apr 05 12:57:20 hostname kernel: [<c0146b94>] (meminfo_proc_show) from [<c0110b9c>] (seq_read+0x1ac/0x3f8) Apr 05 12:57:20 hostname kernel: [<c0110b9c>] (seq_read) from [<c013f554>] (proc_reg_read+0x78/0x8c) Apr 05 12:57:20 hostname kernel: [<c013f554>] (proc_reg_read) from [<c00f45bc>] (vfs_read+0x90/0x174) Apr 05 12:57:20 hostname kernel: [<c00f45bc>] (vfs_read) from [<c00f4d4c>] (SyS_read+0x44/0x84) Apr 05 12:57:20 hostname kernel: [<c00f4d4c>] (SyS_read) from [<c0009400>] (ret_fast_syscall+0x0/0x2c) Apr 05 12:57:20 hostname kernel: Code: e26334ff e5803004 ea000021 e595c000 (e5931000) Apr 05 12:57:20 hostname kernel: ---[] end trace 921667d30991e9a7 ]--- Apr 05 12:57:26 hostname kernel: Unable to handle kernel paging request at virtual address b0e7f26c Apr 05 12:57:26 hostname kernel: pgd = c0958000 Apr 05 12:57:26 hostname kernel: [b0e7f26c] *pgd=00000000 Apr 05 12:57:26 hostname kernel: Internal error: Oops: 5 [#2] ARM Apr 05 12:57:26 hostname kernel: Modules linked in: hmac sha1_generic sha1_arm ehci_orion ehci_hcd marvell usbcore orion_wdt usb_common mv_cesa ahci libahci sg mv643xx_eth mvmdio of_mdio libphy evdev loop gpio_keys fuse ipv6 autofs4 ext4 mbcache jbd2 raid1 md_mod sd_mod crc_t10dif crct10dif_generic crct10dif_common sata_mv libata scsi_mod Apr 05 12:57:26 hostname kernel: CPU: 0 PID: 26687 Comm: apt-get Tainted: G D 3.16.0-4-kirkwood #1 Debian 3.16.7-ckt7-1 Apr 05 12:57:26 hostname kernel: task: de683620 ti: c0856000 task.ti: c0856000 Apr 05 12:57:26 hostname kernel: PC is at __find_vmap_area+0x18/0x50 Apr 05 12:57:26 hostname kernel: LR is at remove_vm_area+0x10/0x5c Apr 05 12:57:26 hostname kernel: pc : [<c00dec10>] lr : [<c00e0288>] psr: a0000013 sp : c0857e48 ip : dbe08944 fp : 00000000 Apr 05 12:57:26 hostname kernel: r10: 00000000 r9 : 00000000 r8 : 00000000 Apr 05 12:57:26 hostname kernel: r7 : 00000001 r6 : e1152000 r5 : 00000000 r4 : dbe08800 Apr 05 12:57:26 hostname kernel: r3 : b0e7f278 r2 : 69e2d336 r1 : 00000001 r0 : e1152000 Apr 05 12:57:26 hostname kernel: Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Apr 05 12:57:26 hostname kernel: Control: 0005397f Table: 00958000 DAC: 00000015 Apr 05 12:57:26 hostname kernel: Process apt-get (pid: 26687, stack limit = 0xc08561c0) Apr 05 12:57:26 hostname kernel: Stack: (0xc0857e48 to 0xc0858000) (... lots of hex bytes showing a stack dump ...) Apr 05 12:57:26 hostname kernel: [<c00dec10>] (__find_vmap_area) from [<c00e0288>] (remove_vm_area+0x10/0x5c) Apr 05 12:57:26 hostname kernel: [<c00e0288>] (remove_vm_area) from [<c00e0308>] (__vunmap+0x34/0xcc) Apr 05 12:57:26 hostname kernel: [<c00e0308>] (__vunmap) from [<c025c178>] (n_tty_close+0x2c/0x38) Apr 05 12:57:26 hostname kernel: [<c025c178>] (n_tty_close) from [<c02605dc>] (tty_ldisc_close.isra.0+0x60/0x6c) Apr 05 12:57:26 hostname kernel: [<c02605dc>] (tty_ldisc_close.isra.0) from [<c02607b4>] (tty_ldisc_reinit+0x38/0xa4) Apr 05 12:57:26 hostname kernel: [<c02607b4>] (tty_ldisc_reinit) from [<c0260d24>] (tty_ldisc_hangup+0x124/0x1c8) Apr 05 12:57:26 hostname kernel: [<c0260d24>] (tty_ldisc_hangup) from [<c0259158>] (__tty_hangup+0x25c/0x38c) Apr 05 12:57:26 hostname kernel: [<c0259158>] (__tty_hangup) from [<c0262b78>] (pty_close+0x178/0x194) Apr 05 12:57:26 hostname kernel: [<c0262b78>] (pty_close) from [<c025a1e0>] (tty_release+0x140/0x4b8) Apr 05 12:57:26 hostname kernel: [<c025a1e0>] (tty_release) from [<c00f5954>] (__fput+0xe4/0x1b4) Apr 05 12:57:26 hostname kernel: [<c00f5954>] (__fput) from [<c0032274>] (task_work_run+0x90/0xac) Apr 05 12:57:26 hostname kernel: [<c0032274>] (task_work_run) from [<c000bdf4>] (do_work_pending+0xd4/0xf4) Apr 05 12:57:26 hostname kernel: [<c000bdf4>] (do_work_pending) from [<c000943c>] (work_pending+0xc/0x20) Apr 05 12:57:26 hostname kernel: Code: e59f303c e5933000 e3530000 0a00000a (e513200c) Apr 05 12:57:26 hostname kernel: ---[] end trace 921667d30991e9a8 ]--- (... etc etc several more processes crashing like this and then ...) Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c0 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c1 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c2 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c3 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c4 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c5 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c6 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c7 Apr 05 12:57:55 hostname kernel: swap_dup: Bad swap file entry 364791c1 Apr 05 12:57:57 hostname kernel: Unable to handle kernel paging request at virtual address 05e93644 Apr 05 12:57:57 hostname kernel: pgd = de740000 Apr 05 12:57:57 hostname kernel: [05e93644] *pgd=00000000 Apr 05 12:57:57 hostname kernel: Internal error: Oops: 5 [#4] ARM (... more log lines cut ...) Apr 05 12:57:58 hostname kernel: /build/linux-gXNuoJ/linux-3.16.7-ckt7/mm/pgtable-generic.c:33: bad pmd 84d05d82. Apr 05 12:57:58 hostname kernel: /build/linux-gXNuoJ/linux-3.16.7-ckt7/mm/pgtable-generic.c:33: bad pmd 3eebde47. Apr 05 12:58:00 hostname kernel: swap_free: Bad swap file entry 20c3e99f Apr 05 12:58:10 hostname kernel: BUG: Bad page map in process cron pte:c3e99f82 pmd:1e53f831 Apr 05 12:58:10 hostname kernel: addr:b6ac2000 vm_flags:00000075 anon_vma: (null) mapping:df24939c index:0 Apr 05 12:58:10 hostname kernel: vma->vm_ops->fault: filemap_fault+0x0/0x410 Apr 05 12:58:10 hostname kernel: vma->vm_file->f_op->mmap: ext4_file_mmap+0x0/0x54 [ext4] Apr 05 12:58:10 hostname kernel: CPU: 0 PID: 416 Comm: cron Tainted: G D 3.16.0-4-kirkwood #1 Debian 3.16.7-ckt7-1 Apr 05 12:58:10 hostname kernel: [<c001009c>] (unwind_backtrace) from [<c000c440>] (show_stack+0x18/0x1c) Apr 05 12:58:10 hostname kernel: [<c000c440>] (show_stack) from [<c00d35b0>] (print_bad_pte+0x168/0x19c) Apr 05 12:58:10 hostname kernel: [<c00d35b0>] (print_bad_pte) from [<c00d47c0>] (unmap_single_vma+0x4e8/0x600) Apr 05 12:58:10 hostname kernel: [<c00d47c0>] (unmap_single_vma) from [<c00d57fc>] (unmap_vmas+0x4c/0x5c) Apr 05 12:58:10 hostname kernel: [<c00d57fc>] (unmap_vmas) from [<c00da3c4>] (exit_mmap+0xdc/0x214) Apr 05 12:58:10 hostname kernel: [<c00da3c4>] (exit_mmap) from [<c0017a60>] (mmput+0x50/0xdc) Apr 05 12:58:10 hostname kernel: [<c0017a60>] (mmput) from [<c001bac0>] (do_exit+0x328/0x884) Apr 05 12:58:10 hostname kernel: [<c001bac0>] (do_exit) from [<c000c6fc>] (die+0x2b8/0x394) Apr 05 12:58:10 hostname kernel: [<c000c6fc>] (die) from [<c03a9970>] (__do_kernel_fault.part.11+0x5c/0x7c) Apr 05 12:58:10 hostname kernel: [<c03a9970>] (__do_kernel_fault.part.11) from [<c0012bdc>] (do_page_fault+0x300/0x360) Apr 05 12:58:10 hostname kernel: [<c0012bdc>] (do_page_fault) from [<c00083a0>] (do_DataAbort+0x3c/0xa0) Apr 05 12:58:10 hostname kernel: [<c00083a0>] (do_DataAbort) from [<c000ced8>] (__dabt_svc+0x38/0x60) Apr 05 12:58:10 hostname kernel: Exception stack(0xde5dfdd8 to 0xde5dfe20) Apr 05 12:58:10 hostname kernel: fdc0: c0c56320 00000012 Apr 05 12:58:10 hostname kernel: fde0: 00000012 44d05000 dea544b0 00012000 de5de000 000000a8 c0c56320 c0c56320 Apr 05 12:58:10 hostname kernel: fe00: de540000 c0c56354 de5de000 de5dfe20 c0012a00 c00d6490 a0000013 ffffffff Apr 05 12:58:10 hostname kernel: [<c000ced8>] (__dabt_svc) from [<c00d6490>] (handle_mm_fault+0xf4/0x914) Apr 05 12:58:10 hostname kernel: [<c00d6490>] (handle_mm_fault) from [<c0012a00>] (do_page_fault+0x124/0x360) Apr 05 12:58:10 hostname kernel: [<c0012a00>] (do_page_fault) from [<c0008440>] (do_PrefetchAbort+0x3c/0xa0) Apr 05 12:58:10 hostname kernel: [<c0008440>] (do_PrefetchAbort) from [<c000d214>] (ret_from_exception+0x0/0x10) (... several more of the do_DataAbort ... unwind_backtrace lines cut ....) Apr 05 12:58:11 hostname kernel: Fixing recursive fault but reboot is needed! And then I pretty much pulled the power. After the system came up, I ran a few test php scripts to allocate a lot of memory and saw the system start eating up swap, but it didn't provoke any crashes. The machines have been running wheezy and squeeze for years on end with no problems at all. Also, probably a freak incident, but an APC UPS that is powering 2 of these devices (not the one with the kernel oops) no longer registers with the USB cable, not on any of these machines or even a macbook anymore, just giving a "unable to enumerate device" error. It worked yesterday (even though the "apcaccess" cli binary is broken on armel jessie, the web interface still worked). I guess when it rains, it pours :-/ Would be grateful for any input, and hoping this can be of help to the people preparing the jessie release as well. And sorry for the wall of text. Thanks. -- To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/20150405194547.ga7...@noloop.net