Re: [OpenIndiana-discuss] Zpool crashes system on reboot and import

CJ Keist Wed, 11 Dec 2013 09:14:25 -0800

Here is /var/adm/messages at time of crash if this helps:


Dec 10 17:02:27 projects2 unix: [ID 836849 kern.notice]
Dec 10 17:02:27 projects2 ^Mpanic[cpu3]/thread=ffffff03e85997c0:

Dec 10 17:02:27 projects2 genunix: [ID 335743 kern.notice] BAD TRAP:type=e (#pf Page fault) rp=ffffff001803c340 addr=20 occurred in module"zfs" due to a NULL pointer dereference

Dec 10 17:02:27 projects2 unix: [ID 100000 kern.notice]
Dec 10 17:02:27 projects2 unix: [ID 839527 kern.notice] zpool:
Dec 10 17:02:27 projects2 unix: [ID 753105 kern.notice] #pf Page fault

Dec 10 17:02:27 projects2 unix: [ID 532287 kern.notice] Bad kernel faultat addr=0x20Dec 10 17:02:27 projects2 unix: [ID 243837 kern.notice] pid=718,pc=0xfffffffff7a220e8, sp=0xffffff001803c438, eflags=0x10213Dec 10 17:02:27 projects2 unix: [ID 211416 kern.notice] cr0:8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>

Dec 10 17:02:27 projects2 unix: [ID 624947 kern.notice] cr2: 20
Dec 10 17:02:27 projects2 unix: [ID 625075 kern.notice] cr3: 337a37000
Dec 10 17:02:27 projects2 unix: [ID 625715 kern.notice] cr8: c
Dec 10 17:02:27 projects2 unix: [ID 100000 kern.notice]

Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] rdi:ffffff03f66e8058 rsi: 0 rdx: 0Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] rcx:f7728503 r8: 88d7af8 r9: ffffff001803c4a0Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] rax:7 rbx: 0 rbp: ffffff001803c480Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] r10:7 r11: 0 r12: ffffff03f66e8058Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] r13:ffffff03f66e8058 r14: ffffff001803c5c0 r15: ffffff001803c600Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] fsb:0 gsb: ffffff03e1067040 ds: 4bDec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] es:4b fs: 0 gs: 1c3Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] trp:e err: 0 rip: fffffffff7a220e8Dec 10 17:02:27 projects2 unix: [ID 592667 kern.notice] cs:30 rfl: 10213 rsp: ffffff001803c438Dec 10 17:02:27 projects2 unix: [ID 266532 kern.notice] ss:38

Dec 10 17:02:27 projects2 unix: [ID 100000 kern.notice]

Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c210 unix:die+dd ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c330 unix:trap+17db ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c340 unix:cmntrap+e6 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c480 zfs:zap_leaf_lookup_closest+40 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c510 zfs:fzap_cursor_retrieve+c9 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c5a0 zfs:zap_cursor_retrieve+17d ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c780 zfs:zfs_purgedir+4c ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c7d0 zfs:zfs_rmnode+50 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c810 zfs:zfs_zinactive+b5 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c860 zfs:zfs_inactive+11a ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c8b0 genunix:fop_inactive+af ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803c8d0 genunix:vn_rele+5f ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803cac0 zfs:zfs_unlinked_drain+af ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803caf0 zfs:zfsvfs_setup+102 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803cb50 zfs:zfs_domount+17c ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803cc70 zfs:zfs_mount+1cd ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803cca0 genunix:fsop_mount+21 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803ce00 genunix:domount+b0e ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803ce80 genunix:mount+121 ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803cec0 genunix:syscall_ap+8c ()Dec 10 17:02:27 projects2 genunix: [ID 655072 kern.notice]ffffff001803cf10 unix:brand_sys_sysenter+1c9 ()

Dec 10 17:02:27 projects2 unix: [ID 100000 kern.notice]

Dec 10 17:02:27 projects2 genunix: [ID 672855 kern.notice] syncing filesystems...

Dec 10 17:02:27 projects2 genunix: [ID 904073 kern.notice]  done

Dec 10 17:02:28 projects2 genunix: [ID 111219 kern.notice] dumping to/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel

Dec 10 17:02:52 projects2 genunix: [ID 100000 kern.notice]

Dec 10 17:02:52 projects2 genunix: [ID 665016 kern.notice] ^M100% done:225419 pages dumped,

Dec 10 17:02:52 projects2 genunix: [ID 851671 kern.notice] dump succeeded



On 12/11/13, 6:21 AM, [email protected] wrote:

It might help if you could run mdb over the kernel crashdump files so 
developers would at least have a stack trace of what went bad. Maybe they would 
have more specific questions on variable values etc. and would post those - but 
the general debugging steps (see Wiki) come first anyway.

So far you can also try to trace your pool with zdb -bscvL or similar to check 
for inconsistensies - i.e. if the box crashed/rebooted with io's written out of 
order like labels or uberblocks updated before the data they point to was 
committed, if the disks/caches lied.

Then you might have luck rolling back a few txg's on import, and you can model 
if this helps with zdb as well (so it would start with an older txg number and 
skip the possibly corrupted last few sync cycles).

Hth, Jim


Typos courtesy of my Samsung Mobile

-------- Исходное сообщение --------
От: CJ Keist <[email protected]>
Дата: 2013.12.11  5:31  (GMT+01:00)
Кому: Discussion list for OpenIndiana <[email protected]>
Тема: [OpenIndiana-discuss] Zpool crashes system on reboot and import

All,
      Some time back we had issue where I last entire zpool file system
due to possible bad raid controller card.  At that time I was strongly
encouraged to get a raid card that supported JBOD and allow ZFS to
control all disks.  Well I did that and unfortunately today I lost an
entire zpool that was configured with mutiple raidz2 volumes. See below:

root@projects2:~# zpool status data
    pool: data
   state: ONLINE
    scan: scrub in progress since Tue Dec 10 18:11:19 2013
      211G scanned out of 30.2T at 1/s, (scan is slow, no estimated time)
      0 repaired, 0.68% done
config:

          NAME                       STATE     READ WRITE CKSUM
          data                       ONLINE       0     0     0
            raidz2-0                 ONLINE       0     0     0
              c3t50014EE25D929FBCd0  ONLINE       0     0     0
              c3t50014EE2B2E8E02Ed0  ONLINE       0     0     0
              c3t50014EE25C346397d0  ONLINE       0     0     0
              c3t50014EE206EB0DDDd0  ONLINE       0     0     0
              c3t50014EE25D932FC7d0  ONLINE       0     0     0
              c3t50014EE25C341621d0  ONLINE       0     0     0
              c3t50014EE206DE835Ed0  ONLINE       0     0     0
              c3t50014EE2083D20DAd0  ONLINE       0     0     0
              c3t50014EE2083D842Ed0  ONLINE       0     0     0
            raidz2-1                 ONLINE       0     0     0
              c3t50014EE2B2E8D8CCd0  ONLINE       0     0     0
              c3t50014EE2B18BE3A4d0  ONLINE       0     0     0
              c3t50014EE25C339C05d0  ONLINE       0     0     0
              c3t50014EE25D9307DAd0  ONLINE       0     0     0
              c3t50014EE2B2E7E5E8d0  ONLINE       0     0     0
              c3t50014EE206EB20ABd0  ONLINE       0     0     0
              c3t50014EE2B2E56CFAd0  ONLINE       0     0     0
              c3t50014EE25D92FC0Ad0  ONLINE       0     0     0
              c3t50014EE25C42CFDBd0  ONLINE       0     0     0
            raidz2-2                 ONLINE       0     0     0
              c3t50014EE25D933003d0  ONLINE       0     0     0
              c3t50014EE2B2E89EF3d0  ONLINE       0     0     0
              c3t50014EE2B2E8DC9Cd0  ONLINE       0     0     0
              c3t50014EE25C35933Ed0  ONLINE       0     0     0
              c3t50014EE2B1968F65d0  ONLINE       0     0     0
              c3t50014EE2083D6987d0  ONLINE       0     0     0
              c3t50014EE2083DDCACd0  ONLINE       0     0     0
              c3t50014EE25C42C384d0  ONLINE       0     0     0
              c3t50014EE206F2A389d0  ONLINE       0     0     0
            raidz2-3                 ONLINE       0     0     0
              c3t50014EE2B1967C56d0  ONLINE       0     0     0
              c3t50014EE2083E1931d0  ONLINE       0     0     0
              c3t50014EE2B1895807d0  ONLINE       0     0     0
              c3t50014EE25D9333E7d0  ONLINE       0     0     0
              c3t50014EE2B196397Ad0  ONLINE       0     0     0
              c3t50014EE25D930567d0  ONLINE       0     0     0
              c3t50014EE2B19D4F5Ad0  ONLINE       0     0     0
              c3t50014EE25D930525d0  ONLINE       0     0     0
              c3t50014EE2083DDCFAd0  ONLINE       0     0     0
            raidz2-4                 ONLINE       0     0     0
              c3t50014EE20721B2BBd0  ONLINE       0     0     0
              c3t50014EE2B2E8DC6Ad0  ONLINE       0     0     0
              c3t50014EE25C40CF9Fd0  ONLINE       0     0     0
              c3t50014EE25D24BC9Fd0  ONLINE       0     0     0
              c3t50014EE2B2E8DFDAd0  ONLINE       0     0     0
              c3t50014EE25C33BF64d0  ONLINE       0     0     0
              c3t50014EE25D9328C4d0  ONLINE       0     0     0
              c3t50014EE25C401FBFd0  ONLINE       0     0     0
              c3t50014EE2B1899AC5d0  ONLINE       0     0     0

errors: No known data errors

The system crashed and when rebooted it would just core dump and
reboot.  After booting in single user mode I found the zpool that was
crashing the system. Exported that out and was able to bring the system
back up. When I try to import that pool it would again crash my system.
I finally found that I could import the pool without crashing my system
if I imported it read only:

zpool import -o readonly=on data

That is the output I have now above from the pool imported as readonly.
Looking for any advice on way to save this pool??? As you can see zpool
reports no errors with the pool.

Running OI 151a8 i86pc i386 i86pc Solaris


--
C. J. Keist                     Email: [email protected]
Systems Group Manager           Solaris 10 OS (SAI)
Engineering Network Services    Phone: 970-491-0630
College of Engineering, CSU     Fax:   970-491-5569
Ft. Collins, CO 80523-1301

All I want is a chance to prove 'Money can't buy happiness'

_______________________________________________
OpenIndiana-discuss mailing list
[email protected]
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Zpool crashes system on reboot and import

Reply via email to