Hi everyone, This is my first post to these forums, but I must first say that there is alot of very useful data here and I am very glad to see such a large community of contributors. I have been working with Solaris 10 now for about 3 years, coming from the linux world, there is no reason to look back.
My problem is this (no, no transition material, sorry): for a while now I have been using Solaris 10 to store TB's of data on ZFS. This file system has been great at handling large datasets (no running crazy tools like diskpart to get ext* to recognize the disk > 1.8T and then worrying about losing it because the journal get corrupted), however I recently ran into an issue with some storage enclosures that I was starting to verify to be put into Production. They are Supermicro storage enclosures that hold 24 1TB disks (Seagate ES2 SATA), connected via an LSI logic 1068 based card. I had been copying data from other sources (lots of large files) to this new enclosure to test, the enclosure was about 80% full and then the machine panic'd. It then proceded to reboot and then got stuck in a reboot loop, each time giving me core dump information: {code} storage01# mdb unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp zfs ip hook neti sctp arp usba uhci fcp fctl md lofs mpt fcip random crypto logindmux ptm ufs nfs ] > ::status debugging crash dump vmcore.0 (64-bit) from storage01 operating system: 5.10 Generic_139556-08 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe80010fe7a0 addr=0 occurred in module "unix" due to a NULL pointer dereference dump content: kernel pages only > > ::ps S PID PPID PGID SID UID FLAGS ADDR NAME R 0 0 0 0 0 0x00000001 fffffffffbc25800 sched R 3 0 0 0 0 0x00020001 ffffffff82b14a78 fsflush R 2 0 0 0 0 0x00020001 ffffffff82b156e0 pageout R 1 0 0 0 0 0x4a004000 ffffffff82b16348 init R 945 1 945 945 0 0x42000000 ffffffff8b2a1c88 rcm_daemon R 846 1 846 846 25 0x52010000 ffffffff8b2a7360 sendmail R 847 1 847 847 0 0x52010000 ffffffff8b2a66f8 sendmail R 841 1 841 841 0 0x42000000 ffffffff8866ce18 fmd R 836 1 836 836 0 0x42000000 ffffffff8b2a4e28 syslogd R 816 1 816 816 0 0x42000000 ffffffff88669010 automountd R 818 816 816 816 0 0x42000000 ffffffff89e94c80 automountd R 666 1 666 666 0 0x42000000 ffffffff89e94018 inetd R 651 1 651 651 0 0x42000000 ffffffff89e96550 utmpd R 631 1 631 631 1 0x42000000 ffffffff89e97e20 lockd R 619 1 616 616 1 0x42000000 ffffffff8866e6e8 nfs4cbd R 617 1 617 617 1 0x42000000 ffffffff89e996f0 statd R 618 1 618 618 1 0x52000000 ffffffff89e98a88 nfsmapid R 610 1 610 610 1 0x42000000 ffffffff82b118d8 rpcbind R 593 1 593 593 0 0x42010000 ffffffff82b13e10 cron R 525 1 525 525 0 0x42000000 ffffffff8866da80 nscd R 504 1 504 504 0 0x42000000 ffffffff89e9a358 picld R 487 1 487 487 1 0x42000000 ffffffff8866b548 kcfd R 464 1 464 464 0 0x42000000 ffffffff82b10008 syseventd R 69 1 69 69 0 0x42000000 ffffffff82b10c70 devfsadm R 9 1 9 9 0 0x42000000 ffffffff82b12540 svc.configd R 7 1 7 7 0 0x42000000 ffffffff82b131a8 svc.startd R 661 7 661 661 0 0x4a004000 ffffffff8866c1b0 sh R 954 661 661 661 0 0x4a004000 ffffffff89e971b8 zpool R 645 7 645 645 0 0x4a014000 ffffffff8866a8e0 sac R 648 645 645 645 0 0x4a014000 ffffffff89e958e8 ttymon ::msgbuf MESSAGE sd15 at mpt0: target d lun 0 sd15 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@d,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@d,0 (sd15) online sd16 at mpt0: target e lun 0 sd16 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@e,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@e,0 (sd16) online sd17 at mpt0: target f lun 0 sd17 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@f,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@f,0 (sd17) online sd18 at mpt0: target 10 lun 0 sd18 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@10,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@10,0 (sd18) online sd19 at mpt0: target 11 lun 0 sd19 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@11,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@11,0 (sd19) online sd20 at mpt0: target 12 lun 0 sd20 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@12,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@12,0 (sd20) online sd21 at mpt0: target 13 lun 0 sd21 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@13,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@13,0 (sd21) online sd22 at mpt0: target 14 lun 0 sd22 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@14,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@14,0 (sd22) online sd23 at mpt0: target 15 lun 0 sd23 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@15,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@15,0 (sd23) online sd24 at mpt0: target 16 lun 0 sd24 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@16,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@16,0 (sd24) online sd25 at mpt0: target 17 lun 0 sd25 is /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@17,0 /p...@0,0/pci8086,2...@1/pci1000,3...@0/s...@17,0 (sd25) online pcplusmp: pciex8086,109a (e1000g) instance #1 vector 0x35 ioapic 0xff intin 0xff is bound to cpu 1 ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property PIO mode 4 selected ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property PIO mode 4 selected ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property PIO mode 4 selected ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property PIO mode 4 selected NOTICE: e1000g1 registered Intel(R) PRO/1000 Network Connection, Driver Ver. 5.2.13.1 UltraDMA mode 5 selected UltraDMA mode 5 selected pcplusmp: asy (asy) instance 0 vector 0x4 ioapic 0x2 intin 0x4 is bound to cpu 1 ISA-device: asy0 asy0 is /isa/a...@1,3f8 pcplusmp: asy (asy) instance #1 vector 0x3 ioapic 0x2 intin 0x3 is bound to cpu 1 ISA-device: asy1 asy1 is /isa/a...@1,2f8 pcplusmp: lp (ecpp) instance 0 vector 0x7 ioapic 0x2 intin 0x7 is bound to cpu 0 ISA-device: ecpp0 ecpp0 is /isa/l...@1,378 fd0 at fdc0 fd0 is /isa/f...@1,3f0/f...@0,0 pseudo-device: ramdisk1024 > ::stack mutex_enter+0xb() metaslab_free+0x68() zio_dva_free+0x1f() zio_execute+0x60() zio_nowait+9() arc_free+0x10a() dsl_dataset_block_kill+0x26b() dmu_objset_sync+0x1b2() dsl_pool_sync+0x13a() spa_sync+0x158() txg_sync_thread+0x1cf() thread_start+8() > ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 113284 442 11% Anon 9650 37 1% Exec and libs 2148 8 0% Page cache 134 0 0% Free (cachelist) 2903 11 0% Free (freelist) 918022 3586 88% Total 1046141 4086 {code} I have also tried attaching this storage array to a completely different system and importing the pool using zpool import -f, this resulted in a panic as well with the same type of data from the core dump. My questions at this point: What went wrong? obviously vague question that I dont expect anyone to be able to answer directly, but maybe aid me with some pointers How do I get this data back? if it was production and not just a test system since it will panic upon import/reboot Is there any way to configure this machine to NOT reboot upon dumps? occasionally it does not do a crash dump and it flashes on screen to quickly to see what is going on. What are my next steps? I do realize that this is a Solaris10 machine and not opensolaris, but I was hoping someone would be able to point me in the right direction Thanks in advance! -- This message posted from opensolaris.org _______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org