Re: zfs panic RELENG_12
On 12/22/2020 10:09 AM, mike tancsa wrote: > On 12/22/2020 10:07 AM, Mark Johnston wrote: >> Could you go to frame 11 and print zone->uz_name and >> bucket->ub_bucket[18]? I'm wondering if the item pointer was mangled >> somehow. > Thank you for looking! > > (kgdb) frame 11 > > #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000, > bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758 > 758 zone->uz_release(zone->uz_arg, bucket->ub_bucket, > bucket->ub_cnt); > (kgdb) p zone->uz_name > $1 = 0x8102118a "mbuf_jumbo_9k" > (kgdb) p bucket->ub_bucket[18] > $2 = (void *) 0xf80de4654000 > (kgdb) p bucket->ub_bucket > $3 = 0xf801c7fd5218 > > (kgdb) > Not sure if its coincidence or not, but previously I was running with arc being limited to ~30G of the 64G of RAM on the box. I removed that limit a few weeks ago after upgrading the box to RELENG_12 to pull in the OpenSSL changes. The panic seems to happen under disk load. I have 3 zfs pools that are pretty busy receiving snapshots. One day a week, we write a full set to a 4th zfs pool off some geli attached drives via USB for offsite cold storage. The crashes happened with that extra level of disk work. gstat shows most of the 12 drives off 2 mrsas controllers at or close to 100% busy during the 18hrs it takes to dump out the files. Trying a new cold storage run now with the arc limit back to vfs.zfs.arc_max=29334498304 ---Mike ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs panic RELENG_12
On 12/22/2020 10:07 AM, Mark Johnston wrote: > > Could you go to frame 11 and print zone->uz_name and > bucket->ub_bucket[18]? I'm wondering if the item pointer was mangled > somehow. Thank you for looking! (kgdb) frame 11 #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000, bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758 758 zone->uz_release(zone->uz_arg, bucket->ub_bucket, bucket->ub_cnt); (kgdb) p zone->uz_name $1 = 0x8102118a "mbuf_jumbo_9k" (kgdb) p bucket->ub_bucket[18] $2 = (void *) 0xf80de4654000 (kgdb) p bucket->ub_bucket $3 = 0xf801c7fd5218 (kgdb) ---Mike ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs panic RELENG_12
On Tue, Dec 22, 2020 at 09:05:01AM -0500, mike tancsa wrote: > Hmmm, another one. Not sure if this is hardware as it seems different ? > > > > Fatal trap 12: page fault while in kernel mode > cpuid = 11; apic id = 0b > fault virtual address = 0x0 > fault code = supervisor write data, page not present > instruction pointer = 0x20:0x80ca0826 > stack pointer = 0x28:0xfe00bc0f8540 > frame pointer = 0x28:0xfe00bc0f8590 > code segment = base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 33 (dom0) > trap number = 12 > panic: page fault > cpuid = 11 > time = 1608641071 > KDB: stack backtrace: > #0 0x80a3fe85 at kdb_backtrace+0x65 > #1 0x809f406b at vpanic+0x17b > #2 0x809f3ee3 at panic+0x43 > #3 0x80e3fe71 at trap_fatal+0x391 > #4 0x80e3fecf at trap_pfault+0x4f > #5 0x80e3f516 at trap+0x286 > #6 0x80e19318 at calltrap+0x8 > #7 0x80ca47d4 at bucket_cache_drain+0x134 > #8 0x80c9e302 at zone_drain_wait+0xa2 > #9 0x80ca2bbd at uma_reclaim_locked+0x6d > #10 0x80ca2af4 at uma_reclaim+0x34 > #11 0x80cc5321 at vm_pageout_worker+0x421 > #12 0x80cc4ee3 at vm_pageout+0x193 > #13 0x809b55be at fork_exit+0x7e > #14 0x80e1a34e at fork_trampoline+0xe > Uptime: 5d20h37m16s > Dumping 16057 out of 65398 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" > (offsetof(struct pcpu, > (kgdb) bt Could you go to frame 11 and print zone->uz_name and bucket->ub_bucket[18]? I'm wondering if the item pointer was mangled somehow. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: zfs panic RELENG_12
Hmmm, another one. Not sure if this is hardware as it seems different ? Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 0b fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0x80ca0826 stack pointer = 0x28:0xfe00bc0f8540 frame pointer = 0x28:0xfe00bc0f8590 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 33 (dom0) trap number = 12 panic: page fault cpuid = 11 time = 1608641071 KDB: stack backtrace: #0 0x80a3fe85 at kdb_backtrace+0x65 #1 0x809f406b at vpanic+0x17b #2 0x809f3ee3 at panic+0x43 #3 0x80e3fe71 at trap_fatal+0x391 #4 0x80e3fecf at trap_pfault+0x4f #5 0x80e3f516 at trap+0x286 #6 0x80e19318 at calltrap+0x8 #7 0x80ca47d4 at bucket_cache_drain+0x134 #8 0x80c9e302 at zone_drain_wait+0xa2 #9 0x80ca2bbd at uma_reclaim_locked+0x6d #10 0x80ca2af4 at uma_reclaim+0x34 #11 0x80cc5321 at vm_pageout_worker+0x421 #12 0x80cc4ee3 at vm_pageout+0x193 #13 0x809b55be at fork_exit+0x7e #14 0x80e1a34e at fork_trampoline+0xe Uptime: 5d20h37m16s Dumping 16057 out of 65398 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0x809f3c85 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880 #4 0x809f3ee3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:807 #5 0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0) at /usr/src/sys/amd64/amd64/trap.c:921 #6 0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480, usermode=, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:739 #7 0x80e3f516 in trap (frame=0xfe00bc0f8480) at /usr/src/sys/amd64/amd64/trap.c:405 #8 #9 0x80ca0826 in slab_free_item (keg=0xf800037fa380, slab=0xf80de4656fb0, item=) at /usr/src/sys/vm/uma_core.c:3357 #10 zone_release (zone=, bucket=0xf801c7fd5218, cnt=) at /usr/src/sys/vm/uma_core.c:3404 #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000, bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758 #12 bucket_cache_drain (zone=0xf800037da000) at /usr/src/sys/vm/uma_core.c:915 #13 0x80c9e302 in zone_drain_wait (zone=0xf800037da000, waitok=1) at /usr/src/sys/vm/uma_core.c:1037 #14 0x80ca2bbd in zone_drain (zone=0xf800037da000) at /usr/src/sys/vm/uma_core.c:1056 #15 zone_foreach (zfunc=) at /usr/src/sys/vm/uma_core.c:1985 #16 uma_reclaim_locked (kmem_danger=) at /usr/src/sys/vm/uma_core.c:3737 #17 0x80ca2af4 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:3757 #18 0x80cc5321 in vm_pageout_lowmem () at /usr/src/sys/vm/vm_pageout.c:1890 #19 vm_pageout_worker (arg=) at /usr/src/sys/vm/vm_pageout.c:1966 #20 0x80cc4ee3 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:2126 #21 0x809b55be in fork_exit (callout=0x80cc4d50 , arg=0x0, frame=0xfe00bc0f8b00) at /usr/src/sys/kern/kern_fork.c:1080 #22 (kgdb) bt full #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 td = #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:371 error = coredump = #2 0x809f3c85 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 once = #3 0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880 buf = "page fault", '\000' other_cpus = {__bits = {2047, 0, 0, 0}} td = 0xf80004964740 newpanic = bootopt = #4 0x809f3ee3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:807 ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0xfe00bc0f82c0, reg_save_area = 0xfe00bc0f8260}} #5 0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0) at /usr/src/sys/amd64/amd64/trap.c:921 softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27, ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1} code = type = ss = 40 handled = #6 0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480, usermode=, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:739 td = 0xf80004964740 p = eva = 0 map = ftype = rv = #7 0x80e3f516 in trap (frame=0xfe00bc0f8480) at /usr/src/sys/amd64/amd64/trap.c:405 ksi = {ksi_link = {tqe_next =
zfs panic RELENG_12
Was doing a backup via zfs send | zfs recv when the box panic'd. Its a not so old RELENG_12 box from last week. Any ideas if this is a hardware issue or a bug ? Its r368493 from last Wednesday. I dont see an ECC errors logged, so dont think its hardware. Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0x823a554b stack pointer = 0x28:0xfe0343231000 frame pointer = 0x28:0xfe03432310c0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 87427 (zfs) trap number = 12 panic: page fault cpuid = 1 time = 1608065221 KDB: stack backtrace: #0 0x80a3fa05 at kdb_backtrace+0x65 #1 0x809f3beb at vpanic+0x17b #2 0x809f3a63 at panic+0x43 #3 0x80e400d1 at trap_fatal+0x391 #4 0x80e4012f at trap_pfault+0x4f #5 0x80e3f776 at trap+0x286 #6 0x80e19568 at calltrap+0x8 #7 0x82393a5e at dmu_object_info+0x1e #8 0x823983a5 at dmu_recv_stream+0x7b5 #9 0x8244b706 at zfs_ioc_recv+0xac6 #10 0x8244dd3d at zfsdev_ioctl+0x62d #11 0x808a35e0 at devfs_ioctl+0xb0 #12 0x80f3becb at VOP_IOCTL_APV+0x7b #13 0x80ad1b0a at vn_ioctl+0x16a #14 0x808a3bce at devfs_ioctl_f+0x1e #15 0x80a5d807 at kern_ioctl+0x2b7 #16 0x80a5d4aa at sys_ioctl+0xfa #17 0x80e40c87 at amd64_syscall+0x387 Uptime: 3d14h59m52s Dumping 17213 out of 65366 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0x809f3805 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0x809f3c43 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880 #4 0x809f3a63 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:807 #5 0x80e400d1 in trap_fatal (frame=0xfe0343230f40, eva=0) at /usr/src/sys/amd64/amd64/trap.c:921 #6 0x80e4012f in trap_pfault (frame=0xfe0343230f40, usermode=, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:739 #7 0x80e3f776 in trap (frame=0xfe0343230f40) at /usr/src/sys/amd64/amd64/trap.c:405 #8 #9 0x823a554b in dnode_hold_impl (os=0xf805e1d2b800, object=, flag=, slots=, tag=, dnp=0xfe03432310d8) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1370 #10 0x82393a5e in dmu_object_info (os=0xf80777890070, object=18446744071600721588, doi=0xfe03432312e0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:2615 #11 0x823983a5 in receive_read_record (ra=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2821 #12 dmu_recv_stream (drc=0xfe0343231430, fp=, voffp=, cleanup_fd=8, action_handlep=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:3203 #13 0x8244b706 in zfs_ioc_recv (zc=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4745 #14 0x8244dd3d in zfsdev_ioctl (dev=, zcmd=, arg=, flag=, td=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6956 #15 0x808a35e0 in devfs_ioctl (ap=0xfe0343231778) at /usr/src/sys/fs/devfs/devfs_vnops.c:797 #16 0x80f3becb in VOP_IOCTL_APV ( vop=0x816a2fe0 , a=0xfe0343231778) at vnode_if.c:1067 #17 0x80ad1b0a in vn_ioctl (fp=0xf8001802b5a0, com=, data=0xfe0343231910, active_cred=0xf80032214300, td=0x2070) at /usr/src/sys/kern/vfs_vnops.c:1508 #18 0x808a3bce in devfs_ioctl_f (fp=0xf80777890070, com=18446744071600721588, data=0x824e34ed <.L.str+1>, cred=0x0, td=0xf8029885) at /usr/src/sys/fs/devfs/devfs_vnops.c:755 #19 0x80a5d807 in fo_ioctl (fp=0xf8001802b5a0, com=3222821403, data=0x824e34ed <.L.str+1>, active_cred=0x0, td=0xf8029885) at /usr/src/sys/sys/file.h:337 #20 kern_ioctl (td=0x2070, fd=, com=3222821403, data=0x824e34ed <.L.str+1> "zrl->zr_mtx") at /usr/src/sys/kern/sys_generic.c:805 #21 0x80a5d4aa in sys_ioctl (td=0xf8029885, uap=0xf802988503c0) at /usr/src/sys/kern/sys_generic.c:713 #22 0x80e40c87 in syscallenter (td=0xf8029885) at
[Bug 235683] [zfs] Panic during data access or scrub on 12.0-STABLE r343904 (blkptr at DVA 0 has invalid OFFSET)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683 Rodney W. Grimes changed: What|Removed |Added CC|sta...@freebsd.org |rgri...@freebsd.org --- Comment #1 from Rodney W. Grimes --- Please do not put bugs on stable@, current@, hackers@, etc -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
[Bug 235683] [zfs] Panic during data access or scrub on 12.0-STABLE r343904 (blkptr at DVA 0 has invalid OFFSET)
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683 Andriy Voskoboinyk changed: What|Removed |Added CC||a...@freebsd.org Summary|ZFS kernel panic when |[zfs] Panic during data |access to data or scrub |access or scrub on ||12.0-STABLE r343904 (blkptr ||at DVA 0 has invalid ||OFFSET) Keywords||panic Assignee|b...@freebsd.org|f...@freebsd.org -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS panic, ARC compression?
Hi, Anyone offer any suggestions about this? kernel: panic: solaris assert: arc_decompress(buf) == 0 (0x5 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 4923 kernel: cpuid = 1 kernel: KDB: stack backtrace: kernel: #0 0x80aadac7 at kdb_backtrace+0x67 kernel: #1 0x80a6bba6 at vpanic+0x186 kernel: #2 0x80a6ba13 at panic+0x43 kernel: #3 0x8248023c at assfail3+0x2c kernel: #4 0x8218e2e0 at arc_read+0x9f0 kernel: #5 0x82198e5e at dbuf_read+0x69e kernel: #6 0x821b3db4 at dnode_hold_impl+0x194 kernel: #7 0x821a11dd at dmu_bonus_hold+0x1d kernel: #8 0x8220fb05 at zfs_zget+0x65 kernel: #9 0x82227d42 at zfs_dirent_lookup+0x162 kernel: #10 0x82227e07 at zfs_dirlook+0x77 kernel: #11 0x8223fcea at zfs_lookup+0x44a kernel: #12 0x822403fd at zfs_freebsd_lookup+0x6d kernel: #13 0x8104b963 at VOP_CACHEDLOOKUP_APV+0x83 kernel: #14 0x80b13816 at vfs_cache_lookup+0xd6 kernel: #15 0x8104b853 at VOP_LOOKUP_APV+0x83 kernel: #16 0x80b1d151 at lookup+0x701 kernel: #17 0x80b1c606 at namei+0x486 Roughly 24 hours earlier (during the scrub), there was: ZFS: vdev state changed, pool_guid=11921811386284628759 vdev_guid=1644286782598989949 ZFS: vdev state changed, pool_guid=11921811386284628759 vdev_guid=17800276530669255627 % uname -a FreeBSD xxx 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 % % zpool status pool: zroot state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 15.7M in 2h37m with 1 errors on Sun Apr 1 09:44:39 2018 config: NAMESTATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p4 ONLINE 0 0 0 ada1p4 ONLINE 0 0 0 errors: 1 data errors, use '-v' for a list % The affected file (in a snapshot) is unimportant. This pool is a daily rsync backup and contains about 120 snapshots. No device or SMART errors were logged. -- Bob Bishop r...@gid.co.uk ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: 9.2PRERELEASE ZFS panic in lzjb_compress
Got another, very similar panic again on recent 9-STABLE (r255602); I assume the latest 9.2 release candidate is affected too. Anybody have any idea of what could be causing this, and of a workaround other than turning compression off? Unlike the last panic I reported, this one did not occur during a zfs send/receive operation. There were just a number of processes potentially writing to disk at the same time. All hardware is healthy as far as I can tell (memory is ECC and no errors in logs; zpool status and smartctl show no problems). Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 24 cpuid = 51; apic id = 83 fault virtual address = 0xff8700a9cc65 fault virtual address = 0xff8700ab0ea9 fault code = supervisor read data, page not present instruction pointer = 0x20:0x8195ff47 fault code = supervisor read data, page not present stack pointer= 0x28:0xffcf951390a0 Fatal trap 12: page fault while in kernel mode frame pointer= 0x28:0xffcf951398f0 Fatal trap 12: page fault while in kernel mode code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 instruction pointer = 0x20:0x8195ffa4 stack pointer= 0x28:0xffcf951250a0 processor eflags = frame pointer= 0x28:0xffcf951258f0 interrupt enabled, code segment = base 0x0, limit 0xf, type 0x1b resume, IOPL = 0 cpuid = 28; apic id = 4c Fatal trap 12: page fault while in kernel mode = DPL 0, pres 1, long 1, def32 0, gran 1 current process = 0 (zio_write_issue_hig) processor eflags = fault virtual address = 0xff8700aa22ac interrupt enabled, fault code = supervisor read data, page not present resume, IOPL = 0 trap number = 12 instruction pointer = 0x20:0x8195ffa4 current process = 0 (zio_write_issue_hig) panic: page fault cpuid = 4 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffcf95138b30 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf95138bf0 panic() at panic+0x1ce/frame 0xffcf95138cf0 trap_fatal() at trap_fatal+0x290/frame 0xffcf95138d50 trap_pfault() at trap_pfault+0x211/frame 0xffcf95138de0 trap() at trap+0x344/frame 0xffcf95138fe0 calltrap() at calltrap+0x8/frame 0xffcf95138fe0 --- trap 0xc, rip = 0x8195ff47, rsp = 0xffcf951390a0, rbp = 0xffcf951398f0 --- lzjb_compress() at lzjb_compress+0xa7/frame 0xffcf951398f0 zio_compress_data() at zio_compress_data+0x92/frame 0xffcf95139920 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf95139970 zio_execute() at zio_execute+0xc3/frame 0xffcf951399b0 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf95139a00 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xffcf95139a20 fork_exit() at fork_exit+0x11f/frame 0xffcf95139a70 fork_trampoline() at fork_trampoline+0xe/frame 0xffcf95139a70 --- trap 0, rip = 0, rsp = 0xffcf95139b30, rbp = 0 --- 0x51f47 is in lzjb_compress (/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c:74). 69 } 70 if (src (uchar_t *)s_start + s_len - MATCH_MAX) { 71 *dst++ = *src++; 72 continue; 73 } 74 hash = (src[0] 16) + (src[1] 8) + src[2]; 75 hash += hash 9; 76 hash += hash 5; 77 hp = lempel[hash (LEMPEL_SIZE - 1)]; 78 offset = (intptr_t)(src - *hp) OFFSET_MASK; dmesg output is at http://pastebin.com/U34fwJ5f kernel config is at http://pastebin.com/c9HKfcsz I can provide more information if useful. Thanks On Fri, Jul 19, 2013 at 6:52 AM, Volodymyr Kostyrko c.kw...@gmail.comwrote: 19.07.2013 07:04, olivier wrote: Hi, Running 9.2-PRERELEASE #19 r253313 I got the following panic Fatal trap 12: page fault while in kernel mode cpuid = 22; apic id = 46 fault virtual address = 0xff827ebca30c fault code = supervisor read data, page not present instruction pointer = 0x20:0x81983055 stack pointer = 0x28:0xffcf75bd60a0 frame pointer = 0x28:0xffcf75bd68f0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_issue_hig) trap number = 12 panic: page fault cpuid = 22 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/**frame 0xffcf75bd5b30 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf75bd5bf0 panic() at panic+0x1ce/frame 0xffcf75bd5cf0 trap_fatal() at trap_fatal+0x290/frame 0xffcf75bd5d50 trap_pfault() at trap_pfault+0x211/frame 0xffcf75bd5de0 trap() at trap+0x344/frame 0xffcf75bd5fe0 calltrap() at calltrap+0x8/frame 0xffcf75bd5fe0 --- trap 0xc, rip = 0x81983055, rsp = 0xffcf75bd60a0, rbp = 0xffcf75bd68f0 --- lzjb_compress() at lzjb_compress+0x185/frame 0xffcf75bd68f0 zio_compress_data() at zio_compress_data+0x92/frame
Re: 9.2PRERELEASE ZFS panic in lzjb_compress
One last piece of information I just got: the problem is not specific to LZJB compression. I switched to LZ4 and get the same sort of panic: Fatal trap 12: page fault while in kernel mode cpuid = 8; apic id = 28 fault virtual address = 0xff8581c48000 fault code = supervisor read data, page not present instruction pointer = 0x20:0x8195f6d1 stack pointer= 0x28:0xffcf950ee850 frame pointer= 0x28:0xffcf950ee8f0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_issue_hig) trap number = 12 panic: page fault cpuid = 8 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffcf950ee2e0 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf950ee3a0 panic() at panic+0x1ce/frame 0xffcf950ee4a0 trap_fatal() at trap_fatal+0x290/frame 0xffcf950ee500 trap_pfault() at trap_pfault+0x211/frame 0xffcf950ee590 trap() at trap+0x344/frame 0xffcf950ee790 calltrap() at calltrap+0x8/frame 0xffcf950ee790 --- trap 0xc, rip = 0x8195f6d1, rsp = 0xffcf950ee850, rbp = 0xffcf950ee8f0 --- lz4_compress() at lz4_compress+0x81/frame 0xffcf950ee8f0 zio_compress_data() at zio_compress_data+0x92/frame 0xffcf950ee920 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf950ee970 zio_execute() at zio_execute+0xc3/frame 0xffcf950ee9b0 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf950eea00 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xffcf950eea20 fork_exit() at fork_exit+0x11f/frame 0xffcf950eea70 fork_trampoline() at fork_trampoline+0xe/frame 0xffcf950eea70 --- trap 0, rip = 0, rsp = 0xffcf950eeb30, rbp = 0 --- (I am now trying without any compression.) On Fri, Sep 20, 2013 at 11:25 AM, olivier olivier77...@gmail.com wrote: Got another, very similar panic again on recent 9-STABLE (r255602); I assume the latest 9.2 release candidate is affected too. Anybody have any idea of what could be causing this, and of a workaround other than turning compression off? Unlike the last panic I reported, this one did not occur during a zfs send/receive operation. There were just a number of processes potentially writing to disk at the same time. All hardware is healthy as far as I can tell (memory is ECC and no errors in logs; zpool status and smartctl show no problems). Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 24 cpuid = 51; apic id = 83 fault virtual address = 0xff8700a9cc65 fault virtual address = 0xff8700ab0ea9 fault code = supervisor read data, page not present instruction pointer = 0x20:0x8195ff47 fault code = supervisor read data, page not present stack pointer= 0x28:0xffcf951390a0 Fatal trap 12: page fault while in kernel mode frame pointer= 0x28:0xffcf951398f0 Fatal trap 12: page fault while in kernel mode code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 instruction pointer = 0x20:0x8195ffa4 stack pointer= 0x28:0xffcf951250a0 processor eflags = frame pointer= 0x28:0xffcf951258f0 interrupt enabled, code segment = base 0x0, limit 0xf, type 0x1b resume, IOPL = 0 cpuid = 28; apic id = 4c Fatal trap 12: page fault while in kernel mode = DPL 0, pres 1, long 1, def32 0, gran 1 current process = 0 (zio_write_issue_hig) processor eflags = fault virtual address = 0xff8700aa22ac interrupt enabled, fault code = supervisor read data, page not present resume, IOPL = 0 trap number = 12 instruction pointer = 0x20:0x8195ffa4 current process = 0 (zio_write_issue_hig) panic: page fault cpuid = 4 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffcf95138b30 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf95138bf0 panic() at panic+0x1ce/frame 0xffcf95138cf0 trap_fatal() at trap_fatal+0x290/frame 0xffcf95138d50 trap_pfault() at trap_pfault+0x211/frame 0xffcf95138de0 trap() at trap+0x344/frame 0xffcf95138fe0 calltrap() at calltrap+0x8/frame 0xffcf95138fe0 --- trap 0xc, rip = 0x8195ff47, rsp = 0xffcf951390a0, rbp = 0xffcf951398f0 --- lzjb_compress() at lzjb_compress+0xa7/frame 0xffcf951398f0 zio_compress_data() at zio_compress_data+0x92/frame 0xffcf95139920 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf95139970 zio_execute() at zio_execute+0xc3/frame 0xffcf951399b0 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf95139a00 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xffcf95139a20 fork_exit() at fork_exit+0x11f/frame 0xffcf95139a70 fork_trampoline() at fork_trampoline+0xe/frame 0xffcf95139a70 --- trap 0, rip = 0, rsp = 0xffcf95139b30, rbp = 0 --- 0x51f47 is in lzjb_compress
Re: 9.2PRERELEASE ZFS panic in lzjb_compress
19.07.2013 07:04, olivier wrote: Hi, Running 9.2-PRERELEASE #19 r253313 I got the following panic Fatal trap 12: page fault while in kernel mode cpuid = 22; apic id = 46 fault virtual address = 0xff827ebca30c fault code = supervisor read data, page not present instruction pointer = 0x20:0x81983055 stack pointer = 0x28:0xffcf75bd60a0 frame pointer = 0x28:0xffcf75bd68f0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_issue_hig) trap number = 12 panic: page fault cpuid = 22 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffcf75bd5b30 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf75bd5bf0 panic() at panic+0x1ce/frame 0xffcf75bd5cf0 trap_fatal() at trap_fatal+0x290/frame 0xffcf75bd5d50 trap_pfault() at trap_pfault+0x211/frame 0xffcf75bd5de0 trap() at trap+0x344/frame 0xffcf75bd5fe0 calltrap() at calltrap+0x8/frame 0xffcf75bd5fe0 --- trap 0xc, rip = 0x81983055, rsp = 0xffcf75bd60a0, rbp = 0xffcf75bd68f0 --- lzjb_compress() at lzjb_compress+0x185/frame 0xffcf75bd68f0 zio_compress_data() at zio_compress_data+0x92/frame 0xffcf75bd6920 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf75bd6970 zio_execute() at zio_execute+0xc3/frame 0xffcf75bd69b0 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf75bd6a00 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xffcf75bd6a20 fork_exit() at fork_exit+0x11f/frame 0xffcf75bd6a70 fork_trampoline() at fork_trampoline+0xe/frame 0xffcf75bd6a70 --- trap 0, rip = 0, rsp = 0xffcf75bd6b30, rbp = 0 --- lzjb_compress+0x185 corresponds to line 85 in 80 cpy = src - offset; 81 if (cpy = (uchar_t *)s_start cpy != src 82src[0] == cpy[0] src[1] == cpy[1] src[2] == cpy[2]) { 83 *copymap |= copymask; 84 for (mlen = MATCH_MIN; mlen MATCH_MAX; mlen++) 85 if (src[mlen] != cpy[mlen]) 86 break; 87 *dst++ = ((mlen - MATCH_MIN) (NBBY - MATCH_BITS)) | 88(offset NBBY); 89 *dst++ = (uchar_t)offset; I think it's the first time I've seen this panic. It happened while doing a send/receive. I have two pools with lzjb compression; I don't know which of these pools caused the problem, but one of them was the source of the send/receive. I only have a textdump but I'm happy to try to provide more information that could help anyone look into this. Thanks Olivier Oh, I can add to this one. I have a full core dump of the same problem caused by copying large set of files from lzjb compressed pool to lz4 compressed pool. vfs.zfs.recover was set. #1 0x8039d954 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:449 #2 0x8039ddce in panic (fmt=value optimized out) at /usr/src/sys/kern/kern_shutdown.c:637 #3 0x80620a6a in trap_fatal (frame=value optimized out, eva=value optimized out) at /usr/src/sys/amd64/amd64/trap.c:879 #4 0x80620d25 in trap_pfault (frame=0x0, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:700 #5 0x806204f6 in trap (frame=0xff821ca43600) at /usr/src/sys/amd64/amd64/trap.c:463 #6 0x8060a032 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #7 0x805a9367 in vm_page_alloc (object=0x80a34030, pindex=16633, req=97) at /usr/src/sys/vm/vm_page.c:1445 #8 0x8059c42e in kmem_back (map=0xfe0001e8, addr=18446743524021862400, size=16384, flags=value optimized out) at /usr/src/sys/vm/vm_kern.c:362 #9 0x8059c2ac in kmem_malloc (map=0xfe0001e8, size=16384, flags=257) at /usr/src/sys/vm/vm_kern.c:313 #10 0x80595104 in uma_large_malloc (size=value optimized out, wait=257) at /usr/src/sys/vm/uma_core.c:994 #11 0x80386b80 in malloc (size=16384, mtp=0x80ea7c40, flags=0) at /usr/src/sys/kern/kern_malloc.c:492 #12 0x80c9e13c in lz4_compress (s_start=0xff80d0b19000, d_start=0xff8159445000, s_len=131072, d_len=114688, n=-2) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c:843 #13 0x80cdde25 in zio_compress_data (c=value optimized out, src=value optimized out, dst=0xff8159445000, s_len=131072) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c:109 #14 0x80cda012 in zio_write_bp_init (zio=0xfe0143a12000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1107 #15 0x80cd8ec6 in zio_execute (zio=0xfe0143a12000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1305 #16 0x803e25e6 in taskqueue_run_locked (queue=0xfe00060ca300) at /usr/src/sys/kern/subr_taskqueue.c:312
9.2PRERELEASE ZFS panic in lzjb_compress
Hi, Running 9.2-PRERELEASE #19 r253313 I got the following panic Fatal trap 12: page fault while in kernel mode cpuid = 22; apic id = 46 fault virtual address = 0xff827ebca30c fault code = supervisor read data, page not present instruction pointer = 0x20:0x81983055 stack pointer = 0x28:0xffcf75bd60a0 frame pointer = 0x28:0xffcf75bd68f0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_issue_hig) trap number = 12 panic: page fault cpuid = 22 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffcf75bd5b30 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf75bd5bf0 panic() at panic+0x1ce/frame 0xffcf75bd5cf0 trap_fatal() at trap_fatal+0x290/frame 0xffcf75bd5d50 trap_pfault() at trap_pfault+0x211/frame 0xffcf75bd5de0 trap() at trap+0x344/frame 0xffcf75bd5fe0 calltrap() at calltrap+0x8/frame 0xffcf75bd5fe0 --- trap 0xc, rip = 0x81983055, rsp = 0xffcf75bd60a0, rbp = 0xffcf75bd68f0 --- lzjb_compress() at lzjb_compress+0x185/frame 0xffcf75bd68f0 zio_compress_data() at zio_compress_data+0x92/frame 0xffcf75bd6920 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf75bd6970 zio_execute() at zio_execute+0xc3/frame 0xffcf75bd69b0 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf75bd6a00 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xffcf75bd6a20 fork_exit() at fork_exit+0x11f/frame 0xffcf75bd6a70 fork_trampoline() at fork_trampoline+0xe/frame 0xffcf75bd6a70 --- trap 0, rip = 0, rsp = 0xffcf75bd6b30, rbp = 0 --- lzjb_compress+0x185 corresponds to line 85 in 80 cpy = src - offset; 81 if (cpy = (uchar_t *)s_start cpy != src 82src[0] == cpy[0] src[1] == cpy[1] src[2] == cpy[2]) { 83 *copymap |= copymask; 84 for (mlen = MATCH_MIN; mlen MATCH_MAX; mlen++) 85 if (src[mlen] != cpy[mlen]) 86 break; 87 *dst++ = ((mlen - MATCH_MIN) (NBBY - MATCH_BITS)) | 88(offset NBBY); 89 *dst++ = (uchar_t)offset; I think it's the first time I've seen this panic. It happened while doing a send/receive. I have two pools with lzjb compression; I don't know which of these pools caused the problem, but one of them was the source of the send/receive. I only have a textdump but I'm happy to try to provide more information that could help anyone look into this. Thanks Olivier ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
on 01/07/2013 21:50 Jeremy Chadwick said the following: The issue is that ZFS on FreeBSD is still young compared to other filesystems (specifically UFS). That's a fact. Nothing is perfect, but FFS/UFS tends to have a significantly larger number of bugs worked out of it to the point where people can use it without losing sleep (barring the SUJ stuff, don't get me started). That's subjective. I have the same concerns over other things, like ext2fs and fusefs for that matter -- but this thread is about a ZFS-related crash, and that's why I'm over-focused on it. I have an impression that you seem to state your (negative) opinion of ZFS in every other thread about ZFS problems. A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only), results in a system where an admin can upgrade + boot into single-user and perform some tasks to test/troubleshoot; if the ZFS layer is broken, it doesn't mean an essentially useless box. That isn't FUD, that's just the stage we're at right now. I'm aware lots of people have working ZFS-exclusive setups; like I said, works great until it doesn't. Yeah, a heterogeneous setup can have its benefits, but it can have its drawbacks too. This is true for heterogeneous vs monoculture in general. But the sword cuts both ways: what if something is broken in UFS layer or god forbid in VFS layer and you have only UFS? Besides, without mentioning specific classes of problems ZFS layer is broken is too vague. So, how do you kernel guys debug a problem in this environment: - ZFS-only - Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt with added debugging features, etc.) - No swap configured - No serial console I use boot environments and boot to a previous / known-good environment if I hit a loader bug, a kernel bug or a major userland problem in a new environment. I also use a mirrored setup and keep two copies of earlier boot chains. I am also not shy of live media in the case everything else fails. Now I wonder how you deal with the same kind of UFS-only environment. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
On Tue, Jul 02, 2013 at 08:59:56AM +0300, Andriy Gapon wrote: on 01/07/2013 21:50 Jeremy Chadwick said the following: The issue is that ZFS on FreeBSD is still young compared to other filesystems (specifically UFS). That's a fact. Nothing is perfect, but FFS/UFS tends to have a significantly larger number of bugs worked out of it to the point where people can use it without losing sleep (barring the SUJ stuff, don't get me started). That's subjective. I have the same concerns over other things, like ext2fs and fusefs for that matter -- but this thread is about a ZFS-related crash, and that's why I'm over-focused on it. I have an impression that you seem to state your (negative) opinion of ZFS in every other thread about ZFS problems. The OP in question ended his post with the line Thoughts?, and I have given those thoughts. My thoughts/opinions/experience may differ from that of others. Diversity of thoughts/opinions/experiences is good. I'm not some kind of authoritative ZFS guru -- far from it. If I misunderstood what Thoughts? meant/implied, then draw and quarter me for it; my actions/words = my responsibility. I do not feel I have a negative opinion of ZFS. I still use it today on FreeBSD, donated money to Pawel when the project was originally announced (because I wanted to see something new and useful thrive on FreeBSD), and try my best to assist with issues pertaining to it where applicable. These are not the actions of someone with a negative opinion, these are the actions of someone who is supportive while simultaneously very cautious. Is ZFS better today than it was when it was introduced? By a long shot. For example, on my stable/9 system here I don't tune /boot/loader.conf any longer. But that doesn't change my viewpoint when it comes to using ZFS exclusively on a FreeBSD box. A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only), results in a system where an admin can upgrade + boot into single-user and perform some tasks to test/troubleshoot; if the ZFS layer is broken, it doesn't mean an essentially useless box. That isn't FUD, that's just the stage we're at right now. I'm aware lots of people have working ZFS-exclusive setups; like I said, works great until it doesn't. Yeah, a heterogeneous setup can have its benefits, but it can have its drawbacks too. This is true for heterogeneous vs monoculture in general. But the sword cuts both ways: what if something is broken in UFS layer or god forbid in VFS layer and you have only UFS? Besides, without mentioning specific classes of problems ZFS layer is broken is too vague. The likelihood of something being broken in UFS is significantly lower given its established history. I have to go off of experience, both personal and professional -- in my years of dealing with FreeBSD (1997-present), I have only encountered issues with UFS a few times (I can count them on one, maybe two hands), and I'm choosing to exclude SU+J from the picture for what should be obvious reasons. With ZFS, well... just look at the mailing lists and PR count. I don't want to be a jerk about it, but you really have to look at the quantity. It doesn't mean ZFS is crap, it just means that for me, I don't think we're quite there yet. And I will gladly admit -- because you are the one who taught me this -- that every incident need be treated unique. But one can't deny that a substantial percentage (I would say majority) of -fs and -stable posts relate somehow to ZFS; I'm often thrilled when it turns out to be something else. Playing a strange devil's advocate, let me give you an interesting example: softupdates. When SU was introduced to FreeBSD back in the late 90s, there were issues and concerns -- lots. As such, SU was chosen to be disabled by default on root filesystems given the importance of that filesystem (re: we do not want to risk losing as much data in the case of a crash -- see the official FAQ, section 8.3). All other filesystems defaulted to SU enabled. It's been like that up until 9.x where it now defaults to enabled. So that's what, 15 years? You could say that my example could also apply to ZFS, i.e. the reports are a part of its growth and maturity, and I'd agree. But I don't feel it's reached the point where I'm willing to risk going ZFS-only. Down the road, sure, but not now. That's just my take on it. Please make sure to also consider, politely, that a lot of people who have issues with ZFS have not been subscribed to the lists for long periods of time. They sign up/post when they have a problem. Meaning: they do not necessarily know of the history. If they did, I (again politely) believe they're likely to use a UFS+ZFS mix, or maybe a gmirror+UFS+ZFS mix (though the GPT/gmirror thing is... never mind...). So, how do you kernel guys debug a problem in this environment: - ZFS-only - Running -RELEASE (i.e. no source, thus a kernel cannot be
Re: ZFS Panic after freebsd-update
On Tue, Jul 02, 2013 at 12:57:16AM -0700, Jeremy Chadwick wrote: But in the OP's case, the situation sounds dire given the limitations -- limitations that someone (apparently not him) chose, which greatly hinder debugging/troubleshooting. Had a heterogeneous setup been chosen, the debugging/troubleshooting pains are less (IMO). When I see this, it makes me step back and ponder the decisions that lead to the ZFS-only setup. As an observer (though one who has used ZFS for some time, now), I might suggest that this can at least -seem- like FUD about ZFS because the limitations don't necessarily have anything to do with ZFS. That is, a situation in which one cannot recover, nor even effectively troubleshoot, if there is a problem, will be a dire one, regardless of what the problem might be or where its source might lie. -- greg byshenk - gbysh...@byshenk.net - Leiden, NL - Portland, OR USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS Panic after freebsd-update
Hello, I have not had much time to research this problem yet, so please let me know what further information I might be able to provide. This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4 using freebsd-update. After I rebooted to test the new kernel, I got a panic. I had to take a picture of the screen. Here's a condensed version: panic: page fault cpuid = 1 KDB: stack backtrace: #0 #1 #2 #3 #4 #5 #6 #6 #6 #6 #6 #6 FreeBSD xeon.cap-press.com 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS Panic after freebsd-update
*** Sorry for partial first message! (gmail sent after multiple returns apparently?) *** Hello, I have not had much time to research this problem yet, so please let me know what further information I might be able to provide. This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4 using freebsd-update. After I rebooted to test the new kernel, I got a panic. I had to take a picture of the screen. Here's a condensed version: panic: page fault cpuid = 1 KDB: stack backtrace: #0 kdb_backtrace #1 panic #2 trap_fatal #3 trap_pfault #4 trap #5 calltrap #6 vdev_mirror_child_select #7 ved_mirror_io_start #8 zio_vdev_io_start #9 zio_execute #10 arc_read #11 dbuf_read #12 dbuf_findbp #13 dbuf_hold_impl #14 dbuf_hold #15 dnode_hold_impl #16 dnu_buf_hold #17 zap_lockdir Uptime: 5s Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort uname -a from before (and after) the reboot: FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 dmesg is attached. I was able to reboot to the old kernel and am up and running back on 8.2 right now. Any thoughts? Thanks, Scott Copyright (c) 1992-2011 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (2266.76-MHz K8-class CPU) Origin = GenuineIntel Id = 0x106a5 Family = 6 Model = 1a Stepping = 5 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x9ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF TSC: P-state invariant real memory = 18253611008 (17408 MB) avail memory = 16513347584 (15748 MB) ACPI APIC Table: 031710 APIC1617 FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 cpu8 (AP): APIC ID: 16 cpu9 (AP): APIC ID: 17 cpu10 (AP): APIC ID: 18 cpu11 (AP): APIC ID: 19 cpu12 (AP): APIC ID: 20 cpu13 (AP): APIC ID: 21 cpu14 (AP): APIC ID: 22 cpu15 (AP): APIC ID: 23 ioapic0 Version 2.0 irqs 0-23 on motherboard ioapic1 Version 2.0 irqs 24-47 on motherboard kbd1 at kbdmux0 acpi0: 031710 XSDT1617 on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a (3) failed acpi0: reservation of 10, bff0 (3) failed Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0 cpu0: ACPI CPU on acpi0 ACPI Warning: Incorrect checksum in table [OEMB] - 0xAD, should be 0xAA (20101013/tbutils-354) cpu1: ACPI CPU on acpi0 cpu2: ACPI CPU on acpi0 cpu3: ACPI CPU on acpi0 cpu4: ACPI CPU on acpi0 cpu5: ACPI CPU on acpi0 cpu6: ACPI CPU on acpi0 cpu7: ACPI CPU on acpi0 cpu8: ACPI CPU on acpi0 cpu9: ACPI CPU on acpi0 cpu10: ACPI CPU on acpi0 cpu11: ACPI CPU on acpi0 cpu12: ACPI CPU on acpi0 cpu13: ACPI CPU on acpi0 cpu14: ACPI CPU on acpi0 cpu15: ACPI CPU on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge at device 1.0 on pci0 pci10: ACPI PCI bus on pcib1 pcib2: ACPI PCI-PCI bridge at device 3.0 on pci0 pci9: ACPI PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge at device 7.0 on pci0 pci8: ACPI PCI bus on pcib3 pcib4: PCI-PCI bridge at device 8.0 on pci0 pci7: PCI bus on pcib4 pcib5: PCI-PCI bridge at device 9.0 on pci0 pci6: PCI bus on pcib5 pcib6: PCI-PCI bridge at device 10.0 on pci0 pci5: PCI bus on pcib6 pci0: base peripheral, interrupt controller at device 20.0 (no driver attached) pci0: base peripheral, interrupt controller at device 20.1 (no driver attached) pci0: base peripheral, interrupt controller at device 20.2 (no driver attached) pci0: base peripheral, interrupt controller at device 20.3 (no driver attached) pci0: base peripheral at device 22.0 (no driver attached) pci0: base peripheral at device 22.1 (no driver attached) pci0: base peripheral at device 22.2 (no driver attached) pci0: base peripheral at device 22.3 (no driver attached) pci0: base peripheral at device 22.4 (no driver attached) pci0: base peripheral at device 22.5 (no driver attached) pci0: base peripheral at device 22.6 (no driver attached) pci0: base peripheral at device 22.7 (no driver attached) uhci0:
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote: *** Sorry for partial first message! (gmail sent after multiple returns apparently?) *** Hello, I have not had much time to research this problem yet, so please let me know what further information I might be able to provide. This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4 using freebsd-update. After I rebooted to test the new kernel, I got a panic. I had to take a picture of the screen. Here's a condensed version: panic: page fault cpuid = 1 KDB: stack backtrace: #0 kdb_backtrace #1 panic #2 trap_fatal #3 trap_pfault #4 trap #5 calltrap #6 vdev_mirror_child_select #7 ved_mirror_io_start #8 zio_vdev_io_start #9 zio_execute #10 arc_read #11 dbuf_read #12 dbuf_findbp #13 dbuf_hold_impl #14 dbuf_hold #15 dnode_hold_impl #16 dnu_buf_hold #17 zap_lockdir Uptime: 5s Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort uname -a from before (and after) the reboot: FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 dmesg is attached. I was able to reboot to the old kernel and am up and running back on 8.2 right now. Any thoughts? Thoughts: - All I see is an amd64 system with 16GB RAM and 4 disks driven by an ICH10 in AHCI mode. - Output from: zpool status - Output from: zpool get all - Output from: zfs get all - Output from: gpart show -p for every disk on the system - Output from: cat /etc/sysctl.conf - Output from: cat /boot/loader.conf - Is there a reason you do not have dumpdev defined in /etc/rc.conf (or alternately, no swap device defined in /etc/fstab (which will get used/honoured by the dumpdev=auto (the default)) ? Taking photos of the console and manually typing backtraces in is borderline worthless. Of course when I see lines like this: Trying to mount root from zfs:zroot ...this greatly diminishes any chances of live debugging on the system. It amazes me how often I see this come up on the lists -- people who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish that behaviour would stop, as it makes debugging ZFS a serious PITA. This comes up on the list almost constantly, sad panda. - Get yourself stable/9 and try that: https://pub.allbsd.org/FreeBSD-snapshots/ - freebsd-fs is a better place for this discussion, especially since you're running a -RELEASE build, not a -STABLE build. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 08:49:25AM -0700, Jeremy Chadwick wrote: - Is there a reason you do not have dumpdev defined in /etc/rc.conf (or alternately, no swap device defined in /etc/fstab (which will get used/honoured by the dumpdev=auto (the default)) ? This should have read or alternately, ***A*** swap device defined in /etc/fstab ... -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
- Original Message - From: Jeremy Chadwick j...@koitsu.org To: Scott Sipe csco...@gmail.com Cc: freebsd-stable List freebsd-stable@freebsd.org Sent: Monday, July 01, 2013 4:49 PM Subject: Re: ZFS Panic after freebsd-update On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote: *** Sorry for partial first message! (gmail sent after multiple returns apparently?) *** Hello, I have not had much time to research this problem yet, so please let me know what further information I might be able to provide. This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4 using freebsd-update. After I rebooted to test the new kernel, I got a panic. I had to take a picture of the screen. Here's a condensed version: panic: page fault cpuid = 1 KDB: stack backtrace: #0 kdb_backtrace #1 panic #2 trap_fatal #3 trap_pfault #4 trap #5 calltrap #6 vdev_mirror_child_select #7 ved_mirror_io_start #8 zio_vdev_io_start #9 zio_execute #10 arc_read #11 dbuf_read #12 dbuf_findbp #13 dbuf_hold_impl #14 dbuf_hold #15 dnode_hold_impl #16 dnu_buf_hold #17 zap_lockdir Uptime: 5s Cannot dump. Device not defined or unavailable. Automatic reboot in 15 seconds - press a key on the console to abort uname -a from before (and after) the reboot: FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 dmesg is attached. I was able to reboot to the old kernel and am up and running back on 8.2 right now. Any thoughts? This says your running a 8.2-RELEASE-p3 kernel not an 8.4-RELEASE kernel. Did the upgrade fail or is that dmesg / uname from your old kernel? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote: On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote: *** Sorry for partial first message! (gmail sent after multiple returns apparently?) *** Hello, I have not had much time to research this problem yet, so please let me know what further information I might be able to provide. [[...]] Any thoughts? Thoughts: [[..]] Of course when I see lines like this: Trying to mount root from zfs:zroot ...this greatly diminishes any chances of live debugging on the system. It amazes me how often I see this come up on the lists -- people who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish that behaviour would stop, as it makes debugging ZFS a serious PITA. This comes up on the list almost constantly, sad panda. I'm not sure why it amazes you that people are making widespread use of ZFS. You could make the same argument that people shouldn't use UFS2 journaling on their file systems because bugs in the implementation might make debugging journaled UFS2 file systems a serious PITA. The point is that there are VERY compelling reasons why people might want to use ZFS for root/var/tmp/usr/etc. (pooled storage; easy snapshots; etc.) and there should come a time when a given file system is generally regarded as safe. I'd say the time for ZFS came when they removed the big disclaimer from the boot messages. If ZFS is dangerous, they should reinstate the not ready for production warning. Until they do, I think it's unfair to castigate people for using ZFS universally. Isn't it a recurring theme on freebsd-current and freebsd-stable that more people need to use features so they can be debugged in realistic environments? If you're telling them, don't use that because it makes debugging harder, how are they supposed to get debugged and hence improved? :-) Cheers, Paul. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote: On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote: On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote: *** Sorry for partial first message! (gmail sent after multiple returns apparently?) *** Hello, I have not had much time to research this problem yet, so please let me know what further information I might be able to provide. [[...]] Any thoughts? Thoughts: [[..]] Of course when I see lines like this: Trying to mount root from zfs:zroot ...this greatly diminishes any chances of live debugging on the system. It amazes me how often I see this come up on the lists -- people who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish that behaviour would stop, as it makes debugging ZFS a serious PITA. This comes up on the list almost constantly, sad panda. I'm not sure why it amazes you that people are making widespread use of ZFS. It's not widespread use of ZFS. It's widespread use of ZFS as their sole filesystem (specifically root/var/tmp/usr, or more specifically just root/usr). People are operating with the belief that ZFS just works, when reality shows it works until it doesn't. The mentality seems to be it's so rock solid it'll never break along with it can't happen to me. I tend to err on the side of caution, hence avoidance of ZFS for critical things like the aforementioned. It's different if you have a UFS root/var/tmp/usr and ZFS for everything else. You then have a system you can boot/use without issue even if ZFS is crapping the bed. You could make the same argument that people shouldn't use UFS2 journaling on their file systems because bugs in the implementation might make debugging journaled UFS2 file systems a serious PITA. Yup, and I do make that argument, quite regularly at that. There is even some evidence at this point in time that softupdates are broken: http://lists.freebsd.org/pipermail/freebsd-fs/2013-June/017424.html The point is that there are VERY compelling reasons why people might want to use ZFS for root/var/tmp/usr/etc. (pooled storage; easy snapshots; etc.) and there should come a time when a given file system is generally regarded as safe. While there may be compelling reasons, those reasons quickly get shot down when they realise they have a system they can't easily do troubleshooting with when the issue is with ZFS. I'd say the time for ZFS came when they removed the big disclaimer from the boot messages. If ZFS is dangerous, they should reinstate the not ready for production warning. Until they do, I think it's unfair to castigate people for using ZFS universally. The warning meant absolutely nothing at the time (it did not keep people away from it), and would mean nothing now if brought back. A single kernel printf() is not the right choice of action. Are we better off today than we were when ZFS was originally ported over? Yes, by far. Lots of improvements, in many great/good ways. No argument there. But there is no way I'd risk putting my root filesystem (or other key filesystems) on it -- still too new, still too many bugs, and users don't know about those problems until it's too late. Isn't it a recurring theme on freebsd-current and freebsd-stable that more people need to use features so they can be debugged in realistic environments? If you're telling them, don't use that because it makes debugging harder, how are they supposed to get debugged and hence improved? :-) 95% of FreeBSD users cannot debug kernel problems**. To debug a kernel problem, you need: a crash dump, a usable system with the exact kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and boot into 8.2 and reliably debug it using that), and (most important of all) a developer who is familiar with kernel debugging *and* familiar with the bits which are crashing. Those who say what you're quoting are often the latter. Part of the need people to try this process you refer to is what stable/X is about, *without* the extra chaos of head. I'm one of those who for the past 15 years has advocated stable/X usage for a lot of reasons; I'll save the diatribe for some other time. But the OP is running -RELEASE, and chooses to run that, along with use of freebsd-update for binary updates. Their choices are limited: stick with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely. But even stable/X doesn't provide enough coverage at times (the recent fxp(4)/dhclient issue is proof of that). It's just too bad so many people have this broken mindset of what stability means on FreeBSD. ** = This number is probably more like 99%, especially when you consider what FreeNAS is catering to/trying to accomplish. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others
Re: ZFS Panic after freebsd-update
On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick j...@koitsu.org wrote: On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote: On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote: Of course when I see lines like this: Trying to mount root from zfs:zroot ...this greatly diminishes any chances of live debugging on the system. It amazes me how often I see this come up on the lists -- people who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish that behaviour would stop, as it makes debugging ZFS a serious PITA. This comes up on the list almost constantly, sad panda. I'm not sure why it amazes you that people are making widespread use of ZFS. It's not widespread use of ZFS. It's widespread use of ZFS as their sole filesystem (specifically root/var/tmp/usr, or more specifically just root/usr). People are operating with the belief that ZFS just works, when reality shows it works until it doesn't. The mentality seems to be it's so rock solid it'll never break along with it can't happen to me. I tend to err on the side of caution, hence avoidance of ZFS for critical things like the aforementioned. It's different if you have a UFS root/var/tmp/usr and ZFS for everything else. You then have a system you can boot/use without issue even if ZFS is crapping the bed. ... 95% of FreeBSD users cannot debug kernel problems**. To debug a kernel problem, you need: a crash dump, a usable system with the exact kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and boot into 8.2 and reliably debug it using that), and (most important of all) a developer who is familiar with kernel debugging *and* familiar with the bits which are crashing. Those who say what you're quoting are often the latter. ... But the OP is running -RELEASE, and chooses to run that, along with use of freebsd-update for binary updates. Their choices are limited: stick with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely. So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I ultimately wasn't sure where the right place to go for discuss 8.4 is? Beyond the FS mailing list, was there a better place for my question? I'll provide the other requested information (zfs outputs, etc) to wherever would be best. This is a production machine (has been since late 2010) and after tweaking some ZFS settings initially has been totally stable. I wasn't incredibly closely involved in the initial configuration, but I've done at least one binary freebsd-update previously. Before this computer I had always done source upgrades. ZFS (and the thought of a panic like the one I saw this weekend!) made me leery of doing that. We're a small business--we have this server, an offsite backup server, and a firewall box. I understand that issues like this are are going to happen when I don't have a dedicated testing box, I just like to try to minimize them and keep them to weekends! It sounds like my best bet might be to add a new UFS disk, do a clean install of 9.1 onto that disk, and then import my existing ZFS pool? Thanks, Scott ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
on 01/07/2013 20:04 Jeremy Chadwick said the following: People are operating with the belief that ZFS just works, when reality shows it works until it doesn't That reality applies to everything that a man creates with a purpose to work. I am not sure why you are so over-focused on ZFS. Please stop spreading FUD. Thank you. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 02:04:24PM -0400, Scott Sipe wrote: On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick j...@koitsu.org wrote: On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote: On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote: Of course when I see lines like this: Trying to mount root from zfs:zroot ...this greatly diminishes any chances of live debugging on the system. It amazes me how often I see this come up on the lists -- people who have ZFS problems but use ZFS for their root/var/tmp/usr. I wish that behaviour would stop, as it makes debugging ZFS a serious PITA. This comes up on the list almost constantly, sad panda. I'm not sure why it amazes you that people are making widespread use of ZFS. It's not widespread use of ZFS. It's widespread use of ZFS as their sole filesystem (specifically root/var/tmp/usr, or more specifically just root/usr). People are operating with the belief that ZFS just works, when reality shows it works until it doesn't. The mentality seems to be it's so rock solid it'll never break along with it can't happen to me. I tend to err on the side of caution, hence avoidance of ZFS for critical things like the aforementioned. It's different if you have a UFS root/var/tmp/usr and ZFS for everything else. You then have a system you can boot/use without issue even if ZFS is crapping the bed. ... 95% of FreeBSD users cannot debug kernel problems**. To debug a kernel problem, you need: a crash dump, a usable system with the exact kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and boot into 8.2 and reliably debug it using that), and (most important of all) a developer who is familiar with kernel debugging *and* familiar with the bits which are crashing. Those who say what you're quoting are often the latter. ... But the OP is running -RELEASE, and chooses to run that, along with use of freebsd-update for binary updates. Their choices are limited: stick with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely. So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I ultimately wasn't sure where the right place to go for discuss 8.4 is? For filesystem issues, freebsd-fs@ is usually the best choice, because it discusses filesystem-related thing (regardless of stable vs. release, but knowing what version you have of course is mandatory). freebsd-stable@ is mainly for stable/X related discussions. Sorry to add pedanticism to an already difficult situation for you (and I sympathise, particularly since the purpose of the lists is often difficult to discern, even with their terse descriptions in mailman). Beyond the FS mailing list, was there a better place for my question? I'll provide the other requested information (zfs outputs, etc) to wherever would be best. Nope, not as far as I know. The only other place is send-pr(1), once you have an issue that can be reproduced. Keep in mind, however, that none of these options (mailing lists, send-pr, etc.) mandate a response from anyone. You/your business (see below) should be aware that there is always the possibility no one can help solve the actual problem; as such it's important that companies have proper upgrade/migration paths, rollback plans, and so on. This is a production machine (has been since late 2010) and after tweaking some ZFS settings initially has been totally stable. I wasn't incredibly closely involved in the initial configuration, but I've done at least one binary freebsd-update previously. Well regardless it sounds like moving from 8.2-RELEASE to 8.4-RELEASE causes ZFS to break for you, so that would classify as a regression. What the root cause is, however, is still unknown. Point: 8.2-RELEASE came out in February 2011, and 8.4-RELEASE came out in June 2013 -- that's almost 2.5 years of changes between versions. The number of changes between these two is major -- hundreds, maybe thousands. ZFS got worked on heavily during this time as well. I tend to tell anyone using ZFS that they should be running a stable/X (particularly stable/9) branch. I can expand on that justification if needed, as it's well-founded for a lot of reasons. Before this computer I had always done source upgrades. ZFS (and the thought of a panic like the one I saw this weekend!) made me leery of doing that. We're a small business--we have this server, an offsite backup server, and a firewall box. I understand that issues like this are are going to happen when I don't have a dedicated testing box, I just like to try to minimize them and keep them to weekends! Understood. It sounds like my best bet might be to add a new UFS disk, do a clean install of 9.1 onto that disk, and then import my existing ZFS pool? I would suggest starting with this: Get stable/9 from the place I mentioned, burn an
Re: ZFS Panic after freebsd-update
On Mon, Jul 01, 2013 at 09:10:45PM +0300, Andriy Gapon wrote: on 01/07/2013 20:04 Jeremy Chadwick said the following: People are operating with the belief that ZFS just works, when reality shows it works until it doesn't That reality applies to everything that a man creates with a purpose to work. I am not sure why you are so over-focused on ZFS. Please stop spreading FUD. Thank you. The issue is that ZFS on FreeBSD is still young compared to other filesystems (specifically UFS). Nothing is perfect, but FFS/UFS tends to have a significantly larger number of bugs worked out of it to the point where people can use it without losing sleep (barring the SUJ stuff, don't get me started). I have the same concerns over other things, like ext2fs and fusefs for that matter -- but this thread is about a ZFS-related crash, and that's why I'm over-focused on it. A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only), results in a system where an admin can upgrade + boot into single-user and perform some tasks to test/troubleshoot; if the ZFS layer is broken, it doesn't mean an essentially useless box. That isn't FUD, that's just the stage we're at right now. I'm aware lots of people have working ZFS-exclusive setups; like I said, works great until it doesn't. So, how do you kernel guys debug a problem in this environment: - ZFS-only - Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt with added debugging features, etc.) - No swap configured - No serial console -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
- Original Message - From: Scott Sipe csco...@gmail.com So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I ultimately wasn't sure where the right place to go for discuss 8.4 is? Beyond the FS mailing list, was there a better place for my question? I'll provide the other requested information (zfs outputs, etc) to wherever would be best. This is a production machine (has been since late 2010) and after tweaking some ZFS settings initially has been totally stable. I wasn't incredibly closely involved in the initial configuration, but I've done at least one binary freebsd-update previously. Before this computer I had always done source upgrades. ZFS (and the thought of a panic like the one I saw this weekend!) made me leery of doing that. We're a small business--we have this server, an offsite backup server, and a firewall box. I understand that issues like this are are going to happen when I don't have a dedicated testing box, I just like to try to minimize them and keep them to weekends! It sounds like my best bet might be to add a new UFS disk, do a clean install of 9.1 onto that disk, and then import my existing ZFS pool? There should be no reason why 8.4-RELEASE shouldn't work fine. Yes ZFS is continuously improving and these fixes / enhancements first hit head / current and are then MFC'ed back to stable/9 stable/8, but that doesn't mean the release branches should be avoided. If you can I would try booting from a 8.4-RELEASE cdrom / iso to see if it can successfully read the pool as this could eliminate out of sync kernel / world issues. Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmas...@multiplay.co.uk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
Am 01.07.2013 um 20:56 schrieb Steven Hartland kill...@multiplay.co.uk: - Original Message - From: Scott Sipe csco...@gmail.com So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I ultimately wasn't sure where the right place to go for discuss 8.4 is? Beyond the FS mailing list, was there a better place for my question? I'll provide the other requested information (zfs outputs, etc) to wherever would be best. This is a production machine (has been since late 2010) and after tweaking some ZFS settings initially has been totally stable. I wasn't incredibly closely involved in the initial configuration, but I've done at least one binary freebsd-update previously. Before this computer I had always done source upgrades. ZFS (and the thought of a panic like the one I saw this weekend!) made me leery of doing that. We're a small business--we have this server, an offsite backup server, and a firewall box. I understand that issues like this are are going to happen when I don't have a dedicated testing box, I just like to try to minimize them and keep them to weekends! It sounds like my best bet might be to add a new UFS disk, do a clean install of 9.1 onto that disk, and then import my existing ZFS pool? There should be no reason why 8.4-RELEASE shouldn't work fine. Yes ZFS is continuously improving and these fixes / enhancements first hit head / current and are then MFC'ed back to stable/9 stable/8, but that doesn't mean the release branches should be avoided. If you can I would try booting from a 8.4-RELEASE cdrom / iso to see if it can successfully read the pool as this could eliminate out of sync kernel / world issues. Personally, I find mfsbsd much more practical for booting up a rescue-environment. Also, if 8.4 does not work for some reason - maybe try 8.3? I have quite a lot of systems running 8.3 (and even more with 9.1) but none of them do zfsroot and none of them stresses ZFS very much. I've so far resisted the urge to update to 8.4. The reason why I would be interested to run zfs-root is that sometimes, you only have two hard drives and still want to do ZFS on it. Ideally, though, FreeBSD would be able to do something like SmartOS (one of the few features I kind of like about it…), where you boot from an USB-image (or ideally, via (i)PXE) but use all the available space for data and (3rd-party) software. That way, you always have something to boot from, but can maximize the usage of spindles and space. A basic FreeBSD install is, I think, less than 0.5G these days - I really hate wasting two 300 (or even 600) GB SAS hard disks just for that. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
On Jul 1, 2013, at 19:04, Jeremy Chadwick j...@koitsu.org wrote: But even stable/X doesn't provide enough coverage at times (the recent fxp(4)/dhclient issue is proof of that). It's just too bad so many people have this broken mindset of what stability means on FreeBSD. As one of the few persons who have run into that issue I feel like I should speak up here and add that this issue was fixed within a very reasonable time span after raising the matter here on freebsd-stable@. You've personally been a great help in getting that fixed, so thank you for that. Apparently there was one earlier report of the issue very late in the pre-release process, which does imply that fxp hardware is fairly rarely in use among FreeBSD users these days (which was the excuse for how the issue passed testing for 8.4/9.1 RELEASE). I don't think the release engineering team can really be blamed for not catching bugs that go unreported that far into the release cycle; they have to make a decision when to release at some point and the later it gets into the cycle the harder it is to turn that decision around. I can completely understand that. That this happened was inconvenient, but it happens in stable. ISTR that stable doesn't mean stable in the sense that it won't crash, but rather that the API's won't change until the next release. I wish other OS companies were as reliable; both MS and Apple let a lot more slip by and they take a lot longer to release fixes as well. Of course nobody likes when their system behaves erratically due to some error outside their control, but until that point FreeBSD has been rock-solid for me for years. And even with this issue, the system was usable. To get back to the ZFS issue... ZFS has always seen a fairly large fraction of raised issues on this list. Often those were user mistakes, ranging from putting not enough memory into the system to not assigning enough to the ZIL (once that became usable). ZFS on FreeBSD has come a long way since then. I don't think it's in quite as usable a state on, for example, Linux. Yes, people are taking a risk when using ZFS for everything. The same goes for any FS. No matter which file system you use, if it breaks you're between a rock and a hard place. Depending on how badly broken it is, you may end up not being able to access your data and with some data that's not an option. That's what we have backups and test environments for, don't we? File system code can break. It shouldn't, and I think it's safe to say that in FreeBSD's history it has been very rare indeed, but it does happen. The problem is probably more that it's so rare that people don't take measures for the few times it does happen; like how many of us have an atomic shelter available to them? Or a rubber boat? How many nuclear incidents have there been versus how many serious file-system breakages in FreeBSD? How many of us first test an update to STABLE on an identical test system before upgrading our production servers? Jeremy, I know for a fact that you're a lot more on this list than I am and probably longer than I have been (I'm pretty sure you were around already back in the days when I started using FreeBSD 2.2.8), but in this case, as much respect as I have for you, I think you're overreacting a bit. And finally, we're having this whole discussion about how problematic FreeBSD's been (or not) recently WHILE THE OP HASNT EVEN GOTTEN BACK TO ANSWER DETAILS ABOUT HIS ISSUE YET. Perhaps it's a bit early for that? It's entirely possible that we're looking at some hardware issue here or a user error that triggered a corner case that wasn't handled or something like that. P.S: Personally, I don't use ZFS because I'm a bit of a database nut and feel like log-based file-systems aren't a good match for database write loads, but that's mostly just me being pedantic. Cheers, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic after freebsd-update
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 07/01/13 09:10, Steven Hartland wrote: [...] This says your running a 8.2-RELEASE-p3 kernel not an 8.4-RELEASE kernel. Did the upgrade fail or is that dmesg / uname from your old kernel? Looking at the context, he used freebsd-update to update 8.2-RELEASE to 8.4-RELEASE (which, the first step would be updating the kernel) and booted with that panic, and reverted to old kernel. It would be helpful if we have address of stack frame #6 as well as the tuning you he have done (in loader.conf), plus the actual panic message (looks like a kernel trap 12, but a glance at the code I didn't find a candidate line where this happens). Cheers, - -- Xin LI delp...@delphij.nethttps://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- iQEcBAEBCgAGBQJR0e91AAoJEG80Jeu8UPuz05MIAK21VdKOkVNISzrd9ZDKTpml EjKtrOUhXreI21XyuoVxGboIjNfBxbfPxu07Tj6ocY8LwwneMot9nW5d3xtsS71A ap9Ho3KFUKGv5RTHWO7mhbKhSXnKBl/SmyIeLx//I7vCfxQb0MWUT7bdRF56Eojj lUz6dnLDXt6q3p3TGC17mwETHbdvdrr4ptBANAXFaY763WFSW6pLWUr5KIxZ7f7i DqNKpShTC4LsVr6OZjq70E+1XFCM7E//ZKVbJWBNrGJd7kmk7raq7ERx8tJqcWu6 sdxWcjbG6bOlCmONcozohNsqRvpTKu1VK6JsWVBUq9Et2nY/2rKvu5lKyIvxPBg= =NmTM -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Help! :( ZFS panic on boot, importing pool after server crash.
Hi, I'm a bit at the end of my tether. We had a ZFS panic last night on a machine that hosts all my mail and web; it was rebooted and it now panics mounting the ZFS root filesystem. The call stack info is: solaris assert: ss == NULL, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/space_map.c, line: 109 kdb_backtrace panic space_map_add space_map_load metaslab_activate metaslab_allocate zio_dva_allocate zio_execute taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline I can boot from the live DVD filesystem, but I can only mount the pool read-only without getting the same kernel panic. This is with FreeBSD 9.0. The machine is remote, and I don't have access other than through a DRAC console port (so I can't cut and paste; sorry for the poor stack trace). Is anyone here in the position to advice me how I might process to get this machine mounting and running again in multi-user mode? Thanks so much. Joe p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root file system. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Help! :( ZFS panic on boot, importing pool after server crash.
14.06.2013 12:55, Dr Josef Karthauser: Hi, I'm a bit at the end of my tether. We had a ZFS panic last night on a machine that hosts all my mail and web; it was rebooted and it now panics mounting the ZFS root filesystem. The call stack info is: solaris assert: ss == NULL, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/space_map.c, line: 109 kdb_backtrace panic space_map_add space_map_load metaslab_activate metaslab_allocate zio_dva_allocate zio_execute taskqueue_run_locked taskqueue_thread_loop fork_exit fork_trampoline I can boot from the live DVD filesystem, but I can only mount the pool read-only without getting the same kernel panic. This is with FreeBSD 9.0. The machine is remote, and I don't have access other than through a DRAC console port (so I can't cut and paste; sorry for the poor stack trace). Is anyone here in the position to advice me how I might process to get this machine mounting and running again in multi-user mode? There's no official way. p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root file system. If you are fairly sure about your devices you can: 1. Remove second disk from pool or create another pool on top of it. 2. Recreate all FS structure on the second disk. You can dump al your FS with something like: zfs list -Ho name | xargs -n1 zfs get -H all | awk 'BEGIN{shard=;output=}{if(shard!=$1 shard!=){output=zfs create;for(param in params)output=output -o param=params[param];print output shard;delete params;shard=}}$4~/local/{params[$2]=$3;shard=$1;next}$2~/type/{shard=$1}END{output=zfs create;for(param in params)output=output -o param=params[param];print output shard;}' Be sure to rename the pool and change the first line. 3. Rsync all data to the second disk. 4. Try to boot from the second disk. If everything worked you are free to attach first disk to second one to create a mirror again. -- Sphinx of black quartz, judge my vow. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Help! :( ZFS panic on boot, importing pool after server crash.
On 14 Jun 2013, at 12:00, Volodymyr Kostyrko c.kw...@gmail.com wrote: 14.06.2013 12:55, Dr Josef Karthauser: Hi, I'm a bit at the end of my tether. p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root file system. If you are fairly sure about your devices you can: 1. Remove second disk from pool or create another pool on top of it. 2. Recreate all FS structure on the second disk. You can dump al your FS with something like: Great. Thanks for that. Have you got a hint as to how I can get access to the root file system? It's currently set to have a legacy mount point. Which means that when I import the pool: # zfs import -o readonly=on -o altroot=/tmp/zfs -f poolname the root filesystem is missing. Then if I try and set the mount point: #zfs set mountpoint=/tmp/zfs2 poolname it just sits there; probably because the command is blocking on the R/O pool, or something. How do I temporarily remount the root filesystem so that I can get access to the files? Thanks, Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Help! :( ZFS panic on boot, importing pool after server crash.
14.06.2013 15:51, Dr Josef Karthauser: On 14 Jun 2013, at 12:00, Volodymyr Kostyrko c.kw...@gmail.com wrote: 14.06.2013 12:55, Dr Josef Karthauser: Hi, I'm a bit at the end of my tether. p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root file system. If you are fairly sure about your devices you can: 1. Remove second disk from pool or create another pool on top of it. 2. Recreate all FS structure on the second disk. You can dump al your FS with something like: Great. Thanks for that. Have you got a hint as to how I can get access to the root file system? It's currently set to have a legacy mount point. Which means that when I import the pool: # zfs import -o readonly=on -o altroot=/tmp/zfs -f poolname the root filesystem is missing. Then if I try and set the mount point: #zfs set mountpoint=/tmp/zfs2 poolname it just sits there; probably because the command is blocking on the R/O pool, or something. How do I temporarily remount the root filesystem so that I can get access to the files? mount -t zfs pool-name mountpoint Personally when I need to work with such pools I first import the pool with -N (nomount) option, then I mount root fs by hand and after that goes `zfs mount -a` which handles everything else. -- Sphinx of black quartz, judge my vow. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on a RELENG_8 NFS server
Hiroki Sato h...@freebsd.org wrote in 20110911.054601.1424617155148336027@allbsd.org: hr Hiroki Sato h...@freebsd.org wrote hr in 20110910.044841.232160047547388224@allbsd.org: hr hr hr Hiroki Sato h...@freebsd.org wrote hr hr in 20110907.094717.2272609566853905102@allbsd.org: hr hr hr hr hr During this investigation an disk has to be replaced and resilvering hr hr hr it is now in progress. A deadlock and a forced reboot after that hr hr hr make recovering of the zfs datasets take a long time (for committing hr hr hr logs, I think), so I will try to reproduce the deadlock and get a hr hr hr core dump after it finished. hr hr hr hr I think I could reproduce the symptoms. I have no idea about if hr hr these are exactly the same as occurred on my box before because the hr hr kernel was replaced with one with some debugging options, but these hr hr are reproducible at least. hr hr hr hr There are two symptoms. One is a panic. A DDB output when the panic hr hr occurred is the following: hr hr I am trying vfs.lookup_shared=0 and seeing how it goes. It seems the hr box can endure a high load which quickly caused these symptoms. There was no difference by the knob. The same panic or unresponsiveness still occurs in about 24-32 hours or so. -- Hiroki pgpIwsQ57ZO6Q.pgp Description: PGP signature
Re: ZFS panic on a RELENG_8 NFS server
Hiroki Sato h...@freebsd.org wrote in 20110910.044841.232160047547388224@allbsd.org: hr Hiroki Sato h...@freebsd.org wrote hr in 20110907.094717.2272609566853905102@allbsd.org: hr hr hr During this investigation an disk has to be replaced and resilvering hr hr it is now in progress. A deadlock and a forced reboot after that hr hr make recovering of the zfs datasets take a long time (for committing hr hr logs, I think), so I will try to reproduce the deadlock and get a hr hr core dump after it finished. hr hr I think I could reproduce the symptoms. I have no idea about if hr these are exactly the same as occurred on my box before because the hr kernel was replaced with one with some debugging options, but these hr are reproducible at least. hr hr There are two symptoms. One is a panic. A DDB output when the panic hr occurred is the following: I am trying vfs.lookup_shared=0 and seeing how it goes. It seems the box can endure a high load which quickly caused these symptoms. -- Hiroki pgpfb5zUJdfPH.pgp Description: PGP signature
ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))
Hiroki Sato h...@freebsd.org wrote in 20110907.094717.2272609566853905102@allbsd.org: hr During this investigation an disk has to be replaced and resilvering hr it is now in progress. A deadlock and a forced reboot after that hr make recovering of the zfs datasets take a long time (for committing hr logs, I think), so I will try to reproduce the deadlock and get a hr core dump after it finished. I think I could reproduce the symptoms. I have no idea about if these are exactly the same as occurred on my box before because the kernel was replaced with one with some debugging options, but these are reproducible at least. There are two symptoms. One is a panic. A DDB output when the panic occurred is the following: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x10040 fault code = supervisor read data, page not present instruction pointer = 0x20:0x8065b926 stack pointer = 0x28:0xff8257b94d70 frame pointer = 0x28:0xff8257b94e10 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 992 (nfsd: service) [thread pid 992 tid 100586 ] Stopped at witness_checkorder+0x246: movl0x40(%r13),%ebx db bt Tracing pid 992 tid 100586 td 0xff00595d9000 witness_checkorder() at witness_checkorder+0x246 _sx_slock() at _sx_slock+0x35 dmu_bonus_hold() at dmu_bonus_hold+0x57 zfs_zget() at zfs_zget+0x237 zfs_dirent_lock() at zfs_dirent_lock+0x488 zfs_dirlook() at zfs_dirlook+0x69 zfs_lookup() at zfs_lookup+0x26b zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81 vfs_cache_lookup() at vfs_cache_lookup+0xf0 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40 lookup() at lookup+0x384 nfsvno_namei() at nfsvno_namei+0x268 nfsrvd_lookup() at nfsrvd_lookup+0xd6 nfsrvd_dorpc() at nfsrvd_dorpc+0x745 nfssvc_program() at nfssvc_program+0x447 svc_run_internal() at svc_run_internal+0x51b svc_thread_start() at svc_thread_start+0xb fork_exit() at fork_exit+0x11d fork_trampoline() at fork_trampoline+0xe --- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffe6c8, rbp = 0x6 --- The complete output can be found at: http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt Another is getting stuck at ZFS access. The kernel is running with no panic but any access to ZFS datasets causes a program non-responsive. The DDB output can be found at: http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt The trigger for the both was some access to a ZFS dataset from the NFS clients. Because the access pattern was complex I could not narrow down what was the culprit, but it seems timing-dependent and simply doing rm -rf locally on the server can sometimes trigger them. The crash dump and the kernel can be found at the following URLs: panic: http://people.allbsd.org/~hrs/zfs_panic_20110909_1/ no panic but unresponsive: http://people.allbsd.org/~hrs/zfs_panic_20110909_2/ kernel: http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/ -- Hiroki ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
On 11/16/2010 8:41 PM, Terry Kennedy wrote: I would say it is definitely very odd that writes are a problem. Sounds like it might be a hardware problem. Is it possible to export the pool, remove the ZIL and re-import it? I myself would be pretty nervous trying that, but it would help isolate the problem? If you can risk it. I think it is unlikely to be a hardware problem. While I haven't run any destructive testing on the ZFS pool, the fact that it can be read without error, combined with ECC throughout the system and the panic always happen- ing on the first write, makes me think that it is a software issue in ZFS. When I do: zpool export data; zpool remove data da0 I get a No such pool: data. I then re-imported the pool and did: zpool offline data da0; zpool export data; zpool import data After doing that, I can write to the pool without a panic. But once I online the log device and do any writes, I get the panic again. As I mentioned, I have this data replicated elsewere, so I can exper- iment with the pool if it will help track down this issue. Any more news on this? -- Dan Langille - http://langille.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
I would say it is definitely very odd that writes are a problem. Sounds like it might be a hardware problem. Is it possible to export the pool, remove the ZIL and re-import it? I myself would be pretty nervous trying that, but it would help isolate the problem? If you can risk it. I think it is unlikely to be a hardware problem. While I haven't run any destructive testing on the ZFS pool, the fact that it can be read without error, combined with ECC throughout the system and the panic always happen- ing on the first write, makes me think that it is a software issue in ZFS. When I do: zpool export data; zpool remove data da0 I get a No such pool: data. I then re-imported the pool and did: zpool offline data da0; zpool export data; zpool import data After doing that, I can write to the pool without a panic. But once I online the log device and do any writes, I get the panic again. As I mentioned, I have this data replicated elsewere, so I can exper- iment with the pool if it will help track down this issue. Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
I can give a developer remote console / root access to the box if that would help. I have a couple days before I will need to nuke the pool and restore it from backups. I haven't heard from anyone that wants to look into this. I need to get the pool back into service soon. If I don't get any requests to postpone or offers to investigate by 00:00 GMT on the 18th, I'll proceed with re-initializing the pool (minus the SSD, which is persona non grata). Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
I am no ZFS kernel-code dude or anything, but it is well known that losing the ZIL can corrupt things pretty bad with ZFS. First, thanks for writing back! I agree that this could be the problem. As I mentioned in my original post, I followed the steps recommended by zpool status - clearing the device and then doing a replace. The fix may be as simple as testing for whether the de- vice in question is a log device and if so, erroring out with You can't do that. Also note that multiple scrubs pass with no errors detected - it is only writes that trigger the panic. It looks like something isn't being cleaned up in the clear / replace path. I would save a crash dump for people to look at, but unfortunately the last time a crash dump actually worked for me (on dozens of systems) was back in the FreeBSD 6.2 days. There wasn't any data corruption (the filesystem was not being written at the time the log device failed) - I have my own checksum files written by the sysutils/cfv port, and the data all matches. All in all, if I was in your situation I would give a whirl at installing OpenSolaris and going from there, being sure not to upgrade the pool vers- ion past what is supported by FreeBSD and going from there. I have the data on another server (see my prior snapshots are not back- ups discussion on freebsd-stable if interested). So, fortunately, this is not a case of data recovery. Unfortunately we all find ourselves in a bit of a pickle with ZFS right now with the Oracle acquisition of Sun. For myself, I would stick with deploying on FreeBSD but I think its going to be FBSD 9.1 before its go- ing to be truly ready for production. The problem with hardware on the leading edge is that the software often needs time to catch up. In this particular case, the ZFS pool is 32TB. I can't begin to imagine how long a UFS fsck would take on such a partition, even if it were possible to create one. It was bad enough on the previous generation of my servers (2TB UFS partitions). Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
Hi Terry, I am no ZFS kernel-code dude or anything, but it is well known that losing the ZIL can corrupt things pretty bad with ZFS. You may want to skim the archives at OpenSolaris ZFS discuss zfs-disc...@opensolaris.org All in all, if I was in your situation I would give a whirl at installing OpenSolaris and going from there, being sure not to upgrade the pool version past what is supported by FreeBSD and going from there. Unfortunately we all find ourselves in a bit of a pickle with ZFS right now with the Oracle acquisition of Sun. For myself, I would stick with deploying on FreeBSD but I think its going to be FBSD 9.1 before its going to be truly ready for production. Just my 2-cents. - Mike On Nov 15, 2010, at 10:24 PM, Terry Kennedy wrote: I can give a developer remote console / root access to the box if that would help. I have a couple days before I will need to nuke the pool and restore it from backups. I haven't heard from anyone that wants to look into this. I need to get the pool back into service soon. If I don't get any requests to postpone or offers to investigate by 00:00 GMT on the 18th, I'll proceed with re-initializing the pool (minus the SSD, which is persona non grata). Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic after replacing log device
Hi, sorry for not completely digesting your original post. I would say it is definitely very odd that writes are a problem. Sounds like it might be a hardware problem. Is it possible to export the pool, remove the ZIL and re-import it? I myself would be pretty nervous trying that, but it would help isolate the problem? If you can risk it. On Nov 15, 2010, at 11:01 PM, Terry Kennedy wrote: Also note that multiple scrubs pass with no errors detected - it is only writes that trigger the panic. It looks like something isn't being cleaned up in the clear / replace path. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS panic after replacing log device
I'm posting this to the freebsd-stable and freebsd-fs mailing lists. Followups should probably happen on freebsd-fs. I have a ZFS pool configured as: zpool create data raidz da1 da2 da3 da4 da5 raidz da6 da7 da8 da9 da10 raidz da11 da12 da13 da14 da15 spare da16 log da0 where da1-16 are WD2003FYYS drives (2TB RE4) and da0 is a 256GB PCI-Express SSD (name omitted to protect the guilty). The SSD has been dropping offline randomly - it seems that one or more flash modules pop out of their sockets and need to be re-seated frequently for some reason. The most recent time it did that, I replaced the SSD with another one (for some reason, the manufacturer ties the flash modules to a particular controller, so just moving the modules results in an offline SSD and inability to manage it due to license limits exceeded or some such nonsense). ZFS wasn't happy with the log device being changed, and reported it as corrupted, with the suggested corrective action being to zpool clear it. I did that, and then did a zpool replace data da0 da0 and it claimed to successfully resilver it. I then did a zpool scrub and the scrub completed with no errors. So far, so good. However, any attempt to write to the array results in a near-immediate panic: panic: solaris assert: sm-sm_spare + size = sm-sm_size, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 93 cpuid=2 (Screenshot at http://www.tmk.com/transient/zfs-panic.png in case I mis-typed something). This is repeatable across reboot / scrub / test cycles. System is 8-STABLE as of Fri Nov 5 19:08:35 EDT 2010, on-disk pool is version 4/15, same as the kernel. I know that certain operations on log devices aren't supported until pool version 19 or thereabouts, but the error messages and zpool command results gave the impression that what I was doing was supported and worked (when it didn't). If this is truly a you can't do that in pool version 15, perhaps a warning could be added so users don't get fooled into thinking it worked? I can give a developer remote console / root access to the box if that would help. I have a couple days before I will need to nuke the pool and restore it from backups. Terry Kennedy http://www.tmk.com te...@tmk.com New York, NY USA ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on RELENG_7/i386
Quoting Dmitry Morozovsky ma...@rinet.ru (from Tue, 26 Jan 2010 01:16:28 +0300 (MSK)): On Mon, 25 Jan 2010, Dmitry Morozovsky wrote: DM PJD I had a crash durinc rsync to ZFS today: DM PJD DM PJD Do you have recent 7-STABLE? Not sure if it was the same before MFC, DM DM r...@woozle:/var/crash# uname -a DM FreeBSD woozle.rinet.ru 7.2-STABLE FreeBSD 7.2-STABLE #4: Mon Dec 14 12:40:43 DM MSK 2009 ma...@woozle.rinet.ru:/usr/obj/usr/src/sys/WOOZLE i386 DM DM I'll update to fresh sources and recheck, thanks. DM DM BTW, any thoughts of another topic I started a couple of weeks ago? Well, after updating to fresh system scrub finished without errors, and now rsync is running, now copied 15G out of 150. You may want to switch the checksum algorithm to fletcher4. It (fletcher4 the default instead of fletcher2) is one of the few changes between 8-stable and 7-stable in ZFS, which I didn't merge. Bye, Alexander. -- Officers' club: We don't know but we've been told, our beer on tap is mighty cold. http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on RELENG_7/i386
On Tue, 26 Jan 2010, Alexander Leidinger wrote: AL Well, after updating to fresh system scrub finished without errors, and AL now AL rsync is running, now copied 15G out of 150. AL AL You may want to switch the checksum algorithm to fletcher4. It (fletcher4 AL the default instead of fletcher2) is one of the few changes between 8-stable AL and 7-stable in ZFS, which I didn't merge. will do, thank you. is fletcher4 faster? -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on RELENG_7/i386
will do, thank you. is fletcher4 faster? Not necessarily. But it does work as a checksum much better. See following link for the details. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597 --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on RELENG_7/i386
On Tue, 26 Jan 2010, Artem Belevich wrote: AB will do, thank you. is fletcher4 faster? AB Not necessarily. But it does work as a checksum much better. See AB following link for the details. AB AB http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597 Yes, I already read some articles about fletcher checksums and related. Thanks. -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS panic on RELENG_7/i386
Dear colleagues, I had a crash durinc rsync to ZFS today: (kgdb) bt #0 doadump () at pcpu.h:196 #1 0xc050c688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0xc050c965 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0xc08e95ce in zfs_fuid_create (zfsvfs=0xc65c4800, id=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c:591 #4 0xc0910775 in zfs_freebsd_setattr (ap=0xf5baab64) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2888 #5 0xc06c6292 in VOP_SETATTR_APV (vop=0xc096e560, a=0xf5baab64) at vnode_if.c:583 #6 0xc05918e5 in setfown (td=0xc834fd80, vp=0xcac4b33c, uid=4294967294, gid=0) at vnode_if.h:315 #7 0xc05919bc in kern_lchown (td=0xc834fd80, path=0xbfbfccc8 Address 0xbfbfccc8 out of bounds, pathseg=UIO_USERSPACE, uid=-2, gid=0) at /usr/src/sys/kern/vfs_syscalls.c:2787 #8 0xc0591a4a in lchown (td=0xc834fd80, uap=0xf5baacfc) at /usr/src/sys/kern/vfs_syscalls.c:2770 #9 0xc06b10f5 in syscall (frame=0xf5baad38) at /usr/src/sys/i386/i386/trap.c:1101 #10 0xc0696b90 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262 Any other info needed? Thanks in advance! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on RELENG_7/i386
On Mon, Jan 25, 2010 at 10:04:20PM +0300, Dmitry Morozovsky wrote: Dear colleagues, I had a crash durinc rsync to ZFS today: Do you have recent 7-STABLE? Not sure if it was the same before MFC, probably not, because what you see is impossible in case of source I'm looking at. At the begining of zfs_fuid_create() function there is a check: if (!zfsvfs-z_use_fuids || !IS_EPHEMERAL(id) || fuid_idx != 0) return (id); And IS_EPHEMERAL() is defined as follows: #define IS_EPHEMERAL(x) (0) So it will always return here. #3 0xc08e95ce in zfs_fuid_create (zfsvfs=0xc65c4800, id=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c:591 -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpGXOZZRCate.pgp Description: PGP signature
Re: ZFS panic on RELENG_7/i386
On Mon, 25 Jan 2010, Pawel Jakub Dawidek wrote: PJD On Mon, Jan 25, 2010 at 10:04:20PM +0300, Dmitry Morozovsky wrote: PJD Dear colleagues, PJD PJD I had a crash durinc rsync to ZFS today: PJD PJD Do you have recent 7-STABLE? Not sure if it was the same before MFC, r...@woozle:/var/crash# uname -a FreeBSD woozle.rinet.ru 7.2-STABLE FreeBSD 7.2-STABLE #4: Mon Dec 14 12:40:43 MSK 2009 ma...@woozle.rinet.ru:/usr/obj/usr/src/sys/WOOZLE i386 I'll update to fresh sources and recheck, thanks. BTW, any thoughts of another topic I started a couple of weeks ago? PJD probably not, because what you see is impossible in case of source I'm PJD looking at. At the begining of zfs_fuid_create() function there is a PJD check: PJD PJDif (!zfsvfs-z_use_fuids || !IS_EPHEMERAL(id) || fuid_idx != 0) PJDreturn (id); PJD PJD And IS_EPHEMERAL() is defined as follows: PJD PJD#define IS_EPHEMERAL(x) (0) PJD PJD So it will always return here. PJD PJD #3 0xc08e95ce in zfs_fuid_create (zfsvfs=0xc65c4800, id=Unhandled dwarf PJD expression opcode 0x93 PJD ) PJD at PJD /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c:591 PJD PJD -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic on RELENG_7/i386
On Mon, 25 Jan 2010, Dmitry Morozovsky wrote: DM PJD I had a crash durinc rsync to ZFS today: DM PJD DM PJD Do you have recent 7-STABLE? Not sure if it was the same before MFC, DM DM r...@woozle:/var/crash# uname -a DM FreeBSD woozle.rinet.ru 7.2-STABLE FreeBSD 7.2-STABLE #4: Mon Dec 14 12:40:43 DM MSK 2009 ma...@woozle.rinet.ru:/usr/obj/usr/src/sys/WOOZLE i386 DM DM I'll update to fresh sources and recheck, thanks. DM DM BTW, any thoughts of another topic I started a couple of weeks ago? Well, after updating to fresh system scrub finished without errors, and now rsync is running, now copied 15G out of 150. Thank you! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS panic solaris assert: sm-sm_space loses pool on RELENG-7
Sometime on sunday our main server paniced with the following error: panic: solaris assert: sm-sm_space == space (0x5e45000 == 0x5e45600), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 361 I did some goolging and found a couple of refereces to other people who have seen this. Both of them, however, could not recover the pool and needed to restore all the data from backups (which I am in the process of). Soes anyone know anything more about this ? Specificly if it a known rpoblem which is fixed in 8.0 ? I couldn't find a PR of any kind, but the fact that a machine can spontaneously loose all it's data from a set of filesystems worries me greatly. cheers, -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
zfs panic mounting fs after crash with RC2
Hi, Yesterday I had the opportunity to play around with my yet-to-become new fileserver a bit more. Originally I had installed 7.2-R, which I upgraded to 8-0-RC2 yesterday. After that I upgraded my zpool consisting of 4 disks in raidz1 constallation to v13. Some time later I tried to use powerd which was obviously a bad idea: it crashed the machine immediately. I will give a separate report on that later as it is probably related to the hardware, which is a bit exotic (VIA VB8001 board with 64bit Via Nano processor). However, the worst thing for me is, that after rebooting from that crash, one of my zfs fs cannot be mounted anymore. As soon as I try to mount it I get a kernel panic. I can still access the properties (I made use of canmount=noauto for the first time :-), but I cannot do a snapshot of the fs (funny enough, zfs complains that the fs is busy, while in reality it is not even mounted - so how could it be busy?). I took a picture of the kernel panic and put it here (don't know if there is any useful information in it): http://www.pmp.uni-hannover.de/test/Mitarbeiter/g_kuehn/data/zfs-panic.jpg The pool as such seems to be fine, all other fs in it can be mounted and used, only trying to mount tank/sys/var triggers this panic. Are there any suggestions what I could do to get my fs back? Please let me know if (and how) I can provide more debugging information. cu Gerrit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
zfs panic
Hi, I got the following panic when rebooting after a crash on 7.2-REL: panic: solaris assert: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa ce_map.c, line: 341 This seems to be the same panic as mentioned here: http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043763.html. However, I did not see warnings about the ZIL. The crash leading to this situation was probably caused by me pushing the controller card a bit too hard (mechanically) during operation (well, so much about hot-plugging of cards :-). Since my pool was almost empty anyway and I needed the machine, I opted to recreate the pool instead of trying the patches supplied by pjd@ in the thread above. But nevertheless I would like to be prepared if this happens again (and the pool is not empty :-). Right now I am updating the system to 8.0-RC2. Will this issue go away with zpoolv13/FBSD8.0 (as suggested above)? I could not find out from the thread above if the suggested patches helped or if anything from this has been commited at all. Pawel or Daniel, do you remember what the final result was? cu Gerrit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
8.0-RC1/amd64, ZFS panic
panic: mtx_lock() of destroyed mutex @ /usr/src/sys/kern/vfs_subrc:2467 cpuid = 1 I was doing a zfs destroy -r of a dataset. The dataset has had many snapshot receives done. # uname -a FreeBSD 8.0-RC1 FreeBSD 8.0-RC1 #1: Tue Oct 13 14:11:08 CEST 2009 root@:/usr/obj/usr/src/sys/DEBUG amd64 (kernel config: added WITNESS, etc to have debugging information, doing some ZFS send/receive tests) It's a VMWare virtual machine, and I've frozen it. Borja. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: zfs/panic: short after rollback
km...@freebsd.org aka Kip Macy schrieb mit Datum Fri, 12 Jun 2009 13:54:40 -0700 in m2n.fbsd.stable: |show sleepchain |show thread 100263 | |On Fri, Jun 12, 2009 at 6:56 AM, Andriy Gapona...@icyb.net.ua wrote: | | I did zfs rollback x...@yyy | And then did ls on a directory in the rolled-back fs. | panic: sleeping thread This is quite likely the same problem as I experience. And it is maybe also the same problem as in kern/137037 and kern/129148. It seems to show up in some different flavours, while the bottomline is this: do a rollback, and soon after (usually at the next filesystem-related action) the kernel has gone fishing. I experienced it first when doing a rollback of a mounted filesystem. It crashed right after the first try, and it did so reproducible. (Well, more or less reproducible - another day under similar circumstances it did not crash.) Then I started thinking, and came to the conclusion that a rollback of a mounted filesystem (with possibly open files) could easily bring a lot of things into an undefined state, and should not be something one wants to do normally. So maybe it is not supposed to work at all. Anyway, when trying this, I do either get the sleeping thread message (as above), or a panic from _sx_xlock() (as shown in my addendum to kern/137037, and in the addendum to kern/129148). So I started to do rollbacks on unmounted filesystems (quite an excessive amount of them), and while this seemed to work at first, later on the system failures reappeared. These system failures took various shapes - I experienced immediate resets without dump, and system hangs. When deliberately trying to reproduce that (after installing a kernel with debugging info and watching the console), I also captured a panic coming from _sx_xlock() - so it seems to be the same problem as without unmounting, only that it takes a couple of rollbacks (a dozen or more) to hit. Over all, there was never any data loss or persistent damage. So, I consider rollback still functional and safe to use, but I consider a system no longer production stable after doing a rollback. rgds, PMc ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: ZFS panic in zfs_fuid_create
-Original Message- From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd- sta...@freebsd.org] On Behalf Of Andriy Gapon Sent: 28 May 2009 18:11 To: Lawrence Farr Cc: freebsd-stable@freebsd.org Subject: Re: ZFS panic in zfs_fuid_create on 27/05/2009 19:25 Lawrence Farr said the following: I updated my backup boxes to the latest and greatest ZFS code, and started getting the following panic on them all (3 machines): panic: zfs_fuid_create cpuid = 1 Uptime: 1h28m48s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort A quick google found kern/133020 with a patch from PJD that has fixed it for me. Should it be in stable or does it break something else? Hmm I wonder if you really do have UIDs or GIDs greater than 2147483647 defined on your system? Not that I could see. It's rsyncing from an EXT3 volume on a Linux server, that runs as an OSX fileserver. All the permissions/owners are mapped to Linux users. There are a lot of odd characters used in the filenames, but that's all I could see that was potentially an issue. Hasn't had a problem since I put that patch in, and I was getting a few minutes into the backup before it paniced previously. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS panic in zfs_fuid_create
on 27/05/2009 19:25 Lawrence Farr said the following: I updated my backup boxes to the latest and greatest ZFS code, and started getting the following panic on them all (3 machines): panic: zfs_fuid_create cpuid = 1 Uptime: 1h28m48s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort A quick google found kern/133020 with a patch from PJD that has fixed it for me. Should it be in stable or does it break something else? Hmm I wonder if you really do have UIDs or GIDs greater than 2147483647 defined on your system? -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ZFS panic in zfs_fuid_create
I updated my backup boxes to the latest and greatest ZFS code, and started getting the following panic on them all (3 machines): panic: zfs_fuid_create cpuid = 1 Uptime: 1h28m48s Cannot dump. No dump device defined. Automatic reboot in 15 seconds - press a key on the console to abort A quick google found kern/133020 with a patch from PJD that has fixed it for me. Should it be in stable or does it break something else? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RELENG_7/i386: ZFS panic on reboot
while rebooting: (kgdb) bt #0 doadump () at pcpu.h:196 #1 0x80514298 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x80514575 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x806a74d4 in trap_fatal (frame=0xbf5b9a24, eva=12) at /usr/src/sys/i386/i386/trap.c:939 #4 0x806a771d in trap_pfault (frame=0xbf5b9a24, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:852 #5 0x806a808a in trap (frame=0xbf5b9a24) at /usr/src/sys/i386/i386/trap.c:530 #6 0x8069016b in calltrap () at /usr/src/sys/i386/i386/exception.s:159 #7 0x80806610 in gfs_dir_create (struct_size=132, pvp=0x87b388a0, vfsp=0x87a93b40, ops=0x808817a0, entries=0x0, inode_cb=0, maxlen=256, readdir_cb=0x808636c6 zfsctl_snapdir_readdir_cb, lookup_cb=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:420 #8 0x80863420 in zfsctl_mknode_snapdir (pvp=0x87b388a0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:783 #9 0x808069e9 in gfs_dir_lookup (dvp=0x87b388a0, nm=0x8087dfae snapshot, vpp=0xbf5b9b60) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:630 #10 0x808630bc in zfsctl_root_lookup (dvp=0x87b388a0, nm=0x8087dfae snapshot, vpp=0xbf5b9b60, pnp=0x0, flags=0, rdir=0x0, cr=0x85e84000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:396 #11 0x808638fa in zfsctl_umount_snapshots (vfsp=0x87a93b40, fflags=524288, cr=0x85e84000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:1063 #12 0x8086b1dc in zfs_umount (vfsp=0x87a93b40, fflag=524288, td=0x85e8ecc0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:692 #13 0x80586ea4 in dounmount (mp=0x87a93b40, flags=524288, td=0x85e8ecc0) at /usr/src/sys/kern/vfs_mount.c:1293 #14 0x8058a4e8 in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:2944 #15 0x80514005 in boot (howto=16392) at /usr/src/sys/kern/kern_shutdown.c:400 #16 0x8051445d in reboot (td=0x85e8ecc0, uap=0xbf5b9cfc) at /usr/src/sys/kern/kern_shutdown.c:172 #17 0x806a7a60 in syscall (frame=0xbf5b9d38) at /usr/src/sys/i386/i386/trap.c:1090 #18 0x806901d0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #19 0x0033 in ?? () Any additional info needed? Thanks! -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: ma...@freebsd.org ] *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru *** ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic
On Tue, Feb 17, 2009 at 09:43:31PM -0800, Cy Schubert wrote: I got this panic after issuing reboot(8). FreeBSD 7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG i386 FreeBSD/i386 (bob) (ttyd0) login: Feb 17 21:22:56 bob reboot: rebooted by root Feb 17 21:22:56 bob syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done All buffers synced. panic: insmntque() failed: error 16 cpuid = 0 KDB: enter: panic [thread pid 1086 tid 100090 ] Stopped at kdb_enter_why+0x3a: movl$0,kdb_why db bt Tracing pid 1086 tid 100090 td 0xc2bfd230 kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at kdb_enter_why+0x3a panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136 gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at gfs_file_create+0x86 gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at zfsctl_mknode_snapdir+0x53 gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at gfs_dir_lookup+0xd1 zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at zfsctl_root_lookup+0xdc zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at zfsctl_umount_snapshots+0x4e zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0x53 dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430 vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b syscall(ebf8dd38) at syscall+0x2b3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (55, FreeBSD ELF32, reboot), eip = 0x280bc947, esp = 0xbfbfeb7c, ebp = 0xbfbfebb8 --- db Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates the panic. The patch below would fix the problem, unless I mis-merged it. Please note that I cannot test the patch myself, so I rely on ZFS users testing before the commit. Property changes on: . ___ Modified: svn:mergeinfo Merged /head/sys:r182781,182824,182840 Property changes on: dev/cxgb ___ Modified: svn:mergeinfo Merged /head/sys/dev/cxgb:r182781,182824,182840 Property changes on: dev/ath/ath_hal ___ Modified: svn:mergeinfo Merged /head/sys/dev/ath/ath_hal:r182781,182824,182840 Property changes on: contrib/pf ___ Modified: svn:mergeinfo Merged /head/sys/contrib/pf:r182781,182824,182840 Index: cddl/contrib/opensolaris/uts/common/fs/gfs.c === --- cddl/contrib/opensolaris/uts/common/fs/gfs.c(revision 188748) +++ cddl/contrib/opensolaris/uts/common/fs/gfs.c(working copy) @@ -358,6 +358,7 @@ fp = kmem_zalloc(size, KM_SLEEP); error = getnewvnode(zfs, vfsp, ops, vp); ASSERT(error == 0); + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread); vp-v_data = (caddr_t)fp; /* @@ -368,7 +369,9 @@ fp-gfs_size = size; fp-gfs_type = GFS_FILE; + vp-v_vflag |= VV_FORCEINSMQ; error = insmntque(vp, vfsp); + vp-v_vflag = ~VV_FORCEINSMQ; KASSERT(error == 0, (insmntque() failed: error %d, error)); /* Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c === --- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c (revision 188748) +++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c (working copy) @@ -113,6 +113,7 @@ if (cdrarg != NULL) { error = getnewvnode(zfs, vfsp, zfs_vnodeops, vp); ASSERT(error == 0); + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread); zp-z_vnode = vp; vp-v_data = (caddr_t)zp; vp-v_vnlock-lk_flags |= LK_CANRECURSE; @@ -348,7 +349,9 @@ if (vp == NULL) return (zp); + vp-v_vflag |= VV_FORCEINSMQ; error = insmntque(vp, zfsvfs-z_vfs); + vp-v_vflag = ~VV_FORCEINSMQ; KASSERT(error == 0, (insmntque() failed: error %d, error)); vp-v_type = IFTOVT((mode_t)zp-z_phys-zp_mode); @@ -535,8 +538,10 @@ *zpp = zp; } else { - if (ZTOV(zp) != NULL) + if (ZTOV(zp) != NULL) { ZTOV(zp)-v_count = 0; +
Re: ZFS Panic
In message 20090218162126.gq41...@deviant.kiev.zoral.com.ua, Kostik Belousov writes: --v+Mbu5iuT/5Blw/K Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 17, 2009 at 09:43:31PM -0800, Cy Schubert wrote: I got this panic after issuing reboot(8). =20 FreeBSD 7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 = =20 c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG i386 =20 =20 FreeBSD/i386 (bob) (ttyd0) =20 login: Feb 17 21:22:56 bob reboot: rebooted by root Feb 17 21:22:56 bob syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done All buffers synced. panic: insmntque() failed: error 16 cpuid =3D 0 KDB: enter: panic [thread pid 1086 tid 100090 ] Stopped at kdb_enter_why+0x3a: movl$0,kdb_why db bt Tracing pid 1086 tid 100090 td 0xc2bfd230 kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at=20 kdb_enter_why+0x3a panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136 gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at=20 gfs_file_create+0x86 gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at=20 zfsctl_mknode_snapdir+0x53 gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at=20 gfs_dir_lookup+0xd1 zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at=20 zfsctl_root_lookup+0xdc zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at=20 zfsctl_umount_snapshots+0x4e zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0= x53 dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430 vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b syscall(ebf8dd38) at syscall+0x2b3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (55, FreeBSD ELF32, reboot), eip =3D 0x280bc947, esp =3D=20 0xbfbfeb7c, ebp =3D 0xbfbfebb8 --- db=20 =20 Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates= =20 the panic. The patch below would fix the problem, unless I mis-merged it. Please note that I cannot test the patch myself, so I rely on ZFS users testing before the commit. Property changes on: . ___ Modified: svn:mergeinfo Merged /head/sys:r182781,182824,182840 Property changes on: dev/cxgb ___ Modified: svn:mergeinfo Merged /head/sys/dev/cxgb:r182781,182824,182840 Property changes on: dev/ath/ath_hal ___ Modified: svn:mergeinfo Merged /head/sys/dev/ath/ath_hal:r182781,182824,182840 Property changes on: contrib/pf ___ Modified: svn:mergeinfo Merged /head/sys/contrib/pf:r182781,182824,182840 Index: cddl/contrib/opensolaris/uts/common/fs/gfs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- cddl/contrib/opensolaris/uts/common/fs/gfs.c (revision 188748) +++ cddl/contrib/opensolaris/uts/common/fs/gfs.c (working copy) @@ -358,6 +358,7 @@ fp =3D kmem_zalloc(size, KM_SLEEP); error =3D getnewvnode(zfs, vfsp, ops, vp); ASSERT(error =3D=3D 0); + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread); vp-v_data =3D (caddr_t)fp; =20 /* @@ -368,7 +369,9 @@ fp-gfs_size =3D size; fp-gfs_type =3D GFS_FILE; =20 + vp-v_vflag |=3D VV_FORCEINSMQ; error =3D insmntque(vp, vfsp); + vp-v_vflag =3D ~VV_FORCEINSMQ; KASSERT(error =3D=3D 0, (insmntque() failed: error %d, error)); =20 /* Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(revision 18874 8) +++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(working copy) @@ -113,6 +113,7 @@ if (cdrarg !=3D NULL) { error =3D getnewvnode(zfs, vfsp, zfs_vnodeops, vp); ASSERT(error =3D=3D 0); + vn_lock(vp, LK_EXCLUSIVE |
Re: ZFS Panic
In message 20090218162126.gq41...@deviant.kiev.zoral.com.ua, Kostik Belousov writes: --v+Mbu5iuT/5Blw/K Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Feb 17, 2009 at 09:43:31PM -0800, Cy Schubert wrote: I got this panic after issuing reboot(8). =20 FreeBSD 7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 = =20 c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG i386 =20 =20 FreeBSD/i386 (bob) (ttyd0) =20 login: Feb 17 21:22:56 bob reboot: rebooted by root Feb 17 21:22:56 bob syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done All buffers synced. panic: insmntque() failed: error 16 cpuid =3D 0 KDB: enter: panic [thread pid 1086 tid 100090 ] Stopped at kdb_enter_why+0x3a: movl$0,kdb_why db bt Tracing pid 1086 tid 100090 td 0xc2bfd230 kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at=20 kdb_enter_why+0x3a panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136 gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at=20 gfs_file_create+0x86 gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at=20 zfsctl_mknode_snapdir+0x53 gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at=20 gfs_dir_lookup+0xd1 zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at=20 zfsctl_root_lookup+0xdc zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at=20 zfsctl_umount_snapshots+0x4e zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0= x53 dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430 vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b syscall(ebf8dd38) at syscall+0x2b3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (55, FreeBSD ELF32, reboot), eip =3D 0x280bc947, esp =3D=20 0xbfbfeb7c, ebp =3D 0xbfbfebb8 --- db=20 =20 Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates= =20 the panic. The patch below would fix the problem, unless I mis-merged it. Please note that I cannot test the patch myself, so I rely on ZFS users testing before the commit. Property changes on: . ___ Modified: svn:mergeinfo Merged /head/sys:r182781,182824,182840 Property changes on: dev/cxgb ___ Modified: svn:mergeinfo Merged /head/sys/dev/cxgb:r182781,182824,182840 Property changes on: dev/ath/ath_hal ___ Modified: svn:mergeinfo Merged /head/sys/dev/ath/ath_hal:r182781,182824,182840 Property changes on: contrib/pf ___ Modified: svn:mergeinfo Merged /head/sys/contrib/pf:r182781,182824,182840 Index: cddl/contrib/opensolaris/uts/common/fs/gfs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- cddl/contrib/opensolaris/uts/common/fs/gfs.c (revision 188748) +++ cddl/contrib/opensolaris/uts/common/fs/gfs.c (working copy) @@ -358,6 +358,7 @@ fp =3D kmem_zalloc(size, KM_SLEEP); error =3D getnewvnode(zfs, vfsp, ops, vp); ASSERT(error =3D=3D 0); + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread); vp-v_data =3D (caddr_t)fp; =20 /* @@ -368,7 +369,9 @@ fp-gfs_size =3D size; fp-gfs_type =3D GFS_FILE; =20 + vp-v_vflag |=3D VV_FORCEINSMQ; error =3D insmntque(vp, vfsp); + vp-v_vflag =3D ~VV_FORCEINSMQ; KASSERT(error =3D=3D 0, (insmntque() failed: error %d, error)); =20 /* Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(revision 18874 8) +++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(working copy) @@ -113,6 +113,7 @@ if (cdrarg !=3D NULL) { error =3D getnewvnode(zfs, vfsp, zfs_vnodeops, vp); ASSERT(error =3D=3D 0); + vn_lock(vp, LK_EXCLUSIVE |
ZFS Panic
I got this panic after issuing reboot(8). FreeBSD 7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG i386 FreeBSD/i386 (bob) (ttyd0) login: Feb 17 21:22:56 bob reboot: rebooted by root Feb 17 21:22:56 bob syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done All buffers synced. panic: insmntque() failed: error 16 cpuid = 0 KDB: enter: panic [thread pid 1086 tid 100090 ] Stopped at kdb_enter_why+0x3a: movl$0,kdb_why db bt Tracing pid 1086 tid 100090 td 0xc2bfd230 kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at kdb_enter_why+0x3a panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136 gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at gfs_file_create+0x86 gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at zfsctl_mknode_snapdir+0x53 gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at gfs_dir_lookup+0xd1 zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at zfsctl_root_lookup+0xdc zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at zfsctl_umount_snapshots+0x4e zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0x53 dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430 vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b syscall(ebf8dd38) at syscall+0x2b3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (55, FreeBSD ELF32, reboot), eip = 0x280bc947, esp = 0xbfbfeb7c, ebp = 0xbfbfebb8 --- db Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates the panic. -- Cheers, Cy Schubert cy.schub...@komquats.com FreeBSD UNIX: c...@freebsd.org Web: http://www.FreeBSD.org e**(i*pi)+1=0 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS Panic
Cy Schubert wrote: I got this panic after issuing reboot(8). FreeBSD 7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG i386 FreeBSD/i386 (bob) (ttyd0) login: Feb 17 21:22:56 bob reboot: rebooted by root Feb 17 21:22:56 bob syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done All buffers synced. panic: insmntque() failed: error 16 cpuid = 0 KDB: enter: panic [thread pid 1086 tid 100090 ] Stopped at kdb_enter_why+0x3a: movl$0,kdb_why db bt Tracing pid 1086 tid 100090 td 0xc2bfd230 kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at kdb_enter_why+0x3a panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136 gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at gfs_file_create+0x86 gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at zfsctl_mknode_snapdir+0x53 gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at gfs_dir_lookup+0xd1 zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at zfsctl_root_lookup+0xdc zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at zfsctl_umount_snapshots+0x4e zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0x53 dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430 vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b syscall(ebf8dd38) at syscall+0x2b3 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (55, FreeBSD ELF32, reboot), eip = 0x280bc947, esp = 0xbfbfeb7c, ebp = 0xbfbfebb8 --- db Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates the panic. I have experienced ZFS related panic with RELEN_7 in November last year and got a fix from k...@. But I'm not quite sure whether yours and mine are the same case, but might help following patch (for /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c and /usr/src/sys/cddl/compat/opensolaris/sys/vnode.h): http://lists.freebsd.org/pipermail/freebsd-stable/2008-November/046752.html Ganbold -- I was the best I ever had. -- Woody Allen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Reliably trigger-able ZFS panic
LI Xin wrote: Hi, The following iozone test case on ZFS would reliably trigger panic: /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C It can also be (eventually) triggered by blogbench -c 100 -i 30 -r 50 -w 10 -W 10 and heavy IO load on real multithreaded applications like mysql (both iozone and blogbench are multithreaded). signature.asc Description: OpenPGP digital signature
Re: Reliably trigger-able ZFS panic
On Sun, Mar 02, 2008 at 03:49:03AM -0800, LI Xin wrote: Hi, The following iozone test case on ZFS would reliably trigger panic: /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C Thanks, I'll try to reproduce it. [...] #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, level=70109184, method=68351768, windowBits=68351600, memLevel=76231808, strategy=76231808, version=Cannot access memory at address 0x00040010 ) at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318 Can you send me your FS configuration? zfs get all your/file/system I see that you use compression on this dataset? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpBXXAkxJmmK.pgp Description: PGP signature
Re: Reliably trigger-able ZFS panic
Pawel Jakub Dawidek wrote: On Sun, Mar 02, 2008 at 03:49:03AM -0800, LI Xin wrote: Hi, The following iozone test case on ZFS would reliably trigger panic: /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C Thanks, I'll try to reproduce it. [...] #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, level=70109184, method=68351768, windowBits=68351600, memLevel=76231808, strategy=76231808, version=Cannot access memory at address 0x00040010 ) at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318 Can you send me your FS configuration? zfs get all your/file/system I see that you use compression on this dataset? It was all default configuration. The pool was a RAID-Z2 without hotspare disk. The box is now running some other tests (not FreeBSD) at our Beijing Lab and we don't have remote hands in the nights, so I'm afraid that I will not be able to provide further information at this moment. Please let me know if the test run will not provoke the problem and I will ask them to see if they can spare the box in the weekend for me. Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Reliably trigger-able ZFS panic
Tue, 04 Mar 2008 03:27:35 +0800,Xin LI [EMAIL PROTECTED]: The kernel is FreeBSD fs12.sina.com.cn 7.0-STABLE FreeBSD 7.0-STABLE #0: Sun Mar 2 18:50:05 CST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ZFORK amd64 the get all at below: fs12# zfs get all NAME PROPERTY VALUE SOURCE midpool type filesystem - midpool creation Fri Feb 29 15:01 2008 - midpool used 11.1M - midpool available 2.65T - midpool referenced 44.7K - midpool compressratio 1.00x - midpool mountedyes- midpool quota none default midpool reservationnone default midpool recordsize 128K default midpool mountpoint /mnt/ztest local midpool sharenfs offdefault midpool checksum on default midpool compressionoffdefault midpool atime on default midpool deviceson default midpool exec on default midpool setuid on default midpool readonly offdefault midpool jailed offdefault midpool snapdirhidden default midpool aclmodegroupmask default midpool aclinherit secure default midpool canmount on default midpool shareiscsi offdefault midpool xattr offtemporary midpool copies 1 default fs12# zpool get all midpool NAME PROPERTY VALUE SOURCE midpool bootfs- default Pawel Jakub Dawidek wrote: On Sun, Mar 02, 2008 at 03:49:03AM -0800, LI Xin wrote: Hi, The following iozone test case on ZFS would reliably trigger panic: /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C Thanks, I'll try to reproduce it. [...] #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, level=70109184, method=68351768, windowBits=68351600, memLevel=76231808, strategy=76231808, version=Cannot access memory at address 0x00040010 ) at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318 Can you send me your FS configuration? zfs get all your/file/system I see that you use compression on this dataset? It was all default configuration. The pool was a RAID-Z2 without hotspare disk. The box is now running some other tests (not FreeBSD) at our Beijing Lab and we don't have remote hands in the nights, so I'm afraid that I will not be able to provide further information at this moment. Please let me know if the test run will not provoke the problem and I will ask them to see if they can spare the box in the weekend for me. Cheers, -- The Power to Serve ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Reliably trigger-able ZFS panic
Hi, The following iozone test case on ZFS would reliably trigger panic: /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g -i 0 -i 1 -i 2 -i 8 -+p 70 -C Unfortunately the kgdb can not reveal useful backtrace. I have tried KDB_TRACE, but have not yet be able to further investigate it. fs12# kgdb /boot/kernel/kernel.symbols vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd. Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x8:0x80763d16 stack pointer = 0x10:0xd94798f0 frame pointer = 0x10:0xd9479920 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 340 (txg_thread_enter) trap number = 12 panic: page fault cpuid = 5 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a trap_fatal() at trap_fatal+0x29f trap_pfault() at trap_pfault+0x294 trap() at trap+0x2ea calltrap() at calltrap+0x8 --- trap 0xc, rip = 0x80763d16, rsp = 0xd94798f0, rbp = 0xd9479920 --- dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x26 dmu_objset_sync() at dmu_objset_sync+0x12d dsl_pool_sync() at dsl_pool_sync+0x72 spa_sync() at spa_sync+0x390 txg_sync_thread() at txg_sync_thread+0x12f fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xd9479d30, rbp = 0 --- Uptime: 25m7s Physical memory: 4081 MB Dumping 1139 MB: 1124 1108 1092 1076 1060 1044 1028 1012 996 980 964 948 932 916 900 884 868 852 836 820 804 788 772 756 740 724 708 692 676 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 356 340 324 308 292 276 260 244 228 212 196 180 164 148 132 116 100 84 68 52 36 20 4 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) add-symbol-file /boot/kernel/zfs.ko.symbols add symbol table from file /boot/kernel/zfs.ko.symbols at (y or n) y Reading symbols from /boot/kernel/zfs.ko.symbols...done. (kgdb) where #0 doadump () at pcpu.h:194 #1 0x80277aa8 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0x80277f07 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0x80465a1f in trap_fatal (frame=0xc, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:724 #4 0x80465e04 in trap_pfault (frame=0xd9479840, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #5 0x8046677a in trap (frame=0xd9479840) at /usr/src/sys/amd64/amd64/trap.c:410 #6 0x8044babe in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #7 0x80763d16 in ?? () #8 0x0004 in adjust_ace_pair () #9 0x0004 in adjust_ace_pair () #10 0xd94799e0 in ?? () #11 0x80763e7d in ?? () #12 0xff0004275a80 in ?? () #13 0xff00045a1190 in ?? () #14 0x807639b0 in ?? () #15 0x80763f20 in ?? () #16 0xff00042dc800 in ?? () #17 0x0004 in adjust_ace_pair () #18 0xd9479990 in ?? () #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, level=70109184, method=68351768, windowBits=68351600, memLevel=76231808, strategy=76231808, version=Cannot access memory at address 0x00040010 ) at /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318 Previous frame inner to this frame (corrupt stack?) -- Xin LI [EMAIL PROTECTED]http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature