Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:09 AM, mike tancsa wrote:
> On 12/22/2020 10:07 AM, Mark Johnston wrote:
>> Could you go to frame 11 and print zone->uz_name and
>> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
>> somehow.
> Thank you for looking!
>
> (kgdb) frame 11
>
> #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
> bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
> 758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
> bucket->ub_cnt);
> (kgdb) p zone->uz_name
> $1 = 0x8102118a "mbuf_jumbo_9k"
> (kgdb) p bucket->ub_bucket[18]
> $2 = (void *) 0xf80de4654000
> (kgdb) p bucket->ub_bucket   
> $3 = 0xf801c7fd5218
>
> (kgdb)
>
Not sure if its coincidence or not, but previously I was running with
arc being limited to ~30G of the 64G of RAM on the box.  I removed that
limit a few weeks ago after upgrading the box to RELENG_12 to pull in
the OpenSSL changes.  The panic seems to happen under disk load. I have
3 zfs pools that are pretty busy receiving snapshots. One day a week, we
write a full set to a 4th zfs pool off some geli attached drives via USB
for offsite cold storage.  The crashes happened with that extra level of
disk work.  gstat shows most of the 12 drives off 2 mrsas controllers at
or close to 100% busy during the 18hrs it takes to dump out the files.

Trying a new cold storage run now with the arc limit back to
vfs.zfs.arc_max=29334498304

    ---Mike



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:07 AM, Mark Johnston wrote:
>
> Could you go to frame 11 and print zone->uz_name and
> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
> somehow.

Thank you for looking!

(kgdb) frame 11

#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
bucket->ub_cnt);
(kgdb) p zone->uz_name
$1 = 0x8102118a "mbuf_jumbo_9k"
(kgdb) p bucket->ub_bucket[18]
$2 = (void *) 0xf80de4654000
(kgdb) p bucket->ub_bucket   
$3 = 0xf801c7fd5218

(kgdb)

    ---Mike

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread Mark Johnston
On Tue, Dec 22, 2020 at 09:05:01AM -0500, mike tancsa wrote:
> Hmmm, another one. Not sure if this is hardware as it seems different ?
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 11; apic id = 0b
> fault virtual address   = 0x0
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x80ca0826
> stack pointer   = 0x28:0xfe00bc0f8540
> frame pointer   = 0x28:0xfe00bc0f8590
> code segment    = base 0x0, limit 0xf, type 0x1b
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process = 33 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 11
> time = 1608641071
> KDB: stack backtrace:
> #0 0x80a3fe85 at kdb_backtrace+0x65
> #1 0x809f406b at vpanic+0x17b
> #2 0x809f3ee3 at panic+0x43
> #3 0x80e3fe71 at trap_fatal+0x391
> #4 0x80e3fecf at trap_pfault+0x4f
> #5 0x80e3f516 at trap+0x286
> #6 0x80e19318 at calltrap+0x8
> #7 0x80ca47d4 at bucket_cache_drain+0x134
> #8 0x80c9e302 at zone_drain_wait+0xa2
> #9 0x80ca2bbd at uma_reclaim_locked+0x6d
> #10 0x80ca2af4 at uma_reclaim+0x34
> #11 0x80cc5321 at vm_pageout_worker+0x421
> #12 0x80cc4ee3 at vm_pageout+0x193
> #13 0x809b55be at fork_exit+0x7e
> #14 0x80e1a34e at fork_trampoline+0xe
> Uptime: 5d20h37m16s
> Dumping 16057 out of 65398
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) bt

Could you go to frame 11 and print zone->uz_name and
bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
somehow.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
Hmmm, another one. Not sure if this is hardware as it seems different ?



Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address   = 0x0
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80ca0826
stack pointer   = 0x28:0xfe00bc0f8540
frame pointer   = 0x28:0xfe00bc0f8590
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 33 (dom0)
trap number = 12
panic: page fault
cpuid = 11
time = 1608641071
KDB: stack backtrace:
#0 0x80a3fe85 at kdb_backtrace+0x65
#1 0x809f406b at vpanic+0x17b
#2 0x809f3ee3 at panic+0x43
#3 0x80e3fe71 at trap_fatal+0x391
#4 0x80e3fecf at trap_pfault+0x4f
#5 0x80e3f516 at trap+0x286
#6 0x80e19318 at calltrap+0x8
#7 0x80ca47d4 at bucket_cache_drain+0x134
#8 0x80c9e302 at zone_drain_wait+0xa2
#9 0x80ca2bbd at uma_reclaim_locked+0x6d
#10 0x80ca2af4 at uma_reclaim+0x34
#11 0x80cc5321 at vm_pageout_worker+0x421
#12 0x80cc4ee3 at vm_pageout+0x193
#13 0x809b55be at fork_exit+0x7e
#14 0x80e1a34e at fork_trampoline+0xe
Uptime: 5d20h37m16s
Dumping 16057 out of 65398
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x80ca0826 in slab_free_item (keg=0xf800037fa380,
slab=0xf80de4656fb0, item=) at
/usr/src/sys/vm/uma_core.c:3357
#10 zone_release (zone=, bucket=0xf801c7fd5218,
cnt=) at /usr/src/sys/vm/uma_core.c:3404
#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
#12 bucket_cache_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:915
#13 0x80c9e302 in zone_drain_wait (zone=0xf800037da000,
waitok=1) at /usr/src/sys/vm/uma_core.c:1037
#14 0x80ca2bbd in zone_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:1056
#15 zone_foreach (zfunc=) at /usr/src/sys/vm/uma_core.c:1985
#16 uma_reclaim_locked (kmem_danger=) at
/usr/src/sys/vm/uma_core.c:3737
#17 0x80ca2af4 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:3757
#18 0x80cc5321 in vm_pageout_lowmem () at
/usr/src/sys/vm/vm_pageout.c:1890
#19 vm_pageout_worker (arg=) at
/usr/src/sys/vm/vm_pageout.c:1966
#20 0x80cc4ee3 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:2126
#21 0x809b55be in fork_exit (callout=0x80cc4d50
, arg=0x0, frame=0xfe00bc0f8b00) at
/usr/src/sys/kern/kern_fork.c:1080
#22 
(kgdb) bt full
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    td = 
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
    error = 
    coredump = 
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
    once = 
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
    buf = "page fault", '\000' 
    other_cpus = {__bits = {2047, 0, 0, 0}}
    td = 0xf80004964740
    newpanic = 
    bootopt = 
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
    ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0xfe00bc0f82c0, reg_save_area = 0xfe00bc0f8260}}
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
    softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27,
ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1}
    code = 
    type = 
    ss = 40
    handled = 
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
    td = 0xf80004964740
    p = 
    eva = 0
    map = 
    ftype = 
    rv = 
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
    ksi = {ksi_link = {tqe_next = 

zfs panic RELENG_12

2020-12-15 Thread mike tancsa
Was doing a backup via zfs send | zfs recv when the box panic'd.  Its a
not so old RELENG_12 box from last week. Any ideas if this is a hardware
issue or a bug ? Its r368493 from last Wednesday. I dont see an ECC
errors logged, so dont think its hardware.

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x0
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x823a554b
stack pointer   = 0x28:0xfe0343231000
frame pointer   = 0x28:0xfe03432310c0
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 87427 (zfs)
trap number = 12
panic: page fault
cpuid = 1
time = 1608065221
KDB: stack backtrace:
#0 0x80a3fa05 at kdb_backtrace+0x65
#1 0x809f3beb at vpanic+0x17b
#2 0x809f3a63 at panic+0x43
#3 0x80e400d1 at trap_fatal+0x391
#4 0x80e4012f at trap_pfault+0x4f
#5 0x80e3f776 at trap+0x286
#6 0x80e19568 at calltrap+0x8
#7 0x82393a5e at dmu_object_info+0x1e
#8 0x823983a5 at dmu_recv_stream+0x7b5
#9 0x8244b706 at zfs_ioc_recv+0xac6
#10 0x8244dd3d at zfsdev_ioctl+0x62d
#11 0x808a35e0 at devfs_ioctl+0xb0
#12 0x80f3becb at VOP_IOCTL_APV+0x7b
#13 0x80ad1b0a at vn_ioctl+0x16a
#14 0x808a3bce at devfs_ioctl_f+0x1e
#15 0x80a5d807 at kern_ioctl+0x2b7
#16 0x80a5d4aa at sys_ioctl+0xfa
#17 0x80e40c87 at amd64_syscall+0x387
Uptime: 3d14h59m52s
Dumping 17213 out of 65366
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=)
    at /usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3805 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f3c43 in vpanic (fmt=, ap=)
    at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3a63 in panic (fmt=)
    at /usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e400d1 in trap_fatal (frame=0xfe0343230f40, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e4012f in trap_pfault (frame=0xfe0343230f40,
    usermode=, signo=, ucode=)
    at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f776 in trap (frame=0xfe0343230f40)
    at /usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x823a554b in dnode_hold_impl (os=0xf805e1d2b800,
    object=, flag=, slots=,
    tag=, dnp=0xfe03432310d8)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1370
#10 0x82393a5e in dmu_object_info (os=0xf80777890070,
    object=18446744071600721588, doi=0xfe03432312e0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:2615
#11 0x823983a5 in receive_read_record (ra=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2821
#12 dmu_recv_stream (drc=0xfe0343231430, fp=,
    voffp=, cleanup_fd=8, action_handlep=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:3203
#13 0x8244b706 in zfs_ioc_recv (zc=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4745
#14 0x8244dd3d in zfsdev_ioctl (dev=,
    zcmd=, arg=, flag=,
    td=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6956
#15 0x808a35e0 in devfs_ioctl (ap=0xfe0343231778)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:797
#16 0x80f3becb in VOP_IOCTL_APV (
    vop=0x816a2fe0 , a=0xfe0343231778)
    at vnode_if.c:1067
#17 0x80ad1b0a in vn_ioctl (fp=0xf8001802b5a0,
    com=, data=0xfe0343231910,
    active_cred=0xf80032214300, td=0x2070)
    at /usr/src/sys/kern/vfs_vnops.c:1508
#18 0x808a3bce in devfs_ioctl_f (fp=0xf80777890070,
    com=18446744071600721588, data=0x824e34ed <.L.str+1>, cred=0x0,
    td=0xf8029885) at /usr/src/sys/fs/devfs/devfs_vnops.c:755
#19 0x80a5d807 in fo_ioctl (fp=0xf8001802b5a0, com=3222821403,
    data=0x824e34ed <.L.str+1>, active_cred=0x0,
    td=0xf8029885) at /usr/src/sys/sys/file.h:337
#20 kern_ioctl (td=0x2070, fd=, com=3222821403,
    data=0x824e34ed <.L.str+1> "zrl->zr_mtx")
    at /usr/src/sys/kern/sys_generic.c:805
#21 0x80a5d4aa in sys_ioctl (td=0xf8029885,
    uap=0xf802988503c0) at /usr/src/sys/kern/sys_generic.c:713
#22 0x80e40c87 in syscallenter (td=0xf8029885)
    at 

[Bug 235683] [zfs] Panic during data access or scrub on 12.0-STABLE r343904 (blkptr at DVA 0 has invalid OFFSET)

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683

Rodney W. Grimes  changed:

   What|Removed |Added

 CC|sta...@freebsd.org  |rgri...@freebsd.org

--- Comment #1 from Rodney W. Grimes  ---
Please do not put bugs on stable@, current@, hackers@, etc

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235683] [zfs] Panic during data access or scrub on 12.0-STABLE r343904 (blkptr at DVA 0 has invalid OFFSET)

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683

Andriy Voskoboinyk  changed:

   What|Removed |Added

 CC||a...@freebsd.org
Summary|ZFS kernel panic when   |[zfs] Panic during data
   |access to data or scrub |access or scrub on
   ||12.0-STABLE r343904 (blkptr
   ||at  DVA 0 has invalid
   ||OFFSET)
   Keywords||panic
   Assignee|b...@freebsd.org|f...@freebsd.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS panic, ARC compression?

2018-04-02 Thread Bob Bishop
Hi,

Anyone offer any suggestions about this?

kernel: panic: solaris assert: arc_decompress(buf) == 0 (0x5 == 0x0), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, line: 4923
kernel: cpuid = 1
kernel: KDB: stack backtrace:
kernel: #0 0x80aadac7 at kdb_backtrace+0x67
kernel: #1 0x80a6bba6 at vpanic+0x186
kernel: #2 0x80a6ba13 at panic+0x43
kernel: #3 0x8248023c at assfail3+0x2c  
kernel: #4 0x8218e2e0 at arc_read+0x9f0
kernel: #5 0x82198e5e at dbuf_read+0x69e
kernel: #6 0x821b3db4 at dnode_hold_impl+0x194
kernel: #7 0x821a11dd at dmu_bonus_hold+0x1d
kernel: #8 0x8220fb05 at zfs_zget+0x65
kernel: #9 0x82227d42 at zfs_dirent_lookup+0x162
kernel: #10 0x82227e07 at zfs_dirlook+0x77
kernel: #11 0x8223fcea at zfs_lookup+0x44a   
kernel: #12 0x822403fd at zfs_freebsd_lookup+0x6d
kernel: #13 0x8104b963 at VOP_CACHEDLOOKUP_APV+0x83
kernel: #14 0x80b13816 at vfs_cache_lookup+0xd6
kernel: #15 0x8104b853 at VOP_LOOKUP_APV+0x83
kernel: #16 0x80b1d151 at lookup+0x701  
kernel: #17 0x80b1c606 at namei+0x486

Roughly 24 hours earlier (during the scrub), there was:

ZFS: vdev state changed, pool_guid=11921811386284628759 
vdev_guid=1644286782598989949
ZFS: vdev state changed, pool_guid=11921811386284628759 
vdev_guid=17800276530669255627

% uname -a
FreeBSD xxx 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 
06:12:40 UTC 2017 
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
%
% zpool status
  pool: zroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 15.7M in 2h37m with 1 errors on Sun Apr  1 09:44:39 2018
config:

NAMESTATE READ WRITE CKSUM
zroot   ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
ada0p4  ONLINE   0 0 0
ada1p4  ONLINE   0 0 0

errors: 1 data errors, use '-v' for a list
%

The affected file (in a snapshot) is unimportant.

This pool is a daily rsync backup and contains about 120 snapshots.

No device or SMART errors were logged.

--
Bob Bishop
r...@gid.co.uk




___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 9.2PRERELEASE ZFS panic in lzjb_compress

2013-09-20 Thread olivier
Got another, very similar panic again on recent 9-STABLE (r255602); I
assume the latest 9.2 release candidate is affected too. Anybody have any
idea of what could be causing this, and of a workaround other than turning
compression off?
Unlike the last panic I reported, this one did not occur during a zfs
send/receive operation. There were just a number of processes potentially
writing to disk at the same time.
All hardware is healthy as far as I can tell (memory is ECC and no errors
in logs; zpool status and smartctl show no problems).

Fatal trap 12: page fault while in kernel mode


cpuid = 4; apic id = 24
cpuid = 51; apic id = 83
fault virtual address = 0xff8700a9cc65
fault virtual address = 0xff8700ab0ea9
fault code = supervisor read data, page not present

instruction pointer = 0x20:0x8195ff47
fault code = supervisor read data, page not present
stack pointer= 0x28:0xffcf951390a0
Fatal trap 12: page fault while in kernel mode
frame pointer= 0x28:0xffcf951398f0
Fatal trap 12: page fault while in kernel mode
code segment = base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
instruction pointer = 0x20:0x8195ffa4
stack pointer= 0x28:0xffcf951250a0
processor eflags = frame pointer= 0x28:0xffcf951258f0
interrupt enabled, code segment = base 0x0, limit 0xf, type 0x1b

resume, IOPL = 0
cpuid = 28; apic id = 4c
Fatal trap 12: page fault while in kernel mode
= DPL 0, pres 1, long 1, def32 0, gran 1
current process = 0 (zio_write_issue_hig)
processor eflags = fault virtual address = 0xff8700aa22ac
interrupt enabled, fault code = supervisor read data, page not present
resume, IOPL = 0
trap number = 12
instruction pointer = 0x20:0x8195ffa4
current process = 0 (zio_write_issue_hig)
panic: page fault
cpuid = 4
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffcf95138b30
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf95138bf0
panic() at panic+0x1ce/frame 0xffcf95138cf0
trap_fatal() at trap_fatal+0x290/frame 0xffcf95138d50
trap_pfault() at trap_pfault+0x211/frame 0xffcf95138de0
trap() at trap+0x344/frame 0xffcf95138fe0
calltrap() at calltrap+0x8/frame 0xffcf95138fe0
--- trap 0xc, rip = 0x8195ff47, rsp = 0xffcf951390a0, rbp =
0xffcf951398f0 ---
lzjb_compress() at lzjb_compress+0xa7/frame 0xffcf951398f0
zio_compress_data() at zio_compress_data+0x92/frame 0xffcf95139920
zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf95139970
zio_execute() at zio_execute+0xc3/frame 0xffcf951399b0
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf95139a00
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame
0xffcf95139a20
fork_exit() at fork_exit+0x11f/frame 0xffcf95139a70
fork_trampoline() at fork_trampoline+0xe/frame 0xffcf95139a70
--- trap 0, rip = 0, rsp = 0xffcf95139b30, rbp = 0 ---


0x51f47 is in lzjb_compress
(/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/lzjb.c:74).
69 }
70 if (src  (uchar_t *)s_start + s_len - MATCH_MAX) {
71 *dst++ = *src++;
72 continue;
73 }
74 hash = (src[0]  16) + (src[1]  8) + src[2];
75 hash += hash  9;
76 hash += hash  5;
77 hp = lempel[hash  (LEMPEL_SIZE - 1)];
78 offset = (intptr_t)(src - *hp)  OFFSET_MASK;

dmesg output is at http://pastebin.com/U34fwJ5f
kernel config is at http://pastebin.com/c9HKfcsz
I can provide more information if useful.
Thanks


On Fri, Jul 19, 2013 at 6:52 AM, Volodymyr Kostyrko c.kw...@gmail.comwrote:

 19.07.2013 07:04, olivier wrote:

 Hi,
 Running 9.2-PRERELEASE #19 r253313 I got the following panic

 Fatal trap 12: page fault while in kernel mode
 cpuid = 22; apic id = 46
 fault virtual address   = 0xff827ebca30c
 fault code  = supervisor read data, page not present
 instruction pointer = 0x20:0x81983055
 stack pointer   = 0x28:0xffcf75bd60a0
 frame pointer   = 0x28:0xffcf75bd68f0
 code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 0 (zio_write_issue_hig)
 trap number = 12
 panic: page fault
 cpuid = 22
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/**frame
 0xffcf75bd5b30
 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf75bd5bf0
 panic() at panic+0x1ce/frame 0xffcf75bd5cf0
 trap_fatal() at trap_fatal+0x290/frame 0xffcf75bd5d50
 trap_pfault() at trap_pfault+0x211/frame 0xffcf75bd5de0
 trap() at trap+0x344/frame 0xffcf75bd5fe0
 calltrap() at calltrap+0x8/frame 0xffcf75bd5fe0
 --- trap 0xc, rip = 0x81983055, rsp = 0xffcf75bd60a0, rbp =
 0xffcf75bd68f0 ---
 lzjb_compress() at lzjb_compress+0x185/frame 0xffcf75bd68f0
 zio_compress_data() at zio_compress_data+0x92/frame 

Re: 9.2PRERELEASE ZFS panic in lzjb_compress

2013-09-20 Thread olivier
One last piece of information I just got: the problem is not specific to
LZJB compression. I switched to LZ4 and get the same sort of panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 8; apic id = 28
fault virtual address = 0xff8581c48000
fault code = supervisor read data, page not present
instruction pointer = 0x20:0x8195f6d1
stack pointer= 0x28:0xffcf950ee850
frame pointer= 0x28:0xffcf950ee8f0
code segment = base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (zio_write_issue_hig)
trap number = 12
panic: page fault
cpuid = 8
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffcf950ee2e0
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf950ee3a0
panic() at panic+0x1ce/frame 0xffcf950ee4a0
trap_fatal() at trap_fatal+0x290/frame 0xffcf950ee500
trap_pfault() at trap_pfault+0x211/frame 0xffcf950ee590
trap() at trap+0x344/frame 0xffcf950ee790
calltrap() at calltrap+0x8/frame 0xffcf950ee790
--- trap 0xc, rip = 0x8195f6d1, rsp = 0xffcf950ee850, rbp =
0xffcf950ee8f0 ---
lz4_compress() at lz4_compress+0x81/frame 0xffcf950ee8f0
zio_compress_data() at zio_compress_data+0x92/frame 0xffcf950ee920
zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf950ee970
zio_execute() at zio_execute+0xc3/frame 0xffcf950ee9b0
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf950eea00
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame
0xffcf950eea20
fork_exit() at fork_exit+0x11f/frame 0xffcf950eea70
fork_trampoline() at fork_trampoline+0xe/frame 0xffcf950eea70
--- trap 0, rip = 0, rsp = 0xffcf950eeb30, rbp = 0 ---

(I am now trying without any compression.)


On Fri, Sep 20, 2013 at 11:25 AM, olivier olivier77...@gmail.com wrote:

 Got another, very similar panic again on recent 9-STABLE (r255602); I
 assume the latest 9.2 release candidate is affected too. Anybody have any
 idea of what could be causing this, and of a workaround other than turning
 compression off?
 Unlike the last panic I reported, this one did not occur during a zfs
 send/receive operation. There were just a number of processes potentially
 writing to disk at the same time.
 All hardware is healthy as far as I can tell (memory is ECC and no errors
 in logs; zpool status and smartctl show no problems).

 Fatal trap 12: page fault while in kernel mode


 cpuid = 4; apic id = 24
 cpuid = 51; apic id = 83
 fault virtual address = 0xff8700a9cc65
 fault virtual address = 0xff8700ab0ea9
 fault code = supervisor read data, page not present

 instruction pointer = 0x20:0x8195ff47
 fault code = supervisor read data, page not present
 stack pointer= 0x28:0xffcf951390a0
 Fatal trap 12: page fault while in kernel mode
 frame pointer= 0x28:0xffcf951398f0
 Fatal trap 12: page fault while in kernel mode
 code segment = base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 instruction pointer = 0x20:0x8195ffa4
 stack pointer= 0x28:0xffcf951250a0
 processor eflags = frame pointer= 0x28:0xffcf951258f0
 interrupt enabled, code segment = base 0x0, limit 0xf, type 0x1b

 resume, IOPL = 0
 cpuid = 28; apic id = 4c
 Fatal trap 12: page fault while in kernel mode
  = DPL 0, pres 1, long 1, def32 0, gran 1
 current process = 0 (zio_write_issue_hig)
 processor eflags = fault virtual address = 0xff8700aa22ac
 interrupt enabled, fault code = supervisor read data, page not present
 resume, IOPL = 0
 trap number = 12
 instruction pointer = 0x20:0x8195ffa4
 current process = 0 (zio_write_issue_hig)
 panic: page fault
 cpuid = 4
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
 0xffcf95138b30
 kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf95138bf0
 panic() at panic+0x1ce/frame 0xffcf95138cf0
 trap_fatal() at trap_fatal+0x290/frame 0xffcf95138d50
 trap_pfault() at trap_pfault+0x211/frame 0xffcf95138de0
 trap() at trap+0x344/frame 0xffcf95138fe0
 calltrap() at calltrap+0x8/frame 0xffcf95138fe0
 --- trap 0xc, rip = 0x8195ff47, rsp = 0xffcf951390a0, rbp =
 0xffcf951398f0 ---
 lzjb_compress() at lzjb_compress+0xa7/frame 0xffcf951398f0
 zio_compress_data() at zio_compress_data+0x92/frame 0xffcf95139920
 zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf95139970
 zio_execute() at zio_execute+0xc3/frame 0xffcf951399b0
 taskqueue_run_locked() at taskqueue_run_locked+0x74/frame
 0xffcf95139a00
 taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame
 0xffcf95139a20
 fork_exit() at fork_exit+0x11f/frame 0xffcf95139a70
 fork_trampoline() at fork_trampoline+0xe/frame 0xffcf95139a70
 --- trap 0, rip = 0, rsp = 0xffcf95139b30, rbp = 0 ---


 0x51f47 is in lzjb_compress
 

Re: 9.2PRERELEASE ZFS panic in lzjb_compress

2013-07-19 Thread Volodymyr Kostyrko

19.07.2013 07:04, olivier wrote:

Hi,
Running 9.2-PRERELEASE #19 r253313 I got the following panic

Fatal trap 12: page fault while in kernel mode
cpuid = 22; apic id = 46
fault virtual address   = 0xff827ebca30c
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x81983055
stack pointer   = 0x28:0xffcf75bd60a0
frame pointer   = 0x28:0xffcf75bd68f0
code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (zio_write_issue_hig)
trap number = 12
panic: page fault
cpuid = 22
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffcf75bd5b30
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf75bd5bf0
panic() at panic+0x1ce/frame 0xffcf75bd5cf0
trap_fatal() at trap_fatal+0x290/frame 0xffcf75bd5d50
trap_pfault() at trap_pfault+0x211/frame 0xffcf75bd5de0
trap() at trap+0x344/frame 0xffcf75bd5fe0
calltrap() at calltrap+0x8/frame 0xffcf75bd5fe0
--- trap 0xc, rip = 0x81983055, rsp = 0xffcf75bd60a0, rbp =
0xffcf75bd68f0 ---
lzjb_compress() at lzjb_compress+0x185/frame 0xffcf75bd68f0
zio_compress_data() at zio_compress_data+0x92/frame 0xffcf75bd6920
zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf75bd6970
zio_execute() at zio_execute+0xc3/frame 0xffcf75bd69b0
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf75bd6a00
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame
0xffcf75bd6a20
fork_exit() at fork_exit+0x11f/frame 0xffcf75bd6a70
fork_trampoline() at fork_trampoline+0xe/frame 0xffcf75bd6a70
--- trap 0, rip = 0, rsp = 0xffcf75bd6b30, rbp = 0 ---

lzjb_compress+0x185 corresponds to line 85 in
80 cpy = src - offset;
81 if (cpy = (uchar_t *)s_start  cpy != src 
82src[0] == cpy[0]  src[1] == cpy[1]  src[2] == cpy[2]) {
83 *copymap |= copymask;
84 for (mlen = MATCH_MIN; mlen  MATCH_MAX; mlen++)
85 if (src[mlen] != cpy[mlen])
86 break;
87 *dst++ = ((mlen - MATCH_MIN)  (NBBY - MATCH_BITS)) |
88(offset  NBBY);
89 *dst++ = (uchar_t)offset;

I think it's the first time I've seen this panic. It happened while doing a
send/receive. I have two pools with lzjb compression; I don't know which of
these pools caused the problem, but one of them was the source of the
send/receive.

I only have a textdump but I'm happy to try to provide more information
that could help anyone look into this.
Thanks
Olivier


Oh, I can add to this one. I have a full core dump of the same problem 
caused by copying large set of files from lzjb compressed pool to lz4 
compressed pool. vfs.zfs.recover was set.


#1  0x8039d954 in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:449
#2  0x8039ddce in panic (fmt=value optimized out)
at /usr/src/sys/kern/kern_shutdown.c:637
#3  0x80620a6a in trap_fatal (frame=value optimized out,
eva=value optimized out) at /usr/src/sys/amd64/amd64/trap.c:879
#4  0x80620d25 in trap_pfault (frame=0x0, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:700
#5  0x806204f6 in trap (frame=0xff821ca43600)
at /usr/src/sys/amd64/amd64/trap.c:463
#6  0x8060a032 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:232
#7  0x805a9367 in vm_page_alloc (object=0x80a34030,
pindex=16633, req=97) at /usr/src/sys/vm/vm_page.c:1445
#8  0x8059c42e in kmem_back (map=0xfe0001e8,
addr=18446743524021862400, size=16384, flags=value optimized out)
at /usr/src/sys/vm/vm_kern.c:362
#9  0x8059c2ac in kmem_malloc (map=0xfe0001e8, size=16384,
flags=257) at /usr/src/sys/vm/vm_kern.c:313
#10 0x80595104 in uma_large_malloc (size=value optimized out,
wait=257) at /usr/src/sys/vm/uma_core.c:994
#11 0x80386b80 in malloc (size=16384, mtp=0x80ea7c40, 
flags=0)

at /usr/src/sys/kern/kern_malloc.c:492
#12 0x80c9e13c in lz4_compress (s_start=0xff80d0b19000,
d_start=0xff8159445000, s_len=131072, d_len=114688, n=-2)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/lz4.c:843

#13 0x80cdde25 in zio_compress_data (c=value optimized out,
src=value optimized out, dst=0xff8159445000, s_len=131072)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio_compress.c:109

#14 0x80cda012 in zio_write_bp_init (zio=0xfe0143a12000)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1107

#15 0x80cd8ec6 in zio_execute (zio=0xfe0143a12000)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1305

#16 0x803e25e6 in taskqueue_run_locked (queue=0xfe00060ca300)
at /usr/src/sys/kern/subr_taskqueue.c:312

9.2PRERELEASE ZFS panic in lzjb_compress

2013-07-18 Thread olivier
Hi,
Running 9.2-PRERELEASE #19 r253313 I got the following panic

Fatal trap 12: page fault while in kernel mode
cpuid = 22; apic id = 46
fault virtual address   = 0xff827ebca30c
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x81983055
stack pointer   = 0x28:0xffcf75bd60a0
frame pointer   = 0x28:0xffcf75bd68f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (zio_write_issue_hig)
trap number = 12
panic: page fault
cpuid = 22
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
0xffcf75bd5b30
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffcf75bd5bf0
panic() at panic+0x1ce/frame 0xffcf75bd5cf0
trap_fatal() at trap_fatal+0x290/frame 0xffcf75bd5d50
trap_pfault() at trap_pfault+0x211/frame 0xffcf75bd5de0
trap() at trap+0x344/frame 0xffcf75bd5fe0
calltrap() at calltrap+0x8/frame 0xffcf75bd5fe0
--- trap 0xc, rip = 0x81983055, rsp = 0xffcf75bd60a0, rbp =
0xffcf75bd68f0 ---
lzjb_compress() at lzjb_compress+0x185/frame 0xffcf75bd68f0
zio_compress_data() at zio_compress_data+0x92/frame 0xffcf75bd6920
zio_write_bp_init() at zio_write_bp_init+0x24b/frame 0xffcf75bd6970
zio_execute() at zio_execute+0xc3/frame 0xffcf75bd69b0
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xffcf75bd6a00
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame
0xffcf75bd6a20
fork_exit() at fork_exit+0x11f/frame 0xffcf75bd6a70
fork_trampoline() at fork_trampoline+0xe/frame 0xffcf75bd6a70
--- trap 0, rip = 0, rsp = 0xffcf75bd6b30, rbp = 0 ---

lzjb_compress+0x185 corresponds to line 85 in
80 cpy = src - offset;
81 if (cpy = (uchar_t *)s_start  cpy != src 
82src[0] == cpy[0]  src[1] == cpy[1]  src[2] == cpy[2]) {
83 *copymap |= copymask;
84 for (mlen = MATCH_MIN; mlen  MATCH_MAX; mlen++)
85 if (src[mlen] != cpy[mlen])
86 break;
87 *dst++ = ((mlen - MATCH_MIN)  (NBBY - MATCH_BITS)) |
88(offset  NBBY);
89 *dst++ = (uchar_t)offset;

I think it's the first time I've seen this panic. It happened while doing a
send/receive. I have two pools with lzjb compression; I don't know which of
these pools caused the problem, but one of them was the source of the
send/receive.

I only have a textdump but I'm happy to try to provide more information
that could help anyone look into this.
Thanks
Olivier
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-02 Thread Andriy Gapon
on 01/07/2013 21:50 Jeremy Chadwick said the following:
 The issue is that ZFS on FreeBSD is still young compared to other
 filesystems (specifically UFS).

That's a fact.

 Nothing is perfect, but FFS/UFS tends
 to have a significantly larger number of bugs worked out of it to the
 point where people can use it without losing sleep (barring the SUJ
 stuff, don't get me started).

That's subjective.

 I have the same concerns over other
 things, like ext2fs and fusefs for that matter -- but this thread is
 about a ZFS-related crash, and that's why I'm over-focused on it.

I have an impression that you seem to state your (negative) opinion of ZFS in
every other thread about ZFS problems.

 A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only),
 results in a system where an admin can upgrade + boot into single-user
 and perform some tasks to test/troubleshoot; if the ZFS layer is
 broken, it doesn't mean an essentially useless box.  That isn't FUD,
 that's just the stage we're at right now.  I'm aware lots of people have
 working ZFS-exclusive setups; like I said, works great until it
 doesn't.

Yeah, a heterogeneous setup can have its benefits, but it can have its drawbacks
too.  This is true for heterogeneous vs monoculture in general.
But the sword cuts both ways: what if something is broken in UFS layer or god
forbid in VFS layer and you have only UFS?
Besides, without mentioning specific classes of problems ZFS layer is broken
is too vague.

 So, how do you kernel guys debug a problem in this environment:
 
 - ZFS-only
 - Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt
   with added debugging features, etc.)
 - No swap configured
 - No serial console

I use boot environments and boot to a previous / known-good environment if I hit
a loader bug, a kernel bug or a major userland problem in a new environment.
I also use a mirrored setup and keep two copies of earlier boot chains.
I am also not shy of live media in the case everything else fails.

Now I wonder how you deal with the same kind of UFS-only environment.
-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-02 Thread Jeremy Chadwick
On Tue, Jul 02, 2013 at 08:59:56AM +0300, Andriy Gapon wrote:
 on 01/07/2013 21:50 Jeremy Chadwick said the following:
  The issue is that ZFS on FreeBSD is still young compared to other
  filesystems (specifically UFS).
 
 That's a fact.
 
  Nothing is perfect, but FFS/UFS tends
  to have a significantly larger number of bugs worked out of it to the
  point where people can use it without losing sleep (barring the SUJ
  stuff, don't get me started).
 
 That's subjective.
 
  I have the same concerns over other
  things, like ext2fs and fusefs for that matter -- but this thread is
  about a ZFS-related crash, and that's why I'm over-focused on it.
 
 I have an impression that you seem to state your (negative) opinion of ZFS in
 every other thread about ZFS problems.

The OP in question ended his post with the line Thoughts?, and I have
given those thoughts.  My thoughts/opinions/experience may differ from
that of others.  Diversity of thoughts/opinions/experiences is good.
I'm not some kind of authoritative ZFS guru -- far from it.  If I
misunderstood what Thoughts? meant/implied, then draw and quarter me
for it; my actions/words = my responsibility.

I do not feel I have a negative opinion of ZFS.  I still use it today
on FreeBSD, donated money to Pawel when the project was originally
announced (because I wanted to see something new and useful thrive on
FreeBSD), and try my best to assist with issues pertaining to it where
applicable.  These are not the actions of someone with a negative
opinion, these are the actions of someone who is supportive while
simultaneously very cautious.

Is ZFS better today than it was when it was introduced?  By a long shot.
For example, on my stable/9 system here I don't tune /boot/loader.conf
any longer.  But that doesn't change my viewpoint when it comes to using
ZFS exclusively on a FreeBSD box.

  A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only),
  results in a system where an admin can upgrade + boot into single-user
  and perform some tasks to test/troubleshoot; if the ZFS layer is
  broken, it doesn't mean an essentially useless box.  That isn't FUD,
  that's just the stage we're at right now.  I'm aware lots of people have
  working ZFS-exclusive setups; like I said, works great until it
  doesn't.
 
 Yeah, a heterogeneous setup can have its benefits, but it can have its 
 drawbacks
 too.  This is true for heterogeneous vs monoculture in general.
 But the sword cuts both ways: what if something is broken in UFS layer or 
 god
 forbid in VFS layer and you have only UFS?
 Besides, without mentioning specific classes of problems ZFS layer is broken
 is too vague.

The likelihood of something being broken in UFS is significantly lower
given its established history.  I have to go off of experience, both
personal and professional -- in my years of dealing with FreeBSD
(1997-present), I have only encountered issues with UFS a few times (I
can count them on one, maybe two hands), and I'm choosing to exclude
SU+J from the picture for what should be obvious reasons.  With ZFS,
well... just look at the mailing lists and PR count.  I don't want to be
a jerk about it, but you really have to look at the quantity.  It
doesn't mean ZFS is crap, it just means that for me, I don't think
we're quite there yet.

And I will gladly admit -- because you are the one who taught me this --
that every incident need be treated unique.  But one can't deny that a
substantial percentage (I would say majority) of -fs and -stable posts
relate somehow to ZFS; I'm often thrilled when it turns out to be
something else.

Playing a strange devil's advocate, let me give you an interesting
example: softupdates.  When SU was introduced to FreeBSD back in the
late 90s, there were issues and concerns -- lots.  As such, SU was
chosen to be disabled by default on root filesystems given the
importance of that filesystem (re: we do not want to risk losing as
much data in the case of a crash -- see the official FAQ, section 8.3).
All other filesystems defaulted to SU enabled.  It's been like that up
until 9.x where it now defaults to enabled.  So that's what, 15 years?

You could say that my example could also apply to ZFS, i.e. the reports
are a part of its growth and maturity, and I'd agree.  But I don't feel
it's reached the point where I'm willing to risk going ZFS-only.  Down
the road, sure, but not now.  That's just my take on it.

Please make sure to also consider, politely, that a lot of people who
have issues with ZFS have not been subscribed to the lists for long
periods of time.  They sign up/post when they have a problem.  Meaning:
they do not necessarily know of the history.  If they did, I (again
politely) believe they're likely to use a UFS+ZFS mix, or maybe a
gmirror+UFS+ZFS mix (though the GPT/gmirror thing is... never mind...).

  So, how do you kernel guys debug a problem in this environment:
  
  - ZFS-only
  - Running -RELEASE (i.e. no source, thus a kernel cannot be 

Re: ZFS Panic after freebsd-update

2013-07-02 Thread Greg Byshenk
On Tue, Jul 02, 2013 at 12:57:16AM -0700, Jeremy Chadwick wrote:
 
 But in the OP's case, the situation sounds dire given the limitations --
 limitations that someone (apparently not him) chose, which greatly
 hinder debugging/troubleshooting.  Had a heterogeneous setup been
 chosen, the debugging/troubleshooting pains are less (IMO).  When I see
 this, it makes me step back and ponder the decisions that lead to the
 ZFS-only setup.

As an observer (though one who has used ZFS for some time, now),
I might suggest that this can at least -seem- like FUD about ZFS
because the limitations don't necessarily have anything to do
with ZFS. That is, a situation in which one cannot recover, nor
even effectively troubleshoot, if there is a problem, will be a
dire one, regardless of what the problem might be or where its
source might lie.

-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL - Portland, OR USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS Panic after freebsd-update

2013-07-01 Thread Scott Sipe
Hello,

I have not had much time to research this problem yet, so please let me
know what further information I might be able to provide.

This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
using freebsd-update. After I rebooted to test the new kernel, I got a
panic. I had to take a picture of the screen. Here's a condensed version:

panic: page fault
cpuid = 1
KDB: stack backtrace:
#0
#1
#2
#3
#4
#5
#6
#6
#6
#6
#6
#6
FreeBSD xeon.cap-press.com 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue
Sep 27 18:45:57 UTC 2011
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
 amd64
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS Panic after freebsd-update

2013-07-01 Thread Scott Sipe
*** Sorry for partial first message! (gmail sent after multiple returns
apparently?) ***

Hello,

I have not had much time to research this problem yet, so please let me
know what further information I might be able to provide.

This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
using freebsd-update. After I rebooted to test the new kernel, I got a
panic. I had to take a picture of the screen. Here's a condensed version:

panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 kdb_backtrace
#1 panic
#2 trap_fatal
#3 trap_pfault
#4 trap
#5 calltrap
#6 vdev_mirror_child_select
#7 ved_mirror_io_start
#8 zio_vdev_io_start
#9 zio_execute
#10 arc_read
#11 dbuf_read
#12 dbuf_findbp
#13 dbuf_hold_impl
#14 dbuf_hold
#15 dnode_hold_impl
#16 dnu_buf_hold
#17 zap_lockdir
Uptime: 5s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort

uname -a from before (and after) the reboot:

FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57
UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
 amd64

dmesg is attached.

I was able to reboot to the old kernel and am up and running back on 8.2
right now.

Any thoughts?

Thanks,
Scott
Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57 UTC 2011
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(R) CPU   E5520  @ 2.27GHz (2266.76-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0x106a5  Family = 6  Model = 1a  Stepping = 5
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  
Features2=0x9ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT
  AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant
real memory  = 18253611008 (17408 MB)
avail memory = 16513347584 (15748 MB)
ACPI APIC Table: 031710 APIC1617
FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
 cpu8 (AP): APIC ID: 16
 cpu9 (AP): APIC ID: 17
 cpu10 (AP): APIC ID: 18
 cpu11 (AP): APIC ID: 19
 cpu12 (AP): APIC ID: 20
 cpu13 (AP): APIC ID: 21
 cpu14 (AP): APIC ID: 22
 cpu15 (AP): APIC ID: 23
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
kbd1 at kbdmux0
acpi0: 031710 XSDT1617 on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, bff0 (3) failed
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
cpu0: ACPI CPU on acpi0
ACPI Warning: Incorrect checksum in table [OEMB] - 0xAD, should be 0xAA 
(20101013/tbutils-354)
cpu1: ACPI CPU on acpi0
cpu2: ACPI CPU on acpi0
cpu3: ACPI CPU on acpi0
cpu4: ACPI CPU on acpi0
cpu5: ACPI CPU on acpi0
cpu6: ACPI CPU on acpi0
cpu7: ACPI CPU on acpi0
cpu8: ACPI CPU on acpi0
cpu9: ACPI CPU on acpi0
cpu10: ACPI CPU on acpi0
cpu11: ACPI CPU on acpi0
cpu12: ACPI CPU on acpi0
cpu13: ACPI CPU on acpi0
cpu14: ACPI CPU on acpi0
cpu15: ACPI CPU on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge at device 1.0 on pci0
pci10: ACPI PCI bus on pcib1
pcib2: ACPI PCI-PCI bridge at device 3.0 on pci0
pci9: ACPI PCI bus on pcib2
pcib3: ACPI PCI-PCI bridge at device 7.0 on pci0
pci8: ACPI PCI bus on pcib3
pcib4: PCI-PCI bridge at device 8.0 on pci0
pci7: PCI bus on pcib4
pcib5: PCI-PCI bridge at device 9.0 on pci0
pci6: PCI bus on pcib5
pcib6: PCI-PCI bridge at device 10.0 on pci0
pci5: PCI bus on pcib6
pci0: base peripheral, interrupt controller at device 20.0 (no driver 
attached)
pci0: base peripheral, interrupt controller at device 20.1 (no driver 
attached)
pci0: base peripheral, interrupt controller at device 20.2 (no driver 
attached)
pci0: base peripheral, interrupt controller at device 20.3 (no driver 
attached)
pci0: base peripheral at device 22.0 (no driver attached)
pci0: base peripheral at device 22.1 (no driver attached)
pci0: base peripheral at device 22.2 (no driver attached)
pci0: base peripheral at device 22.3 (no driver attached)
pci0: base peripheral at device 22.4 (no driver attached)
pci0: base peripheral at device 22.5 (no driver attached)
pci0: base peripheral at device 22.6 (no driver attached)
pci0: base peripheral at device 22.7 (no driver attached)
uhci0: 

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick
On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote:
 *** Sorry for partial first message! (gmail sent after multiple returns
 apparently?) ***
 
 Hello,
 
 I have not had much time to research this problem yet, so please let me
 know what further information I might be able to provide.
 
 This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
 using freebsd-update. After I rebooted to test the new kernel, I got a
 panic. I had to take a picture of the screen. Here's a condensed version:
 
 panic: page fault
 cpuid = 1
 KDB: stack backtrace:
 #0 kdb_backtrace
 #1 panic
 #2 trap_fatal
 #3 trap_pfault
 #4 trap
 #5 calltrap
 #6 vdev_mirror_child_select
 #7 ved_mirror_io_start
 #8 zio_vdev_io_start
 #9 zio_execute
 #10 arc_read
 #11 dbuf_read
 #12 dbuf_findbp
 #13 dbuf_hold_impl
 #14 dbuf_hold
 #15 dnode_hold_impl
 #16 dnu_buf_hold
 #17 zap_lockdir
 Uptime: 5s
 Cannot dump. Device not defined or unavailable.
 Automatic reboot in 15 seconds - press a key on the console to abort
 
 uname -a from before (and after) the reboot:
 
 FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57
 UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
  amd64
 
 dmesg is attached.
 
 I was able to reboot to the old kernel and am up and running back on 8.2
 right now.
 
 Any thoughts?

Thoughts:

- All I see is an amd64 system with 16GB RAM and 4 disks driven by an ICH10
  in AHCI mode.

- Output from: zpool status

- Output from: zpool get all

- Output from: zfs get all

- Output from: gpart show -p for every disk on the system

- Output from: cat /etc/sysctl.conf

- Output from: cat /boot/loader.conf

- Is there a reason you do not have dumpdev defined in /etc/rc.conf (or
  alternately, no swap device defined in /etc/fstab (which will get
  used/honoured by the dumpdev=auto (the default)) ?  Taking photos of
  the console and manually typing backtraces in is borderline worthless.
  Of course when I see lines like this:

  Trying to mount root from zfs:zroot

  ...this greatly diminishes any chances of live debugging on the
  system.  It amazes me how often I see this come up on the lists -- people
  who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
  that behaviour would stop, as it makes debugging ZFS a serious PITA.
  This comes up on the list almost constantly, sad panda.

- Get yourself stable/9 and try that:
  https://pub.allbsd.org/FreeBSD-snapshots/

- freebsd-fs is a better place for this discussion, especially since
  you're running a -RELEASE build, not a -STABLE build.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick
On Mon, Jul 01, 2013 at 08:49:25AM -0700, Jeremy Chadwick wrote:
 - Is there a reason you do not have dumpdev defined in /etc/rc.conf (or
   alternately, no swap device defined in /etc/fstab (which will get
   used/honoured by the dumpdev=auto (the default)) ?

This should have read or alternately, ***A*** swap device defined in
/etc/fstab ...

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Steven Hartland


- Original Message - 
From: Jeremy Chadwick j...@koitsu.org

To: Scott Sipe csco...@gmail.com
Cc: freebsd-stable List freebsd-stable@freebsd.org
Sent: Monday, July 01, 2013 4:49 PM
Subject: Re: ZFS Panic after freebsd-update



On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote:

*** Sorry for partial first message! (gmail sent after multiple returns
apparently?) ***

Hello,

I have not had much time to research this problem yet, so please let me
know what further information I might be able to provide.

This weekend I attempted to upgrade a computer from 8.2-RELEASE-p3 to 8.4
using freebsd-update. After I rebooted to test the new kernel, I got a
panic. I had to take a picture of the screen. Here's a condensed version:

panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 kdb_backtrace
#1 panic
#2 trap_fatal
#3 trap_pfault
#4 trap
#5 calltrap
#6 vdev_mirror_child_select
#7 ved_mirror_io_start
#8 zio_vdev_io_start
#9 zio_execute
#10 arc_read
#11 dbuf_read
#12 dbuf_findbp
#13 dbuf_hold_impl
#14 dbuf_hold
#15 dnode_hold_impl
#16 dnu_buf_hold
#17 zap_lockdir
Uptime: 5s
Cannot dump. Device not defined or unavailable.
Automatic reboot in 15 seconds - press a key on the console to abort

uname -a from before (and after) the reboot:

FreeBSD xeon 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:45:57
UTC 2011 r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
 amd64

dmesg is attached.

I was able to reboot to the old kernel and am up and running back on 8.2
right now.

Any thoughts?


This says your running a 8.2-RELEASE-p3 kernel not an 8.4-RELEASE kernel.

Did the upgrade fail or is that dmesg / uname from your old kernel?

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Paul Mather
On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote:

 On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote:
 *** Sorry for partial first message! (gmail sent after multiple returns
 apparently?) ***
 
 Hello,
 
 I have not had much time to research this problem yet, so please let me
 know what further information I might be able to provide.
 [[...]]
 Any thoughts?
 
 Thoughts:
 
 [[..]]
 Of course when I see lines like this:
 
  Trying to mount root from zfs:zroot
 
  ...this greatly diminishes any chances of live debugging on the
  system.  It amazes me how often I see this come up on the lists -- people
  who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
  that behaviour would stop, as it makes debugging ZFS a serious PITA.
  This comes up on the list almost constantly, sad panda.


I'm not sure why it amazes you that people are making widespread use of ZFS.  
You could make the same argument that people shouldn't use UFS2 journaling on 
their file systems because bugs in the implementation might make debugging 
journaled UFS2 file systems a serious PITA.  The point is that there are VERY 
compelling reasons why people might want to use ZFS for root/var/tmp/usr/etc. 
(pooled storage; easy snapshots; etc.) and there should come a time when a 
given file system is generally regarded as safe.  I'd say the time for ZFS 
came when they removed the big disclaimer from the boot messages.  If ZFS is 
dangerous, they should reinstate the not ready for production warning.  Until 
they do, I think it's unfair to castigate people for using ZFS universally.

Isn't it a recurring theme on freebsd-current and freebsd-stable that more 
people need to use features so they can be debugged in realistic environments?  
If you're telling them, don't use that because it makes debugging harder, how 
are they supposed to get debugged and hence improved? :-)

Cheers,

Paul.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick
On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote:
 On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote:
 
  On Mon, Jul 01, 2013 at 11:35:30AM -0400, Scott Sipe wrote:
  *** Sorry for partial first message! (gmail sent after multiple returns
  apparently?) ***
  
  Hello,
  
  I have not had much time to research this problem yet, so please let me
  know what further information I might be able to provide.
  [[...]]
  Any thoughts?
  
  Thoughts:
  
  [[..]]
  Of course when I see lines like this:
  
   Trying to mount root from zfs:zroot
  
   ...this greatly diminishes any chances of live debugging on the
   system.  It amazes me how often I see this come up on the lists -- people
   who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
   that behaviour would stop, as it makes debugging ZFS a serious PITA.
   This comes up on the list almost constantly, sad panda.
 
 
 I'm not sure why it amazes you that people are making widespread use of ZFS.

It's not widespread use of ZFS.  It's widespread use of ZFS as their
sole filesystem (specifically root/var/tmp/usr, or more specifically
just root/usr).  People are operating with the belief that ZFS just
works, when reality shows it works until it doesn't.  The mentality
seems to be it's so rock solid it'll never break along with it can't
happen to me.  I tend to err on the side of caution, hence avoidance of
ZFS for critical things like the aforementioned.

It's different if you have a UFS root/var/tmp/usr and ZFS for everything
else.  You then have a system you can boot/use without issue even if ZFS
is crapping the bed.

 You could make the same argument that people shouldn't use UFS2
 journaling on their file systems because bugs in the implementation
 might make debugging journaled UFS2 file systems a serious PITA.

Yup, and I do make that argument, quite regularly at that.  There is
even some evidence at this point in time that softupdates are broken:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-June/017424.html

 The point is that there are VERY compelling reasons why people might
 want to use ZFS for root/var/tmp/usr/etc. (pooled storage; easy
 snapshots; etc.) and there should come a time when a given file system
 is generally regarded as safe.

While there may be compelling reasons, those reasons quickly get shot
down when they realise they have a system they can't easily do
troubleshooting with when the issue is with ZFS.

 I'd say the time for ZFS came when they removed the big disclaimer
 from the boot messages.  If ZFS is dangerous, they should reinstate
 the not ready for production warning.  Until they do, I think it's
 unfair to castigate people for using ZFS universally.

The warning meant absolutely nothing at the time (it did not keep people
away from it), and would mean nothing now if brought back.  A single
kernel printf() is not the right choice of action.

Are we better off today than we were when ZFS was originally ported
over?  Yes, by far.  Lots of improvements, in many great/good ways.  No
argument there.  But there is no way I'd risk putting my root filesystem
(or other key filesystems) on it -- still too new, still too many bugs,
and users don't know about those problems until it's too late.

 Isn't it a recurring theme on freebsd-current and freebsd-stable that
 more people need to use features so they can be debugged in realistic
 environments?  If you're telling them, don't use that because it
 makes debugging harder, how are they supposed to get debugged and
 hence improved? :-)

95% of FreeBSD users cannot debug kernel problems**.  To debug a kernel
problem, you need: a crash dump, a usable system with the exact
kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and
boot into 8.2 and reliably debug it using that), and (most important of
all) a developer who is familiar with kernel debugging *and* familiar
with the bits which are crashing.  Those who say what you're quoting are
often the latter.

Part of the need people to try this process you refer to is what
stable/X is about, *without* the extra chaos of head.  I'm one of those
who for the past 15 years has advocated stable/X usage for a lot of
reasons; I'll save the diatribe for some other time.

But the OP is running -RELEASE, and chooses to run that, along with use
of freebsd-update for binary updates.  Their choices are limited: stick
with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely.

But even stable/X doesn't provide enough coverage at times (the recent
fxp(4)/dhclient issue is proof of that).  It's just too bad so many
people have this broken mindset of what stability means on FreeBSD.

** = This number is probably more like 99%, especially when you consider
what FreeNAS is catering to/trying to accomplish.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others 

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Scott Sipe
On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick j...@koitsu.org wrote:

 On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote:
  On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote:
 
   Of course when I see lines like this:
  
Trying to mount root from zfs:zroot
  
...this greatly diminishes any chances of live debugging on the
system.  It amazes me how often I see this come up on the lists --
 people
who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
that behaviour would stop, as it makes debugging ZFS a serious PITA.
This comes up on the list almost constantly, sad panda.
 
 
  I'm not sure why it amazes you that people are making widespread use of
 ZFS.

 It's not widespread use of ZFS.  It's widespread use of ZFS as their
 sole filesystem (specifically root/var/tmp/usr, or more specifically
 just root/usr).  People are operating with the belief that ZFS just
 works, when reality shows it works until it doesn't.  The mentality
 seems to be it's so rock solid it'll never break along with it can't
 happen to me.  I tend to err on the side of caution, hence avoidance of
 ZFS for critical things like the aforementioned.

 It's different if you have a UFS root/var/tmp/usr and ZFS for everything
 else.  You then have a system you can boot/use without issue even if ZFS
 is crapping the bed.



 ...



 95% of FreeBSD users cannot debug kernel problems**.  To debug a kernel
 problem, you need: a crash dump, a usable system with the exact
 kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and
 boot into 8.2 and reliably debug it using that), and (most important of
 all) a developer who is familiar with kernel debugging *and* familiar
 with the bits which are crashing.  Those who say what you're quoting are
 often the latter.



 ...



 But the OP is running -RELEASE, and chooses to run that, along with use
 of freebsd-update for binary updates.  Their choices are limited: stick
 with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely.


So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I
ultimately wasn't sure where the right place to go for discuss 8.4 is?
Beyond the FS mailing list, was there a better place for my question? I'll
provide the other requested information (zfs outputs, etc) to wherever
would be best.

This is a production machine (has been since late 2010) and after tweaking
some ZFS settings initially has been totally stable. I wasn't incredibly
closely involved in the initial configuration, but I've done at least one
binary freebsd-update previously.

Before this computer I had always done source upgrades. ZFS (and the
thought of a panic like the one I saw this weekend!) made me leery of doing
that. We're a small business--we have this server, an offsite backup
server, and a firewall box. I understand that issues like this are are
going to happen when I don't have a dedicated testing box, I just like to
try to minimize them and keep them to weekends!

It sounds like my best bet might be to add a new UFS disk, do a clean
install of 9.1 onto that disk, and then import my existing ZFS pool?

Thanks,
Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Andriy Gapon
on 01/07/2013 20:04 Jeremy Chadwick said the following:
 People are operating with the belief that ZFS just
 works, when reality shows it works until it doesn't

That reality applies to everything that a man creates with a purpose to work.
I am not sure why you are so over-focused on ZFS.
Please stop spreading FUD.  Thank you.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick
On Mon, Jul 01, 2013 at 02:04:24PM -0400, Scott Sipe wrote:
 On Mon, Jul 1, 2013 at 1:04 PM, Jeremy Chadwick j...@koitsu.org wrote:
 
  On Mon, Jul 01, 2013 at 12:23:45PM -0400, Paul Mather wrote:
   On Jul 1, 2013, at 11:49 AM, Jeremy Chadwick j...@koitsu.org wrote:
  
Of course when I see lines like this:
   
 Trying to mount root from zfs:zroot
   
 ...this greatly diminishes any chances of live debugging on the
 system.  It amazes me how often I see this come up on the lists --
  people
 who have ZFS problems but use ZFS for their root/var/tmp/usr.  I wish
 that behaviour would stop, as it makes debugging ZFS a serious PITA.
 This comes up on the list almost constantly, sad panda.
  
  
   I'm not sure why it amazes you that people are making widespread use of
  ZFS.
 
  It's not widespread use of ZFS.  It's widespread use of ZFS as their
  sole filesystem (specifically root/var/tmp/usr, or more specifically
  just root/usr).  People are operating with the belief that ZFS just
  works, when reality shows it works until it doesn't.  The mentality
  seems to be it's so rock solid it'll never break along with it can't
  happen to me.  I tend to err on the side of caution, hence avoidance of
  ZFS for critical things like the aforementioned.
 
  It's different if you have a UFS root/var/tmp/usr and ZFS for everything
  else.  You then have a system you can boot/use without issue even if ZFS
  is crapping the bed.
 
 
 
  ...
 
 
 
  95% of FreeBSD users cannot debug kernel problems**.  To debug a kernel
  problem, you need: a crash dump, a usable system with the exact
  kernel/world where the crash happened (i.e. you cannot crash 8.4 ZFS and
  boot into 8.2 and reliably debug it using that), and (most important of
  all) a developer who is familiar with kernel debugging *and* familiar
  with the bits which are crashing.  Those who say what you're quoting are
  often the latter.
 
 
 
  ...
 
 
 
  But the OP is running -RELEASE, and chooses to run that, along with use
  of freebsd-update for binary updates.  Their choices are limited: stick
  with 8.2, switch to stable/X, cease use of ZFS, or change OSes entirely.
 
 
 So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I
 ultimately wasn't sure where the right place to go for discuss 8.4 is?

For filesystem issues, freebsd-fs@ is usually the best choice, because
it discusses filesystem-related thing (regardless of stable vs. release,
but knowing what version you have of course is mandatory).

freebsd-stable@ is mainly for stable/X related discussions.

Sorry to add pedanticism to an already difficult situation for you (and
I sympathise, particularly since the purpose of the lists is often
difficult to discern, even with their terse descriptions in mailman).

 Beyond the FS mailing list, was there a better place for my question? I'll
 provide the other requested information (zfs outputs, etc) to wherever
 would be best.

Nope, not as far as I know.  The only other place is send-pr(1), once
you have an issue that can be reproduced.

Keep in mind, however, that none of these options (mailing lists,
send-pr, etc.) mandate a response from anyone.  You/your business (see
below) should be aware that there is always the possibility no one can
help solve the actual problem; as such it's important that companies
have proper upgrade/migration paths, rollback plans, and so on.

 This is a production machine (has been since late 2010) and after tweaking
 some ZFS settings initially has been totally stable. I wasn't incredibly
 closely involved in the initial configuration, but I've done at least one
 binary freebsd-update previously.

Well regardless it sounds like moving from 8.2-RELEASE to 8.4-RELEASE
causes ZFS to break for you, so that would classify as a regression.
What the root cause is, however, is still unknown.

Point: 8.2-RELEASE came out in February 2011, and 8.4-RELEASE came out
in June 2013 -- that's almost 2.5 years of changes between versions.
The number of changes between these two is major -- hundreds, maybe
thousands.  ZFS got worked on heavily during this time as well.

I tend to tell anyone using ZFS that they should be running a stable/X
(particularly stable/9) branch.  I can expand on that justification if
needed, as it's well-founded for a lot of reasons.

 Before this computer I had always done source upgrades. ZFS (and the
 thought of a panic like the one I saw this weekend!) made me leery of doing
 that. We're a small business--we have this server, an offsite backup
 server, and a firewall box. I understand that issues like this are are
 going to happen when I don't have a dedicated testing box, I just like to
 try to minimize them and keep them to weekends!

Understood.

 It sounds like my best bet might be to add a new UFS disk, do a clean
 install of 9.1 onto that disk, and then import my existing ZFS pool?

I would suggest starting with this:

Get stable/9 from the place I mentioned, burn an 

Re: ZFS Panic after freebsd-update

2013-07-01 Thread Jeremy Chadwick
On Mon, Jul 01, 2013 at 09:10:45PM +0300, Andriy Gapon wrote:
 on 01/07/2013 20:04 Jeremy Chadwick said the following:
  People are operating with the belief that ZFS just
  works, when reality shows it works until it doesn't
 
 That reality applies to everything that a man creates with a purpose to work.
 I am not sure why you are so over-focused on ZFS.
 Please stop spreading FUD.  Thank you.

The issue is that ZFS on FreeBSD is still young compared to other
filesystems (specifically UFS).  Nothing is perfect, but FFS/UFS tends
to have a significantly larger number of bugs worked out of it to the
point where people can use it without losing sleep (barring the SUJ
stuff, don't get me started).  I have the same concerns over other
things, like ext2fs and fusefs for that matter -- but this thread is
about a ZFS-related crash, and that's why I'm over-focused on it.

A heterogeneous (UFS+ZFS) setup, rather than homogeneous (ZFS-only),
results in a system where an admin can upgrade + boot into single-user
and perform some tasks to test/troubleshoot; if the ZFS layer is
broken, it doesn't mean an essentially useless box.  That isn't FUD,
that's just the stage we're at right now.  I'm aware lots of people have
working ZFS-exclusive setups; like I said, works great until it
doesn't.

So, how do you kernel guys debug a problem in this environment:

- ZFS-only
- Running -RELEASE (i.e. no source, thus a kernel cannot be rebuilt
  with added debugging features, etc.)
- No swap configured
- No serial console

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Steven Hartland
- Original Message - 
From: Scott Sipe csco...@gmail.com

So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I
ultimately wasn't sure where the right place to go for discuss 8.4 is?
Beyond the FS mailing list, was there a better place for my question? I'll
provide the other requested information (zfs outputs, etc) to wherever
would be best.

This is a production machine (has been since late 2010) and after tweaking
some ZFS settings initially has been totally stable. I wasn't incredibly
closely involved in the initial configuration, but I've done at least one
binary freebsd-update previously.

Before this computer I had always done source upgrades. ZFS (and the
thought of a panic like the one I saw this weekend!) made me leery of doing
that. We're a small business--we have this server, an offsite backup
server, and a firewall box. I understand that issues like this are are
going to happen when I don't have a dedicated testing box, I just like to
try to minimize them and keep them to weekends!

It sounds like my best bet might be to add a new UFS disk, do a clean
install of 9.1 onto that disk, and then import my existing ZFS pool?


There should be no reason why 8.4-RELEASE shouldn't work fine.

Yes ZFS is continuously improving and these fixes / enhancements first hit
head / current and are then MFC'ed back to stable/9  stable/8, but that
doesn't mean the release branches should be avoided.

If you can I would try booting from a 8.4-RELEASE cdrom / iso to see
if it can successfully read the pool as this could eliminate out of sync
kernel / world issues.

   Regards
   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Rainer Duffner

Am 01.07.2013 um 20:56 schrieb Steven Hartland kill...@multiplay.co.uk:

 - Original Message - From: Scott Sipe csco...@gmail.com
 So I realize that neither 8.2-RELEASE or 8.4-RELEASE are stable, but I
 ultimately wasn't sure where the right place to go for discuss 8.4 is?
 Beyond the FS mailing list, was there a better place for my question? I'll
 provide the other requested information (zfs outputs, etc) to wherever
 would be best.
 This is a production machine (has been since late 2010) and after tweaking
 some ZFS settings initially has been totally stable. I wasn't incredibly
 closely involved in the initial configuration, but I've done at least one
 binary freebsd-update previously.
 Before this computer I had always done source upgrades. ZFS (and the
 thought of a panic like the one I saw this weekend!) made me leery of doing
 that. We're a small business--we have this server, an offsite backup
 server, and a firewall box. I understand that issues like this are are
 going to happen when I don't have a dedicated testing box, I just like to
 try to minimize them and keep them to weekends!
 It sounds like my best bet might be to add a new UFS disk, do a clean
 install of 9.1 onto that disk, and then import my existing ZFS pool?
 
 There should be no reason why 8.4-RELEASE shouldn't work fine.
 
 Yes ZFS is continuously improving and these fixes / enhancements first hit
 head / current and are then MFC'ed back to stable/9  stable/8, but that
 doesn't mean the release branches should be avoided.
 
 If you can I would try booting from a 8.4-RELEASE cdrom / iso to see
 if it can successfully read the pool as this could eliminate out of sync
 kernel / world issues.



Personally, I find mfsbsd much more practical for booting up a 
rescue-environment.
Also, if 8.4 does not work for some reason - maybe try 8.3?

I have quite a lot of systems running 8.3 (and even more with 9.1) but none of 
them do zfsroot and none of them stresses ZFS very much.
I've so far resisted the urge to update to 8.4.

The reason why I would be interested to run zfs-root is that sometimes, you 
only have two hard drives and still want to do ZFS on it.

Ideally, though, FreeBSD would be able to do something like SmartOS (one of the 
few features I kind of like about it…), where you boot from an USB-image (or 
ideally, via (i)PXE) but use all the available space for data and (3rd-party) 
software. That way, you always have something to boot from, but can maximize 
the usage of spindles and space.
A basic FreeBSD install is, I think, less than 0.5G these days - I really hate 
wasting two 300 (or even 600) GB SAS hard disks just for that.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Alban Hertroys
On Jul 1, 2013, at 19:04, Jeremy Chadwick j...@koitsu.org wrote:

 But even stable/X doesn't provide enough coverage at times (the recent
 fxp(4)/dhclient issue is proof of that).  It's just too bad so many
 people have this broken mindset of what stability means on FreeBSD.


As one of the few persons who have run into that issue I feel like I should 
speak up here and add that this issue was fixed within a very reasonable time 
span after raising the matter here on freebsd-stable@. You've personally been a 
great help in getting that fixed, so thank you for that.

Apparently there was one earlier report of the issue very late in the 
pre-release process, which does imply that fxp hardware is fairly rarely in use 
among FreeBSD users these days (which was the excuse for how the issue passed 
testing for 8.4/9.1 RELEASE). I don't think the release engineering team can 
really be blamed for not catching bugs that go unreported that far into the 
release cycle; they have to make a decision when to release at some point and 
the later it gets into the cycle the harder it is to turn that decision around. 
I can completely understand that.

That this happened was inconvenient, but it happens in stable. ISTR that 
stable doesn't mean stable in the sense that it won't crash, but rather that 
the API's won't change until the next release. I wish other OS companies were 
as reliable; both MS and Apple let a lot more slip by and they take a lot 
longer to release fixes as well.

Of course nobody likes when their system behaves erratically due to some error 
outside their control, but until that point FreeBSD has been rock-solid for me 
for years. And even with this issue, the system was usable.


To get back to the ZFS issue...
ZFS has always seen a fairly large fraction of raised issues on this list. 
Often those were user mistakes, ranging from putting not enough memory into the 
system to not assigning enough to the ZIL (once that became usable). ZFS on 
FreeBSD has come a long way since then. I don't think it's in quite as usable a 
state on, for example, Linux.

Yes, people are taking a risk when using ZFS for everything. The same goes for 
any FS. No matter which file system you use, if it breaks you're between a rock 
and a hard place. Depending on how badly broken it is, you may end up not being 
able to access your data and with some data that's not an option. That's what 
we have backups and test environments for, don't we?

File system code can break. It shouldn't, and I think it's safe to say that in 
FreeBSD's history it has been very rare indeed, but it does happen. The problem 
is probably more that it's so rare that people don't take measures for the few 
times it does happen; like how many of us have an atomic shelter available to 
them? Or a rubber boat? How many nuclear incidents have there been versus how 
many serious file-system breakages in FreeBSD? How many of us first test an 
update to STABLE on an identical test system before upgrading our production 
servers?

Jeremy, I know for a fact that you're a lot more on this list than I am and 
probably longer than I have been (I'm pretty sure you were around already back 
in the days when I started using FreeBSD 2.2.8), but in this case, as much 
respect as I have for you, I think you're overreacting a bit.

And finally, we're having this whole discussion about how problematic FreeBSD's 
been (or not) recently WHILE THE OP HASNT EVEN GOTTEN BACK TO ANSWER DETAILS 
ABOUT HIS ISSUE YET. Perhaps it's a bit early for that? It's entirely possible 
that we're looking at some hardware issue here or a user error that triggered a 
corner case that wasn't handled or something like that.


P.S: Personally, I don't use ZFS because I'm a bit of a database nut and feel 
like log-based file-systems aren't a good match for database write loads, but 
that's mostly just me being pedantic.

Cheers,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic after freebsd-update

2013-07-01 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/01/13 09:10, Steven Hartland wrote:
[...]
 This says your running a 8.2-RELEASE-p3 kernel not an 8.4-RELEASE
 kernel.
 
 Did the upgrade fail or is that dmesg / uname from your old
 kernel?

Looking at the context, he used freebsd-update to update 8.2-RELEASE
to 8.4-RELEASE (which, the first step would be updating the kernel)
and booted with that panic, and reverted to old kernel.

It would be helpful if we have address of stack frame #6 as well as
the tuning you he have done (in loader.conf), plus the actual panic
message (looks like a kernel trap 12, but a glance at the code I
didn't find a candidate line where this happens).

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQEcBAEBCgAGBQJR0e91AAoJEG80Jeu8UPuz05MIAK21VdKOkVNISzrd9ZDKTpml
EjKtrOUhXreI21XyuoVxGboIjNfBxbfPxu07Tj6ocY8LwwneMot9nW5d3xtsS71A
ap9Ho3KFUKGv5RTHWO7mhbKhSXnKBl/SmyIeLx//I7vCfxQb0MWUT7bdRF56Eojj
lUz6dnLDXt6q3p3TGC17mwETHbdvdrr4ptBANAXFaY763WFSW6pLWUr5KIxZ7f7i
DqNKpShTC4LsVr6OZjq70E+1XFCM7E//ZKVbJWBNrGJd7kmk7raq7ERx8tJqcWu6
sdxWcjbG6bOlCmONcozohNsqRvpTKu1VK6JsWVBUq9Et2nY/2rKvu5lKyIvxPBg=
=NmTM
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Help! :( ZFS panic on boot, importing pool after server crash.

2013-06-14 Thread Dr Josef Karthauser
Hi, I'm a bit at the end of my tether.

We had a ZFS panic last night on a machine that hosts all my mail and web; it 
was rebooted and it now panics mounting the ZFS root filesystem.

The call stack info is:

solaris assert: ss == NULL, file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/space_map.c,
 line: 109

kdb_backtrace
panic
space_map_add
space_map_load
metaslab_activate
metaslab_allocate
zio_dva_allocate
zio_execute
taskqueue_run_locked
taskqueue_thread_loop
fork_exit
fork_trampoline

I can boot from the live DVD filesystem, but I can only mount the pool 
read-only without getting the same kernel panic.  This is with FreeBSD 9.0.

The machine is remote, and I don't have access other than through a DRAC 
console port (so I can't cut and paste; sorry for the poor stack trace).

Is anyone here in the position to advice me how I might process to get this 
machine mounting and running again in multi-user mode?

Thanks so much.
Joe

p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root 
file system.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help! :( ZFS panic on boot, importing pool after server crash.

2013-06-14 Thread Volodymyr Kostyrko

14.06.2013 12:55, Dr Josef Karthauser:

Hi, I'm a bit at the end of my tether.

We had a ZFS panic last night on a machine that hosts all my mail and web; it 
was rebooted and it now panics mounting the ZFS root filesystem.

The call stack info is:

solaris assert: ss == NULL, file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensource/uts/common/fs/zfs/space_map.c,
 line: 109

kdb_backtrace
panic
space_map_add
space_map_load
metaslab_activate
metaslab_allocate
zio_dva_allocate
zio_execute
taskqueue_run_locked
taskqueue_thread_loop
fork_exit
fork_trampoline

I can boot from the live DVD filesystem, but I can only mount the pool 
read-only without getting the same kernel panic.  This is with FreeBSD 9.0.

The machine is remote, and I don't have access other than through a DRAC 
console port (so I can't cut and paste; sorry for the poor stack trace).

Is anyone here in the position to advice me how I might process to get this 
machine mounting and running again in multi-user mode?


There's no official way.


p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root 
file system.


If you are fairly sure about your devices you can:

1. Remove second disk from pool or create another pool on top of it.

2. Recreate all FS structure on the second disk. You can dump al your FS 
with something like:


zfs list -Ho name | xargs -n1 zfs get -H all | awk 
'BEGIN{shard=;output=}{if(shard!=$1  shard!=){output=zfs 
create;for(param in params)output=output -o param=params[param];print
output shard;delete 
params;shard=}}$4~/local/{params[$2]=$3;shard=$1;next}$2~/type/{shard=$1}END{output=zfs 
create;for(param in params)output=output -o 
param=params[param];print output shard;}'


Be sure to rename the pool and change the first line.

3. Rsync all data to the second disk.

4. Try to boot from the second disk.

If everything worked you are free to attach first disk to second one to 
create a mirror again.


--
Sphinx of black quartz, judge my vow.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help! :( ZFS panic on boot, importing pool after server crash.

2013-06-14 Thread Dr Josef Karthauser
On 14 Jun 2013, at 12:00, Volodymyr Kostyrko c.kw...@gmail.com wrote:

 14.06.2013 12:55, Dr Josef Karthauser:
 Hi, I'm a bit at the end of my tether.

 p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root 
 file system.
 
 If you are fairly sure about your devices you can:
 
 1. Remove second disk from pool or create another pool on top of it.
 
 2. Recreate all FS structure on the second disk. You can dump al your FS with 
 something like:
 

Great. Thanks for that.

Have you got a hint as to how I can get access to the root file system? It's 
currently set to have a legacy mount point.  Which means that when I import the 
pool:

# zfs import -o readonly=on -o altroot=/tmp/zfs -f poolname

the root filesystem is missing.  Then if I try and set the mount point:

#zfs set mountpoint=/tmp/zfs2 poolname

it just sits there; probably because the command is blocking on the R/O pool, 
or something.

How do I temporarily remount the root filesystem so that I can get access to 
the files?

Thanks,
Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Help! :( ZFS panic on boot, importing pool after server crash.

2013-06-14 Thread Volodymyr Kostyrko

14.06.2013 15:51, Dr Josef Karthauser:

On 14 Jun 2013, at 12:00, Volodymyr Kostyrko c.kw...@gmail.com wrote:


14.06.2013 12:55, Dr Josef Karthauser:

Hi, I'm a bit at the end of my tether.



p.s. the config, btw, is a ZFS mirror on two ad devices. It's got a ZFS root 
file system.


If you are fairly sure about your devices you can:

1. Remove second disk from pool or create another pool on top of it.

2. Recreate all FS structure on the second disk. You can dump al your FS with 
something like:



Great. Thanks for that.

Have you got a hint as to how I can get access to the root file system? It's 
currently set to have a legacy mount point.  Which means that when I import the 
pool:

# zfs import -o readonly=on -o altroot=/tmp/zfs -f poolname

the root filesystem is missing.  Then if I try and set the mount point:

#zfs set mountpoint=/tmp/zfs2 poolname

it just sits there; probably because the command is blocking on the R/O pool, 
or something.

How do I temporarily remount the root filesystem so that I can get access to 
the files?


mount -t zfs pool-name mountpoint

Personally when I need to work with such pools I first import the pool 
with -N (nomount) option, then I mount root fs by hand and after that 
goes `zfs mount -a` which handles everything else.


--
Sphinx of black quartz, judge my vow.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on a RELENG_8 NFS server

2011-09-19 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110911.054601.1424617155148336027@allbsd.org:

hr Hiroki Sato h...@freebsd.org wrote
hr   in 20110910.044841.232160047547388224@allbsd.org:
hr
hr hr Hiroki Sato h...@freebsd.org wrote
hr hr   in 20110907.094717.2272609566853905102@allbsd.org:
hr hr
hr hr hr  During this investigation an disk has to be replaced and 
resilvering
hr hr hr  it is now in progress.  A deadlock and a forced reboot after that
hr hr hr  make recovering of the zfs datasets take a long time (for 
committing
hr hr hr  logs, I think), so I will try to reproduce the deadlock and get a
hr hr hr  core dump after it finished.
hr hr
hr hr  I think I could reproduce the symptoms.  I have no idea about if
hr hr  these are exactly the same as occurred on my box before because the
hr hr  kernel was replaced with one with some debugging options, but these
hr hr  are reproducible at least.
hr hr
hr hr  There are two symptoms.  One is a panic.  A DDB output when the panic
hr hr  occurred is the following:
hr
hr  I am trying vfs.lookup_shared=0 and seeing how it goes.  It seems the
hr  box can endure a high load which quickly caused these symptoms.

 There was no difference by the knob.  The same panic or
 unresponsiveness still occurs in about 24-32 hours or so.

-- Hiroki


pgpIwsQ57ZO6Q.pgp
Description: PGP signature


Re: ZFS panic on a RELENG_8 NFS server

2011-09-10 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110910.044841.232160047547388224@allbsd.org:

hr Hiroki Sato h...@freebsd.org wrote
hr   in 20110907.094717.2272609566853905102@allbsd.org:
hr
hr hr  During this investigation an disk has to be replaced and resilvering
hr hr  it is now in progress.  A deadlock and a forced reboot after that
hr hr  make recovering of the zfs datasets take a long time (for committing
hr hr  logs, I think), so I will try to reproduce the deadlock and get a
hr hr  core dump after it finished.
hr
hr  I think I could reproduce the symptoms.  I have no idea about if
hr  these are exactly the same as occurred on my box before because the
hr  kernel was replaced with one with some debugging options, but these
hr  are reproducible at least.
hr
hr  There are two symptoms.  One is a panic.  A DDB output when the panic
hr  occurred is the following:

 I am trying vfs.lookup_shared=0 and seeing how it goes.  It seems the
 box can endure a high load which quickly caused these symptoms.

-- Hiroki


pgpfb5zUJdfPH.pgp
Description: PGP signature


ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too long (RELENG_8 from today))

2011-09-09 Thread Hiroki Sato
Hiroki Sato h...@freebsd.org wrote
  in 20110907.094717.2272609566853905102@allbsd.org:

hr  During this investigation an disk has to be replaced and resilvering
hr  it is now in progress.  A deadlock and a forced reboot after that
hr  make recovering of the zfs datasets take a long time (for committing
hr  logs, I think), so I will try to reproduce the deadlock and get a
hr  core dump after it finished.

 I think I could reproduce the symptoms.  I have no idea about if
 these are exactly the same as occurred on my box before because the
 kernel was replaced with one with some debugging options, but these
 are reproducible at least.

 There are two symptoms.  One is a panic.  A DDB output when the panic
 occurred is the following:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x10040
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x8065b926
stack pointer   = 0x28:0xff8257b94d70
frame pointer   = 0x28:0xff8257b94e10
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 992 (nfsd: service)
[thread pid 992 tid 100586 ]
Stopped at  witness_checkorder+0x246:   movl0x40(%r13),%ebx

db bt
Tracing pid 992 tid 100586 td 0xff00595d9000
witness_checkorder() at witness_checkorder+0x246
_sx_slock() at _sx_slock+0x35
dmu_bonus_hold() at dmu_bonus_hold+0x57
zfs_zget() at zfs_zget+0x237
zfs_dirent_lock() at zfs_dirent_lock+0x488
zfs_dirlook() at zfs_dirlook+0x69
zfs_lookup() at zfs_lookup+0x26b
zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81
vfs_cache_lookup() at vfs_cache_lookup+0xf0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40
lookup() at lookup+0x384
nfsvno_namei() at nfsvno_namei+0x268
nfsrvd_lookup() at nfsrvd_lookup+0xd6
nfsrvd_dorpc() at nfsrvd_dorpc+0x745
nfssvc_program() at nfssvc_program+0x447
svc_run_internal() at svc_run_internal+0x51b
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffe6c8, rbp = 0x6 ---


 The complete output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt

 Another is getting stuck at ZFS access.  The kernel is running with
 no panic but any access to ZFS datasets causes a program
 non-responsive.  The DDB output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt

 The trigger for the both was some access to a ZFS dataset from the
 NFS clients.  Because the access pattern was complex I could not
 narrow down what was the culprit, but it seems timing-dependent and
 simply doing rm -rf locally on the server can sometimes trigger
 them.

 The crash dump and the kernel can be found at the following URLs:

  panic:
http://people.allbsd.org/~hrs/zfs_panic_20110909_1/

  no panic but unresponsive:
http://people.allbsd.org/~hrs/zfs_panic_20110909_2/

  kernel:
http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/

-- Hiroki
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic after replacing log device

2010-11-29 Thread Dan Langille

On 11/16/2010 8:41 PM, Terry Kennedy wrote:

I would say it is definitely very odd that writes are a problem.  Sounds
like it might be a hardware problem.  Is it possible to export the pool,
remove the ZIL and re-import it?  I myself would be pretty nervous trying
that, but it would help isolate the problem?  If you can risk it.


   I think it is unlikely to be a hardware problem. While I haven't run any
destructive testing on the ZFS pool, the fact that it can be read without
error, combined with ECC throughout the system and the panic always happen-
ing on the first write, makes me think that it is a software issue in ZFS.

   When I do:

zpool export data; zpool remove data da0

   I get a No such pool: data. I then re-imported the pool and did:

zpool offline data da0; zpool export data; zpool import data

   After doing that, I can write to the pool without a panic. But once I
online the log device and do any writes, I get the panic again.

   As I mentioned, I have this data replicated elsewere, so I can exper-
iment with the pool if it will help track down this issue.


Any more news on this?

--
Dan Langille - http://langille.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic after replacing log device

2010-11-16 Thread Terry Kennedy
 I would say it is definitely very odd that writes are a problem.  Sounds
 like it might be a hardware problem.  Is it possible to export the pool, 
 remove the ZIL and re-import it?  I myself would be pretty nervous trying
 that, but it would help isolate the problem?  If you can risk it.

  I think it is unlikely to be a hardware problem. While I haven't run any
destructive testing on the ZFS pool, the fact that it can be read without
error, combined with ECC throughout the system and the panic always happen-
ing on the first write, makes me think that it is a software issue in ZFS.

  When I do:

zpool export data; zpool remove data da0

  I get a No such pool: data. I then re-imported the pool and did:

zpool offline data da0; zpool export data; zpool import data

  After doing that, I can write to the pool without a panic. But once I
online the log device and do any writes, I get the panic again.

  As I mentioned, I have this data replicated elsewere, so I can exper-
iment with the pool if it will help track down this issue.

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic after replacing log device

2010-11-15 Thread Terry Kennedy
 I can give a developer remote console / root access to the box if that would 
 help. I have a couple days before I will need to nuke the pool and restore it 
 from backups. 

I haven't heard from anyone that wants to look into this. I need to get the 
pool back into service soon. If I don't get any requests to postpone or offers 
to investigate by 00:00 GMT on the 18th, I'll proceed with re-initializing the 
pool (minus the SSD, which is persona non grata). 

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic after replacing log device

2010-11-15 Thread Terry Kennedy
 I am no ZFS kernel-code dude or anything, but it is well known that losing
 the ZIL can corrupt things pretty bad with ZFS.

  First, thanks for writing back!

  I agree that this could be the problem. As I mentioned in my original post,
I followed the steps recommended by zpool status - clearing the device and
then doing a replace. The fix may be as simple as testing for whether the de-
vice in question is a log device and if so, erroring out with You can't do
that.

  Also note that multiple scrubs pass with no errors detected - it is only
writes that trigger the panic. It looks like something isn't being cleaned
up in the clear / replace path.

  I would save a crash dump for people to look at, but unfortunately the
last time a crash dump actually worked for me (on dozens of systems) was
back in the FreeBSD 6.2 days.

  There wasn't any data corruption (the filesystem was not being written at
the time the log device failed) - I have my own checksum files written by
the sysutils/cfv port, and the data all matches.

 All in all, if I was in your situation I would give a whirl at installing
 OpenSolaris and going from there, being sure not to upgrade the pool vers-
 ion past what is supported by FreeBSD and going from there.

  I have the data on another server (see my prior snapshots are not back-
ups discussion on freebsd-stable if interested). So, fortunately, this is
not a case of data recovery.

 Unfortunately we all find ourselves in a bit of a pickle with ZFS right 
 now with the Oracle acquisition of Sun.  For myself, I would stick with 
 deploying on FreeBSD but I think its going to be FBSD 9.1 before its go-
 ing to be truly ready for production.

  The problem with hardware on the leading edge is that the software often
needs time to catch up. In this particular case, the ZFS pool is 32TB. I
can't begin to imagine how long a UFS fsck would take on such a partition,
even if it were possible to create one. It was bad enough on the previous
generation of my servers (2TB UFS partitions).

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic after replacing log device

2010-11-15 Thread Michael DeMan
Hi Terry,

I am no ZFS kernel-code dude or anything, but it is well known that losing the 
ZIL can corrupt things pretty bad with ZFS.

You may want to skim the archives at OpenSolaris ZFS discuss 
zfs-disc...@opensolaris.org

All in all, if I was in your situation I would give a whirl at installing 
OpenSolaris and going from there, being sure not to upgrade the pool version 
past what is supported by FreeBSD and going from there.

Unfortunately we all find ourselves in a bit of a pickle with ZFS right now 
with the Oracle acquisition of Sun.  For myself, I would stick with deploying 
on FreeBSD but I think its going to be FBSD 9.1 before its going to be truly 
ready for production.

Just my 2-cents.

- Mike


On Nov 15, 2010, at 10:24 PM, Terry Kennedy wrote:

 I can give a developer remote console / root access to the box if that would 
 help. I have a couple days before I will need to nuke the pool and restore 
 it 
 from backups. 
 
 I haven't heard from anyone that wants to look into this. I need to get the 
 pool back into service soon. If I don't get any requests to postpone or 
 offers 
 to investigate by 00:00 GMT on the 18th, I'll proceed with re-initializing 
 the 
 pool (minus the SSD, which is persona non grata). 
 
Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic after replacing log device

2010-11-15 Thread Michael DeMan
Hi, sorry for not completely digesting your original post.

I would say it is definitely very odd that writes are a problem.  Sounds like 
it might be a hardware problem.  Is it possible to export the pool, remove the 
ZIL and re-import it?  I myself would be pretty nervous trying that, but it 
would help isolate the problem?  If you can risk it.



On Nov 15, 2010, at 11:01 PM, Terry Kennedy wrote:

 Also note that multiple scrubs pass with no errors detected - it is only
 writes that trigger the panic. It looks like something isn't being cleaned
 up in the clear / replace path.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS panic after replacing log device

2010-11-13 Thread Terry Kennedy
I'm posting this to the freebsd-stable and freebsd-fs mailing lists. Followups
should probably happen on freebsd-fs.

I have a ZFS pool configured as:

zpool create data raidz da1 da2 da3 da4 da5 raidz da6 da7 da8 da9 da10 
raidz da11 da12 da13 da14 da15 spare da16 log da0

where da1-16 are WD2003FYYS drives (2TB RE4) and da0 is a 256GB PCI-Express
SSD (name omitted to protect the guilty).

The SSD has been dropping offline randomly - it seems that one or more flash 
modules pop out of their sockets and need to be re-seated frequently for some 
reason.

The most recent time it did that, I replaced the SSD with another one (for some 
reason, the manufacturer ties the flash modules to a particular controller, so 
just moving the modules results in an offline SSD and inability to manage it 
due to license limits exceeded or some such nonsense).

ZFS wasn't happy with the log device being changed, and reported it as 
corrupted, with the suggested corrective action being to zpool clear it. I 
did that, and then did a zpool replace data da0 da0 and it claimed to 
successfully resilver it. I then did a zpool scrub and the scrub completed 
with no errors. So far, so good.

However, any attempt to write to the array results in a near-immediate panic:

panic: solaris assert: sm-sm_spare + size = sm-sm_size, file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
 
line: 93 cpuid=2

(Screenshot at http://www.tmk.com/transient/zfs-panic.png in case I mis-typed
something).

This is repeatable across reboot / scrub / test cycles. System is 8-STABLE as 
of Fri Nov  5 19:08:35 EDT 2010, on-disk pool is version 4/15, same as the 
kernel.

I know that certain operations on log devices aren't supported until pool 
version 19 or thereabouts, but the error messages and zpool command results 
gave the impression that what I was doing was supported and worked (when it 
didn't). If this is truly a you can't do that in pool version 15, perhaps a 
warning could be added so users don't get fooled into thinking it worked?

I can give a developer remote console / root access to the box if that would 
help. I have a couple days before I will need to nuke the pool and restore it 
from backups.

Terry Kennedy http://www.tmk.com
te...@tmk.com New York, NY USA
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on RELENG_7/i386

2010-01-26 Thread Alexander Leidinger


Quoting Dmitry Morozovsky ma...@rinet.ru (from Tue, 26 Jan 2010  
01:16:28 +0300 (MSK)):



On Mon, 25 Jan 2010, Dmitry Morozovsky wrote:

DM PJD  I had a crash durinc rsync to ZFS today:
DM PJD
DM PJD Do you have recent 7-STABLE? Not sure if it was the same before MFC,
DM
DM r...@woozle:/var/crash# uname -a
DM FreeBSD woozle.rinet.ru 7.2-STABLE FreeBSD 7.2-STABLE #4: Mon  
Dec 14 12:40:43

DM MSK 2009 ma...@woozle.rinet.ru:/usr/obj/usr/src/sys/WOOZLE  i386
DM
DM I'll update to fresh sources and recheck, thanks.
DM
DM BTW, any thoughts of another topic I started a couple of weeks ago?

Well, after updating to fresh system scrub finished without errors, and now
rsync is running, now copied 15G out of 150.


You may want to switch the checksum algorithm to fletcher4. It  
(fletcher4 the default instead of fletcher2) is one of the few changes  
between 8-stable and 7-stable in ZFS, which I didn't merge.


Bye,
Alexander.

--
 Officers' club: We don't know but we've been told, our beer on tap is
 mighty cold.

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on RELENG_7/i386

2010-01-26 Thread Dmitry Morozovsky
On Tue, 26 Jan 2010, Alexander Leidinger wrote:

AL  Well, after updating to fresh system scrub finished without errors, and
AL  now
AL  rsync is running, now copied 15G out of 150.
AL 
AL You may want to switch the checksum algorithm to fletcher4. It (fletcher4
AL the default instead of fletcher2) is one of the few changes between 8-stable
AL and 7-stable in ZFS, which I didn't merge.

will do, thank you. is fletcher4 faster?

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on RELENG_7/i386

2010-01-26 Thread Artem Belevich
 will do, thank you. is fletcher4 faster?
Not necessarily. But it does work as a checksum much better. See
following link for the details.

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597

--Artem
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on RELENG_7/i386

2010-01-26 Thread Dmitry Morozovsky
On Tue, 26 Jan 2010, Artem Belevich wrote:

AB  will do, thank you. is fletcher4 faster?
AB Not necessarily. But it does work as a checksum much better. See
AB following link for the details.
AB 
AB http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597

Yes, I already read some articles about fletcher checksums and related.

Thanks.


-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS panic on RELENG_7/i386

2010-01-25 Thread Dmitry Morozovsky
Dear colleagues,

I had a crash durinc rsync to ZFS today:

(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0xc050c688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc050c965 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc08e95ce in zfs_fuid_create (zfsvfs=0xc65c4800, id=Unhandled dwarf 
expression opcode 0x93
)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c:591
#4  0xc0910775 in zfs_freebsd_setattr (ap=0xf5baab64)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2888
#5  0xc06c6292 in VOP_SETATTR_APV (vop=0xc096e560, a=0xf5baab64)
at vnode_if.c:583
#6  0xc05918e5 in setfown (td=0xc834fd80, vp=0xcac4b33c, uid=4294967294, gid=0)
at vnode_if.h:315
#7  0xc05919bc in kern_lchown (td=0xc834fd80, 
path=0xbfbfccc8 Address 0xbfbfccc8 out of bounds, pathseg=UIO_USERSPACE, 
uid=-2, gid=0) at /usr/src/sys/kern/vfs_syscalls.c:2787
#8  0xc0591a4a in lchown (td=0xc834fd80, uap=0xf5baacfc)
at /usr/src/sys/kern/vfs_syscalls.c:2770
#9  0xc06b10f5 in syscall (frame=0xf5baad38)
at /usr/src/sys/i386/i386/trap.c:1101
#10 0xc0696b90 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262

Any other info needed?

Thanks in advance!

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on RELENG_7/i386

2010-01-25 Thread Pawel Jakub Dawidek
On Mon, Jan 25, 2010 at 10:04:20PM +0300, Dmitry Morozovsky wrote:
 Dear colleagues,
 
 I had a crash durinc rsync to ZFS today:

Do you have recent 7-STABLE? Not sure if it was the same before MFC,
probably not, because what you see is impossible in case of source I'm
looking at. At the begining of zfs_fuid_create() function there is a
check:

if (!zfsvfs-z_use_fuids || !IS_EPHEMERAL(id) || fuid_idx != 0)
return (id);

And IS_EPHEMERAL() is defined as follows:

#define IS_EPHEMERAL(x) (0)

So it will always return here.

 #3  0xc08e95ce in zfs_fuid_create (zfsvfs=0xc65c4800, id=Unhandled dwarf 
 expression opcode 0x93
 )
 at 
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c:591

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpGXOZZRCate.pgp
Description: PGP signature


Re: ZFS panic on RELENG_7/i386

2010-01-25 Thread Dmitry Morozovsky
On Mon, 25 Jan 2010, Pawel Jakub Dawidek wrote:

PJD On Mon, Jan 25, 2010 at 10:04:20PM +0300, Dmitry Morozovsky wrote:
PJD  Dear colleagues,
PJD  
PJD  I had a crash durinc rsync to ZFS today:
PJD 
PJD Do you have recent 7-STABLE? Not sure if it was the same before MFC,

r...@woozle:/var/crash# uname -a
FreeBSD woozle.rinet.ru 7.2-STABLE FreeBSD 7.2-STABLE #4: Mon Dec 14 12:40:43 
MSK 2009 ma...@woozle.rinet.ru:/usr/obj/usr/src/sys/WOOZLE  i386

I'll update to fresh sources and recheck, thanks.

BTW, any thoughts of another topic I started a couple of weeks ago?

PJD probably not, because what you see is impossible in case of source I'm
PJD looking at. At the begining of zfs_fuid_create() function there is a
PJD check:
PJD 
PJDif (!zfsvfs-z_use_fuids || !IS_EPHEMERAL(id) || fuid_idx != 0)
PJDreturn (id);
PJD 
PJD And IS_EPHEMERAL() is defined as follows:
PJD 
PJD#define IS_EPHEMERAL(x) (0)
PJD 
PJD So it will always return here.
PJD 
PJD  #3  0xc08e95ce in zfs_fuid_create (zfsvfs=0xc65c4800, id=Unhandled dwarf 
PJD  expression opcode 0x93
PJD  )
PJD  at 
PJD  
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_fuid.c:591
PJD 
PJD 

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic on RELENG_7/i386

2010-01-25 Thread Dmitry Morozovsky
On Mon, 25 Jan 2010, Dmitry Morozovsky wrote:

DM PJD  I had a crash durinc rsync to ZFS today:
DM PJD 
DM PJD Do you have recent 7-STABLE? Not sure if it was the same before MFC,
DM 
DM r...@woozle:/var/crash# uname -a
DM FreeBSD woozle.rinet.ru 7.2-STABLE FreeBSD 7.2-STABLE #4: Mon Dec 14 
12:40:43 
DM MSK 2009 ma...@woozle.rinet.ru:/usr/obj/usr/src/sys/WOOZLE  i386
DM 
DM I'll update to fresh sources and recheck, thanks.
DM 
DM BTW, any thoughts of another topic I started a couple of weeks ago?

Well, after updating to fresh system scrub finished without errors, and now 
rsync is running, now copied 15G out of 150.

Thank you!

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS panic solaris assert: sm-sm_space loses pool on RELENG-7

2009-11-16 Thread Pete French
Sometime on sunday our main server paniced with the following error:

panic: solaris assert: sm-sm_space == space (0x5e45000 == 0x5e45600), file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
 line: 361

I did some goolging and found a couple of refereces to other people who
have seen this. Both of them, however, could not recover the pool and
needed to restore all the data from backups (which I am in the process of).

Soes anyone know anything more about this ? Specificly if it a known
rpoblem which is fixed in 8.0 ? I couldn't find a PR of any kind, but
the fact that a machine can spontaneously loose all it's data from
a set of filesystems worries me greatly.

cheers,

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


zfs panic mounting fs after crash with RC2

2009-11-04 Thread Gerrit Kühn
Hi,

Yesterday I had the opportunity to play around with my yet-to-become new
fileserver a bit more. Originally I had installed 7.2-R, which I upgraded
to 8-0-RC2 yesterday. After that I upgraded my zpool consisting of 4 disks
in raidz1 constallation to v13.
Some time later I tried to use powerd which was obviously a bad idea: it
crashed the machine immediately. I will give a separate report on that
later as it is probably related to the hardware, which is a bit exotic (VIA
VB8001 board with 64bit Via Nano processor).
However, the worst thing for me is, that after rebooting from that crash,
one of my zfs fs cannot be mounted anymore. As soon as I try to mount it I
get a kernel panic. I can still access the properties (I made use of
canmount=noauto for the first time :-), but I cannot do a snapshot of
the fs (funny enough, zfs complains that the fs is busy, while in reality
it is not even mounted - so how could it be busy?).

I took a picture of the kernel panic and put it here (don't know if there
is any useful information in it):
http://www.pmp.uni-hannover.de/test/Mitarbeiter/g_kuehn/data/zfs-panic.jpg

The pool as such seems to be fine, all other fs in it can be mounted and
used, only trying to mount tank/sys/var triggers this panic.
Are there any suggestions what I could do to get my fs back? Please let me
know if (and how) I can provide more debugging information.


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


zfs panic

2009-11-02 Thread Gerrit Kühn
Hi,

I got the following panic when rebooting after a crash on 7.2-REL:

panic: solaris assert: dmu_read(os, smo-smo_object, offset, size,
entry_map) == 0 (0x5 == 0x0), file:
/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/spa
ce_map.c, line: 341

This seems to be the same panic as mentioned here:
http://lists.freebsd.org/pipermail/freebsd-stable/2008-July/043763.html.

However, I did not see warnings about the ZIL. The crash leading to this
situation was probably caused by me pushing the controller card a bit too
hard (mechanically) during operation (well, so much about hot-plugging of
cards :-).
Since my pool was almost empty anyway and I needed the machine, I opted to
recreate the pool instead of trying the patches supplied by pjd@ in the
thread above.

But nevertheless I would like to be prepared if this happens again (and
the pool is not empty :-).
Right now I am updating the system to 8.0-RC2. Will this issue go away
with zpoolv13/FBSD8.0 (as suggested above)? I could not find out from the
thread above if the suggested patches helped or if anything from this has
been commited at all. Pawel or Daniel, do you remember what the final
result was?


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


8.0-RC1/amd64, ZFS panic

2009-10-15 Thread Borja Marcos


panic: mtx_lock() of destroyed mutex @ /usr/src/sys/kern/vfs_subrc:2467
cpuid = 1

I was doing a zfs destroy -r of a dataset. The dataset has had many  
snapshot receives done.



# uname -a
FreeBSD  8.0-RC1 FreeBSD 8.0-RC1 #1: Tue Oct 13 14:11:08 CEST 2009  
root@:/usr/obj/usr/src/sys/DEBUG  amd64


(kernel config: added WITNESS, etc to have debugging information,  
doing some ZFS send/receive tests)


It's a VMWare virtual machine, and I've frozen it.




Borja.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: zfs/panic: short after rollback

2009-08-11 Thread Peter Much
km...@freebsd.org aka Kip Macy  schrieb
mit Datum Fri, 12 Jun 2009 13:54:40 -0700 in m2n.fbsd.stable:

|show sleepchain
|show thread 100263
|
|On Fri, Jun 12, 2009 at 6:56 AM, Andriy Gapona...@icyb.net.ua wrote:
|
| I did zfs rollback x...@yyy
| And then did ls on a directory in the rolled-back fs.

| panic: sleeping thread

This is quite likely the same problem as I experience. 
And it is maybe also the same problem as in kern/137037 and kern/129148.

It seems to show up in some different flavours, while the bottomline
is this: 
do a rollback, and soon after (usually at the next filesystem-related 
action) the kernel has gone fishing.

I experienced it first when doing a rollback of a mounted filesystem.
It crashed right after the first try, and it did so reproducible.
(Well, more or less reproducible - another day under similar
circumstances it did not crash.)

Then I started thinking, and came to the conclusion that a rollback
of a mounted filesystem (with possibly open files) could easily bring 
a lot of things into an undefined state, and should not be something 
one wants to do normally. So maybe it is not supposed to work at all.

Anyway, when trying this, I do either get the sleeping thread
message (as above), or a panic from _sx_xlock() (as shown in 
my addendum to kern/137037, and in the addendum to kern/129148).

So I started to do rollbacks on unmounted filesystems (quite an
excessive amount of them), and while this seemed to work at first, 
later on the system failures reappeared. 
These system failures took various shapes - I experienced immediate
resets without dump, and system hangs.
When deliberately trying to reproduce that (after installing a 
kernel with debugging info and watching the console), I also 
captured a panic coming from _sx_xlock() - so it seems to be the 
same problem as without unmounting, only that it takes a couple 
of rollbacks (a dozen or more) to hit.

Over all, there was never any data loss or persistent damage.
So, I consider rollback still functional and safe to use, but
I consider a system no longer production stable after doing
a rollback.

rgds,
PMc
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: ZFS panic in zfs_fuid_create

2009-05-29 Thread Lawrence Farr


 -Original Message-
 From: owner-freebsd-sta...@freebsd.org [mailto:owner-freebsd-
 sta...@freebsd.org] On Behalf Of Andriy Gapon
 Sent: 28 May 2009 18:11
 To: Lawrence Farr
 Cc: freebsd-stable@freebsd.org
 Subject: Re: ZFS panic in zfs_fuid_create
 
 on 27/05/2009 19:25 Lawrence Farr said the following:
  I updated my backup boxes to the latest and greatest ZFS code,
  and started getting the following panic on them all (3 machines):
 
  panic: zfs_fuid_create
  cpuid = 1
  Uptime: 1h28m48s
  Cannot dump. No dump device defined.
  Automatic reboot in 15 seconds - press a key on the console to abort
 
  A quick google found kern/133020 with a patch from PJD that has fixed
  it for me. Should it be in stable or does it break something else?
 
 Hmm I wonder if you really do have UIDs or GIDs greater than 2147483647
 defined on
 your system?
 

Not that I could see. It's rsyncing from an EXT3 volume on a Linux server,
that runs as an OSX fileserver. All the permissions/owners are mapped to
Linux users. There are a lot of odd characters used in the filenames, but
that's all I could see that was potentially an issue. Hasn't had a problem
since I put that patch in, and I was getting a few minutes into the backup
before it paniced previously.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS panic in zfs_fuid_create

2009-05-28 Thread Andriy Gapon
on 27/05/2009 19:25 Lawrence Farr said the following:
 I updated my backup boxes to the latest and greatest ZFS code,
 and started getting the following panic on them all (3 machines):
 
 panic: zfs_fuid_create
 cpuid = 1
 Uptime: 1h28m48s
 Cannot dump. No dump device defined.
 Automatic reboot in 15 seconds - press a key on the console to abort
 
 A quick google found kern/133020 with a patch from PJD that has fixed
 it for me. Should it be in stable or does it break something else?

Hmm I wonder if you really do have UIDs or GIDs greater than 2147483647 defined 
on
your system?


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS panic in zfs_fuid_create

2009-05-27 Thread Lawrence Farr
I updated my backup boxes to the latest and greatest ZFS code,
and started getting the following panic on them all (3 machines):

panic: zfs_fuid_create
cpuid = 1
Uptime: 1h28m48s
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort

A quick google found kern/133020 with a patch from PJD that has fixed
it for me. Should it be in stable or does it break something else?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RELENG_7/i386: ZFS panic on reboot

2009-03-17 Thread Dmitry Morozovsky

while rebooting:

(kgdb) bt
#0  doadump () at pcpu.h:196
#1  0x80514298 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0x80514575 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x806a74d4 in trap_fatal (frame=0xbf5b9a24, eva=12) at 
/usr/src/sys/i386/i386/trap.c:939
#4  0x806a771d in trap_pfault (frame=0xbf5b9a24, usermode=0, eva=12) at 
/usr/src/sys/i386/i386/trap.c:852
#5  0x806a808a in trap (frame=0xbf5b9a24) at /usr/src/sys/i386/i386/trap.c:530
#6  0x8069016b in calltrap () at /usr/src/sys/i386/i386/exception.s:159
#7  0x80806610 in gfs_dir_create (struct_size=132, pvp=0x87b388a0, 
vfsp=0x87a93b40, ops=0x808817a0, entries=0x0, inode_cb=0, maxlen=256, 
readdir_cb=0x808636c6 zfsctl_snapdir_readdir_cb, lookup_cb=0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:420
#8  0x80863420 in zfsctl_mknode_snapdir (pvp=0x87b388a0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:783
#9  0x808069e9 in gfs_dir_lookup (dvp=0x87b388a0, nm=0x8087dfae snapshot, 
vpp=0xbf5b9b60) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/gfs.c:630
#10 0x808630bc in zfsctl_root_lookup (dvp=0x87b388a0, nm=0x8087dfae snapshot, 
vpp=0xbf5b9b60, pnp=0x0, flags=0, rdir=0x0, cr=0x85e84000)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:396
#11 0x808638fa in zfsctl_umount_snapshots (vfsp=0x87a93b40, fflags=524288, 
cr=0x85e84000)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:1063
#12 0x8086b1dc in zfs_umount (vfsp=0x87a93b40, fflag=524288, td=0x85e8ecc0) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:692
#13 0x80586ea4 in dounmount (mp=0x87a93b40, flags=524288, td=0x85e8ecc0) at 
/usr/src/sys/kern/vfs_mount.c:1293
#14 0x8058a4e8 in vfs_unmountall () at /usr/src/sys/kern/vfs_subr.c:2944
#15 0x80514005 in boot (howto=16392) at /usr/src/sys/kern/kern_shutdown.c:400
#16 0x8051445d in reboot (td=0x85e8ecc0, uap=0xbf5b9cfc) at 
/usr/src/sys/kern/kern_shutdown.c:172
#17 0x806a7a60 in syscall (frame=0xbf5b9d38) at 
/usr/src/sys/i386/i386/trap.c:1090
#18 0x806901d0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#19 0x0033 in ?? ()

Any additional info needed? Thanks!

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic

2009-02-18 Thread Kostik Belousov
On Tue, Feb 17, 2009 at 09:43:31PM -0800, Cy Schubert wrote:
 I got this panic after issuing reboot(8).
 
 FreeBSD  7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 
 c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG  i386
 
 
 FreeBSD/i386 (bob) (ttyd0)
 
 login: Feb 17 21:22:56 bob reboot: rebooted by root
 Feb 17 21:22:56 bob syslogd: exiting on signal 15
 Waiting (max 60 seconds) for system process `vnlru' to stop...done
 Waiting (max 60 seconds) for system process `syncer' to stop...
 Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done
 Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
 All buffers synced.
 panic: insmntque() failed: error 16
 cpuid = 0
 KDB: enter: panic
 [thread pid 1086 tid 100090 ]
 Stopped at  kdb_enter_why+0x3a: movl$0,kdb_why
 db bt
 Tracing pid 1086 tid 100090 td 0xc2bfd230
 kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at 
 kdb_enter_why+0x3a
 panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136
 gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at 
 gfs_file_create+0x86
 gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c
 zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at 
 zfsctl_mknode_snapdir+0x53
 gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at 
 gfs_dir_lookup+0xd1
 zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at 
 zfsctl_root_lookup+0xdc
 zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at 
 zfsctl_umount_snapshots+0x4e
 zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0x53
 dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430
 vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e
 boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f
 reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b
 syscall(ebf8dd38) at syscall+0x2b3
 Xint0x80_syscall() at Xint0x80_syscall+0x20
 --- syscall (55, FreeBSD ELF32, reboot), eip = 0x280bc947, esp = 
 0xbfbfeb7c, ebp = 0xbfbfebb8 ---
 db 
 
 Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates 
 the panic.

The patch below would fix the problem, unless I mis-merged it.
Please note that I cannot test the patch myself, so I rely on ZFS
users testing before the commit.

Property changes on: .
___
Modified: svn:mergeinfo
   Merged /head/sys:r182781,182824,182840


Property changes on: dev/cxgb
___
Modified: svn:mergeinfo
   Merged /head/sys/dev/cxgb:r182781,182824,182840


Property changes on: dev/ath/ath_hal
___
Modified: svn:mergeinfo
   Merged /head/sys/dev/ath/ath_hal:r182781,182824,182840


Property changes on: contrib/pf
___
Modified: svn:mergeinfo
   Merged /head/sys/contrib/pf:r182781,182824,182840

Index: cddl/contrib/opensolaris/uts/common/fs/gfs.c
===
--- cddl/contrib/opensolaris/uts/common/fs/gfs.c(revision 188748)
+++ cddl/contrib/opensolaris/uts/common/fs/gfs.c(working copy)
@@ -358,6 +358,7 @@
fp = kmem_zalloc(size, KM_SLEEP);
error = getnewvnode(zfs, vfsp, ops, vp);
ASSERT(error == 0);
+   vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread);
vp-v_data = (caddr_t)fp;
 
/*
@@ -368,7 +369,9 @@
fp-gfs_size = size;
fp-gfs_type = GFS_FILE;
 
+   vp-v_vflag |= VV_FORCEINSMQ;
error = insmntque(vp, vfsp);
+   vp-v_vflag = ~VV_FORCEINSMQ;
KASSERT(error == 0, (insmntque() failed: error %d, error));
 
/*
Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
===
--- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c  (revision 
188748)
+++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c  (working copy)
@@ -113,6 +113,7 @@
if (cdrarg != NULL) {
error = getnewvnode(zfs, vfsp, zfs_vnodeops, vp);
ASSERT(error == 0);
+   vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread);
zp-z_vnode = vp;
vp-v_data = (caddr_t)zp;
vp-v_vnlock-lk_flags |= LK_CANRECURSE;
@@ -348,7 +349,9 @@
if (vp == NULL)
return (zp);
 
+   vp-v_vflag |= VV_FORCEINSMQ;
error = insmntque(vp, zfsvfs-z_vfs);
+   vp-v_vflag = ~VV_FORCEINSMQ;
KASSERT(error == 0, (insmntque() failed: error %d, error));
 
vp-v_type = IFTOVT((mode_t)zp-z_phys-zp_mode);
@@ -535,8 +538,10 @@
 
*zpp = zp;
} else {
-   if (ZTOV(zp) != NULL)
+   if (ZTOV(zp) != NULL) {
ZTOV(zp)-v_count = 0;
+

Re: ZFS Panic

2009-02-18 Thread Cy Schubert
In message 20090218162126.gq41...@deviant.kiev.zoral.com.ua, Kostik 
Belousov
writes:
 
 --v+Mbu5iuT/5Blw/K
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Feb 17, 2009 at 09:43:31PM -0800, Cy Schubert wrote:
  I got this panic after issuing reboot(8).
 =20
  FreeBSD  7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009  =
   =20
  c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG  i386
 =20
 =20
  FreeBSD/i386 (bob) (ttyd0)
 =20
  login: Feb 17 21:22:56 bob reboot: rebooted by root
  Feb 17 21:22:56 bob syslogd: exiting on signal 15
  Waiting (max 60 seconds) for system process `vnlru' to stop...done
  Waiting (max 60 seconds) for system process `syncer' to stop...
  Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done
  Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
  All buffers synced.
  panic: insmntque() failed: error 16
  cpuid =3D 0
  KDB: enter: panic
  [thread pid 1086 tid 100090 ]
  Stopped at  kdb_enter_why+0x3a: movl$0,kdb_why
  db bt
  Tracing pid 1086 tid 100090 td 0xc2bfd230
  kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at=20
  kdb_enter_why+0x3a
  panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136
  gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at=20
  gfs_file_create+0x86
  gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c
  zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at=20
  zfsctl_mknode_snapdir+0x53
  gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at=20
  gfs_dir_lookup+0xd1
  zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at=20
  zfsctl_root_lookup+0xdc
  zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at=20
  zfsctl_umount_snapshots+0x4e
  zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0=
 x53
  dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430
  vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e
  boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f
  reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b
  syscall(ebf8dd38) at syscall+0x2b3
  Xint0x80_syscall() at Xint0x80_syscall+0x20
  --- syscall (55, FreeBSD ELF32, reboot), eip =3D 0x280bc947, esp =3D=20
  0xbfbfeb7c, ebp =3D 0xbfbfebb8 ---
  db=20
 =20
  Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates=
 =20
  the panic.
 
 The patch below would fix the problem, unless I mis-merged it.
 Please note that I cannot test the patch myself, so I rely on ZFS
 users testing before the commit.
 
 Property changes on: .
 ___
 Modified: svn:mergeinfo
Merged /head/sys:r182781,182824,182840
 
 
 Property changes on: dev/cxgb
 ___
 Modified: svn:mergeinfo
Merged /head/sys/dev/cxgb:r182781,182824,182840
 
 
 Property changes on: dev/ath/ath_hal
 ___
 Modified: svn:mergeinfo
Merged /head/sys/dev/ath/ath_hal:r182781,182824,182840
 
 
 Property changes on: contrib/pf
 ___
 Modified: svn:mergeinfo
Merged /head/sys/contrib/pf:r182781,182824,182840
 
 Index: cddl/contrib/opensolaris/uts/common/fs/gfs.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 --- cddl/contrib/opensolaris/uts/common/fs/gfs.c  (revision 188748)
 +++ cddl/contrib/opensolaris/uts/common/fs/gfs.c  (working copy)
 @@ -358,6 +358,7 @@
   fp =3D kmem_zalloc(size, KM_SLEEP);
   error =3D getnewvnode(zfs, vfsp, ops, vp);
   ASSERT(error =3D=3D 0);
 + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread);
   vp-v_data =3D (caddr_t)fp;
 =20
   /*
 @@ -368,7 +369,9 @@
   fp-gfs_size =3D size;
   fp-gfs_type =3D GFS_FILE;
 =20
 + vp-v_vflag |=3D VV_FORCEINSMQ;
   error =3D insmntque(vp, vfsp);
 + vp-v_vflag =3D ~VV_FORCEINSMQ;
   KASSERT(error =3D=3D 0, (insmntque() failed: error %d, error));
 =20
   /*
 Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 --- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(revision 18874
 8)
 +++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(working copy)
 @@ -113,6 +113,7 @@
   if (cdrarg !=3D NULL) {
   error =3D getnewvnode(zfs, vfsp, zfs_vnodeops, vp);
   ASSERT(error =3D=3D 0);
 + vn_lock(vp, LK_EXCLUSIVE | 

Re: ZFS Panic

2009-02-18 Thread Cy Schubert
In message 20090218162126.gq41...@deviant.kiev.zoral.com.ua, Kostik 
Belousov
writes:
 
 --v+Mbu5iuT/5Blw/K
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Feb 17, 2009 at 09:43:31PM -0800, Cy Schubert wrote:
  I got this panic after issuing reboot(8).
 =20
  FreeBSD  7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009  =
   =20
  c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG  i386
 =20
 =20
  FreeBSD/i386 (bob) (ttyd0)
 =20
  login: Feb 17 21:22:56 bob reboot: rebooted by root
  Feb 17 21:22:56 bob syslogd: exiting on signal 15
  Waiting (max 60 seconds) for system process `vnlru' to stop...done
  Waiting (max 60 seconds) for system process `syncer' to stop...
  Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done
  Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
  All buffers synced.
  panic: insmntque() failed: error 16
  cpuid =3D 0
  KDB: enter: panic
  [thread pid 1086 tid 100090 ]
  Stopped at  kdb_enter_why+0x3a: movl$0,kdb_why
  db bt
  Tracing pid 1086 tid 100090 td 0xc2bfd230
  kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at=20
  kdb_enter_why+0x3a
  panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136
  gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at=20
  gfs_file_create+0x86
  gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c
  zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at=20
  zfsctl_mknode_snapdir+0x53
  gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at=20
  gfs_dir_lookup+0xd1
  zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at=20
  zfsctl_root_lookup+0xdc
  zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at=20
  zfsctl_umount_snapshots+0x4e
  zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0=
 x53
  dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430
  vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e
  boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f
  reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b
  syscall(ebf8dd38) at syscall+0x2b3
  Xint0x80_syscall() at Xint0x80_syscall+0x20
  --- syscall (55, FreeBSD ELF32, reboot), eip =3D 0x280bc947, esp =3D=20
  0xbfbfeb7c, ebp =3D 0xbfbfebb8 ---
  db=20
 =20
  Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates=
 =20
  the panic.
 
 The patch below would fix the problem, unless I mis-merged it.
 Please note that I cannot test the patch myself, so I rely on ZFS
 users testing before the commit.
 
 Property changes on: .
 ___
 Modified: svn:mergeinfo
Merged /head/sys:r182781,182824,182840
 
 
 Property changes on: dev/cxgb
 ___
 Modified: svn:mergeinfo
Merged /head/sys/dev/cxgb:r182781,182824,182840
 
 
 Property changes on: dev/ath/ath_hal
 ___
 Modified: svn:mergeinfo
Merged /head/sys/dev/ath/ath_hal:r182781,182824,182840
 
 
 Property changes on: contrib/pf
 ___
 Modified: svn:mergeinfo
Merged /head/sys/contrib/pf:r182781,182824,182840
 
 Index: cddl/contrib/opensolaris/uts/common/fs/gfs.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 --- cddl/contrib/opensolaris/uts/common/fs/gfs.c  (revision 188748)
 +++ cddl/contrib/opensolaris/uts/common/fs/gfs.c  (working copy)
 @@ -358,6 +358,7 @@
   fp =3D kmem_zalloc(size, KM_SLEEP);
   error =3D getnewvnode(zfs, vfsp, ops, vp);
   ASSERT(error =3D=3D 0);
 + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, curthread);
   vp-v_data =3D (caddr_t)fp;
 =20
   /*
 @@ -368,7 +369,9 @@
   fp-gfs_size =3D size;
   fp-gfs_type =3D GFS_FILE;
 =20
 + vp-v_vflag |=3D VV_FORCEINSMQ;
   error =3D insmntque(vp, vfsp);
 + vp-v_vflag =3D ~VV_FORCEINSMQ;
   KASSERT(error =3D=3D 0, (insmntque() failed: error %d, error));
 =20
   /*
 Index: cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 --- cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(revision 18874
 8)
 +++ cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c(working copy)
 @@ -113,6 +113,7 @@
   if (cdrarg !=3D NULL) {
   error =3D getnewvnode(zfs, vfsp, zfs_vnodeops, vp);
   ASSERT(error =3D=3D 0);
 + vn_lock(vp, LK_EXCLUSIVE | 

ZFS Panic

2009-02-17 Thread Cy Schubert
I got this panic after issuing reboot(8).

FreeBSD  7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 
c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG  i386


FreeBSD/i386 (bob) (ttyd0)

login: Feb 17 21:22:56 bob reboot: rebooted by root
Feb 17 21:22:56 bob syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.
panic: insmntque() failed: error 16
cpuid = 0
KDB: enter: panic
[thread pid 1086 tid 100090 ]
Stopped at  kdb_enter_why+0x3a: movl$0,kdb_why
db bt
Tracing pid 1086 tid 100090 td 0xc2bfd230
kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at 
kdb_enter_why+0x3a
panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136
gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at 
gfs_file_create+0x86
gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c
zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at 
zfsctl_mknode_snapdir+0x53
gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at 
gfs_dir_lookup+0xd1
zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at 
zfsctl_root_lookup+0xdc
zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at 
zfsctl_umount_snapshots+0x4e
zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0x53
dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430
vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e
boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f
reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b
syscall(ebf8dd38) at syscall+0x2b3
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (55, FreeBSD ELF32, reboot), eip = 0x280bc947, esp = 
0xbfbfeb7c, ebp = 0xbfbfebb8 ---
db 

Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates 
the panic.


-- 
Cheers,
Cy Schubert cy.schub...@komquats.com
FreeBSD UNIX:  c...@freebsd.org   Web:  http://www.FreeBSD.org

e**(i*pi)+1=0


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS Panic

2009-02-17 Thread Ganbold

Cy Schubert wrote:

I got this panic after issuing reboot(8).

FreeBSD  7.1-STABLE FreeBSD 7.1-STABLE #0: Tue Feb 17 19:29:23 PST 2009 
c...@cwsys:/export/obj/export/home/cy/test/test-stable7/sys/DEBUG  i386



FreeBSD/i386 (bob) (ttyd0)

login: Feb 17 21:22:56 bob reboot: rebooted by root
Feb 17 21:22:56 bob syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...2 2 2 2 1 1 1 1 0 0 0 0 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
All buffers synced.
panic: insmntque() failed: error 16
cpuid = 0
KDB: enter: panic
[thread pid 1086 tid 100090 ]
Stopped at  kdb_enter_why+0x3a: movl$0,kdb_why
db bt
Tracing pid 1086 tid 100090 td 0xc2bfd230
kdb_enter_why(c087ef4a,c087ef4a,c2b1b5b4,ebf8da58,0,...) at 
kdb_enter_why+0x3a

panic(c2b1b5b4,10,c2b24a40,ebf8da64,c38e6000,...) at panic+0x136
gfs_file_create(84,c346d8a0,c342d5a0,c2b24a40,c346d8a0,...) at 
gfs_file_create+0x86

gfs_dir_create(84,c346d8a0,c342d5a0,c2b24a40,0,...) at gfs_dir_create+0x2c
zfsctl_mknode_snapdir(c346d8a0,c2b1b54f,275,25d,c3419520,...) at 
zfsctl_mknode_snapdir+0x53
gfs_dir_lookup(c346d8a0,c2b21126,ebf8db74,c091521c,ebf8db38,...) at 
gfs_dir_lookup+0xd1
zfsctl_root_lookup(c346d8a0,c2b21126,ebf8db74,0,0,...) at 
zfsctl_root_lookup+0xdc
zfsctl_umount_snapshots(c342d5a0,8,c3acb800,c3216844,0,...) at 
zfsctl_umount_snapshots+0x4e

zfs_umount(c342d5a0,8,c2bfd230,c2bfd230,c088a687,...) at zfs_umount+0x53
dounmount(c342d5a0,8,c2bfd230,e26988ac,0,...) at dounmount+0x430
vfs_unmountall(c087ed87,0,c087edeb,128,0,...) at vfs_unmountall+0x4e
boot(c090b5d0,0,c087edeb,ab,ebf8dd2c,...) at boot+0x44f
reboot(c2bfd230,ebf8dcfc,4,c0885aef,c08c38a8,...) at reboot+0x4b
syscall(ebf8dd38) at syscall+0x2b3
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (55, FreeBSD ELF32, reboot), eip = 0x280bc947, esp = 
0xbfbfeb7c, ebp = 0xbfbfebb8 ---
db 

Forceably unmounting ZFS filesystems prior to issuing reboot(8) mitigates 
the panic.
  

I have experienced ZFS related panic with RELEN_7 in November last year and
got a fix from k...@. But I'm not quite sure whether yours and mine are 
the same case, but
might help following patch (for 
/usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c and

/usr/src/sys/cddl/compat/opensolaris/sys/vnode.h):

http://lists.freebsd.org/pipermail/freebsd-stable/2008-November/046752.html

Ganbold



  



--
I was the best I ever had. -- Woody Allen
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Reliably trigger-able ZFS panic

2008-03-03 Thread Ivan Voras
LI Xin wrote:
 Hi,
 
 The following iozone test case on ZFS would reliably trigger panic:
 
 /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g
 -i 0 -i 1 -i 2 -i 8 -+p 70 -C

It can also be (eventually) triggered by blogbench -c 100 -i 30 -r 50
-w 10 -W 10 and heavy IO load on real multithreaded applications like
mysql (both iozone and blogbench are multithreaded).




signature.asc
Description: OpenPGP digital signature


Re: Reliably trigger-able ZFS panic

2008-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 02, 2008 at 03:49:03AM -0800, LI Xin wrote:
 Hi,
 
 The following iozone test case on ZFS would reliably trigger panic:
 
 /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g 
 -i 0 -i 1 -i 2 -i 8 -+p 70 -C

Thanks, I'll try to reproduce it.

[...]

 #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, 
 level=70109184, method=68351768,
 windowBits=68351600, memLevel=76231808, strategy=76231808, 
 version=Cannot access memory at address 0x00040010
 )
 at 
 /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318

Can you send me your FS configuration? zfs get all your/file/system
I see that you use compression on this dataset?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpBXXAkxJmmK.pgp
Description: PGP signature


Re: Reliably trigger-able ZFS panic

2008-03-03 Thread Xin LI
Pawel Jakub Dawidek wrote:
 On Sun, Mar 02, 2008 at 03:49:03AM -0800, LI Xin wrote:
 Hi,

 The following iozone test case on ZFS would reliably trigger panic:

 /usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g 
 -i 0 -i 1 -i 2 -i 8 -+p 70 -C
 
 Thanks, I'll try to reproduce it.
 
 [...]
 
 #19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, 
 level=70109184, method=68351768,
 windowBits=68351600, memLevel=76231808, strategy=76231808, 
 version=Cannot access memory at address 0x00040010
 )
 at 
 /usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318
 
 Can you send me your FS configuration? zfs get all your/file/system
 I see that you use compression on this dataset?

It was all default configuration.  The pool was a RAID-Z2 without
hotspare disk.  The box is now running some other tests (not FreeBSD) at
our Beijing Lab and we don't have remote hands in the nights, so I'm
afraid that I will not be able to provide further information at this
moment.  Please let me know if the test run will not provoke the problem
and I will ask them to see if they can spare the box in the weekend for me.

Cheers,
-- 
Xin LI [EMAIL PROTECTED]  http://www.delphij.net/
FreeBSD - The Power to Serve!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Reliably trigger-able ZFS panic

2008-03-03 Thread Quake Lee

Tue, 04 Mar 2008 03:27:35 +0800,Xin LI [EMAIL PROTECTED]:
The kernel is
FreeBSD fs12.sina.com.cn 7.0-STABLE FreeBSD 7.0-STABLE #0: Sun Mar  2  
18:50:05 CST 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ZFORK   
amd64


the get all at below:
fs12# zfs get all
NAME PROPERTY   VALUE  SOURCE
midpool  type   filesystem -
midpool  creation   Fri Feb 29 15:01 2008  -
midpool  used   11.1M  -
midpool  available  2.65T  -
midpool  referenced 44.7K  -
midpool  compressratio  1.00x  -
midpool  mountedyes-
midpool  quota  none   default
midpool  reservationnone   default
midpool  recordsize 128K   default
midpool  mountpoint /mnt/ztest local
midpool  sharenfs   offdefault
midpool  checksum   on default
midpool  compressionoffdefault
midpool  atime  on default
midpool  deviceson default
midpool  exec   on default
midpool  setuid on default
midpool  readonly   offdefault
midpool  jailed offdefault
midpool  snapdirhidden default
midpool  aclmodegroupmask  default
midpool  aclinherit secure default
midpool  canmount   on default
midpool  shareiscsi offdefault
midpool  xattr  offtemporary
midpool  copies 1  default

fs12# zpool get all midpool
NAME PROPERTY  VALUE   SOURCE
midpool  bootfs-   default


Pawel Jakub Dawidek wrote:

On Sun, Mar 02, 2008 at 03:49:03AM -0800, LI Xin wrote:

Hi,

The following iozone test case on ZFS would reliably trigger panic:

/usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g
-i 0 -i 1 -i 2 -i 8 -+p 70 -C


Thanks, I'll try to reproduce it.

[...]


#19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0,
level=70109184, method=68351768,
windowBits=68351600, memLevel=76231808, strategy=76231808,
version=Cannot access memory at address 0x00040010
)
at
/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318


Can you send me your FS configuration? zfs get all your/file/system
I see that you use compression on this dataset?


It was all default configuration.  The pool was a RAID-Z2 without
hotspare disk.  The box is now running some other tests (not FreeBSD) at
our Beijing Lab and we don't have remote hands in the nights, so I'm
afraid that I will not be able to provide further information at this
moment.  Please let me know if the test run will not provoke the problem
and I will ask them to see if they can spare the box in the weekend for  
me.


Cheers,




--
The Power to Serve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Reliably trigger-able ZFS panic

2008-03-02 Thread LI Xin

Hi,

The following iozone test case on ZFS would reliably trigger panic:

/usr/local/bin/iozone -M -e -+u -T -t 128 -S 4096 -L 64 -R -r 4k -s 30g 
-i 0 -i 1 -i 2 -i 8 -+p 70 -C


Unfortunately the kgdb can not reveal useful backtrace.  I have tried 
KDB_TRACE, but have not yet be able to further investigate it.


fs12# kgdb /boot/kernel/kernel.symbols vmcore.0
[GDB will not be able to debug user-mode threads: 
/usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup]

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 05
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x8:0x80763d16
stack pointer   = 0x10:0xd94798f0
frame pointer   = 0x10:0xd9479920
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 340 (txg_thread_enter)
trap number = 12
panic: page fault
cpuid = 5
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17a
trap_fatal() at trap_fatal+0x29f
trap_pfault() at trap_pfault+0x294
trap() at trap+0x2ea
calltrap() at calltrap+0x8
--- trap 0xc, rip = 0x80763d16, rsp = 0xd94798f0, rbp = 
0xd9479920 ---

dmu_objset_sync_dnodes() at dmu_objset_sync_dnodes+0x26
dmu_objset_sync() at dmu_objset_sync+0x12d
dsl_pool_sync() at dsl_pool_sync+0x72
spa_sync() at spa_sync+0x390
txg_sync_thread() at txg_sync_thread+0x12f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xd9479d30, rbp = 0 ---
Uptime: 25m7s
Physical memory: 4081 MB
Dumping 1139 MB: 1124 1108 1092 1076 1060 1044 1028 1012 996 980 964 948 
932 916 900 884 868 852 836 820 804 788 772 756 740 724 708 692 676 660 
644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 
356 340 324 308 292 276 260 244 228 212 196 180 164 148 132 116 100 84 
68 52 36 20 4


#0  doadump () at pcpu.h:194
194 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) add-symbol-file /boot/kernel/zfs.ko.symbols
add symbol table from file /boot/kernel/zfs.ko.symbols at
(y or n) y
Reading symbols from /boot/kernel/zfs.ko.symbols...done.
(kgdb) where
#0  doadump () at pcpu.h:194
#1  0x80277aa8 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:409

#2  0x80277f07 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0x80465a1f in trap_fatal (frame=0xc, eva=Variable eva is 
not available.

) at /usr/src/sys/amd64/amd64/trap.c:724
#4  0x80465e04 in trap_pfault (frame=0xd9479840, usermode=0)
at /usr/src/sys/amd64/amd64/trap.c:641
#5  0x8046677a in trap (frame=0xd9479840) at 
/usr/src/sys/amd64/amd64/trap.c:410
#6  0x8044babe in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:169

#7  0x80763d16 in ?? ()
#8  0x0004 in adjust_ace_pair ()
#9  0x0004 in adjust_ace_pair ()
#10 0xd94799e0 in ?? ()
#11 0x80763e7d in ?? ()
#12 0xff0004275a80 in ?? ()
#13 0xff00045a1190 in ?? ()
#14 0x807639b0 in ?? ()
#15 0x80763f20 in ?? ()
#16 0xff00042dc800 in ?? ()
#17 0x0004 in adjust_ace_pair ()
#18 0xd9479990 in ?? ()
#19 0xb55d in z_deflateInit2_ (strm=0xff00042dc8e0, 
level=70109184, method=68351768,
windowBits=68351600, memLevel=76231808, strategy=76231808, 
version=Cannot access memory at address 0x00040010

)
at 
/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/zmod/deflate.c:318

Previous frame inner to this frame (corrupt stack?)
--
Xin LI [EMAIL PROTECTED]http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature