Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching

2017-07-18 Thread Mark Martinec

2017-07-18 01:24, Mark Johnston wrote:

Are you able to break into the debugger at this point? Try setting
debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at
the loader prompt, and hit the break key, or the key sequence
 ~ ctrl-b once the hang occurs. At the debugger prompt, try
"bt" and "show allpcpu" to start.


Thank you for a prompt and good suggestion! I spent an afternoon
fiddling with the machine, with mixed results. Your suggestion to
break into debugger did not work, there was no reaction to 
or to  ~ ctrl-b.

So I embarked on rebuilding the RC3 kernel with
  options KDB
  options DDB
  options BREAK_TO_DEBUGGER
  options ALT_BREAK_TO_DEBUGGER
  options INVARIANTS
  options INVARIANT_SUPPORT
  options WITNESS
  options WITNESS_SKIPSPIN
but then I realized the  key is mapped-to by: alt ctrl ,
which now does break into debugger - but not so early where the
holdup occurs.

The WITNESS produced some LOR warnings, but that is probably ok.
I came across a trace just before the problem area, but it flows
by so fast on a vt console and only the last 40 or so lines
remain on the screen (I have a photo), which do not look like
revealing much. Unfortunately this machine does not have a serial
interface.

So in my last attempt I rebuilt a kernel with INVARIANTS but
without WITNESS - and now I cannot reproduce the problem, with
or without a "safe mode". What is interesting here that now
the da0..da3 disks are attached first, and only then the ada
disks - and even within the group of disks on the same
controller their order has been shuffled - no idea what could
have caused it - and it may have avoided the problem by doing so.

Will play some more with this tomorrow...

  Mark



On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote:

Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update
upgrade
method I ended up with a system which gets stuck while trying to 
attach
the second set of disks. This happened already after the first phase 
of

the upgrade procedure (installing and re-booting with a new kernel).

The first set of disks (ada0 .. ada2) are attached successfully, also 
a

cd0, but then when the first of the set of four (a regular spinning
disk)
on an LSI controller is to be attached, the boot procedure just gets
stuck there:
   kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes)
   kernel: ada1: Command Queueing enabled
   kernel: ada1: 305245MB (625142448 512 byte sectors)
   kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0
   kernel: ada2:  ATA8-ACS SATA 3.x device
   kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8
   kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes)
   kernel: ada2: Command Queueing enabled
   kernel: ada2: 114473MB (234441648 512 byte sectors)
   kernel: ada2: quirks=0x1<4K>
   kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0

(stuck here, keyboard not responding, fans rising their pitch,
  presumably CPU is spinning)

[...]
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel Panic of 10.2-RELEASE

2017-07-18 Thread Stéphane Dupille via freebsd-stable
> Le 18 juil. 2017 à 18:47, Daniel Genis  a écrit :
> 
> Hello, 

Hello,

> Take a look at this commit: https://github.com/freebsd/freebsd/commit/d99ba5c
> It might be the issue you're encountering. 

Yes, it is. Here ’s the corresponding PR : 
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207464

If I understand comments correctly, we have the same issue in 10.3 as well. So 
the solution is to avoid destroy snapshots, or upgrade to 11.0. Or patch the 
kernel myself.

Thanks.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Kernel Panic of 10.2-RELEASE

2017-07-18 Thread Daniel Genis
Hello, 

Take a look at this commit: https://github.com/freebsd/freebsd/commit/d99ba5c

It might be the issue you're encountering. 

With kind regards, 

Daniel

On 18 July 2017 18:33:02 CEST, "Stéphane Dupille via freebsd-stable" 
 wrote:
>Hello,
>
>My server is running 10.2-RELEASE (yes, I need to upgrade it, but it
>works like a charm). Today, I launched this command, as root :
># zfs destroy -r zroot@attic
>and the machine crashed :
>
>Jul 18 18:09:40 penitencier syslogd: kernel boot file is
>/boot/kernel/kernel
>Jul 18 18:09:40 penitencier kernel: vputx: negative ref count
>Jul 18 18:09:40 penitencier kernel: 0xf8023037f000: tag zfs, type
>VDIR
>Jul 18 18:09:40 penitencier kernel: usecount 0, writecount 0, refcount
>0 mountedhere 0
>Jul 18 18:09:40 penitencier kernel: flags (VI_FREE)
>Jul 18 18:09:40 penitencier kernel: VI_LOCKedlock type zfs: EXCL by
>thread 0xf8014f242940 (pid 60698, zfs, tid 100747)
>Jul 18 18:09:40 penitencier kernel: panic: vputx: negative ref cnt
>Jul 18 18:09:40 penitencier kernel: cpuid = 1
>Jul 18 18:09:40 penitencier kernel: KDB: stack backtrace:
>Jul 18 18:09:40 penitencier kernel: #0 0x80984ef0 at
>kdb_backtrace+0x60
>Jul 18 18:09:40 penitencier kernel: #1 0x80948aa6 at
>vpanic+0x126
>Jul 18 18:09:40 penitencier kernel: #2 0x80948973 at panic+0x43
>Jul 18 18:09:40 penitencier kernel: #3 0x809eb7d5 at
>vputx+0x2d5
>Jul 18 18:09:40 penitencier kernel: #4 0x809e4f59 at
>dounmount+0x689
>Jul 18 18:09:40 penitencier kernel: #5 0x81a5fdd4 at
>zfs_unmount_snap+0x114
>Jul 18 18:09:40 penitencier kernel: #6 0x81a62fc1 at
>zfs_ioc_destroy_snaps+0xc1
>Jul 18 18:09:40 penitencier kernel: #7 0x81a61ae0 at
>zfsdev_ioctl+0x5f0
>Jul 18 18:09:40 penitencier kernel: #8 0x80830019 at
>devfs_ioctl_f+0x139
>Jul 18 18:09:40 penitencier kernel: #9 0x8099cde5 at
>kern_ioctl+0x255
>Jul 18 18:09:40 penitencier kernel: #10 0x8099cae0 at
>sys_ioctl+0x140
>Jul 18 18:09:40 penitencier kernel: #11 0x80d4b3e7 at
>amd64_syscall+0x357
>Jul 18 18:09:40 penitencier kernel: #12 0x80d30acb at
>Xfast_syscall+0xfb
>Jul 18 18:09:40 penitencier kernel: Uptime: 5d6h0m11s
>
>This is all I found in logs. I have only a remote access to this
>machine so I have no clue of what was printed on console.
>
>I use zfs on top of geom_eli.
>
>Here is a uname -v :
>FreeBSD penitencier.dalton-brothers.org 10.2-RELEASE-p9 FreeBSD
>10.2-RELEASE-p9 #0: Thu Jan 14 01:32:46 UTC 2016
>r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
>
>After rebooting, the machine works well, as far as I can see :
>root@penitencier:/var/log # zpool status
>  pool: zboot
> state: ONLINE
>scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 12 11:20:33
>2014
>config:
>
>   NAME   STATE READ WRITE CKSUM
>   zboot  ONLINE   0 0 0
> mirror-0 ONLINE   0 0 0
>   gpt/boot0  ONLINE   0 0 0
>   gpt/boot1  ONLINE   0 0 0
>
>errors: No known data errors
>
>  pool: zroot
> state: ONLINE
>scan: resilvered 6,56M in 0h0m with 0 errors on Tue Jul 18 18:13:23
>2017
>config:
>
>   NAME   STATE READ WRITE CKSUM
>   zroot  ONLINE   0 0 0
> mirror-0 ONLINE   0 0 0
>   da0p4.eli  ONLINE   0 0 0
>   da1p4.eli  ONLINE   0 0 0
>
>errors: No known data errors
>
>
>(the pool has been resilvered because I boot once, but put a wrong
>passphrase in geli for one of the two drives, so it booted with only
>one disk)
>
>What should I do now ? launch a zfs scrub ? I’m a bit afraid of making
>it panic again. Should I consider that I got unlucky once ?
>(please don’t tell me to upgrade it : I’m currently trying to install a
>new server, and I will migrate to it very soon).
>
>Thanks.
>
>___
>freebsd-stable@freebsd.org mailing list
>https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>To unsubscribe, send any mail to
>"freebsd-stable-unsubscr...@freebsd.org"

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Kernel Panic of 10.2-RELEASE

2017-07-18 Thread Stéphane Dupille via freebsd-stable
Hello,

My server is running 10.2-RELEASE (yes, I need to upgrade it, but it works like 
a charm). Today, I launched this command, as root :
# zfs destroy -r zroot@attic
and the machine crashed :

Jul 18 18:09:40 penitencier syslogd: kernel boot file is /boot/kernel/kernel
Jul 18 18:09:40 penitencier kernel: vputx: negative ref count
Jul 18 18:09:40 penitencier kernel: 0xf8023037f000: tag zfs, type VDIR
Jul 18 18:09:40 penitencier kernel: usecount 0, writecount 0, refcount 0 
mountedhere 0
Jul 18 18:09:40 penitencier kernel: flags (VI_FREE)
Jul 18 18:09:40 penitencier kernel: VI_LOCKedlock type zfs: EXCL by thread 
0xf8014f242940 (pid 60698, zfs, tid 100747)
Jul 18 18:09:40 penitencier kernel: panic: vputx: negative ref cnt
Jul 18 18:09:40 penitencier kernel: cpuid = 1
Jul 18 18:09:40 penitencier kernel: KDB: stack backtrace:
Jul 18 18:09:40 penitencier kernel: #0 0x80984ef0 at kdb_backtrace+0x60
Jul 18 18:09:40 penitencier kernel: #1 0x80948aa6 at vpanic+0x126
Jul 18 18:09:40 penitencier kernel: #2 0x80948973 at panic+0x43
Jul 18 18:09:40 penitencier kernel: #3 0x809eb7d5 at vputx+0x2d5
Jul 18 18:09:40 penitencier kernel: #4 0x809e4f59 at dounmount+0x689
Jul 18 18:09:40 penitencier kernel: #5 0x81a5fdd4 at 
zfs_unmount_snap+0x114
Jul 18 18:09:40 penitencier kernel: #6 0x81a62fc1 at 
zfs_ioc_destroy_snaps+0xc1
Jul 18 18:09:40 penitencier kernel: #7 0x81a61ae0 at zfsdev_ioctl+0x5f0
Jul 18 18:09:40 penitencier kernel: #8 0x80830019 at devfs_ioctl_f+0x139
Jul 18 18:09:40 penitencier kernel: #9 0x8099cde5 at kern_ioctl+0x255
Jul 18 18:09:40 penitencier kernel: #10 0x8099cae0 at sys_ioctl+0x140
Jul 18 18:09:40 penitencier kernel: #11 0x80d4b3e7 at 
amd64_syscall+0x357
Jul 18 18:09:40 penitencier kernel: #12 0x80d30acb at Xfast_syscall+0xfb
Jul 18 18:09:40 penitencier kernel: Uptime: 5d6h0m11s

This is all I found in logs. I have only a remote access to this machine so I 
have no clue of what was printed on console.

I use zfs on top of geom_eli.

Here is a uname -v :
FreeBSD penitencier.dalton-brothers.org 10.2-RELEASE-p9 FreeBSD 10.2-RELEASE-p9 
#0: Thu Jan 14 01:32:46 UTC 2016 
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

After rebooting, the machine works well, as far as I can see :
root@penitencier:/var/log # zpool status
  pool: zboot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 12 11:20:33 2014
config:

NAME   STATE READ WRITE CKSUM
zboot  ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
gpt/boot0  ONLINE   0 0 0
gpt/boot1  ONLINE   0 0 0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: resilvered 6,56M in 0h0m with 0 errors on Tue Jul 18 18:13:23 2017
config:

NAME   STATE READ WRITE CKSUM
zroot  ONLINE   0 0 0
  mirror-0 ONLINE   0 0 0
da0p4.eli  ONLINE   0 0 0
da1p4.eli  ONLINE   0 0 0

errors: No known data errors


(the pool has been resilvered because I boot once, but put a wrong passphrase 
in geli for one of the two drives, so it booted with only one disk)

What should I do now ? launch a zfs scrub ? I’m a bit afraid of making it panic 
again. Should I consider that I got unlucky once ?
(please don’t tell me to upgrade it : I’m currently trying to install a new 
server, and I will migrate to it very soon).

Thanks.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"