Re: Setting for displaying utf8 characters on all vt consoles results in panic on 14-CURRENT and 13.0-ALPHA3

2021-02-02 Thread Toomas Soome via freebsd-stable


> On 1. Feb 2021, at 05:35, Yasuhiro Kimura  wrote:
> 
> From: Yasuhiro Kimura mailto:y...@utahime.org>>
> Subject: Setting for displaying utf8 characters on all vt consoles results in 
> panic on 14-CURRENT and 13.0-ALPHA3
> Date: Mon, 01 Feb 2021 11:41:35 +0900 (JST)
> 
>> To display utf8 characters on all vt console I did following settings.
>> 
>> 1. Download GNU Unifont BDF file
>>   
>> (http://unifoundry.com/pub/unifont/unifont-13.0.05/font-builds/unifont-13.0.05.bdf.gz)
>> 2. gunzip unifont-13.0.05.bdf.gz
>> 3. vtfontcvt unifont-13.0.05.bdf unifont.fnt
>> 4. cp unifont.fnt /usr/share/vt/fonts
>> 5. Add 'allscreens_flags="-f 8x16 unifont.fnt"' to /etc/rc.conf
>> 6. Add 'hw.vga.textmode=0' to /boot/loader.conf.local
>> 7. shutdown -r now
>> 
>> On 12.2-RELEASE and 11.4-RELEASE it works as is expected. But on
>> 14-CURRENT(man) and 13.0-ALPHA3(stable/13) it result in kernel panic.
>> 
>> Screen shot of 14-CURRENT.
>> https://www.utahime.org/FreeBSD/panic.20210201.14-CURRENT.png
>> 
>> 14-CURRENT(main):
>> yasu@rolling-vm-freebsd1[1006]% uname -a
>> FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD 
>> 14.0-CURRENT #0 main-n244517-f17fc5439f5: Mon Feb  1 10:55:51 JST 2021 
>> ro...@rolling-vm-freebsd1.home.utahime.org:/usr0/freebsd/src/obj/usr0/freebsd/src/git/amd64.amd64/sys/GENERIC
>>   amd64
>> 
>> 13.0-ALPHA3(stable/13):
>> yasu@rolling-vm-freebsd5[1005]% uname -a
>> FreeBSD rolling-vm-freebsd5.home.utahime.org 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 
>> #0 stable/13-c256214-g40cb0344eb2: Mon Feb  1 11:30:28 JST 2021 
>> ro...@rolling-vm-freebsd5.home.utahime.org:/usr0/freebsd/src/obj/usr0/freebsd/src/git/amd64.amd64/sys/GENERIC
>>   amd64
> 
> I submitted this problem to Bugzilla.
> 
> Bug 253147 - Setting for displaying utf8 characters on all vt consoles
> results in panic on 14-CURRENT and 13.0-ALPHA3
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253147 
> <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253147>
> 
> ---
> Yasuhiro Kimura

Should be fixed on current now.

thanks,
toomas


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Setting for displaying utf8 characters on all vt consoles results in panic on 14-CURRENT and 13.0-ALPHA3

2021-02-01 Thread Yasuhiro Kimura
From: Toomas Soome via freebsd-current 
Subject: Re: Setting for displaying utf8 characters on all vt consoles results 
in panic on 14-CURRENT and 13.0-ALPHA3
Date: Tue, 2 Feb 2021 00:35:49 +0200

> Should be fixed on current now.

Confirmed. Would you please MFC to stable/13?

Best Regards.

---
Yasuhiro Kimura
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Setting for displaying utf8 characters on all vt consoles results in panic on 14-CURRENT and 13.0-ALPHA3

2021-01-31 Thread Yasuhiro Kimura
From: Yasuhiro Kimura 
Subject: Setting for displaying utf8 characters on all vt consoles results in 
panic on 14-CURRENT and 13.0-ALPHA3
Date: Mon, 01 Feb 2021 11:41:35 +0900 (JST)

> To display utf8 characters on all vt console I did following settings.
> 
> 1. Download GNU Unifont BDF file
>
> (http://unifoundry.com/pub/unifont/unifont-13.0.05/font-builds/unifont-13.0.05.bdf.gz)
> 2. gunzip unifont-13.0.05.bdf.gz
> 3. vtfontcvt unifont-13.0.05.bdf unifont.fnt
> 4. cp unifont.fnt /usr/share/vt/fonts
> 5. Add 'allscreens_flags="-f 8x16 unifont.fnt"' to /etc/rc.conf
> 6. Add 'hw.vga.textmode=0' to /boot/loader.conf.local
> 7. shutdown -r now
> 
> On 12.2-RELEASE and 11.4-RELEASE it works as is expected. But on
> 14-CURRENT(man) and 13.0-ALPHA3(stable/13) it result in kernel panic.
> 
> Screen shot of 14-CURRENT.
> https://www.utahime.org/FreeBSD/panic.20210201.14-CURRENT.png
> 
> 14-CURRENT(main):
> yasu@rolling-vm-freebsd1[1006]% uname -a
> FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD 
> 14.0-CURRENT #0 main-n244517-f17fc5439f5: Mon Feb  1 10:55:51 JST 2021 
> ro...@rolling-vm-freebsd1.home.utahime.org:/usr0/freebsd/src/obj/usr0/freebsd/src/git/amd64.amd64/sys/GENERIC
>   amd64
> 
> 13.0-ALPHA3(stable/13):
> yasu@rolling-vm-freebsd5[1005]% uname -a
> FreeBSD rolling-vm-freebsd5.home.utahime.org 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 
> #0 stable/13-c256214-g40cb0344eb2: Mon Feb  1 11:30:28 JST 2021 
> ro...@rolling-vm-freebsd5.home.utahime.org:/usr0/freebsd/src/obj/usr0/freebsd/src/git/amd64.amd64/sys/GENERIC
>   amd64

I submitted this problem to Bugzilla.

Bug 253147 - Setting for displaying utf8 characters on all vt consoles
results in panic on 14-CURRENT and 13.0-ALPHA3
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253147

---
Yasuhiro Kimura
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Setting for displaying utf8 characters on all vt consoles results in panic on 14-CURRENT and 13.0-ALPHA3

2021-01-31 Thread Yasuhiro Kimura
To display utf8 characters on all vt console I did following settings.

1. Download GNU Unifont BDF file
   
(http://unifoundry.com/pub/unifont/unifont-13.0.05/font-builds/unifont-13.0.05.bdf.gz)
2. gunzip unifont-13.0.05.bdf.gz
3. vtfontcvt unifont-13.0.05.bdf unifont.fnt
4. cp unifont.fnt /usr/share/vt/fonts
5. Add 'allscreens_flags="-f 8x16 unifont.fnt"' to /etc/rc.conf
6. Add 'hw.vga.textmode=0' to /boot/loader.conf.local
7. shutdown -r now

On 12.2-RELEASE and 11.4-RELEASE it works as is expected. But on
14-CURRENT(man) and 13.0-ALPHA3(stable/13) it result in kernel panic.

Screen shot of 14-CURRENT.
https://www.utahime.org/FreeBSD/panic.20210201.14-CURRENT.png

14-CURRENT(main):
yasu@rolling-vm-freebsd1[1006]% uname -a
FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD 14.0-CURRENT 
#0 main-n244517-f17fc5439f5: Mon Feb  1 10:55:51 JST 2021 
ro...@rolling-vm-freebsd1.home.utahime.org:/usr0/freebsd/src/obj/usr0/freebsd/src/git/amd64.amd64/sys/GENERIC
  amd64

13.0-ALPHA3(stable/13):
yasu@rolling-vm-freebsd5[1005]% uname -a
FreeBSD rolling-vm-freebsd5.home.utahime.org 13.0-ALPHA3 FreeBSD 13.0-ALPHA3 #0 
stable/13-c256214-g40cb0344eb2: Mon Feb  1 11:30:28 JST 2021 
ro...@rolling-vm-freebsd5.home.utahime.org:/usr0/freebsd/src/obj/usr0/freebsd/src/git/amd64.amd64/sys/GENERIC
  amd64

---
Yasuhiro Kimura
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:09 AM, mike tancsa wrote:
> On 12/22/2020 10:07 AM, Mark Johnston wrote:
>> Could you go to frame 11 and print zone->uz_name and
>> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
>> somehow.
> Thank you for looking!
>
> (kgdb) frame 11
>
> #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
> bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
> 758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
> bucket->ub_cnt);
> (kgdb) p zone->uz_name
> $1 = 0x8102118a "mbuf_jumbo_9k"
> (kgdb) p bucket->ub_bucket[18]
> $2 = (void *) 0xf80de4654000
> (kgdb) p bucket->ub_bucket   
> $3 = 0xf801c7fd5218
>
> (kgdb)
>
Not sure if its coincidence or not, but previously I was running with
arc being limited to ~30G of the 64G of RAM on the box.  I removed that
limit a few weeks ago after upgrading the box to RELENG_12 to pull in
the OpenSSL changes.  The panic seems to happen under disk load. I have
3 zfs pools that are pretty busy receiving snapshots. One day a week, we
write a full set to a 4th zfs pool off some geli attached drives via USB
for offsite cold storage.  The crashes happened with that extra level of
disk work.  gstat shows most of the 12 drives off 2 mrsas controllers at
or close to 100% busy during the 18hrs it takes to dump out the files.

Trying a new cold storage run now with the arc limit back to
vfs.zfs.arc_max=29334498304

    ---Mike



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:07 AM, Mark Johnston wrote:
>
> Could you go to frame 11 and print zone->uz_name and
> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
> somehow.

Thank you for looking!

(kgdb) frame 11

#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
bucket->ub_cnt);
(kgdb) p zone->uz_name
$1 = 0x8102118a "mbuf_jumbo_9k"
(kgdb) p bucket->ub_bucket[18]
$2 = (void *) 0xf80de4654000
(kgdb) p bucket->ub_bucket   
$3 = 0xf801c7fd5218

(kgdb)

    ---Mike

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread Mark Johnston
On Tue, Dec 22, 2020 at 09:05:01AM -0500, mike tancsa wrote:
> Hmmm, another one. Not sure if this is hardware as it seems different ?
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 11; apic id = 0b
> fault virtual address   = 0x0
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x80ca0826
> stack pointer   = 0x28:0xfe00bc0f8540
> frame pointer   = 0x28:0xfe00bc0f8590
> code segment    = base 0x0, limit 0xf, type 0x1b
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process = 33 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 11
> time = 1608641071
> KDB: stack backtrace:
> #0 0x80a3fe85 at kdb_backtrace+0x65
> #1 0x809f406b at vpanic+0x17b
> #2 0x809f3ee3 at panic+0x43
> #3 0x80e3fe71 at trap_fatal+0x391
> #4 0x80e3fecf at trap_pfault+0x4f
> #5 0x80e3f516 at trap+0x286
> #6 0x80e19318 at calltrap+0x8
> #7 0x80ca47d4 at bucket_cache_drain+0x134
> #8 0x80c9e302 at zone_drain_wait+0xa2
> #9 0x80ca2bbd at uma_reclaim_locked+0x6d
> #10 0x80ca2af4 at uma_reclaim+0x34
> #11 0x80cc5321 at vm_pageout_worker+0x421
> #12 0x80cc4ee3 at vm_pageout+0x193
> #13 0x809b55be at fork_exit+0x7e
> #14 0x80e1a34e at fork_trampoline+0xe
> Uptime: 5d20h37m16s
> Dumping 16057 out of 65398
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) bt

Could you go to frame 11 and print zone->uz_name and
bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
somehow.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
Hmmm, another one. Not sure if this is hardware as it seems different ?



Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address   = 0x0
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80ca0826
stack pointer   = 0x28:0xfe00bc0f8540
frame pointer   = 0x28:0xfe00bc0f8590
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 33 (dom0)
trap number = 12
panic: page fault
cpuid = 11
time = 1608641071
KDB: stack backtrace:
#0 0x80a3fe85 at kdb_backtrace+0x65
#1 0x809f406b at vpanic+0x17b
#2 0x809f3ee3 at panic+0x43
#3 0x80e3fe71 at trap_fatal+0x391
#4 0x80e3fecf at trap_pfault+0x4f
#5 0x80e3f516 at trap+0x286
#6 0x80e19318 at calltrap+0x8
#7 0x80ca47d4 at bucket_cache_drain+0x134
#8 0x80c9e302 at zone_drain_wait+0xa2
#9 0x80ca2bbd at uma_reclaim_locked+0x6d
#10 0x80ca2af4 at uma_reclaim+0x34
#11 0x80cc5321 at vm_pageout_worker+0x421
#12 0x80cc4ee3 at vm_pageout+0x193
#13 0x809b55be at fork_exit+0x7e
#14 0x80e1a34e at fork_trampoline+0xe
Uptime: 5d20h37m16s
Dumping 16057 out of 65398
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x80ca0826 in slab_free_item (keg=0xf800037fa380,
slab=0xf80de4656fb0, item=) at
/usr/src/sys/vm/uma_core.c:3357
#10 zone_release (zone=, bucket=0xf801c7fd5218,
cnt=) at /usr/src/sys/vm/uma_core.c:3404
#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
#12 bucket_cache_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:915
#13 0x80c9e302 in zone_drain_wait (zone=0xf800037da000,
waitok=1) at /usr/src/sys/vm/uma_core.c:1037
#14 0x80ca2bbd in zone_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:1056
#15 zone_foreach (zfunc=) at /usr/src/sys/vm/uma_core.c:1985
#16 uma_reclaim_locked (kmem_danger=) at
/usr/src/sys/vm/uma_core.c:3737
#17 0x80ca2af4 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:3757
#18 0x80cc5321 in vm_pageout_lowmem () at
/usr/src/sys/vm/vm_pageout.c:1890
#19 vm_pageout_worker (arg=) at
/usr/src/sys/vm/vm_pageout.c:1966
#20 0x80cc4ee3 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:2126
#21 0x809b55be in fork_exit (callout=0x80cc4d50
, arg=0x0, frame=0xfe00bc0f8b00) at
/usr/src/sys/kern/kern_fork.c:1080
#22 
(kgdb) bt full
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    td = 
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
    error = 
    coredump = 
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
    once = 
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
    buf = "page fault", '\000' 
    other_cpus = {__bits = {2047, 0, 0, 0}}
    td = 0xf80004964740
    newpanic = 
    bootopt = 
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
    ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0xfe00bc0f82c0, reg_save_area = 0xfe00bc0f8260}}
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
    softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27,
ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1}
    code = 
    type = 
    ss = 40
    handled = 
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
    td = 0xf80004964740
    p = 
    eva = 0
    map = 
    ftype = 
    rv = 
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/am

zfs panic RELENG_12

2020-12-15 Thread mike tancsa
Was doing a backup via zfs send | zfs recv when the box panic'd.  Its a
not so old RELENG_12 box from last week. Any ideas if this is a hardware
issue or a bug ? Its r368493 from last Wednesday. I dont see an ECC
errors logged, so dont think its hardware.

Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x0
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x823a554b
stack pointer   = 0x28:0xfe0343231000
frame pointer   = 0x28:0xfe03432310c0
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 87427 (zfs)
trap number = 12
panic: page fault
cpuid = 1
time = 1608065221
KDB: stack backtrace:
#0 0x80a3fa05 at kdb_backtrace+0x65
#1 0x809f3beb at vpanic+0x17b
#2 0x809f3a63 at panic+0x43
#3 0x80e400d1 at trap_fatal+0x391
#4 0x80e4012f at trap_pfault+0x4f
#5 0x80e3f776 at trap+0x286
#6 0x80e19568 at calltrap+0x8
#7 0x82393a5e at dmu_object_info+0x1e
#8 0x823983a5 at dmu_recv_stream+0x7b5
#9 0x8244b706 at zfs_ioc_recv+0xac6
#10 0x8244dd3d at zfsdev_ioctl+0x62d
#11 0x808a35e0 at devfs_ioctl+0xb0
#12 0x80f3becb at VOP_IOCTL_APV+0x7b
#13 0x80ad1b0a at vn_ioctl+0x16a
#14 0x808a3bce at devfs_ioctl_f+0x1e
#15 0x80a5d807 at kern_ioctl+0x2b7
#16 0x80a5d4aa at sys_ioctl+0xfa
#17 0x80e40c87 at amd64_syscall+0x387
Uptime: 3d14h59m52s
Dumping 17213 out of 65366
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=)
    at /usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3805 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f3c43 in vpanic (fmt=, ap=)
    at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3a63 in panic (fmt=)
    at /usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e400d1 in trap_fatal (frame=0xfe0343230f40, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e4012f in trap_pfault (frame=0xfe0343230f40,
    usermode=, signo=, ucode=)
    at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f776 in trap (frame=0xfe0343230f40)
    at /usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x823a554b in dnode_hold_impl (os=0xf805e1d2b800,
    object=, flag=, slots=,
    tag=, dnp=0xfe03432310d8)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:1370
#10 0x82393a5e in dmu_object_info (os=0xf80777890070,
    object=18446744071600721588, doi=0xfe03432312e0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:2615
#11 0x823983a5 in receive_read_record (ra=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:2821
#12 dmu_recv_stream (drc=0xfe0343231430, fp=,
    voffp=, cleanup_fd=8, action_handlep=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:3203
#13 0x8244b706 in zfs_ioc_recv (zc=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4745
#14 0x8244dd3d in zfsdev_ioctl (dev=,
    zcmd=, arg=, flag=,
    td=)
    at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6956
#15 0x808a35e0 in devfs_ioctl (ap=0xfe0343231778)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:797
#16 0x80f3becb in VOP_IOCTL_APV (
    vop=0x816a2fe0 , a=0xfe0343231778)
    at vnode_if.c:1067
#17 0x80ad1b0a in vn_ioctl (fp=0xf8001802b5a0,
    com=, data=0xfe0343231910,
    active_cred=0xf80032214300, td=0x2070)
    at /usr/src/sys/kern/vfs_vnops.c:1508
#18 0x808a3bce in devfs_ioctl_f (fp=0xf80777890070,
    com=18446744071600721588, data=0x824e34ed <.L.str+1>, cred=0x0,
    td=0xf8029885) at /usr/src/sys/fs/devfs/devfs_vnops.c:755
#19 0x80a5d807 in fo_ioctl (fp=0xf8001802b5a0, com=3222821403,
    data=0x824e34ed <.L.str+1>, active_cred=0x0,
    td=0xf8029885) at /usr/src/sys/sys/file.h:337
#20 kern_ioctl (td=0x2070, fd=, com=3222821403,
    data=0x824e34ed <.L.str+1> "zrl->zr_mtx")
    at /usr/src/sys/kern/sys_generic.c:805
#21 0x80a5d4aa in sys_ioctl (td=0xf8029885,
    uap=0xf802988503c0) at /usr/src/sys/kern/sys_generic.c:713
#22 0x80e40c87 in sy

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Peter
On Wed, Dec 09, 2020 at 02:00:37PM +1100, Dewayne Geraghty wrote:

! On a jail with config:
! exec.start = "/bin/sh -x /etc/rc";
! exec.stop = "/bin/sh /etc/rc.shutdown";
! exec.clean;
! 
! test_prod  { jid=7; persist; ip4.addr =
! "10.0.7.96,10.0.5.96,127.0.5.96"; devfs_ruleset = "6";
! host.hostuuid=---0001-0302; host.hostid=000302; }
! 
! I successfully performed
! for i in `seq 10`; do jail -vc test_prod; sleep 3; jail -vr test_prod; done

But, this is not a VIMAGE jail, is it?
Old-style jails are unaffected by this issue. Only VIMAGE jails, using
epair or netgraph, might be affected. (In that case, you would not
have an "ip4.addr" configured, and rather a "vnet.interface".)

! I think the normal use of jail.conf is to NOT explicitly use a jid in
! the definition, which may be why this may not have been picked up?
! (Maybe a clue).

This is an interesting point. When you stop a jail, it may stay for
a more or less long time in a "dying" state (visible with "jls -d"),
keeping the jid occupied. During that time, the jail cannot be
restarted with that same jid.
Once ago, I read people complaining about this, and the advice was to
just not define the jid in the definition, so that the jail can be
restarted immediately (and will probably grab another jid).

I did not find a solid explanation for what is happening in that
"dying" state (and why it does take more or less long), even less
an approach to fix that. I found some theories circling the net, but
these don't really figure. So I would need to look into the source
myself - and I did postpone that indefinitely. ;)

But what I found out, with the VIMAGE jails (those that can carry
their own network interfaces), when you make a slight mistake with
managing and handling the interfaces, then the jail will stay in the
dying state forever. If you don't make a mistake, then it will finally
die within some time.
So I decided to keep the jid, so that rightaway nothing is allowed to
linger from misconfigured unnoticed. (The tradeoff is obviousely that
one might have to wait before restarting.)

cheerio,
PMc

P.S. 41 celsius is phantastic! I envy You! :)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Kristof Provost

Peter,

I’m not interested in discussing software development methodology 
here.


Please drop me from this thread. Let me know if/when you have a test 
case I can work from.


Regards,
Kristof

On 9 Dec 2020, at 11:54, Peter wrote:


On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote:

! You seem to have misinterpreted this; he doesn't want to narrow it
! down to one bug, he wants simple steps that he can follow to 
reproduce


Maybe I did misinterpret, but then I don't really understand it.
I would suppose, when testing a proposed fix, the fact that it
does break under the exact same conditions as before, is all the
information needed at that point. Put in simple words: that it does
not work.

! any failure, preferably steps that can actually be followed by just
! about anyone and don't require immense amounts of setup time or
! additional hardware.

Engineering does not normally work that way.

I'll try to explain: when a bug is first encountered, it is necessary
to isolate it insofar that somebody who is knowledgeable of the code,
can actually reproduce it, in order to have a look at it and analyze
what causes the mis-happening.

If then a remedy is devised, and that does not work as expected, then
the flaw is in the analysis, and we just start over from there.

In fact, I would have expected somebody who is trying to fix such
kind of bug, to already have testing tools available and tell me
exactly which kind of data I might retrieve from the dumps.

The open question now is: am I the only one seeing these failures?
Might they be attributed to a faulty configuration or maybe hardware
issues or whatever?
We cannot know this, we can only watch out what happens at other
sites. And that is why I sent out all these backtraces - because they
appear weird and might be difficult to associate with this issue.

I don't think there is much more we can do at this point, unless we
were willing to actually look into the details.


Am I discouraging? Indeed, I think, engineering is discouraging by
it's very nature, and that's the fun of it: to overcome odds and
finally maybe make things better. And when we start to forget about
that, bad things begin to happen (anybody remember Apollo 13?).

But talking about disencouragement: I usually try to track down
defects I encounter, and, if possible, do a viable root-cause
analysis. I tended to be very willing to share the outcomes and. if
a solution arises, by all means make that get back into the code base;
but I found that even ready made patches for easy matters would
linger forever in the sendbug system without anybody caring, or, in
more complex cases where I would need some feedback from the original
writer, if only to clarify the purpose of some defaults or verify
than an approach is viable, that communication is very difficult to
establish. And that is what I would call disencouraging, and I for
my part have accepted to just leave the developers in their ivory
tower and tend to my own business.


cheerio,
PMc

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-09 Thread Peter
On Tue, Dec 08, 2020 at 07:51:07PM -0600, Kyle Evans wrote:
 
! You seem to have misinterpreted this; he doesn't want to narrow it
! down to one bug, he wants simple steps that he can follow to reproduce

Maybe I did misinterpret, but then I don't really understand it.
I would suppose, when testing a proposed fix, the fact that it
does break under the exact same conditions as before, is all the
information needed at that point. Put in simple words: that it does
not work.

! any failure, preferably steps that can actually be followed by just
! about anyone and don't require immense amounts of setup time or
! additional hardware.

Engineering does not normally work that way. 

I'll try to explain: when a bug is first encountered, it is necessary
to isolate it insofar that somebody who is knowledgeable of the code,
can actually reproduce it, in order to have a look at it and analyze
what causes the mis-happening.

If then a remedy is devised, and that does not work as expected, then
the flaw is in the analysis, and we just start over from there.

In fact, I would have expected somebody who is trying to fix such
kind of bug, to already have testing tools available and tell me
exactly which kind of data I might retrieve from the dumps.

The open question now is: am I the only one seeing these failures?
Might they be attributed to a faulty configuration or maybe hardware
issues or whatever?
We cannot know this, we can only watch out what happens at other
sites. And that is why I sent out all these backtraces - because they
appear weird and might be difficult to associate with this issue.

I don't think there is much more we can do at this point, unless we
were willing to actually look into the details.


Am I discouraging? Indeed, I think, engineering is discouraging by
it's very nature, and that's the fun of it: to overcome odds and
finally maybe make things better. And when we start to forget about
that, bad things begin to happen (anybody remember Apollo 13?). 

But talking about disencouragement: I usually try to track down
defects I encounter, and, if possible, do a viable root-cause
analysis. I tended to be very willing to share the outcomes and. if
a solution arises, by all means make that get back into the code base;
but I found that even ready made patches for easy matters would
linger forever in the sendbug system without anybody caring, or, in
more complex cases where I would need some feedback from the original
writer, if only to clarify the purpose of some defaults or verify
than an approach is viable, that communication is very difficult to
establish. And that is what I would call disencouraging, and I for
my part have accepted to just leave the developers in their ivory
tower and tend to my own business.


cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 9 Dec 2020, at 2:31, Peter wrote:

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! >
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop


exec.poststop = "
   sleep 6 ;
   /usr/sbin/ngctl shutdown ${ifname1l}: ;
   ";


helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can 
trigger

! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

No, we need a simple case to reproduce these problems. It’s fine if 
that test case triggers multiple issues.



Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=BEo2g-w545A


Happily we’re not in space.



! I’m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
From Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.





(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.

These issues should trigger just fine in VMs. There’s no need for 
hardware pain.


Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kyle Evans
On Tue, Dec 8, 2020 at 7:45 PM Peter  wrote:
>
>
> On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:
> > Can you reduce your netgraph use case to a small test case that can trigger
> ? the problem?
>
> I'm sorry, I fear I don't get Your point.
> Assumed there are actually two or three bugs here, You are asking me
> to reduce config so that it will trigger only one of them? Is that
> correct?
>
> Then let me put this different: assuming this is the OS for the life
> support system of the manned Jupiter mission. Then, which one of the
> bugs do You want to get fixed, and which would You prefer to keep and
> make Your oxygen supply cut off?
>
> https://www.youtube.com/watch?v=BEo2g-w545A

You seem to have misinterpreted this; he doesn't want to narrow it
down to one bug, he wants simple steps that he can follow to reproduce
any failure, preferably steps that can actually be followed by just
about anyone and don't require immense amounts of setup time or
additional hardware.

Unfortunately, your tone following the misunderstanding was pretty discouraging.

Thanks,

Kyle Evans
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter

On Tue, Dec 08, 2020 at 08:02:47PM +0100, Kristof Provost wrote:

! > Sorry for the bad news.
! > 
! You appear to be triggering two or three different bugs there.

That is possible. Then there are two or three different bugs in the
production code.

In any case, my current workaround, i.e. delaying in the exec.poststop

> exec.poststop = "
>sleep 6 ;
>/usr/sbin/ngctl shutdown ${ifname1l}: ;
>";

helps for it all and makes the system behave solid. This is true
with and without Your patch.

! Can you reduce your netgraph use case to a small test case that can trigger
! the problem?

I'm sorry, I fear I don't get Your point.
Assumed there are actually two or three bugs here, You are asking me
to reduce config so that it will trigger only one of them? Is that
correct?

Then let me put this different: assuming this is the OS for the life
support system of the manned Jupiter mission. Then, which one of the
bugs do You want to get fixed, and which would You prefer to keep and
make Your oxygen supply cut off?

https://www.youtube.com/watch?v=BEo2g-w545A

! I’m not likely to be able to do anything unless I can reproduce
! the problem(s).

I understand that.
From Your former mail I get the impression that you prefer to rely
on tests. I consider this a bad habit[1] and prefer logical thinking.

So lets try that:
We know that there is a problem with taking down an interface from a
VIMAGE, in the way it is done by "jail -r". We know this problem can
be solidly workarounded by delaying the interface takedown for a short
time.

Now with Your patch, we do not get the typical crash at interface
takedown. Instead, all of a sudden, there are strange crashes from
various other places. And, interestingly, we get these also when
STARTING a jail.

I think this is not an additional problem, it is instead a valuable
information (albeit not the one You might like to get).

Furthermore, we get these new crashes always invoked by "ifconfig",
and they seem to have in common that somebody tries to obtain
information about some interface configuration and receives some
bogus. I might conclude, just out of the belly without looking into
details, that either
 - your patch achieves to garble some internal interface data,
   instead of what it is intended to do, or
 - the original problem manages to garble internal interface data
   (leading to the usual crash), and Your patch does not achieve to
   solve this, but only protects from the immediate consequence.

It might also be worth consideration, that, while the problem may be
more easy to reproduce with epair, this effect may or may not be a
netgraph specific one[2].

Now lets keep in mind that a successful test means EXACTLY NOTHING.
By which other means can we confirm that Your patch fully achieves
what it is intended for? (E.g. something like dumping and verifying
the respective internal tables in-vivo)

(Background: It is not that I would be unwilling to create clean and
precisely reproducible scenarious, But, one of my problems is
currently, I only have two machines availabe: the graphical one where
I'm just typing, and the backend server with the jails that does
practically everything.
Therefore, experimenting on any of them creates considerable pain.
I'm working on that issue, trying to get a real server board for the
backend so to get the current one free for testing - but what I would
like to use, e.g. ASUS Z10PE+cores+regECC, is not something one would
easily find on yardsales - and seldom for an acceptable price.)


cheerio,
PMc

[1] Rationale: a failing test tells us that either the test or the
application has a bug (50/50 chance). A succeeding test tells us
that 1 equals 1, which we knew already before.
In fact, tests tell us *nothing at all* about the state of our
code, and specifically, 'successful' outcomes do NOT mean that
things are all correct.
The only true usefulness of tests is to protect against
re-introducing a fault that was already fixed before,
i.e. regressions.

[2] My netgraph configuration consists of bringing up some bridges
and then attaching the jails to them.

Here is the bridge starter (only respective component,
there are more of these populated, but probably not influencing
the issue):

#! /bin/sh

# PROVIDE: netgraphs
# REQUIRE: netwait
# BEFORE: NETWORKING

. /etc/rc.subr

name="netgraphs"
start_cmd="${name}_start"
stop_cmd="${name}_stop"

load_rc_config $name

netgraphs_graphs="svc"

netgraphs_svc_if1_name="nge_svc_1u"
netgraphs_svc_if1_mac="00:1d:92:01:02:01"
netgraphs_svc_if1_addr="***.***.***.***/29"

netgraphs_svc_start()
{
local _ifname
if ngctl info svcswitch: > /dev/null 2>&1; then
netgraphs_svc_stop
fi

echo "Creating SVC Switch"
ngctl -f - < /dev/null 2>&1; then
$_cmd
else
echo "netgraphs-start: object $i not found" >&2
fi
done
}

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter
Here is the next funny crashdump - I obtained this one twice
and also the sysctl_rtsock() again.

I can reproduce this by just starting and stopping a most simple jail
that does only
exec.start = "/bin/sleep 4 &";
(And as usual, when I let it time out, nothing bad happens.)


Fatal trap 9: general protection fault while in kernel mode
cpuid = 1; apic id = 02
instruction pointer = 0x20:0x80a2ac45
stack pointer   = 0x28:0xfe0047cf2890
frame pointer   = 0x28:0xfe0047cf2890
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13557 (ifconfig)
trap number     = 9
panic: general protection fault
cpuid = 1
time = 1607469295
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0047cf25a0
vpanic() at vpanic+0x17b/frame 0xfffffe0047cf25f0
panic() at panic+0x43/frame 0xfe0047cf2650
trap_fatal() at trap_fatal+0x391/frame 0xfe0047cf26b0
trap() at trap+0x67/frame 0xfe0047cf27c0
calltrap() at calltrap+0x8/frame 0xfe0047cf27c0
--- trap 0x9, rip = 0x80a2ac45, rsp = 0xfe0047cf2890, rbp = 
0xfe0047cf2890 ---
strncmp() at strncmp+0x15/frame 0xfe0047cf2890
ifunit_ref() at ifunit_ref+0x59/frame 0xfe0047cf28d0
ifioctl() at ifioctl+0x427/frame 0xfe0047cf2990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe0047cf29f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe0047cf2ac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe0047cf2bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0047cf2bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe3b8, rbp = 0x7fffe450 ---
Uptime: 8m54s
Dumping 880 out of 3959 MB:
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 8 Dec 2020, at 19:49, Peter wrote:

On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote:
! Yeah, the bug is not exclusive to epair but that’s where it’s 
most easily

! seen.

Ack.

! Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch


Great, thanks a lot.

Now I have bad news: when playing yoyo with the next-best three
application  jails (with all their installed stuff) it took about
ten up and down's then I got this one:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad73c
stack pointer   = 0x28:0xfe003f80e810
frame pointer   = 0x28:0xfe003f80e810
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15486 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607450838
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe003f80e4d0

vpanic() at vpanic+0x17b/frame 0xfe003f80e520
panic() at panic+0x43/frame 0xfe003f80e580
trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0
trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630
trap() at trap+0x4cf/frame 0xfe003f80e740
calltrap() at calltrap+0x8/frame 0xfe003f80e740
--- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp 
= 0xfe003f80e810 ---
ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 
0xfe003f80e810

ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850
ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0
ifioctl() at ifioctl+0x448/frame 0xfe003f80e990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfe003f80ebf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe358, rbp = 0x7fffe450 ---

Uptime: 9m51s
Dumping 899 out of 3959 MB:

I decided to give it a second try, and this is what I did:

root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 8  kerb.***.org  /j/kerb
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail stop rail
Stopping jails: rail.
root@edge:/var/crash # service jail stop tele
Stopping jails: tele.
root@edge:/var/crash # service jail stop kerb
Stopping jails: kerb.
root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
root@edge:/var/crash # jls -d
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail start kerb
Starting jails:Fssh_packet_write_wait: Connection to 1*** port 
22: Broken pipe


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not 
present

instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00540ea658
frame pointer   = 0x28:0xfe00540ea670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13420 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 160745191

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Peter
On Tue, Dec 08, 2020 at 04:50:00PM +0100, Kristof Provost wrote:
! Yeah, the bug is not exclusive to epair but that’s where it’s most easily
! seen.

Ack.

! Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch

Great, thanks a lot.

Now I have bad news: when playing yoyo with the next-best three
application  jails (with all their installed stuff) it took about
ten up and down's then I got this one:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x10
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80aad73c
stack pointer   = 0x28:0xfe003f80e810
frame pointer   = 0x28:0xfe003f80e810
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15486 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607450838
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe003f80e4d0
vpanic() at vpanic+0x17b/frame 0xfe003f80e520
panic() at panic+0x43/frame 0xfe003f80e580
trap_fatal() at trap_fatal+0x391/frame 0xfe003f80e5e0
trap_pfault() at trap_pfault+0x4f/frame 0xfe003f80e630
trap() at trap+0x4cf/frame 0xfe003f80e740
calltrap() at calltrap+0x8/frame 0xfe003f80e740
--- trap 0xc, rip = 0x80aad73c, rsp = 0xfe003f80e810, rbp = 
0xfe003f80e810 ---
ng_eiface_mediastatus() at ng_eiface_mediastatus+0xc/frame 0xfe003f80e810
ifmedia_ioctl() at ifmedia_ioctl+0x174/frame 0xfe003f80e850
ifhwioctl() at ifhwioctl+0x639/frame 0xfe003f80e8d0
ifioctl() at ifioctl+0x448/frame 0xfe003f80e990
kern_ioctl() at kern_ioctl+0x275/frame 0xfe003f80e9f0
sys_ioctl() at sys_ioctl+0x101/frame 0xfe003f80eac0
amd64_syscall() at amd64_syscall+0x380/frame 0xfe003f80ebf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe003f80ebf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800475b2a, rsp = 
0x7fffe358, rbp = 0x7fffe450 ---
Uptime: 9m51s
Dumping 899 out of 3959 MB:

I decided to give it a second try, and this is what I did:

root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 8  kerb.***.org  /j/kerb
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail stop rail
Stopping jails: rail.
root@edge:/var/crash # service jail stop tele
Stopping jails: tele.
root@edge:/var/crash # service jail stop kerb
Stopping jails: kerb.
root@edge:/var/crash # jls
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
root@edge:/var/crash # jls -d
   JID  IP Address  Hostname  Path
 1  1***gate.***.org  /j/gate
 3  1***raix.***.org  /j/raix
 4  oper.***.org  /j/oper
 5  admn.***.org  /j/admn
 6  data.***.org  /j/data
 7  conn.***.org  /j/conn
 9  tele.***.org  /j/tele
10  rail.***.org  /j/rail
root@edge:/var/crash # service jail start kerb
Starting jails:Fssh_packet_write_wait: Connection to 1*** port 22: 
Broken pipe

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x0
fault code  = supervisor read instruction, page not present
instruction pointer = 0x20:0x0
stack pointer   = 0x28:0xfe00540ea658
frame pointer   = 0x28:0xfe00540ea670
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 13420 (ifconfig)
trap number = 12
panic: page fault
cpuid = 1
time = 1607451910
KDB: stack backtrace:
db_trace_self_wrapper

Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-08 Thread Kristof Provost

On 8 Dec 2020, at 0:34, Peter wrote:

Hi Kristof,
  it's great to read You!

On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote:

! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 
244703,

! 250870.

epair? No. It is purely Netgrh here.

Yeah, the bug is not exclusive to epair but that’s where it’s most 
easily seen.


! I pushed a fix for that in CURRENT in r368237. It’s scheduled to 
go into
! stable/12 sometime next week, but it’d be good to know that it 
fixes your

! problem too before I merge it.
! In other words: can you test a recent CURRENT? It’s likely fixed 
there, and

! if it’s not I may be able to fix it quickly.


Oh my Gods. No offense meant, but this is not really a good time
for that. This is the most horrible upgrade I experienced in 25 years
FreeBSD (and it was prepared, 12.2 did run fine on the other machine).

I have issue with mem config
https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/
I have issue with damaged filesystem, for no apparent reason
https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/

Then I have this issue here which is now gladly workarounded
https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365

and when I then dare to have a look at my applications, they look like
sheer horror, segfaults all over, and I don't even know where to begin
with these.


Other option: can you make this fix so that I can patch it into 12.2
source and just redeploy?

Try 
http://people.freebsd.org/~kp/0001-if-Fix-panic-when-destroying-vnet-and-epair-simultan.patch


That’s currently running the regression tests that used to provoke the 
panic nearly instantly, and no panics so far.


Best regards.
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-07 Thread Peter

Hi Kristof,
  it's great to read You!
  
On Mon, Dec 07, 2020 at 09:11:32PM +0100, Kristof Provost wrote:

! That smells a lot like the epair/vnet issues in bugs 238870, 234985, 244703,
! 250870.

epair? No. It is purely Netgraph here.

! I pushed a fix for that in CURRENT in r368237. It’s scheduled to go into
! stable/12 sometime next week, but it’d be good to know that it fixes your
! problem too before I merge it.
! In other words: can you test a recent CURRENT? It’s likely fixed there, and
! if it’s not I may be able to fix it quickly.


Oh my Gods. No offense meant, but this is not really a good time
for that. This is the most horrible upgrade I experienced in 25 years
FreeBSD (and it was prepared, 12.2 did run fine on the other machine).

I have issue with mem config
https://forums.freebsd.org/threads/fun-with-upgrading-sysctl-unknown-oid-vm-pageout_wakeup_thresh.77955/
I have issue with damaged filesystem, for no apparent reason
https://forums.freebsd.org/threads/no-longer-fun-with-upgrading-file-offline.77959/

Then I have this issue here which is now gladly workarounded
https://forums.freebsd.org/threads/panic-12-2-does-not-work-with-jails.77962/post-486365

and when I then dare to have a look at my applications, they look like
sheer horror, segfaults all over, and I don't even know where to begin
with these.


Other option: can you make this fix so that I can patch it into 12.2
source and just redeploy?

I tried to apply the changes from r368237 into my 12.2 source, that
seemed to be quite obvious, but it doesn't work; jails fail to remove
entirely:

# service jail stop rail
Stopping jails: rail.
# jexec rail
jexec: jail "rail" not found

-> it works once.

# service jail start rail
Starting jails: rail.
# service jail stop rail
Stopping jails: rail.
# jexec rail
root@rail:/ # ps ax
ps: empty file: Invalid argument

-> And here it doesn't work anymore, and leaves a skull of a jail
   one cannot get rid of.


Cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic: 12.2 fails to use VIMAGE jails

2020-12-07 Thread Kristof Provost

On 7 Dec 2020, at 13:54, Peter wrote:

After clean upgrade (from source) from 11.4 to 12.2-p1 my jails do
no longer work correctly.

Old-fashioned jails seem to work, but most are VIMAGE+NETGRAPH style,
and do not work properly.
All did work flawlessly for nearly a year with Rel.11.

If I start 2-3 jails, and then stop them again, there is always a
panic.
Also reproducible with GENERIC kernel.

Can this be fixed, or do I need to revert to 11.4?

The backtrace looks like this:

#4 0x810bbadf at trap_pfault+0x4f
#5 0x810bb23f at trap+0x4cf
#6 0x810933f8 at calltrap+0x8
#7 0x80cdd555 at _if_delgroup_locked+0x465
#8 0x80cdbfbe at if_detach_internal+0x24e
#9 0x80ce305c at if_vmove+0x3c
#10 0x80ce3010 at vnet_if_return+0x50
#11 0x80d0e696 at vnet_destroy+0x136
#12 0x80ba781d at prison_deref+0x27d
#13 0x80c3e38a at taskqueue_run_locked+0x14a
#14 0x80c3f799 at taskqueue_thread_loop+0xb9
#15 0x80b9fd52 at fork_exit+0x82
#16 0x8109442e at fork_trampoline+0xe

This is my typical jail config, designed and tested with Rel.11:

That smells a lot like the epair/vnet issues in bugs 238870, 234985, 
244703, 250870.
I pushed a fix for that in CURRENT in r368237. It’s scheduled to go 
into stable/12 sometime next week, but it’d be good to know that it 
fixes your problem too before I merge it.
In other words: can you test a recent CURRENT? It’s likely fixed 
there, and if it’s not I may be able to fix it quickly.


Best regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Analyzing kernel panic from VIMAGE/Netgraph takedown

2020-12-07 Thread Peter


Stopping a VIMAGE+Netgraph jail in 12.2 in the same way as it
did work with Rel. 11.4, crashes the kernel after 2 or 3 start/stop
iterations.

Specifically. this does not work:

  exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:";

Also this new option from Rel.12 does not work either, it just
gives a few more iterations:

  exec.release = "/usr/sbin/ngctl shutdown ${ifname1l}:";

What seems to work is adding a delay:

  exec.poststop = "
  sleep 2 ;
  /usr/sbin/ngctl shutdown ${ifname1l}: ;
  ";

The big question now is: how long should the delay be?

This example did run a test with 100 start/stop iterations. But then,
on a loaded machine stopping a jail that had been running for a few
months, is an entirely different matter: in such a case the jail will
spend hours in "dying" state, while in this test the jid became
instantly free for restart.

In any case, as all this did work flawlessly with Rel. 11.4, there
is now something broken in the code, and should be fixed.

PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Panic: 12.2 fails to use VIMAGE jails

2020-12-07 Thread Peter


After clean upgrade (from source) from 11.4 to 12.2-p1 my jails do
no longer work correctly.

Old-fashioned jails seem to work, but most are VIMAGE+NETGRAPH style,
and do not work properly.
All did work flawlessly for nearly a year with Rel.11.

If I start 2-3 jails, and then stop them again, there is always a
panic.
Also reproducible with GENERIC kernel.

Can this be fixed, or do I need to revert to 11.4?

The backtrace looks like this:

#4 0x810bbadf at trap_pfault+0x4f
#5 0x810bb23f at trap+0x4cf
#6 0x810933f8 at calltrap+0x8
#7 0x80cdd555 at _if_delgroup_locked+0x465
#8 0x80cdbfbe at if_detach_internal+0x24e
#9 0x80ce305c at if_vmove+0x3c
#10 0x80ce3010 at vnet_if_return+0x50
#11 0x80d0e696 at vnet_destroy+0x136
#12 0x80ba781d at prison_deref+0x27d
#13 0x80c3e38a at taskqueue_run_locked+0x14a
#14 0x80c3f799 at taskqueue_thread_loop+0xb9
#15 0x80b9fd52 at fork_exit+0x82
#16 0x8109442e at fork_trampoline+0xe

This is my typical jail config, designed and tested with Rel.11:

rail {
jid = 10;
devfs_ruleset = 11;
host.hostname = "xxx.xxx.xxx.org";
vnet = "new";
sysvshm;
$ifname1l = nge_${name}_1l;
$ifname1l_mac = 00:1d:92:01:01:0a;
vnet.interface = "$ifname1l";
exec.prestart = "
echo -e \"mkpeer eiface crhook ether\nname .:crhook $ifname1l\" \
| /usr/sbin/ngctl -f -
/usr/sbin/ngctl connect ${ifname1l}: svcswitch: ether link2
ifname=`/usr/sbin/ngctl msg ${ifname1l}: getifname | \
awk '$1 == \"Args:\" { print substr($2, 2, length($2)-2)}'`
/sbin/ifconfig \$ifname name $ifname1l
/sbin/ifconfig $ifname1l link $ifname1l_mac
";
exec.poststart = "
/usr/sbin/jexec $name /sbin/sysctl kern.securelevel=3 ;
";
exec.poststop = "/usr/sbin/ngctl shutdown ${ifname1l}:";
}
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


System unresponsive upon panic

2020-09-08 Thread Sean C. Farley
Occasionally, I do something (i.e., attempt to run VirtualBox) that 
provokes a panic on my workstation.  When this happens, the system 
becomes completely unresponsive where not even a shutdown signal from 
pressing the power button works.  It is probably a kernel panic, but 
there is no dump and no reboot.  There is only a hard freeze.


Due to running GEOM_RAID, this leads to a nice long synchronization 
compared to if it did reboot, so I am trying to figure out how to fix 
that problem.  I suspect the Nvidia driver (v440.100 built from ports) 
is somehow related to this, I am not certain.


System details:
FreeBSD 12-STABLE (r365263), but this has been happening for awhile
nvidia-driver-440.100
GeForce GTX 960
UFS + Intel ICH8+ RAID1
Dump disabled or enabled matters not.

To see if it mattered if X was the active console or the type of panic 
mattered, I forced a panic from the console (ttyv0) while X was running. 
This is all it gave me before becoming unresponsive.


-
panic: kdb_sysctl_panic
cpuid = 1
time = 1599367793
KDB: stack backtrace
#0 0x80731ec5 at kdb_backtrace+0x65
#1 0x806e615b at vpanic+0x17b
#2 0x806e5fd3 at panic+0x43
#3 0x80732891 at kdb_sysctl_panic+0x61
#4 0x806f503a at sysctl_root_handler_locked+0x8a
#5 0x806f4469 at sysctl_root+0x249
#6 0x806f4ad8 at userland_sysctl+0x178
#7 0x806f491f at sys___sysctl+0x5f
#8 0x80a324c7 at amd64_syscall+0x387
#9 0x80a0995e at fast_syscall_common+0xf8
Uptime: 1m52s
-

Now, if I do the same without X running, then it spits out a stacktrace 
and reboots.


Should the system at least reboot when this happens?  Does it matter 
what options are used to build the Nvidia driver such as ACPI, which I 
have enabled?


Sean
--
s...@freebsd.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


panic: too many modules

2020-03-04 Thread Peter
Front up: I do not like loadable modules. They are nice to try
something out, but when you start to depend on some dozen loaded
modules, debugging becomes a living hell: say you hunt some spurious
misbehaviour and compare logfiles with those from four weeks ago,
you will not know exactly which modules were loaded at that time.
Compiling everything into the kernel has the advantage that the
'uname' does change on every change and so does precisely describe
the running kernel.

So I came across the cc_vegas and cc_cdg modules, and they aren't
provided to compile into the kernel straightaway. But that should not
be a big deal: just add some arbitrary new device to the KERNCONF, and
then add the required files to sys/conf/files appropriately.

Should work. But it doesn't. Right after the startup message, before
even probing devices, it says
 panic: module_register_init: module named ertt not found
and a stacktrace from kern/init_main.c:mi_startup().
But definitely the h_ertt is present in the kernel (I checked).

To have a closer look, I added VERBOSE_SYSINIT to the kernel, and -
the panic is gone, everything working as expected. Without even
activating the output from VERBOSE_SYSINIT.

Then, I moved netinet/khelp/h_ertt.c to the very end of
sys/conf/files - and this also avoids the panic and things do work.
While this change does nothing but change the sequence in which
the files are compiled (and probably linked).

I think this is not good. Everybody likes modules, (although -see
above- they come with a serious tradeoff on reproducability). But if
we now deliver components only as loadable modules because a compound
kernel is no longer able to sort them out on boot, that's a more
serious issue.
I wouldn't complain if the module would simply not work (reproducible)
when compiled into the kernel - but this here appears to be a race,
most likely a timing race. And such being possible to happen at the
point where the kernel sorts out it's own components - ups, that does
worry me indeed...

There seems also to be a desire for a *fast* system bringup. I don't
share that. I do boot once a quarter, and if that takes a hour I don't
mind.
Maybe there is need for an option, to give fast boot to those who want
a gaming console alike to be available immediately, and slow boot
for those who want a reliable system in 24/7 operation?

Maybe I'll take a closer look at the issue after switching to R.12
(probably not this year). Or, maybe somebody would like to point me
to some paper describing how the module fabric is supposed to
interface and by which steps the runtime linkage is achieved?

Platform: FreeBSD 11.3-RELEASE-p6, Intel(R) Core(TM) i5-3570T CPU (IvyBridge)

cheerio,
PMc
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic when stopping jails

2019-12-03 Thread peter . blok
Forgot to mention that it is a very recent 12-STABLE and I don’t suspect any 
recent commits. It is just that jails are now stopped more often.


> On 3 Dec 2019, at 11:47, peter.b...@bsd4all.org wrote:
> 
> Hi,
> 
> I’m getting the following panic when stopping jais. When ifunit_ref iterates 
> over the VNET ifnet’s it gets a bad ifp. I’m using netgrapg bridge’s.
> 
> Any pointers how to debug are welcome. Crash dump is available.
> 
> Peter
> 
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 3; apic id = 03
> instruction pointer   = 0x20:0x807377c5
> stack pointer = 0x28:0xfe00d1e90870
> frame pointer = 0x28:0xfe00d1e90870
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 8537 (ifconfig)
> trap number   = 9
> panic: general protection fault
> cpuid = 3
> time = 1575297301
> KDB: stack backtrace:
> #0 0x8069a8d7 at kdb_backtrace+0x67
> #1 0x8064ec6d at vpanic+0x19d
> #2 0x8064eac3 at panic+0x43
> #3 0x809e450c at trap_fatal+0x39c
> #4 0x809e395a at trap+0x6a
> #5 0x809be97c at calltrap+0x8
> #6 0x80750ff1 at ifunit_ref+0x51
> #7 0x8075328c at ifioctl+0x47c
> #8 0x806b8b2e at kern_ioctl+0x2be
> #9 0x806b87fd at sys_ioctl+0x15d
> #10 0x809e50a2 at amd64_syscall+0x362
> #11 0x809bf2b0 at fast_syscall_common+0x101
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


panic when stopping jails

2019-12-03 Thread peter . blok
Hi,

I’m getting the following panic when stopping jais. When ifunit_ref iterates 
over the VNET ifnet’s it gets a bad ifp. I’m using netgrapg bridge’s.

Any pointers how to debug are welcome. Crash dump is available.

Peter


Fatal trap 9: general protection fault while in kernel mode
cpuid = 3; apic id = 03
instruction pointer = 0x20:0x807377c5
stack pointer   = 0x28:0xfe00d1e90870
frame pointer   = 0x28:0xfe00d1e90870
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 8537 (ifconfig)
trap number = 9
panic: general protection fault
cpuid = 3
time = 1575297301
KDB: stack backtrace:
#0 0x8069a8d7 at kdb_backtrace+0x67
#1 0x8064ec6d at vpanic+0x19d
#2 0x8064eac3 at panic+0x43
#3 0x809e450c at trap_fatal+0x39c
#4 0x809e395a at trap+0x6a
#5 0x809be97c at calltrap+0x8
#6 0x80750ff1 at ifunit_ref+0x51
#7 0x8075328c at ifioctl+0x47c
#8 0x806b8b2e at kern_ioctl+0x2be
#9 0x806b87fd at sys_ioctl+0x15d
#10 0x809e50a2 at amd64_syscall+0x362
#11 0x809bf2b0 at fast_syscall_common+0x101
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: I/O to pool appears to be hung on vdev

2019-11-14 Thread Eugene Grosbein
15.11.2019 13:08, Eugene Grosbein wrote:
> Hi!
> 
> Recently I did routine source upgrade from 11.2-STABLE/amd64 to 11.3-STABLE 
> r354667
> that went without any problem. After less than 2 days of uptime it paniced 
> and failed to reboot (hung),
> screenshot is here: http://www.grosbein.net/freebsd/zpanic.png
> 
> It did not panic with 11.2-STABLE but had some performance problems with ZFS.

I have to correct myself: it did panic same way at least once with 11.2-STABLE 
r344922

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: panic: I/O to pool appears to be hung on vdev

2019-11-14 Thread Eugene Grosbein
15.11.2019 13:08, Eugene Grosbein wrote:
> Hi!
> 
> Recently I did routine source upgrade from 11.2-STABLE/amd64 to 11.3-STABLE 
> r354667
> that went without any problem. After less than 2 days of uptime it paniced 
> and failed to reboot (hung),
> screenshot is here: http://www.grosbein.net/freebsd/zpanic.png
> 
> It did not panic with 11.2-STABLE but had some performance problems with ZFS.
> 
> Hardware: Dell PowerEdge R640 with 360G RAM, mrsas(4)-supported controller 
> PERC H730/P Mini LSI MegaRAID SAS-3 3108 [Invader]
> and 7 SSD devices, two of them keep FreeBSD installation (distinct boot pool) 
> and five others
> are GELI-encrypted and combined to another (RAIDZ1) pool 'sata' mentioned on 
> screenshot.
> 
> vfs.zfs.arc_max=160g
> 
> The system runs several bhyve instances over ZVOls. There are many shapshots 
> that are routinely
> created/destroyed so system generally issues many TRIM requests to underlying 
> SSDs.
> 
> After 1.5 day of uptime (before panic) I set 
> kern.cam.da.[2-6].delete_max=262144
> changing it from default 17179607040 hoping it would decrease latency of 
> read-write operations
> like listing of snapshots. No other non-default settings for ZFS were done.
> 
> What does it mean "panic: I/O to pool appears to be hung on vdev" provided 
> hardware is healthy?

I wonder also why did it panic instead of degrading the RAIDZ pool.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


panic: I/O to pool appears to be hung on vdev

2019-11-14 Thread Eugene Grosbein
Hi!

Recently I did routine source upgrade from 11.2-STABLE/amd64 to 11.3-STABLE 
r354667
that went without any problem. After less than 2 days of uptime it paniced and 
failed to reboot (hung),
screenshot is here: http://www.grosbein.net/freebsd/zpanic.png

It did not panic with 11.2-STABLE but had some performance problems with ZFS.

Hardware: Dell PowerEdge R640 with 360G RAM, mrsas(4)-supported controller PERC 
H730/P Mini LSI MegaRAID SAS-3 3108 [Invader]
and 7 SSD devices, two of them keep FreeBSD installation (distinct boot pool) 
and five others
are GELI-encrypted and combined to another (RAIDZ1) pool 'sata' mentioned on 
screenshot.

vfs.zfs.arc_max=160g

The system runs several bhyve instances over ZVOls. There are many shapshots 
that are routinely
created/destroyed so system generally issues many TRIM requests to underlying 
SSDs.

After 1.5 day of uptime (before panic) I set kern.cam.da.[2-6].delete_max=262144
changing it from default 17179607040 hoping it would decrease latency of 
read-write operations
like listing of snapshots. No other non-default settings for ZFS were done.

What does it mean "panic: I/O to pool appears to be hung on vdev" provided 
hardware is healthy?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


panic: unregistered use of FPU in kernel

2019-09-26 Thread Alan Somers
The 12.1-PRERELEASE (FreeBSD-12.1-PRERELEASE-amd64-20190913-r352266.qcow2)
VM image instapanics with the error message:

panic: Unregistered use of FPU in kernel
stack trace:
...
ffs_ckhash_ch
bufdone
g_io_deliver
g_io_deliver
g_io_deliver
g_disk_done
vtblk_vq_intr
ithread_loop
fork_exit
...

I see a similar panic on HEAD snapshots, reproducible as far back as the
June-7 snapshot (the oldest one on the ftp server).  I'll continue
investigating.
-Alan
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.1-prerelease nullfs? related panic

2019-09-13 Thread Christos Chatzaras
> 
> This occured only once and after the reboot, I found a corrupted file on my 
> nullfs-mount.  It wasn't mutilated, but showed content of another valid file 
> in the same directory (like 'cat fil1 >> vfile2')
> 
> Any ideas if this has been recently addressed since 09/09?
> 

Maybe we have the same issue, but I am not sure as the servers rebooted and I 
don't have a crash dump:

https://lists.freebsd.org/pipermail/freebsd-stable/2019-September/091503.html 


I use nullfs mounts too for 3 jails on each server.

The issue happened in 5 servers (total servers 63) 5-6 hours after I upgrade 
from r351639 to r352091, but I am sure it would happen to all servers after 
some time if I didn't reboot them to r351639 kernel.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


12.1-prerelease nullfs? related panic

2019-09-13 Thread Harry Schmalzbauer

Hello,

got this panic today booting a test machine with kernel from 09/09/2019, 
r352054:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0x80541088
stack pointer   = 0x28:0xfe578420
frame pointer   = 0x28:0xfe578470
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 593 (limits)
trap number = 9
panic: general protection fault
cpuid = 0
time = 1568392101
KDB: stack backtrace:
#0 0x8061eec7 at kdb_backtrace+0x67
#1 0x805d323d at vpanic+0x19d
#2 0x805d3093 at panic+0x43
#3 0x80941d2c at trap_fatal+0x39c
#4 0x8094113c at trap+0x6c
#5 0x8091b4ac at calltrap+0x8
#6 0x805421c2 at null_lookup+0x162
#7 0x809c40c0 at VOP_LOOKUP_APV+0x50
#8 0x806948f1 at lookup+0x6d1
#9 0x80693dc7 at namei+0x437
#10 0x806aaad2 at kern_statat+0x72
#11 0x806ab2cf at sys_fstatat+0x2f
#12 0x809428e4 at amd64_syscall+0x364
#13 0x8091bdd0 at fast_syscall_common+0x101
Uptime: 26s

#4  0x80941d2c in trap_fatal (frame=, 
eva=)
    at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/trap.c:943
#5  0x8094113c in trap (frame=0xfe578360) at 
RELENG_12/src/sys/amd64/include/counter.h:87
#6  0x8091b4ac in calltrap () at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/exception.S:289
#7  0x80541088 in null_nodeget (mp=0xf8000524d000, 
lowervp=0xf8000535b000, vpp=0xfe5784a8)
    at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/fs/nullfs/null_subr.c:117

#8  0x805421c2 in null_lookup (ap=0xfe578568)
    at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/fs/nullfs/null_vnops.c:429
#9  0x809c40c0 in VOP_LOOKUP_APV (vop=0x80c81d98, 
a=0xfe578568) at vnode_if.c:126

#10 0x806948f1 in lookup (ndp=0xfe578768) at vnode_if.h:54
#11 0x80693dc7 in namei (ndp=0xfe578768) at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/kern/vfs_lookup.c:445
#12 0x806aaad2 in kern_statat (td=0xf800052ad5e0, 
flag=, fd=,
    path=0x8002600b0 , 
pathseg=UIO_USERSPACE, sbp=0xfe57, hook=0)
    at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/kern/vfs_syscalls.c:2300
#13 0x806ab2cf in sys_fstatat (td=, 
uap=0xf800052ad9a0)
    at 
/usr/local/share/deploy-tools/RELENG_12/src/sys/kern/vfs_syscalls.c:2277

#14 0x809428e4 in amd64_syscall (td=0xf800052ad5e0, traced=0)
    at RELENG_12/src/sys/amd64/amd64/../../kern/subr_syscall.c:135


This occured only once and after the reboot, I found a corrupted file on 
my nullfs-mount.  It wasn't mutilated, but showed content of another 
valid file in the same directory (like 'cat fil1 >> vfile2')


Any ideas if this has been recently addressed since 09/09?

Thanks,

-harry

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Karl Denninger
On 7/18/2019 15:35, Karl Denninger wrote:
> On 7/18/2019 15:19, Eugene Grosbein wrote:
>> 19.07.2019 3:13, Karl Denninger wrote:
>>
>>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
>>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
>>>
>>> Note -- no patches of any sort in the ZFS code; I am NOT running any of
>>> my former patch set.
>>>
>>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
>>>
>>> Thu Jul 18 15:02:54 CDT 2019
>>>
>>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
>>> Thu Jun 13 18:01:16 CDT 2019
>>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  amd64
>>>
>>> panic: double fault
>> [skip]
>>
>>> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000)
>>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
>>> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000)
>>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
>>> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100)
>>> at /usr/src/sys/kern/subr_taskqueue.c:467
>>> #286 0x80c3cb28 in taskqueue_thread_loop (arg=)
>>> at /usr/src/sys/kern/subr_taskqueue.c:773
>>> #287 0x80b9ab23 in fork_exit (
>>> callout=0x80c3ca90 ,
>>> arg=0xf801a0577520, frame=0xfe009d4edc00)
>>> at /usr/src/sys/kern/kern_fork.c:1063
>>> #288 0x810b367e in fork_trampoline ()
>>> at /usr/src/sys/amd64/amd64/exception.S:996
>>> #289 0x in ?? ()
>>> Current language:  auto; currently minimal
>>> (kgdb)
>> You have "double fault" and completely insane number of stack frames in the 
>> trace.
>> This is obviously infinite recursion resulting in kernel stack overflow and 
>> panic.
> Yes, but why and how?
>
> What's executing at the time is this command:
>
> zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP
>
> Which in turn results in the old snapshots on the target not on the
> source being deleted, then the new ones being sent.  It never gets to
> the sending part; it blows up during the delete of the OLD snapshots.
>
> The one(s) it deletes, however, it DOES delete.  When the box is
> rebooted those two snapshots on the target are indeed gone.
>
> That is, it is NOT getting "stuck" on one (which would imply there's an
> un-detected fault in the filesystem on the target in the metadata for
> that snapshot, resulting in a recursive call that blows up the stack)
> and it never gets to send the new snapshot, so whatever is going on is
> NOT on the source filesystem.  Neither source or destination shows any
> errors on the filesystem; both pools are healthy with zero error counts.
>
> Therefore the question -- is the system queueing enough work to blow the
> stack *BUT* the work it queues is all legitimate?  If so there's a
> serious problem in the way the code now functions in that an "ordinary"
> operation can result in what amounts to kernel stack exhaustion.
>
> One note -- I haven't run this backup for the last five days, as I do it
> manually and I've been out of town.  Previous running it on a daily
> basis completed without trouble.  This smells like a backlog of "things
> to do" when the send runs that results in the allegedly-infinite
> recursion (that isn't really infinite) that runs the stack out of space
> -- and THAT implies that the system is trying to queue a crazy amount of
> work on a recursive basis for what is a perfectly-legitimate operation
> -- which it should *NOT* do.

Update: This looks like an OLD bug that came back.

Previously the system would go absolutely insane on the first few
accesses to spinning rust during a snapshot delete and ATTEMPT to send
thousands of TRIM requests -- which spinning rust does not support.  On
a system with mixed vdevs, where some pools are rust and some are SSD,
this was a problem since you can't turn TRIM off because you REALLY want
it on those disks.

The FIX for this was to do this on the import of said pool comprised of
spinning rust:

#
# Now try to trigger TRIM so that we don't have a storm of them
#
# echo "Attempting to disable TRIM on spinning rust"

mount -t zfs $BACKUP/no-trim /mnt
dd if=/dev/random of=/mnt/kill-trim bs=128k count=2
echo "Performed 2 writes"
sleep 2
rm /mnt/kill-trim
echo "Performed delete of written file; wait"
sleep 35
umount /mnt
echo "Unmounted tempo

Re: Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Karl Denninger
On 7/18/2019 15:19, Eugene Grosbein wrote:
> 19.07.2019 3:13, Karl Denninger wrote:
>
>> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
>>
>> Note -- no patches of any sort in the ZFS code; I am NOT running any of
>> my former patch set.
>>
>> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
>>
>> Thu Jul 18 15:02:54 CDT 2019
>>
>> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
>> Thu Jun 13 18:01:16 CDT 2019
>> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  amd64
>>
>> panic: double fault
> [skip]
>
>> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000)
>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
>> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000)
>> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
>> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100)
>> at /usr/src/sys/kern/subr_taskqueue.c:467
>> #286 0x80c3cb28 in taskqueue_thread_loop (arg=)
>> at /usr/src/sys/kern/subr_taskqueue.c:773
>> #287 0x80b9ab23 in fork_exit (
>> callout=0x80c3ca90 ,
>> arg=0xf801a0577520, frame=0xfe009d4edc00)
>> at /usr/src/sys/kern/kern_fork.c:1063
>> #288 0x810b367e in fork_trampoline ()
>> at /usr/src/sys/amd64/amd64/exception.S:996
>> #289 0x in ?? ()
>> Current language:  auto; currently minimal
>> (kgdb)
> You have "double fault" and completely insane number of stack frames in the 
> trace.
> This is obviously infinite recursion resulting in kernel stack overflow and 
> panic.

Yes, but why and how?

What's executing at the time is this command:

zfs send -RI $i@zfs-old $i@zfs-base | zfs receive -Fudv $BACKUP

Which in turn results in the old snapshots on the target not on the
source being deleted, then the new ones being sent.  It never gets to
the sending part; it blows up during the delete of the OLD snapshots.

The one(s) it deletes, however, it DOES delete.  When the box is
rebooted those two snapshots on the target are indeed gone.

That is, it is NOT getting "stuck" on one (which would imply there's an
un-detected fault in the filesystem on the target in the metadata for
that snapshot, resulting in a recursive call that blows up the stack)
and it never gets to send the new snapshot, so whatever is going on is
NOT on the source filesystem.  Neither source or destination shows any
errors on the filesystem; both pools are healthy with zero error counts.

Therefore the question -- is the system queueing enough work to blow the
stack *BUT* the work it queues is all legitimate?  If so there's a
serious problem in the way the code now functions in that an "ordinary"
operation can result in what amounts to kernel stack exhaustion.

One note -- I haven't run this backup for the last five days, as I do it
manually and I've been out of town.  Previous running it on a daily
basis completed without trouble.  This smells like a backlog of "things
to do" when the send runs that results in the allegedly-infinite
recursion (that isn't really infinite) that runs the stack out of space
-- and THAT implies that the system is trying to queue a crazy amount of
work on a recursive basis for what is a perfectly-legitimate operation
-- which it should *NOT* do.

-- 
Karl Denninger
k...@denninger.net <mailto:k...@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Eugene Grosbein
19.07.2019 3:13, Karl Denninger wrote:

> FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP
> 
> Note -- no patches of any sort in the ZFS code; I am NOT running any of
> my former patch set.
> 
> NewFS.denninger.net dumped core - see /var/crash/vmcore.8
> 
> Thu Jul 18 15:02:54 CDT 2019
> 
> FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
> Thu Jun 13 18:01:16 CDT 2019
> k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  amd64
> 
> panic: double fault

[skip]

> #283 0x82748d91 in zio_vdev_io_done (zio=0xf8000b8b8000)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3376
> #284 0x82744eac in zio_execute (zio=0xf8000b8b8000)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
> #285 0x80c3b7f4 in taskqueue_run_locked (queue=0xf801a8b35100)
> at /usr/src/sys/kern/subr_taskqueue.c:467
> #286 0x80c3cb28 in taskqueue_thread_loop (arg=)
> at /usr/src/sys/kern/subr_taskqueue.c:773
> #287 0x80b9ab23 in fork_exit (
> callout=0x80c3ca90 ,
> arg=0xf801a0577520, frame=0xfe009d4edc00)
> at /usr/src/sys/kern/kern_fork.c:1063
> #288 0x810b367e in fork_trampoline ()
> at /usr/src/sys/amd64/amd64/exception.S:996
> #289 0x in ?? ()
> Current language:  auto; currently minimal
> (kgdb)

You have "double fault" and completely insane number of stack frames in the 
trace.
This is obviously infinite recursion resulting in kernel stack overflow and 
panic.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Kernel panic in zfs code; 12-STABLE

2019-07-18 Thread Karl Denninger
FreeBSD 12.0-STABLE #2 r349024M: Thu Jun 13 18:01:16 CDT 2019
k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP

Note -- no patches of any sort in the ZFS code; I am NOT running any of
my former patch set.

NewFS.denninger.net dumped core - see /var/crash/vmcore.8

Thu Jul 18 15:02:54 CDT 2019

FreeBSD NewFS.denninger.net 12.0-STABLE FreeBSD 12.0-STABLE #2 r349024M:
Thu Jun 13 18:01:16 CDT 2019
k...@newfs.denninger.net:/usr/obj/usr/src/amd64.amd64/sys/KSD-SMP  amd64

panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:

Fatal double fault
rip 0x8271eeec rsp 0xfe009d4e7f60 rbp 0xfe009d4e8450
rax 0xf801b5b68000 rdx 0xababe19 rbx 0xf801ac399000
rcx 0x6f598 rsi 0xf801b5b68740 rdi 0xf801ac2a2668
r8 0xf801ac2a2668 r9 0 r10 0xf801ac7cf250
r11 0 r12 0xf801b5b685b8 r13 0xf801b5b68000
r14 0xfe0082dfb000 r15 0xf801b5b685b8 rflags 0x10286
cs 0x20 ss 0x28 ds 0x3b es 0x3b fs 0x13 gs 0x1b
fsbase 0x8002328d0 gsbase 0x8202a100 kgsbase 0
cpuid = 11; apic id = 35
panic: double fault
cpuid = 11
time = 1563479881
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe000338edb0
vpanic() at vpanic+0x19d/frame 0xfffffe000338ee00
panic() at panic+0x43/frame 0xfe000338ee60
dblfault_handler() at dblfault_handler+0x1de/frame 0xfe000338ef30
Xdblfault() at Xdblfault+0xc3/frame 0xfe000338ef30
--- trap 0x17, rip = 0x8271eeec, rsp = 0xfe009d4e7f60, rbp =
0xfe009d4e8450 ---
vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x2c/frame
0xfe009d4e8450
vdev_queue_io_done() at vdev_queue_io_done+0xc8/frame 0xfe009d4e84a0
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e84e0
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8530
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e8590
zio_execute() at zio_execute+0xac/frame 0xfe009d4e85e0
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e8630
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8670
zio_execute() at zio_execute+0xac/frame 0xfe009d4e86c0
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e8720
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8770
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e87c0
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8800
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8850
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e88b0
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8900
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e8950
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8990
zio_execute() at zio_execute+0xac/frame 0xfe009d4e89e0
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e8a40
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8a90
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e8ae0
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8b20
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8b70
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e8bd0
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8c20
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e8c70
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8cb0
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8d00
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e8d60
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8db0
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e8e00
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8e40
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8e90
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e8ef0
zio_execute() at zio_execute+0xac/frame 0xfe009d4e8f40
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e8f90
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e8fd0
zio_execute() at zio_execute+0xac/frame 0xfe009d4e9020
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e9080
zio_execute() at zio_execute+0xac/frame 0xfe009d4e90d0
vdev_queue_io_done() at vdev_queue_io_done+0x115/frame 0xfe009d4e9120
zio_vdev_io_done() at zio_vdev_io_done+0x151/frame 0xfe009d4e9160
zio_execute() at zio_execute+0xac/frame 0xfe009d4e91b0
zio_vdev_io_start() at zio_vdev_io_start+0x2a7/frame 0xfe009d4e9210
zio_execu

GENERIC 12.0-RELEASE-p7 (amd64) panic when rebooting

2019-07-15 Thread pomoke
I have a reproducible panic on my VPS just before powering off or
rebooting.Suggesting umounting fs before swapoff?

Message of the crash:
   Waiting (max 60 seconds) for system process `vnlru' to stop... done
   Waiting (max 60 seconds) for system process `syncer' to stop... 
   Syncing disks, vnodes remaining... 12 12 12 3 1 1 0 0 0 0 done
   Waiting (max 60 seconds) for system thread `bufdaemon' to stop...
   done
   Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to
   stop... done
   All buffers synced.
   swap_pager: I/O error - pagein failed; blkno 3291,size 4096, error 5
   panic: swap_pager_force_pagein: read from swap failed
   cpuid = 0
   time = 1559891849
   KDB: stack backtrace:
   #0 0x80be7977 at kdb_backtrace+0x67
   #1 0x80b9b563 at vpanic+0x1a3
   #2 0xffff80b9b3b3 at panic+0x43
   #3 0x80ed14bd at swapoff_one+0x80d
   #4 0x80ed15e7 at swapoff_all+0x117
   #5 0x80c4905a at bufshutdown+0x2fa
   #6 0x80b9aee8 at kern_reboot+0x228
   #7 0x80b9acb1 at sys_reboot+0x411
   #8 0x81075449 at amd64_syscall+0x369
   #9 0x8104fd1d at fast_syscall_common+0x101
   Uptime: 20h4m14s
   Automatic reboot in 15 seconds - press a key on the console to abort
   Rebooting...

/etc/fstab :
   /dev/ufs/rootfs  /   ufs rw,userquota,groupquota 0 0
   md11 none swap sw,file=/var/swap,late 0 0

swapinfo (upon reboot) :
   /dev/md11  15728640  1572864 0%
   -- 
   Darui Zhuo 


signature.asc
Description: This is a digitally signed message part


Boot time panic with 11.3-RELEASE on HP DL380 G7

2019-07-13 Thread Greg Rivers
frame pointer   = 0x28:0xfe231ad3ca50
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock (0))
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0x80b4c4b7 at kdb_backtrace+0x67
#1 0x80b054ce at vpanic+0x17e
#2 0x80b05343 at panic+0x43
#3 0x80f894f9 at trap_fatal+0x369
#4 0x80f89559 at trap_pfault+0x49
#5 0x80f88bdd at trap+0x29d
#6 0x80f68d9c at calltrap+0x8
#7 0x80b1d72f at softclock_call_cc+0x14f
#8 0x80b1dc29 at softclock+0x79
#9 0x80acb6f9 at intr_event_execute_handlers+0xe9
#10 0x80acb9d7 at ithread_loop+0xe7
#11 0x80ac8aa3 at fork_exit+0x83
#12 0x80f69d6e at fork_trampoline+0xe
Uptime: 3s
Automatic reboot in 15 seconds - press a key on the console to abort

This happens every time during device probing. 11.3-RELEASE runs fine on a 
DL380 G9. I'm not sure if this is a bug, or whether a loader.conf clock related 
tunable is now necessary on this 6 year old hardware. Thoughts or suggestions?

-- 
Greg Rivers


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic on 12-STABLE-r348203 amd64

2019-06-11 Thread Mark Saad
All
  I am going to try to reset the box to factory defaults and try to make it 
crash again today . I’ll update you with my outcome . 

---
Mark Saad | nones...@longcount.org

> On Jun 11, 2019, at 1:15 AM, Kubilay Kocak  wrote:
> 
>> On 6/06/2019 5:04 am, Mark Saad wrote:
>>> On Wed, Jun 5, 2019 at 2:42 PM Mark Saad  wrote:
>>> 
>>>> On Wed, Jun 5, 2019 at 12:29 PM Mark Saad  wrote:
>>>> 
>>>> All
>>>>  I was wondering if anyone could shed some light on this boot panic I
>>>> saw yesterday. This is on a Dell R630 with Bios 2.9.1  booting
>>>> 12.0-STABLE-r348203 amd64.
>>>> I reverted this back to 12.0-RELEASE-p4 and its fine .
>>>> 
>>>> The only custom options I had were in loader.conf
>>>> 
>>>> kern.geom.label.gptid.enable="0"
>>>> ipmi_load="YES"
>>>> boot_multicons="YES"
>>>> boot_serial="YES"
>>>> console="comconsole,vidconsole"
>>>> net.inet.tcp.tso="0"
>>>> cc_htcp_load="YES"
>>>> autoboot_delay="5"
>>>> hw.mfi.mrsas_enable="1"
>>>> hw.usb.no_pf="1"# Disable USB packet filtering
>>>> hw.usb.no_shutdown_wait="1"
>>>> hw.vga.textmode="1" # Text mode
>>>> machdep.hyperthreading_allowed="0"
>>>> 
>>>> Any ideas ?
>>>> 
>>>> Screen shot here
>>>> https://imgur.com/a/nGvHtIs
>>>> 
>>>> --
>>>> mark saad | nones...@longcount.org
>>> 
>>> Plain text version of the crash
>>> 
>>> Loading kernel...
>>> /boot/kernel/kernel text=0x168d811 data=0x1cf968+0x768c80
>>> syms=[0x8+0x1778e8+0x8   /
>>> +0x194f1d]
>>> Loading configured modules...
>>> /boot/kernel/ipmi.ko size 0x11e10 at 0x2645000
>>> loading required module 'smbus'
>>> /boot/kernel/smbus.ko size 0x2ef0 at 0x2657000
>>> /boot/entropy size=0x1000
>>> /boot/kernel/cc_httcp.ko size 0x2330 at 0x265b000
>>> ---<>---c_hmodule 'smbus'
>>> Copyright (c) 1992-2019 The FreeBSD Project.
>>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>>> The Regents of the University of California. All rights reserved.
>>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>>> FreeBSD 12.0-STABLE r348693 GENERIC amd64
>>> FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on
>>> LLVM 8.0.0)
>>> panic: UMA zone "UMA Zones": Increase vm.boot_pages
>>> cpuid = 0
>>> time = 1
>>> KDB: stack backtrace:
>>> #0 0x80c16df7 at ??+0
>>> #1 0x80bcaccd at ??+0
>>> #2 0x80bcab23 at ??+0
>>> #3 0x80f0b03c at ??+0
>>> #4 0x80f08d8d at ??+0
>>> #5 0x80f0bb3d at ??+0
>>> #6 0x80f0b301 at ??+0
>>> #7 0x80f0b3d1 at ??+0
>>> #8 0x80f066c4 at ??+0
>>> #9 0x80f0543f at ??+0
>>> #10 0x80f23aef at ??+0
>>> #11 0x80f1133b at ??+0
>>> #12 0x80b619c8 at ??+0
>>> #13 0x8036a02c at ??+0
>>> Uptime: 1s
>>> 
>>> 
>>> Also increasing the vm.boot_pages to 128 in the loader works. Anyone
>>> know why ? This box has 64G ram.
>>> 
>>> --
>>> mark saad | nones...@longcount.org
>> So after some poking in the bios this has to do with how the Dell NUMA
>> options are set. If the system is set Cluster On Die mode, you get a
>> kernel panic
>> Home Snoop or Early Snoop no issue.
> 
> Hi Mark,
> 
> Could you report this bug (Bugzilla) if you haven't already, providing:
> 
> - exact freebsd version(s) reproducible with
> - panic/backtrace output as an attachment. Ideally with a debug kernel
> - /var/run/dmesg.boot output (as an attachment) in a verbose boot
> - if you can test a current snapshot, that would be great
> - any other system information you believe might be helpful in isolating root 
> cause(s) or potential fixes
> 
> Thanks!
> Feel free to CC me on it
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic on 12-STABLE-r348203 amd64

2019-06-10 Thread Kubilay Kocak

On 6/06/2019 5:04 am, Mark Saad wrote:

On Wed, Jun 5, 2019 at 2:42 PM Mark Saad  wrote:


On Wed, Jun 5, 2019 at 12:29 PM Mark Saad  wrote:


All
  I was wondering if anyone could shed some light on this boot panic I
saw yesterday. This is on a Dell R630 with Bios 2.9.1  booting
12.0-STABLE-r348203 amd64.
I reverted this back to 12.0-RELEASE-p4 and its fine .

The only custom options I had were in loader.conf

kern.geom.label.gptid.enable="0"
ipmi_load="YES"
boot_multicons="YES"
boot_serial="YES"
console="comconsole,vidconsole"
net.inet.tcp.tso="0"
cc_htcp_load="YES"
autoboot_delay="5"
hw.mfi.mrsas_enable="1"
hw.usb.no_pf="1"# Disable USB packet filtering
hw.usb.no_shutdown_wait="1"
hw.vga.textmode="1" # Text mode
machdep.hyperthreading_allowed="0"

Any ideas ?

Screen shot here
https://imgur.com/a/nGvHtIs

--
mark saad | nones...@longcount.org


Plain text version of the crash

Loading kernel...
/boot/kernel/kernel text=0x168d811 data=0x1cf968+0x768c80
syms=[0x8+0x1778e8+0x8   /
+0x194f1d]
Loading configured modules...
/boot/kernel/ipmi.ko size 0x11e10 at 0x2645000
loading required module 'smbus'
/boot/kernel/smbus.ko size 0x2ef0 at 0x2657000
/boot/entropy size=0x1000
/boot/kernel/cc_httcp.ko size 0x2330 at 0x265b000
---<>---c_hmodule 'smbus'
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-STABLE r348693 GENERIC amd64
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on
LLVM 8.0.0)
panic: UMA zone "UMA Zones": Increase vm.boot_pages
cpuid = 0
time = 1
KDB: stack backtrace:
#0 0x80c16df7 at ??+0
#1 0x80bcaccd at ??+0
#2 0x80bcab23 at ??+0
#3 0x80f0b03c at ??+0
#4 0x80f08d8d at ??+0
#5 0x80f0bb3d at ??+0
#6 0x80f0b301 at ??+0
#7 0x80f0b3d1 at ??+0
#8 0x80f066c4 at ??+0
#9 0x80f0543f at ??+0
#10 0x80f23aef at ??+0
#11 0x80f1133b at ??+0
#12 0x80b619c8 at ??+0
#13 0x8036a02c at ??+0
Uptime: 1s


Also increasing the vm.boot_pages to 128 in the loader works. Anyone
know why ? This box has 64G ram.

--
mark saad | nones...@longcount.org


So after some poking in the bios this has to do with how the Dell NUMA
options are set. If the system is set Cluster On Die mode, you get a
kernel panic
Home Snoop or Early Snoop no issue.




Hi Mark,

Could you report this bug (Bugzilla) if you haven't already, providing:

- exact freebsd version(s) reproducible with
- panic/backtrace output as an attachment. Ideally with a debug kernel
- /var/run/dmesg.boot output (as an attachment) in a verbose boot
- if you can test a current snapshot, that would be great
- any other system information you believe might be helpful in isolating 
root cause(s) or potential fixes


Thanks!
Feel free to CC me on it
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic on 12-STABLE-r348203 amd64

2019-06-05 Thread Mark Saad
On Wed, Jun 5, 2019 at 2:42 PM Mark Saad  wrote:
>
> On Wed, Jun 5, 2019 at 12:29 PM Mark Saad  wrote:
> >
> > All
> >  I was wondering if anyone could shed some light on this boot panic I
> > saw yesterday. This is on a Dell R630 with Bios 2.9.1  booting
> > 12.0-STABLE-r348203 amd64.
> > I reverted this back to 12.0-RELEASE-p4 and its fine .
> >
> > The only custom options I had were in loader.conf
> >
> > kern.geom.label.gptid.enable="0"
> > ipmi_load="YES"
> > boot_multicons="YES"
> > boot_serial="YES"
> > console="comconsole,vidconsole"
> > net.inet.tcp.tso="0"
> > cc_htcp_load="YES"
> > autoboot_delay="5"
> > hw.mfi.mrsas_enable="1"
> > hw.usb.no_pf="1"# Disable USB packet filtering
> > hw.usb.no_shutdown_wait="1"
> > hw.vga.textmode="1" # Text mode
> > machdep.hyperthreading_allowed="0"
> >
> > Any ideas ?
> >
> > Screen shot here
> > https://imgur.com/a/nGvHtIs
> >
> > --
> > mark saad | nones...@longcount.org
>
> Plain text version of the crash
>
> Loading kernel...
> /boot/kernel/kernel text=0x168d811 data=0x1cf968+0x768c80
> syms=[0x8+0x1778e8+0x8   /
> +0x194f1d]
> Loading configured modules...
> /boot/kernel/ipmi.ko size 0x11e10 at 0x2645000
> loading required module 'smbus'
> /boot/kernel/smbus.ko size 0x2ef0 at 0x2657000
> /boot/entropy size=0x1000
> /boot/kernel/cc_httcp.ko size 0x2330 at 0x265b000
> ---<>---c_hmodule 'smbus'
> Copyright (c) 1992-2019 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 12.0-STABLE r348693 GENERIC amd64
> FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on
> LLVM 8.0.0)
> panic: UMA zone "UMA Zones": Increase vm.boot_pages
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> #0 0x80c16df7 at ??+0
> #1 0x80bcaccd at ??+0
> #2 0x80bcab23 at ??+0
> #3 0x80f0b03c at ??+0
> #4 0x80f08d8d at ??+0
> #5 0x80f0bb3d at ??+0
> #6 0x80f0b301 at ??+0
> #7 0x80f0b3d1 at ??+0
> #8 0x80f066c4 at ??+0
> #9 0x80f0543f at ??+0
> #10 0x80f23aef at ??+0
> #11 0x80f1133b at ??+0
> #12 0x80b619c8 at ??+0
> #13 0x8036a02c at ??+0
> Uptime: 1s
>
>
> Also increasing the vm.boot_pages to 128 in the loader works. Anyone
> know why ? This box has 64G ram.
>
> --
> mark saad | nones...@longcount.org

So after some poking in the bios this has to do with how the Dell NUMA
options are set. If the system is set Cluster On Die mode, you get a
kernel panic
Home Snoop or Early Snoop no issue.


-- 
mark saad | nones...@longcount.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic on 12-STABLE-r348203 amd64

2019-06-05 Thread Mark Saad
On Wed, Jun 5, 2019 at 12:29 PM Mark Saad  wrote:
>
> All
>  I was wondering if anyone could shed some light on this boot panic I
> saw yesterday. This is on a Dell R630 with Bios 2.9.1  booting
> 12.0-STABLE-r348203 amd64.
> I reverted this back to 12.0-RELEASE-p4 and its fine .
>
> The only custom options I had were in loader.conf
>
> kern.geom.label.gptid.enable="0"
> ipmi_load="YES"
> boot_multicons="YES"
> boot_serial="YES"
> console="comconsole,vidconsole"
> net.inet.tcp.tso="0"
> cc_htcp_load="YES"
> autoboot_delay="5"
> hw.mfi.mrsas_enable="1"
> hw.usb.no_pf="1"# Disable USB packet filtering
> hw.usb.no_shutdown_wait="1"
> hw.vga.textmode="1" # Text mode
> machdep.hyperthreading_allowed="0"
>
> Any ideas ?
>
> Screen shot here
> https://imgur.com/a/nGvHtIs
>
> --
> mark saad | nones...@longcount.org

Plain text version of the crash

Loading kernel...
/boot/kernel/kernel text=0x168d811 data=0x1cf968+0x768c80
syms=[0x8+0x1778e8+0x8   /
+0x194f1d]
Loading configured modules...
/boot/kernel/ipmi.ko size 0x11e10 at 0x2645000
loading required module 'smbus'
/boot/kernel/smbus.ko size 0x2ef0 at 0x2657000
/boot/entropy size=0x1000
/boot/kernel/cc_httcp.ko size 0x2330 at 0x265b000
---<>---c_hmodule 'smbus'
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-STABLE r348693 GENERIC amd64
FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on
LLVM 8.0.0)
panic: UMA zone "UMA Zones": Increase vm.boot_pages
cpuid = 0
time = 1
KDB: stack backtrace:
#0 0x80c16df7 at ??+0
#1 0x80bcaccd at ??+0
#2 0x80bcab23 at ??+0
#3 0x80f0b03c at ??+0
#4 0x80f08d8d at ??+0
#5 0x80f0bb3d at ??+0
#6 0x80f0b301 at ??+0
#7 0x80f0b3d1 at ??+0
#8 0x80f066c4 at ??+0
#9 0x80f0543f at ??+0
#10 0x80f23aef at ??+0
#11 0x80f1133b at ??+0
#12 0x80b619c8 at ??+0
#13 0x8036a02c at ??+0
Uptime: 1s


Also increasing the vm.boot_pages to 128 in the loader works. Anyone
know why ? This box has 64G ram.

-- 
mark saad | nones...@longcount.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Kernel panic on 12-STABLE-r348203 amd64

2019-06-05 Thread Mark Saad
All
 I was wondering if anyone could shed some light on this boot panic I
saw yesterday. This is on a Dell R630 with Bios 2.9.1  booting
12.0-STABLE-r348203 amd64.
I reverted this back to 12.0-RELEASE-p4 and its fine .

The only custom options I had were in loader.conf

kern.geom.label.gptid.enable="0"
ipmi_load="YES"
boot_multicons="YES"
boot_serial="YES"
console="comconsole,vidconsole"
net.inet.tcp.tso="0"
cc_htcp_load="YES"
autoboot_delay="5"
hw.mfi.mrsas_enable="1"
hw.usb.no_pf="1"# Disable USB packet filtering
hw.usb.no_shutdown_wait="1"
hw.vga.textmode="1" # Text mode
machdep.hyperthreading_allowed="0"

Any ideas ?

Screen shot here
https://imgur.com/a/nGvHtIs

-- 
mark saad | nones...@longcount.org
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: efirtc causing panic (was Re: Panic booting 12-RC2 on amd64)

2019-06-03 Thread Jan Martin Mikkelsen
Hi,

This patch resolves the panic when booting without efi.rt.disabled=1 for me.

Thanks!

Jan M.


> On 31 May 2019, at 20:35, Konstantin Belousov  wrote:
> 
> On Fri, May 31, 2019 at 04:19:57PM +0200, Jan Martin Mikkelsen wrote:
>> Hi,
>> 
>> Christian has pointed me at this 
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233534 which he raised 
>> after his email. The workaround was to boot with “efi.rt.disabled=1”. 
>> 
>> I took a closer look at what is going on. The problem is that the EFI 
>> rt_gettime call is faulting, and the fault is handled in efirt_support.S and 
>> a failure is reported. These messages is in the kernel output:
>> 
>> kernel trap 12 with interrupts disabled
>> kernel trap 12 with interrupts disabled
>> EFI rt_gettime call faulted, error 14
>> efirtc0: cannot read EFI realtime clock, error 14
>> 
>> So far, so good. The problem is that that later in startup the 
>> "smp_targeted_tlb_shootdown: interrupts disabled” panic occurs, if the SMP 
>> is enabled. With SMP disabled this does not occur and the system runs.
>> 
>> I’m not sure whether this is a BIOS problem (seems likely) or something that 
>> could handled after dealing with the fault in efirt_support.S.
>> 
>> While looking I found the code below that looks wrong in efi_enter(), but 
>> that is not the problem in this case.
>> 
>> Just adding this to the archive in case someone else looks more closely 
>> later.
> 
> Try this.  Only compile-time tested.
> 
> diff --git a/sys/amd64/amd64/efirt_support.S b/sys/amd64/amd64/efirt_support.S
> index cd578eddcfb..b54b13b01fe 100644
> --- a/sys/amd64/amd64/efirt_support.S
> +++ b/sys/amd64/amd64/efirt_support.S
> @@ -47,6 +47,9 @@ ENTRY(efi_rt_arch_call)
>   movq%r13, EC_R13(%rdi)
>   movq%r14, EC_R14(%rdi)
>   movq%r15, EC_R15(%rdi)
> + pushfq
> + popq%rax
> + movq%rax, EC_RFLAGS(%rdi)
>   movqPCPU(CURTHREAD), %rax
>   movq%rdi, TD_MD+MD_EFIRT_TMP(%rax)
>   movqPCPU(CURPCB), %rsi
> @@ -98,6 +101,8 @@ efi_rt_arch_call_tail:
>   movqEC_RBP(%rdi), %rbp
>   movqEC_RSP(%rdi), %rsp
>   movqEC_RBX(%rdi), %rbx
> + pushq   EC_RFLAGS(%rdi)
> + popfq
> 
>   popq%rbp
>   ret
> diff --git a/sys/amd64/amd64/genassym.c b/sys/amd64/amd64/genassym.c
> index de3969734a1..2e81b823262 100644
> --- a/sys/amd64/amd64/genassym.c
> +++ b/sys/amd64/amd64/genassym.c
> @@ -272,3 +272,4 @@ ASSYM(EC_R12, offsetof(struct efirt_callinfo, ec_r12));
> ASSYM(EC_R13, offsetof(struct efirt_callinfo, ec_r13));
> ASSYM(EC_R14, offsetof(struct efirt_callinfo, ec_r14));
> ASSYM(EC_R15, offsetof(struct efirt_callinfo, ec_r15));
> +ASSYM(EC_RFLAGS, offsetof(struct efirt_callinfo, ec_rflags));
> diff --git a/sys/amd64/include/efi.h b/sys/amd64/include/efi.h
> index 082223792ac..e630a338c17 100644
> --- a/sys/amd64/include/efi.h
> +++ b/sys/amd64/include/efi.h
> @@ -72,6 +72,7 @@ struct efirt_callinfo {
>   register_t  ec_r13;
>   register_t  ec_r14;
>   register_t  ec_r15;
> + register_t  ec_rflags;
> };
> 
> #endif /* __AMD64_INCLUDE_EFI_H_ */
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: efirtc causing panic (was Re: Panic booting 12-RC2 on amd64)

2019-05-31 Thread Konstantin Belousov
On Fri, May 31, 2019 at 04:19:57PM +0200, Jan Martin Mikkelsen wrote:
> Hi,
> 
> Christian has pointed me at this 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233534 which he raised 
> after his email. The workaround was to boot with “efi.rt.disabled=1”. 
> 
> I took a closer look at what is going on. The problem is that the EFI 
> rt_gettime call is faulting, and the fault is handled in efirt_support.S and 
> a failure is reported. These messages is in the kernel output:
> 
> kernel trap 12 with interrupts disabled
> kernel trap 12 with interrupts disabled
> EFI rt_gettime call faulted, error 14
> efirtc0: cannot read EFI realtime clock, error 14
> 
> So far, so good. The problem is that that later in startup the 
> "smp_targeted_tlb_shootdown: interrupts disabled” panic occurs, if the SMP is 
> enabled. With SMP disabled this does not occur and the system runs.
> 
> I’m not sure whether this is a BIOS problem (seems likely) or something that 
> could handled after dealing with the fault in efirt_support.S.
> 
> While looking I found the code below that looks wrong in efi_enter(), but 
> that is not the problem in this case.
> 
> Just adding this to the archive in case someone else looks more closely later.

Try this.  Only compile-time tested.

diff --git a/sys/amd64/amd64/efirt_support.S b/sys/amd64/amd64/efirt_support.S
index cd578eddcfb..b54b13b01fe 100644
--- a/sys/amd64/amd64/efirt_support.S
+++ b/sys/amd64/amd64/efirt_support.S
@@ -47,6 +47,9 @@ ENTRY(efi_rt_arch_call)
movq%r13, EC_R13(%rdi)
movq%r14, EC_R14(%rdi)
movq%r15, EC_R15(%rdi)
+   pushfq
+   popq%rax
+   movq%rax, EC_RFLAGS(%rdi)
movqPCPU(CURTHREAD), %rax
movq%rdi, TD_MD+MD_EFIRT_TMP(%rax)
movqPCPU(CURPCB), %rsi
@@ -98,6 +101,8 @@ efi_rt_arch_call_tail:
movqEC_RBP(%rdi), %rbp
movqEC_RSP(%rdi), %rsp
movqEC_RBX(%rdi), %rbx
+   pushq   EC_RFLAGS(%rdi)
+   popfq
 
popq%rbp
ret
diff --git a/sys/amd64/amd64/genassym.c b/sys/amd64/amd64/genassym.c
index de3969734a1..2e81b823262 100644
--- a/sys/amd64/amd64/genassym.c
+++ b/sys/amd64/amd64/genassym.c
@@ -272,3 +272,4 @@ ASSYM(EC_R12, offsetof(struct efirt_callinfo, ec_r12));
 ASSYM(EC_R13, offsetof(struct efirt_callinfo, ec_r13));
 ASSYM(EC_R14, offsetof(struct efirt_callinfo, ec_r14));
 ASSYM(EC_R15, offsetof(struct efirt_callinfo, ec_r15));
+ASSYM(EC_RFLAGS, offsetof(struct efirt_callinfo, ec_rflags));
diff --git a/sys/amd64/include/efi.h b/sys/amd64/include/efi.h
index 082223792ac..e630a338c17 100644
--- a/sys/amd64/include/efi.h
+++ b/sys/amd64/include/efi.h
@@ -72,6 +72,7 @@ struct efirt_callinfo {
register_t  ec_r13;
register_t  ec_r14;
register_t  ec_r15;
+   register_t  ec_rflags;
 };
 
 #endif /* __AMD64_INCLUDE_EFI_H_ */
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


efirtc causing panic (was Re: Panic booting 12-RC2 on amd64)

2019-05-31 Thread Jan Martin Mikkelsen
Hi,

Christian has pointed me at this 
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233534 which he raised after 
his email. The workaround was to boot with “efi.rt.disabled=1”. 

I took a closer look at what is going on. The problem is that the EFI 
rt_gettime call is faulting, and the fault is handled in efirt_support.S and a 
failure is reported. These messages is in the kernel output:

kernel trap 12 with interrupts disabled
kernel trap 12 with interrupts disabled
EFI rt_gettime call faulted, error 14
efirtc0: cannot read EFI realtime clock, error 14

So far, so good. The problem is that that later in startup the 
"smp_targeted_tlb_shootdown: interrupts disabled” panic occurs, if the SMP is 
enabled. With SMP disabled this does not occur and the system runs.

I’m not sure whether this is a BIOS problem (seems likely) or something that 
could handled after dealing with the fault in efirt_support.S.

While looking I found the code below that looks wrong in efi_enter(), but that 
is not the problem in this case.

Just adding this to the archive in case someone else looks more closely later.

Regards,

Jan M.


--- a/src/sys/dev/efidev/efirt.c2018-11-19 15:43:47.0 1100
+++ b/src/sys/dev/efidev/efirt.c2018-11-19 15:43:47.0 1100
@@ -245,6 +245,7 @@
 static int
 efi_enter(void)
 {
+   int error;
struct thread *td;
pmap_t curpmap;
 
@@ -255,7 +256,14 @@
PMAP_LOCK(curpmap);
mtx_lock(&efi_lock);
fpu_kern_enter(td, NULL, FPU_KERN_NOCTX);
-   return (efi_arch_enter());
+   error = efi_arch_enter();
+   if (error != 0) {
+   fpu_kern_leave(td, NULL);
+   mtx_unlock(&efi_lock);
+   PMAP_UNLOCK(curpmap);
+   }
+
+   return (error);
 }
 
 static void


> On 31 May 2019, at 12:26, Jan Martin Mikkelsen  
> wrote:
> 
> Hi,
> 
> I see exactly the same stacktrace on a Celeron J1900 based system with 
> 12.0-p5 when using a UEFI boot. With a non-UEFI boot it works fine (except vt 
> not working until the new 915kms.ko is loaded). With safe mode on it also 
> works fine.
> 
> Did you find any more information?
> 
> Regards,
> 
> Jan.
> 
>> On 25 Nov 2018, at 19:26, Christian Ullrich  wrote:
>> 
>> Hello,
>> 
>> I have a reproducible panic booting 12-RC2 and stable/12, 2cf4a7e0d8 
>> from Friday, on a Jetway JNF9HG board, Celeron N2930 CPU, booting with 
>> UEFI. The same box has no problems with stable/11 18f83cbbc9 from Thursday.
>> 
>> There is no serial console on the box right now, but the last screenful 
>> of boot output is this (from the -RC2; the panic'ing symbol is the same 
>> with the stable/12 kernel):
>> 
>> random: entropy device external interface
>> kbd1 at kbdmux0
>> netmap: loaded module
>> [ath_hal] loaded
>> module_register_init: MOD_LOAD (vesa, 0x810f8750, 0) error 19
>> random: registering fast source Intel Secure Key RNG
>> random: fast provider: "Intel Secure Key RNG"
>> nexus0
>> kernel trap 12 with interrupts disabled
>> kernel trap 12 with interrupts disabled
>> cryptosoft0:  on motherboard
>> acpi0: <_> on motherboard
>> panic: smp_targeted_tlb_shootdown: interrupts disabled
>> cpuid = 2
>> time = 1
>> KDB: stack backtrace:
>> #0 0x80be74a7 at kdb_backtrace+0x67
>> #1 0x80b9b093 at vpanic+0x1a3
>> #2 0x80b9aee3 at panic+0x43
>> #3 0x811eda2f at smp_targeted_tlb_shootdown+0x40f
>> #4 0x811ed60d at smp_masked_invltlb+0x3d
>> #5 0x8105d5c5 at pmap_invalidate_range+0x1b5
>> #6 0x8106a429 at pmap_change_attr_locked+0x859
>> #7 0x81069804 at pmap_mapdev_internal+0x424
>> #8 0x81075ed0 at pcie_cfgregopen+0x60
>> #9 0x80451f10 at acpi_attach+0x390
>> #10 0x80bd6efc at device_attach+0x3ec
>> #11 0x80bd81dc at bus_generic_attach+0x5c
>> #12 0x80bd6efc at device_attach+0x3ec  [sic!]
>> #13 0x80bd88b8 at bus_generic_new_pass+0x118
>> #14 0x80bda577 at root_bus_configure+0x77
>> #15 0x811dbce9 at configure+0x9
>> #16 0x80b31a78 at mi_startup+0x118
>> #17 0x8034102c at btext+0x2c
>> Uptime: 1s
>> Automatic reboot in 15 seconds - press a key on the console to abort
>> 
>> If it matters, the build from svn was with CPUTYPE=slm, the -RC2 is 
>> FreeBSD-12.0-RC2-amd64-mini-memstick.img, i.e. without CPUTYPE. I have 
>> been running stable/11 with CPUTYPE=slm on this and other identical CPUs 
>> for a long time with no trouble, so I think it is unrelated.
>> 
>> I'd really like to upgrade to 12. If anyone can suggest

Re: Panic booting 12-RC2 on amd64

2019-05-31 Thread Jan Martin Mikkelsen
Hi,

I see exactly the same stacktrace on a Celeron J1900 based system with 12.0-p5 
when using a UEFI boot. With a non-UEFI boot it works fine (except vt not 
working until the new 915kms.ko is loaded). With safe mode on it also works 
fine.

Did you find any more information?

Regards,

Jan.

> On 25 Nov 2018, at 19:26, Christian Ullrich  wrote:
> 
> Hello,
> 
> I have a reproducible panic booting 12-RC2 and stable/12, 2cf4a7e0d8 
> from Friday, on a Jetway JNF9HG board, Celeron N2930 CPU, booting with 
> UEFI. The same box has no problems with stable/11 18f83cbbc9 from Thursday.
> 
> There is no serial console on the box right now, but the last screenful 
> of boot output is this (from the -RC2; the panic'ing symbol is the same 
> with the stable/12 kernel):
> 
> random: entropy device external interface
> kbd1 at kbdmux0
> netmap: loaded module
> [ath_hal] loaded
> module_register_init: MOD_LOAD (vesa, 0x810f8750, 0) error 19
> random: registering fast source Intel Secure Key RNG
> random: fast provider: "Intel Secure Key RNG"
> nexus0
> kernel trap 12 with interrupts disabled
> kernel trap 12 with interrupts disabled
> cryptosoft0:  on motherboard
> acpi0: <_> on motherboard
> panic: smp_targeted_tlb_shootdown: interrupts disabled
> cpuid = 2
> time = 1
> KDB: stack backtrace:
> #0 0xffff80be74a7 at kdb_backtrace+0x67
> #1 0x80b9b093 at vpanic+0x1a3
> #2 0x80b9aee3 at panic+0x43
> #3 0x811eda2f at smp_targeted_tlb_shootdown+0x40f
> #4 0x811ed60d at smp_masked_invltlb+0x3d
> #5 0x8105d5c5 at pmap_invalidate_range+0x1b5
> #6 0x8106a429 at pmap_change_attr_locked+0x859
> #7 0x81069804 at pmap_mapdev_internal+0x424
> #8 0x81075ed0 at pcie_cfgregopen+0x60
> #9 0x80451f10 at acpi_attach+0x390
> #10 0x80bd6efc at device_attach+0x3ec
> #11 0x80bd81dc at bus_generic_attach+0x5c
> #12 0x80bd6efc at device_attach+0x3ec  [sic!]
> #13 0x80bd88b8 at bus_generic_new_pass+0x118
> #14 0x80bda577 at root_bus_configure+0x77
> #15 0x811dbce9 at configure+0x9
> #16 0x80b31a78 at mi_startup+0x118
> #17 0x8034102c at btext+0x2c
> Uptime: 1s
> Automatic reboot in 15 seconds - press a key on the console to abort
> 
> If it matters, the build from svn was with CPUTYPE=slm, the -RC2 is 
> FreeBSD-12.0-RC2-amd64-mini-memstick.img, i.e. without CPUTYPE. I have 
> been running stable/11 with CPUTYPE=slm on this and other identical CPUs 
> for a long time with no trouble, so I think it is unrelated.
> 
> I'd really like to upgrade to 12. If anyone can suggest something I can 
> try, I'll be happy to do experiments.
> 
> -- 
> Christian
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE-p4 kernel panic on i386 boot

2019-05-16 Thread Rob Belics
This morning, RELEASE-p5 came about. I did a freebsd-update without issue.
However, as I said, I am running amd64 and not i386 on my server. So there
must be something more involved here.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193361] [panic] panic pagedaemon

2019-05-16 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193361

Gleb Popov  changed:

   What|Removed |Added

 CC||arr...@freebsd.org
 Resolution|--- |Overcome By Events
 Status|New |Closed

--- Comment #3 from Gleb Popov  ---
This FreeBSD version is not supported anymore, and the bug report doesn't seem
to actually contain any useful information.

Closing it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193366] [panic] panic vnlru

2019-05-16 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193366

Konstantin Belousov  changed:

   What|Removed |Added

 Status|New |Closed
 Resolution|--- |Overcome By Events
 CC||k...@freebsd.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193366] [panic] panic vnlru

2019-05-16 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193366

Gleb Popov  changed:

   What|Removed |Added

 CC||arr...@freebsd.org

--- Comment #1 from Gleb Popov  ---
This FreeBSD version is not supported anymore, and the bug report doesn't seem
to actually contain any useful information. Should we close it?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE-p4 kernel panic on i386 boot

2019-05-16 Thread Rob Belics
Possibly same issue on amd64 server in my VPS but my laptop updated just
fine.

vm_fault_hold: fault on nofault entry, addr: 0
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE-p4 kernel panic on i386 boot

2019-05-15 Thread Ed Maste
On Wed, 15 May 2019 at 00:03, Ed Maste  wrote:
>
> On Wed, 15 May 2019 at 15:09, wintellect Auser  
> wrote:
> >
> > Hi all,
> >
> > Wanted to make you aware of an issue I have encountered, sorry if this is
> > the wrong list.
>
> This is the right place and thank you for reporting. Looking into it.

It looks like a new update for 12.0 i386 will be needed and will be
rolled out as soon as possible.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE-p4 kernel panic on i386 boot

2019-05-15 Thread Ed Maste
On Wed, 15 May 2019 at 15:09, wintellect Auser  wrote:
>
> Hi all,
>
> Wanted to make you aware of an issue I have encountered, sorry if this is
> the wrong list.

This is the right place and thank you for reporting. Looking into it.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 12.0-RELEASE-p4 kernel panic on i386 boot

2019-05-15 Thread Dan Langille
> Hi all,
> 
> Wanted to make you aware of an issue I have encountered, sorry if this is
> the wrong list.
> 
> I upgraded from FreeBSD 12.0-RELEASE-p3 to p4 using:
> 
> freebsd-fetch update
> freebsd-fetch install
> 
> and use the GENERIC kernel. Upon reboot the system kernel panics when
> attempting to mount the filesystem read-write. This also happens in
> single-user mode if selected at boot.
> 
> Selecting the kernel.old from the boot menu boots the system with 12-p3 and
> all works fine. I have uploaded a screenshot here:
> 
> https://imagebin.ca/v/4hCc2Kk5YqCX
> 
> The computer is an i386 system.

I also upgraded using "freebsd-update fetch install".

I also went from -p3 to -p4 on an i386.

My screen shot is here: https://twitter.com/DLangille/status/1128734141569208320

Hope this helps.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


12.0-RELEASE-p4 kernel panic on boot

2019-05-15 Thread wintellect Auser
Hi all,

Wanted to make you aware of an issue I have encountered, sorry if this is
the wrong list.

I upgraded from FreeBSD 12.0-RELEASE-p3 to p4 using:

freebsd-fetch update
freebsd-fetch install

and use the GENERIC kernel. Upon reboot the system kernel panics when
attempting to mount the filesystem read-write. This also happens in
single-user mode if selected at boot.

Selecting the kernel.old from the boot menu boots the system with 12-p3 and
all works fine. I have attached a screenshot.

The computer is an i386 system.

Thanks

wintellect
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


12.0-RELEASE-p4 kernel panic on i386 boot

2019-05-15 Thread wintellect Auser
Hi all,

Wanted to make you aware of an issue I have encountered, sorry if this is
the wrong list.

I upgraded from FreeBSD 12.0-RELEASE-p3 to p4 using:

freebsd-fetch update
freebsd-fetch install

and use the GENERIC kernel. Upon reboot the system kernel panics when
attempting to mount the filesystem read-write. This also happens in
single-user mode if selected at boot.

Selecting the kernel.old from the boot menu boots the system with 12-p3 and
all works fine. I have uploaded a screenshot here:

https://imagebin.ca/v/4hCc2Kk5YqCX

The computer is an i386 system.

Thanks

wintellect
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic during reboot involving softclock_call_cc(), nd6_timer() and nd6_dad_start()

2019-04-17 Thread Trond Endrestøl
On Wed, 17 Apr 2019 12:05+0200, Trond Endrestøl wrote:

> On Wed, 17 Apr 2019 12:41+0300, Andrey V. Elsukov wrote:
> 
> > On 15.04.2019 16:31, Trond Endrestøl wrote:
> > > Has anyone else witnessed a panic during reboot involving 
> > > softclock_call_cc(), nd6_timer(), and nd6_dad_start()?
> > > 
> > > The stack trace goes more or less like this:
> > > 
> > > db_trace_self_wrapper()
> > > vpanic()
> > > panic()
> > > trap_fatal()
> > > trap()
> > > calltrap()
> > > nd6_dad_start()
> > > nd6_timer()
> > > softclock_call_cc()
> > > softclock()
> > > ithread_loop()
> > > fork_exit()
> > > fork_trampoline()
> > > 
> > > This was last seen while transitioning from r345628 to r346220 on 
> > > amd64 stable/12.
> > 
> > Hi,
> > 
> > do you have exact panic message and/or backtrace from core dump?
> 
> Here's another system I had to shut down recently:
> 
> root@HOSTNAME:/var/crash # kgdb /boot/kernel/kernel vmcore.0
> [...]
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x410
> fault code  = supervisor read data  , page not present
> instruction pointer = 0x20:0x807ea33d
> stack pointer   = 0x28:0xfe005ad3c8d0
> frame pointer   = 0x28:0xfe005ad3c960
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 12 (swi4: clock (0))
> trap number = 12
> panic: page fault
> cpuid = 0
> time = 1555402802
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x8054125b = 
> db_trace_self_wrapper+0x2b/frame 0xfe005ad3c570
> vpanic() at 0x8080aae4 = vpanic+0x1b4/frame 0xfe005ad3c5d0
> panic() at 0x8080a923 = panic+0x43/frame 0xfe005ad3c630
> trap_fatal() at 0x80b76244 = trap_fatal+0x394/frame 0xfe005ad3c690
> trap_pfault() at 0x80b762a9 = trap_pfault+0x49/frame 
> 0xfe005ad3c6f0
> trap() at 0x80b7588f = trap+0x29f/frame 0xfe005ad3c800
> calltrap() at 0x80b514c5 = calltrap+0x8/frame 0xfe005ad3c800
> --- trap 0xc, rip = 0x807ea33d, rsp = 0xfe005ad3c8d0, rbp = 
> 0xfe005ad3c960 ---
> __mtx_lock_sleep() at 0x807ea33d = __mtx_lock_sleep+0xbd/frame 
> 0xfe005ad3c960
> mld_fasttimo() at 0x80a3ae32 = mld_fasttimo+0x492/frame 
> 0xfe005ad3ca50
> pffasttimo() at 0x80899fa4 = pffasttimo+0x54/frame 0xfe005ad3ca80
> softclock_call_cc() at 0x80824e0e = softclock_call_cc+0x12e/frame 
> 0xfe005ad3cb30
> softclock() at 0x808252f9 = softclock+0x79/frame 0xfe005ad3cb50
> ithread_loop() at 0x807cd824 = ithread_loop+0x1d4/frame 
> 0xfe005ad3cbb0
> fork_exit() at 0x807ca2d3 = fork_exit+0x83/frame 0xfe005ad3cbf0
> fork_trampoline() at 0x80b524be = fork_trampoline+0xe/frame 
> 0xfe005ad3cbf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> Uptime: 34d16h8m2s
> Dumping 4593 out of 12258 
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> This particular system uses lagg0 comprised of bce0, bce1, em0, and 
> em1. Also, it runs a custom kernel.
> 
> > It would be good to submit PR about such problems.
> 
> I'll submit the details in a PR.

PR is 237329.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237329

-- 
Trond.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic during reboot involving softclock_call_cc(), nd6_timer() and nd6_dad_start()

2019-04-17 Thread Trond Endrestøl
On Wed, 17 Apr 2019 12:41+0300, Andrey V. Elsukov wrote:

> On 15.04.2019 16:31, Trond Endrestøl wrote:
> > Has anyone else witnessed a panic during reboot involving 
> > softclock_call_cc(), nd6_timer(), and nd6_dad_start()?
> > 
> > The stack trace goes more or less like this:
> > 
> > db_trace_self_wrapper()
> > vpanic()
> > panic()
> > trap_fatal()
> > trap()
> > calltrap()
> > nd6_dad_start()
> > nd6_timer()
> > softclock_call_cc()
> > softclock()
> > ithread_loop()
> > fork_exit()
> > fork_trampoline()
> > 
> > This was last seen while transitioning from r345628 to r346220 on 
> > amd64 stable/12.
> 
> Hi,
> 
> do you have exact panic message and/or backtrace from core dump?

Here's another system I had to shut down recently:

root@HOSTNAME:/var/crash # kgdb /boot/kernel/kernel vmcore.0
[...]
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x410
fault code  = supervisor read data  , page not present
instruction pointer = 0x20:0x807ea33d
stack pointer   = 0x28:0xfe005ad3c8d0
frame pointer   = 0x28:0xfe005ad3c960
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock (0))
trap number = 12
panic: page fault
cpuid = 0
time = 1555402802
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffff8054125b = 
db_trace_self_wrapper+0x2b/frame 0xfe005ad3c570
vpanic() at 0x8080aae4 = vpanic+0x1b4/frame 0xfe005ad3c5d0
panic() at 0x8080a923 = panic+0x43/frame 0xfe005ad3c630
trap_fatal() at 0x80b76244 = trap_fatal+0x394/frame 0xfe005ad3c690
trap_pfault() at 0x80b762a9 = trap_pfault+0x49/frame 0xfe005ad3c6f0
trap() at 0x80b7588f = trap+0x29f/frame 0xfe005ad3c800
calltrap() at 0x80b514c5 = calltrap+0x8/frame 0xfe005ad3c800
--- trap 0xc, rip = 0x807ea33d, rsp = 0xfe005ad3c8d0, rbp = 
0xfe005ad3c960 ---
__mtx_lock_sleep() at 0x807ea33d = __mtx_lock_sleep+0xbd/frame 
0xfe005ad3c960
mld_fasttimo() at 0x80a3ae32 = mld_fasttimo+0x492/frame 
0xfe005ad3ca50
pffasttimo() at 0x80899fa4 = pffasttimo+0x54/frame 0xfe005ad3ca80
softclock_call_cc() at 0x80824e0e = softclock_call_cc+0x12e/frame 
0xfe005ad3cb30
softclock() at 0x808252f9 = softclock+0x79/frame 0xfe005ad3cb50
ithread_loop() at 0x807cd824 = ithread_loop+0x1d4/frame 
0xfe005ad3cbb0
fork_exit() at 0x807ca2d3 = fork_exit+0x83/frame 0xfe005ad3cbf0
fork_trampoline() at 0x80b524be = fork_trampoline+0xe/frame 
0xfe005ad3cbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 34d16h8m2s
Dumping 4593 out of 12258 
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

This particular system uses lagg0 comprised of bce0, bce1, em0, and 
em1. Also, it runs a custom kernel.

> It would be good to submit PR about such problems.

I'll submit the details in a PR.

-- 
Trond.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Panic during reboot involving softclock_call_cc(), nd6_timer() and nd6_dad_start()

2019-04-17 Thread Andrey V. Elsukov
On 15.04.2019 16:31, Trond Endrestøl wrote:
> Has anyone else witnessed a panic during reboot involving 
> softclock_call_cc(), nd6_timer(), and nd6_dad_start()?
> 
> The stack trace goes more or less like this:
> 
> db_trace_self_wrapper()
> vpanic()
> panic()
> trap_fatal()
> trap()
> calltrap()
> nd6_dad_start()
> nd6_timer()
> softclock_call_cc()
> softclock()
> ithread_loop()
> fork_exit()
> fork_trampoline()
> 
> This was last seen while transitioning from r345628 to r346220 on 
> amd64 stable/12.

Hi,

do you have exact panic message and/or backtrace from core dump?
It would be good to submit PR about such problems.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: Panic during reboot involving softclock_call_cc(), nd6_timer() and nd6_dad_start()

2019-04-15 Thread Trond Endrestøl
On Mon, 15 Apr 2019 15:31+0200, Trond Endrestøl wrote:

> Has anyone else witnessed a panic during reboot involving 
> softclock_call_cc(), nd6_timer(), and nd6_dad_start()?
> 
> The stack trace goes more or less like this:
> 
> db_trace_self_wrapper()
> vpanic()
> panic()
> trap_fatal()
> trap()
> calltrap()
> nd6_dad_start()
> nd6_timer()
> softclock_call_cc()
> softclock()
> ithread_loop()
> fork_exit()
> fork_trampoline()
> 
> This was last seen while transitioning from r345628 to r346220 on 
> amd64 stable/12.

The NIC in question is a Chelsio T6225-CR, cxgbe(4), using the cc0 
port only.

-- 
Trond.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Panic during reboot involving softclock_call_cc(), nd6_timer() and nd6_dad_start()

2019-04-15 Thread Trond Endrestøl
Hi,

Has anyone else witnessed a panic during reboot involving 
softclock_call_cc(), nd6_timer(), and nd6_dad_start()?

The stack trace goes more or less like this:

db_trace_self_wrapper()
vpanic()
panic()
trap_fatal()
trap()
calltrap()
nd6_dad_start()
nd6_timer()
softclock_call_cc()
softclock()
ithread_loop()
fork_exit()
fork_trampoline()

This was last seen while transitioning from r345628 to r346220 on 
amd64 stable/12.

-- 
Trond.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: more fun, upgrading from 10.3-STABLE 10.4-RELENG to 11.2-RELENG - kernel panic

2019-03-02 Thread Miroslav Lachman

Lee Damon wrote on 2019/03/02 01:36:

On 3/1/19 15:38 , Miroslav Lachman wrote:

Did you tried to boot "safe mode"? (selectable in boot menu).


I completely forgot about safe mode.

Yep. It boots. I'm going to finish the freebsd-update process then 
reboot into safe mode again. I'm out of time to work on this today and 
am only in this lab on Fridays so I'll have to pick up working on this 
problem next Friday.


Glad to know something finally works :)

You can look in to /boot/menu-commands.4th there is definition what Safe 
Mode disable



: safemode_enabled? ( -- flag )
s" kern.smp.disabled" getenv -1 <> dup if
swap drop ( c-addr flag -- flag )
then
;

: safemode_enable ( -- )
s" set kern.smp.disabled=1" evaluate
s" set hw.ata.ata_dma=0" evaluate
s" set hw.ata.atapi_dma=0" evaluate
s" set hw.ata.wc=0" evaluate
s" set hw.eisa_slots=0" evaluate
s" set kern.eventtimer.periodic=1" evaluate
s" set kern.geom.part.check_integrity=0" evaluate
;

You can play with these items one by one to find what is the root cause 
in your case.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: more fun, upgrading from 10.3-STABLE 10.4-RELENG to 11.2-RELENG - kernel panic

2019-03-01 Thread Lee Damon via freebsd-stable

On 3/1/19 15:38 , Miroslav Lachman wrote:

Did you tried to boot "safe mode"? (selectable in boot menu).


I completely forgot about safe mode.

Yep. It boots. I'm going to finish the freebsd-update process then 
reboot into safe mode again. I'm out of time to work on this today and 
am only in this lab on Fridays so I'll have to pick up working on this 
problem next Friday.


Thanks for the help,
nomad
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: more fun, upgrading from 10.3-STABLE 10.4-RELENG to 11.2-RELENG - kernel panic

2019-03-01 Thread Miroslav Lachman

Lee Damon wrote on 2019/03/02 00:06:


Darn it. I get the same kernel panic with that one.

I'm compiling locally but I don't expect that to make any difference. 
I'll need to go pawing through the release notes and see if there are 
any references to deprecated hardware that might be involved.


I'm attaching a copy of dmesg output from a successful boot into 
10.4-STABLE. The kernel panic appears to happen around 15% of the way 
into the output, around


I am running 11.2 on SunFire X2100 M2 but according to your dmesg it 
uses different chips. X2100 M2 has nVidia nForce MCP55 chipset for ATA 
devices, nfe for 2 NICs and Broadcom bge for the other 2 NIC's.


Did you tried to boot "safe mode"? (selectable in boot menu).
Or you can try to disable / enable some settings in the BIOS. Something 
related to USB or onboard VGA etc. may help.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: more fun, upgrading from 10.3-STABLE 10.4-RELENG to 11.2-RELENG - kernel panic

2019-03-01 Thread Lee Damon via freebsd-stable

On 3/1/19 14:19 , Miroslav Lachman wrote:
If you can boot with the old 10.4 kernel and go online, just fetch 
kernel.txz from the net: 
http://ftp.freebsd.org/pub/FreeBSD/releases/amd64/11.2-RELEASE/kernel.txz and 
unpack it to /boot/kernel112 then you can try to reboot a manually 
select to boot this kernel instead of default /boot/kernel.


Darn it. I get the same kernel panic with that one.

I'm compiling locally but I don't expect that to make any difference. 
I'll need to go pawing through the release notes and see if there are 
any references to deprecated hardware that might be involved.


I'm attaching a copy of dmesg output from a successful boot into 
10.4-STABLE. The kernel panic appears to happen around 15% of the way 
into the output, around


...
mvsch13:  at channel 5 on mvs1
mvsch14:  at channel 6 on mvs1
mvsch15:  at channel 7 on mvs1
pcib3:  at device 6.0 on pci0
pci3:  on pcib3
ohci0:  mem 0xfd1fe000-0xfd1fefff irq 19 
at device 0.0 on pci3

usbus0 on ohci0
ohci1:  mem 0xfd1fd000-0xfd1fdfff irq 19 
at device 0.1 on pci3

usbus1 on ohci1
...

(Just before it enumerates vgapci0)

but I can't be sure because the screen moves so fast that even slow-mo 
video is just a blur.


nomad
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.4-STABLE #25 r342947: Fri Jan 11 14:17:40 PST 2019
l...@goose.ee.washington.edu:/usr/obj/usr/src/sys/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Dual Core AMD Opteron(tm) Processor 290 (2792.11-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x20f12  Family=0xf  Model=0x21  Stepping=2
  
Features=0x178bfbff
  Features2=0x1
  AMD Features=0xe2500800
  AMD Features2=0x3
real memory  = 17179869184 (16384 MB)
avail memory = 16418484224 (15657 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
random:  initialized
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-30 on motherboard
ioapic2  irqs 31-37 on motherboard
ioapic3  irqs 38-44 on motherboard
ioapic4  irqs 45-51 on motherboard
ioapic5  irqs 52-58 on motherboard
ioapic6  irqs 59-65 on motherboard
ioapic7  irqs 66-72 on motherboard
ioapic8  irqs 73-79 on motherboard
ioapic9  irqs 80-86 on motherboard
ioapic10  irqs 87-93 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
acpi0: Power Button (fixed)
cpu0:  on acpi0
cpu1:  on acpi0
cpu2:  on acpi0
cpu3:  on acpi0
attimer0:  port 0x40-0x43 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0:  port 0x70-0x71 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
hpet0:  iomem 0xfec01000-0xfec013ff irq 0,8 on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0
pcib0:  on acpi0
pci0:  on pcib0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
mvs0:  port 0x7c00-0x7cff mem 
0xfae0-0xfaef irq 24 at device 1.0 on pci1
mvs0: Gen-II, 8 3Gbps ports, Port Multiplier supported
mvsch0:  at channel 0 on mvs0
mvsch1:  at channel 1 on mvs0
mvsch2:  at channel 2 on mvs0
mvsch3:  at channel 3 on mvs0
mvsch4:  at channel 4 on mvs0
mvsch5:  at channel 5 on mvs0
mvsch6:  at channel 6 on mvs0
mvsch7:  at channel 7 on mvs0
pcib2:  at device 2.0 on pci0
pci2:  on pcib2
mvs1:  port 0x8c00-0x8cff mem 
0xfb00-0xfb0f irq 32 at device 1.0 on pci2
mvs1: Gen-II, 8 3Gbps ports, Port Multiplier supported
mvsch8:  at channel 0 on mvs1
mvsch9:  at channel 1 on mvs1
mvsch10:  at channel 2 on mvs1
mvsch11:  at channel 3 on mvs1
mvsch12:  at channel 4 on mvs1
mvsch13:  at channel 5 on mvs1
mvsch14:  at channel 6 on mvs1
mvsch15:  at channel 7 on mvs1
pcib3:  at device 6.0 on pci0
pci3:  on pcib3
ohci0:  mem 0xfd1fe000-0xfd1fefff irq 19 at 
device 0.0 on pci3
usbus0 on ohci0
ohci1:  mem 0xfd1fd000-0xfd1fdfff irq 19 at 
device 0.1 on pci3
usbus1 on ohci1
vgapci0:  port 0x9800-0x98ff mem 
0xfc00-0xfcff,0xfd1ff000-0xfd1f irq 16 at device 3.0 on pci3
vgapci0: Boot video device
ohci2:  mem 0xfd1fc000-0xfd1fcfff irq 17 at device 
4.0 on pci3
usbus2 on ohci2
ohci3:  mem 0xfd1fb000-0xfd1fbfff irq 18 at device 
4.1 on pci3
usbus3 on ohci3
ehci0:  mem 0xfd1fac00-0xfd1facff irq 19 at 
device 4.2 on pci3
usbus4: EHCI version 1.0
usbus4 on ehci0
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
ata0:  a

Re: more fun, upgrading from 10.3-STABLE 10.4-RELENG to 11.2-RELENG - kernel panic

2019-03-01 Thread Miroslav Lachman

Lee Damon via freebsd-stable wrote on 2019/03/01 22:53:
After discussion with Bob Bishop (thanks for the help!) I've tried to do 
the following to upgrade one of the old boxes I mentioned previously.


cd /usr/src
tar ... .
rm -rf .??* *
svn checkout httpg://svn.freebsd.org/base/releng/10.3 /usr/src
compile, installkernel, installworld...

Now that the host is running RELENG the next step was to update from 
10.4 to 11.2 via freebsd-update


freebsd-update
freebsd-install
freebsd-update upgrade -r 11.2-RELEASE
freebsd-update install

so far, so good. Now it all falls apart

shutdown -r now
... why isn't the host coming back? Oh look, kernel panic.

   Fatal trap 12: page fault while in kernel mode
   cpuid = 1; apci id = 01
   fault virtual address = 0x84
   fault code = supervisor read data, page not present


I went back from freebsd-update to source upgrades few years ago and now 
use exclusively source builds (build it on powerful build machine and 
distribute it to clients thru NFS so clients can just run make 
installkernel and make installworld) because I was bitten by failed 
freebsd-update upgrade many times...


Google searches find references to the same panic type in VMs running 
11.1, including https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923


The differences are, that's 11.1 not 11.2 (I would presume the fix made 
it into 11.2 but maybe not) and most notably, that's against VMs and the 
host I'm doing this on is bare iron (Sun x4500).


Still, I gave the two entries in /boot/loader.conf a try, no joy. 
Exactly the same panic. Recording the boot with slow-mo shows the panic 
happening just after the USB devices are enumerated by the kernel. It 
never even tries to mount root.


I am able to boot to kernel.old, which appears to be my old 10.4-STABLE 
kernel. So now I'm kind of stuck. The update has already modified the 
config files as part of the first pass so rolling back may be a problem 
and moving forward seems unwise.


I have only one x4500 but I have three x4540s running 11.2-STABLE (also 
installed from source) just fine.


Anyone have any brilliant suggestions? I'm thinking of trying to compile 
11.2-RELENG in /usr/src so I can try installing that kernel but that'll 
take several hours at least (it's an old box).


If you can boot with the old 10.4 kernel and go online, just fetch 
kernel.txz from the net: 
http://ftp.freebsd.org/pub/FreeBSD/releases/amd64/11.2-RELEASE/kernel.txz 
and unpack it to /boot/kernel112 then you can try to reboot a manually 
select to boot this kernel instead of default /boot/kernel.

If you cannot access the boot loader prompt you can try "nextboot" command.
1) unpack the kernel
2) set nextboot: nextboot -k kernel112
3) shutdown -r now and hope for a luck

If your machine boots fine with 11.2 kernel, you can fetch sources and 
rebuild kernel and userland for 11.2 as usual.
Or you can try to fetch and unpack base.txz 
http://ftp.freebsd.org/pub/FreeBSD/releases/amd64/11.2-RELEASE/base.txz 
over your current files. It can make a mess but you can always clean it 
with "make delete-old & make delete-old-libs"


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


more fun, upgrading from 10.3-STABLE 10.4-RELENG to 11.2-RELENG - kernel panic

2019-03-01 Thread Lee Damon via freebsd-stable
After discussion with Bob Bishop (thanks for the help!) I've tried to do 
the following to upgrade one of the old boxes I mentioned previously.


cd /usr/src
tar ... .
rm -rf .??* *
svn checkout httpg://svn.freebsd.org/base/releng/10.3 /usr/src
compile, installkernel, installworld...

Now that the host is running RELENG the next step was to update from 
10.4 to 11.2 via freebsd-update


freebsd-update
freebsd-install
freebsd-update upgrade -r 11.2-RELEASE
freebsd-update install

so far, so good. Now it all falls apart

shutdown -r now
... why isn't the host coming back? Oh look, kernel panic.

  Fatal trap 12: page fault while in kernel mode
  cpuid = 1; apci id = 01
  fault virtual address = 0x84
  fault code = supervisor read data, page not present

Google searches find references to the same panic type in VMs running 
11.1, including https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220923


The differences are, that's 11.1 not 11.2 (I would presume the fix made 
it into 11.2 but maybe not) and most notably, that's against VMs and the 
host I'm doing this on is bare iron (Sun x4500).


Still, I gave the two entries in /boot/loader.conf a try, no joy. 
Exactly the same panic. Recording the boot with slow-mo shows the panic 
happening just after the USB devices are enumerated by the kernel. It 
never even tries to mount root.


I am able to boot to kernel.old, which appears to be my old 10.4-STABLE 
kernel. So now I'm kind of stuck. The update has already modified the 
config files as part of the first pass so rolling back may be a problem 
and moving forward seems unwise.


I have only one x4500 but I have three x4540s running 11.2-STABLE (also 
installed from source) just fine.


Anyone have any brilliant suggestions? I'm thinking of trying to compile 
11.2-RELENG in /usr/src so I can try installing that kernel but that'll 
take several hours at least (it's an old box).


nomad
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193386] [panic] resource_list_alloc: resource entry is busy

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193386

--- Comment #1 from Rodney W. Grimes  ---
Please do not put bugs on stable@, current@, hackers@, etc

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193364] [panic] ffs_blkfree_cg: freeing free block

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193364

Rodney W. Grimes  changed:

   What|Removed |Added

 CC|sta...@freebsd.org  |

--- Comment #5 from Rodney W. Grimes  ---
Please do not put bugs on stable@, current@, hackers@, etc

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193360] [panic] [syscons] random syscons panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193360

Rodney W. Grimes  changed:

   What|Removed |Added

 CC|sta...@freebsd.org  |

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235683] [zfs] Panic during data access or scrub on 12.0-STABLE r343904 (blkptr at DVA 0 has invalid OFFSET)

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683

Rodney W. Grimes  changed:

   What|Removed |Added

 CC|sta...@freebsd.org  |rgri...@freebsd.org

--- Comment #1 from Rodney W. Grimes  ---
Please do not put bugs on stable@, current@, hackers@, etc

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

Rodney W. Grimes  changed:

   What|Removed |Added

 CC|sta...@freebsd.org  |

--- Comment #15 from Rodney W. Grimes  ---
Please do not put bugs on stable@, current@, hackers@, etc

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Networking panic on 12 - found the cause

2019-02-12 Thread Pete French

Thanks guys! That was fast

On 12/02/2019 20:13, Kristof Provost wrote:

On 2019-02-12 13:54:21 (-0600), Eric van Gyzen  wrote:
 > I see the same behavior on head (and stable/12).
 >
 > (kgdb) f
 > #16 0x80ce5331 in ether_output_frame (ifp=0xf80003672800,
 > m=0xf8000c88b100) at /usr/src/sys/net/if_ethersubr.c:468
 > 468 switch (pfil_run_hooks(V_link_pfil_head, &m, ifp, PFIL_OUT,
 >
 > 0x80ce5321 <+81>: mov %gs:0x0,%rax
 > 0x80ce532a <+90>: mov 0x500(%rax),%rax
 > => 0x80ce5331 <+97>: mov 0x28(%rax),%rax
 >
 > I think this is part of the V_link_pfil_head. I'm not very familiar
 > with vnet. Does this need a CURVNET_SET(), maybe in garp_rexmit()?
 >
Yes. I posted a proposed patch in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235699 


Basically we get called through a timer, so there's no vnet context. It
needs to be set, and then we can safely use any V_ variables.

Regards,
Kristof

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #14 from Andrey V. Elsukov  ---
(In reply to Sergey Anokhin from comment #13)
> (In reply to Andrey V. Elsukov from comment #11)
> 
> I'd preferred to try to rebuild kernel if it's no difference between turning
> off VIMAGE from kernel config and applying patch because kernel building
> more faster then "world" building. As far as I understand, you are propose
> patch for "world" component, right?

No, the patch is for kernel.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #13 from Sergey Anokhin  ---
(In reply to Andrey V. Elsukov from comment #11)

I'd preferred to try to rebuild kernel if it's no difference between turning
off VIMAGE from kernel config and applying patch because kernel building more
faster then "world" building. As far as I understand, you are propose patch for
"world" component, right?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #12 from Sergey Anokhin  ---
(In reply to Andrey V. Elsukov from comment #9)

Sure, now I'm building kernel without VIMAGE. I'll let you know about testing
result

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #11 from Andrey V. Elsukov  ---
Created attachment 201968
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=201968&action=edit
Proposed patch

Also, you can test this patch instead, it should fix panic with VIMAGE option.
The problem is due to introduced deferred PCB destroying via epoch_call().
Since this code is executed from gtaskqueue, it has no VNET context.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #10 from Sergey Anokhin  ---
(In reply to Jan Bramkamp from comment #6)

Will it ok?

(pts/1)[root@server:~]# sysctl kern.maxssiz=1073741824
kern.maxssiz: 536870912 -> 1073741824
(pts/1)[root@server:~]# /usr/local/etc/rc.d/racoon onestart
Starting racoon.
(pts/1)[root@server:~]# /usr/local/etc/rc.d/racoon onestop
Stopping racoon.
Waiting for PIDS: 5662

kernel panic

btw, I've noticed that kernel panic during stopping racoon.

# kgdb kernel /var/crash/vmcore.last
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from kernel...Reading symbols from
/usr/obj/usr/src/amd64.amd64/sys/SERVER/kernel.debug...done.
done.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x28
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80ecd31d
stack pointer   = 0x28:0xfe003fca7a40
frame pointer   = 0x28:0xfe003fca7a60
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (softirq_2)
trap number = 12
panic: page fault
cpuid = 2
time = 1550009599
KDB: stack backtrace:
#0 0xffff80c531c7 at kdb_backtrace+0x67
#1 0x80c07143 at vpanic+0x1a3
#2 0x80c06f93 at panic+0x43
#3 0x8118d9ff at trap_fatal+0x35f
#4 0x8118da59 at trap_pfault+0x49
#5 0x8118d07e at trap+0x29e
#6 0x81168ac5 at calltrap+0x8
#7 0x80eca240 at ipsec_delete_pcbpolicy+0x20
#8 0x80dbaeec at in_pcbfree_deferred+0x6c
#9 0x80c4db1a at epoch_call_task+0x1ca
#10 0x80c51a54 at gtaskqueue_run_locked+0x144
#11 0x80c516b8 at gtaskqueue_thread_loop+0x98
#12 0x80bc6f23 at fork_exit+0x83
#13 0x81169abe at fork_trampoline+0xe
Uptime: 8m33s
Dumping 950 out of 8077 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at ./machine/pcpu.h:230
230 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(OFFSETOF_CURTHREAD));
(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0x80c06d2b in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:446
#3  0xffff80c071a3 in vpanic (fmt=, ap=0xfe003fca7790)
at /usr/src/sys/kern/kern_shutdown.c:872
#4  0x80c06f93 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:799
#5  0x8118d9ff in trap_fatal (frame=0xfe003fca7980, eva=40) at
/usr/src/sys/amd64/amd64/trap.c:929
#6  0x8118da59 in trap_pfault (frame=0xfe003fca7980, usermode=0) at
/usr/src/sys/amd64/amd64/trap.c:765
#7  0x8118d07e in trap (frame=0xfe003fca7980) at
/usr/src/sys/amd64/amd64/trap.c:441
#8  
#9  0x80ecd31d in key_freesp (spp=0xf80211241880) at
/usr/src/sys/netipsec/key.c:1199
#10 0x80eca240 in ipsec_delete_pcbpolicy (inp=0xf800151aa1e8) at
/usr/src/sys/netipsec/ipsec_pcb.c:176
#11 0x80dbaeec in in_pcbfree_deferred (ctx=0xf800151aa3c0) at
/usr/src/sys/netinet/in_pcb.c:1576
#12 0x80c4db1a in epoch_call_task (arg=) at
/usr/src/sys/kern/subr_epoch.c:507
#13 0x80c51a54 in gtaskqueue_run_locked (queue=0xf80003363c00) at
/usr/src/sys/kern/subr_gtaskqueue.c:376
#14 0x80c516b8 in gtaskqueue_thread_loop (arg=) at
/usr/src/sys/kern/subr_gtaskqueue.c:557
#15 0x80bc6f23 in fork_exit (callout=0x80c51620
, arg=0xfe00025f5038, frame=0xfe003fca7c00)
at /usr/src/sys/kern/kern_fork.c:1059
#16 
(kgdb) frame 9
#9  0x80ecd31d in key_freesp (spp=0xf80211241880) at
/usr/src/sys/netipsec/key.c:1199
1199KEYDBG(IPSEC_STAMP,
(kgdb)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #9 from Andrey V. Elsukov  ---
Can you try to remove `option VIMAGE` from your kernel config? It looks like
the problem is similar to the one described in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235699

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Networking panic on 12 - found the cause

2019-02-12 Thread Kristof Provost
On 2019-02-12 13:54:21 (-0600), Eric van Gyzen  wrote:
> I see the same behavior on head (and stable/12).
> 
> (kgdb) f
> #16 0x80ce5331 in ether_output_frame (ifp=0xf80003672800,
> m=0xf8000c88b100) at /usr/src/sys/net/if_ethersubr.c:468
> 468   switch (pfil_run_hooks(V_link_pfil_head, &m, ifp, 
> PFIL_OUT,
> 
>0x80ce5321 <+81>:  mov%gs:0x0,%rax
>0x80ce532a <+90>:  mov0x500(%rax),%rax
> => 0x80ce5331 <+97>:  mov0x28(%rax),%rax
> 
> I think this is part of the V_link_pfil_head.  I'm not very familiar
> with vnet.  Does this need a CURVNET_SET(), maybe in garp_rexmit()?
> 
Yes. I posted a proposed patch in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235699

Basically we get called through a timer, so there's no vnet context. It
needs to be set, and then we can safely use any V_ variables.

Regards,
Kristof
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Networking panic on 12 - found the cause

2019-02-12 Thread Eric van Gyzen
On 2/12/19 8:53 AM, Pete French wrote:
> I found my panic. If I take everything out of rc.conf and loader.conf
> and sysctl.conf and boot the system it works fine when I add an IP
> address. If I add this one line to sysctl.conf
> 
> net.link.ether.inet.garp_rexmit_count=2
> 
> Then I get a panic when I configure the interface:
> 
> root@serpentine-passive:~ #  ifconfig igb0 inet 10.32.10.4/16 up
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x28
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80c987f1
> stack pointer   = 0x28:0xfe4d5730
> frame pointer   = 0x28:0xfe4d5750
> code segment    = base 0x0, limit 0xf, type 0x1b
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process = 12 (swi4: clock (0))
> trap number = 12
> panic: page fault
> cpuid = 0
> time = 1549981620
> KDB: stack backtrace:
> #0 0x80bdfdc7 at kdb_backtrace+0x67
> #1 0x80b93fa3 at vpanic+0x1a3
> #2 0x80b93df3 at panic+0x43
> #3 0x8106a7bf at trap_fatal+0x35f
> #4 0x8106a819 at trap_pfault+0x49
> #5 0x81069e3e at trap+0x29e
> #6 0x810450c5 at calltrap+0x8
> #7 0x80c986f6 at ether_output+0x6b6
> #8 0x80d03354 at arprequest+0x4c4
> #9 0x80d0515c at garp_rexmit+0xbc
> #10 0x80bade19 at softclock_call_cc+0x129
> #11 0x80bae2f9 at softclock+0x79
> #12 0x80b57c57 at ithread_loop+0x1a7
> #13 0x80b54da2 at fork_exit+0x82
> #14 0x810460be at fork_trampoline+0xe

I see the same behavior on head (and stable/12).

(kgdb) f
#16 0x80ce5331 in ether_output_frame (ifp=0xf80003672800,
m=0xf8000c88b100) at /usr/src/sys/net/if_ethersubr.c:468
468 switch (pfil_run_hooks(V_link_pfil_head, &m, ifp, 
PFIL_OUT,

   0x80ce5321 <+81>:mov%gs:0x0,%rax
   0x80ce532a <+90>:mov0x500(%rax),%rax
=> 0x80ce5331 <+97>:mov0x28(%rax),%rax

I think this is part of the V_link_pfil_head.  I'm not very familiar
with vnet.  Does this need a CURVNET_SET(), maybe in garp_rexmit()?

Eric
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #8 from Sergey Anokhin  ---
(In reply to Jan Bramkamp from comment #6)

Did you mean try to set kern.maxssiz into /boot/loader.conf?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #7 from Sergey Anokhin  ---
btw, perhaps it can be helpful: if port security/ipsec-tools was built with
default options (make rmconfig), so the bug doesn't reproduced

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

Jan Bramkamp  changed:

   What|Removed |Added

 CC||cr...@bultmann.eu

--- Comment #6 from Jan Bramkamp  ---
Can you try again with IPSEC_DEBUG and a doubled kernel stack size?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #5 from Andrey V. Elsukov  ---
(In reply to Sergey Anokhin from comment #4)
> (In reply to Andrey V. Elsukov from comment #3)
> 
> There is a mind that if turn off
> 
> options IPSEC_DEBUG
> 
> kernel panic will disappear

Disabling IPSEC_DEBUG also reduces the requirement to kernel stack size.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #4 from Sergey Anokhin  ---
(In reply to Andrey V. Elsukov from comment #3)

There is a mind that if turn off

options IPSEC_DEBUG

kernel panic will disappear

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Networking panic on 12 - found the cause

2019-02-12 Thread Pete French
I found my panic. If I take everything out of rc.conf and loader.conf 
and sysctl.conf and boot the system it works fine when I add an IP 
address. If I add this one line to sysctl.conf


net.link.ether.inet.garp_rexmit_count=2

Then I get a panic when I configure the interface:

root@serpentine-passive:~ #  ifconfig igb0 inet 10.32.10.4/16 up


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x28
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80c987f1
stack pointer   = 0x28:0xfe4d5730
frame pointer   = 0x28:0xfe4d5750
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (swi4: clock (0))
trap number = 12
panic: page fault
cpuid = 0
time = 1549981620
KDB: stack backtrace:
#0 0x80bdfdc7 at kdb_backtrace+0x67
#1 0x80b93fa3 at vpanic+0x1a3
#2 0x80b93df3 at panic+0x43
#3 0x8106a7bf at trap_fatal+0x35f
#4 0x8106a819 at trap_pfault+0x49
#5 0x81069e3e at trap+0x29e
#6 0x810450c5 at calltrap+0x8
#7 0x80c986f6 at ether_output+0x6b6
#8 0x80d03354 at arprequest+0x4c4
#9 0x80d0515c at garp_rexmit+0xbc
#10 0x80bade19 at softclock_call_cc+0x129
#11 0x80bae2f9 at softclock+0x79
#12 0x80b57c57 at ithread_loop+0x1a7
#13 0x80b54da2 at fork_exit+0x82
#14 0x810460be at fork_trampoline+0xe
Uptime: 2m6s
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

--- Comment #3 from Andrey V. Elsukov  ---
KEYDBG() macro executed only when net.key.debug is set to non-zero value. It
looks like your sysctl.conf didn't set it. Also, it looks impossible to get
page fault with fault address 0x28 in this line of code. I suspect, that you
have some sort of memory corruption. Not sure, is it hardware related or it is
overwritten by some code.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235683] [zfs] Panic during data access or scrub on 12.0-STABLE r343904 (blkptr at DVA 0 has invalid OFFSET)

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683

Andriy Voskoboinyk  changed:

   What|Removed |Added

 CC||a...@freebsd.org
Summary|ZFS kernel panic when   |[zfs] Panic during data
   |access to data or scrub |access or scrub on
   ||12.0-STABLE r343904 (blkptr
   ||at  DVA 0 has invalid
   ||OFFSET)
   Keywords||panic
   Assignee|b...@freebsd.org|f...@freebsd.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
s are compiled in
options KDTRACE_HOOKS   # Kernel DTrace hooks
options DDB_CTF # Kernel ELF linker loads CTF data
options INCLUDE_CONFIG_FILE # Include this file in kernel
options RACCT   # Resource accounting framework
options RACCT_DEFAULT_TO_DISABLED # Set kern.racct.enable=0 by default
options RCTL# Resource limits

# Debugging support.  Always need this:
options KDB # Enable kernel debugger support.
options KDB_TRACE   # Print a stack trace for a panic.

# Kernel dump features.
options EKCD# Support for encrypted kernel dumps
options GZIO# gzip-compressed kernel and user dumps
options ZSTDIO  # zstd-compressed kernel and user dumps
options NETDUMP # netdump(4) client support

# Make an SMP-capable kernel by default
options SMP # Symmetric MultiProcessor Kernel
options EARLY_AP_STARTUP

# CPU frequency control
device  cpufreq

# Bus support.
device  acpi
options ACPI_DMAR
device  pci
options PCI_HP  # PCI-Express native HotPlug
options PCI_IOV # PCI SR-IOV support

# Floppy drives
device  fdc

# ATA controllers
device  ahci# AHCI-compatible SATA controllers
device  ata # Legacy ATA/SATA controllers
device  mvs # Marvell
88SX50XX/88SX60XX/88SX70XX/SoC SATA
device  siis# SiliconImage SiI3124/SiI3132/SiI3531
SATA

# SCSI Controllers
device  ahc # AHA2940 and onboard AIC7xxx devices
device  ahd # AHA39320/29320 and onboard AIC79xx
devices
device  esp # AMD Am53C974 (Tekram DC-390(T))
device  hptiop  # Highpoint RocketRaid 3xxx series
device  isp # Qlogic family
#device ispfw   # Firmware for QLogic HBAs- normally a
module
device  mpt # LSI-Logic MPT-Fusion
device  mps # LSI-Logic MPT-Fusion 2
device  mpr # LSI-Logic MPT-Fusion 3
#device ncr # NCR/Symbios Logic
device  sym # NCR/Symbios Logic (newer chipsets +
those of `ncr')
device  trm # Tekram DC395U/UW/F DC315U adapters
device  isci# Intel C600 SAS controller
device  ocs_fc  # Emulex FC adapters

# ATA/SCSI peripherals
device  scbus   # SCSI bus (required for ATA/SCSI)
device  ch  # SCSI media changers
device  da  # Direct Access (disks)
device  sa  # Sequential Access (tape etc)
device  cd  # CD
device  pass# Passthrough device (direct ATA/SCSI
access)
device  ses # Enclosure Services (SES and SAF-TE)
#device ctl # CAM Target Layer

# RAID controllers interfaced to the SCSI subsystem
device  amr # AMI MegaRAID
device  arcmsr  # Areca SATA II RAID
device  ciss# Compaq Smart RAID 5*
device  dpt # DPT Smartcache III, IV - See NOTES
for options
device  hptmv   # Highpoint RocketRAID 182x
device  hptnr   # Highpoint DC7280, R750
device  hptrr   # Highpoint RocketRAID 17xx, 22xx,
23xx, 25xx
device  hpt27xx # Highpoint RocketRAID 27xx
device  iir # Intel Integrated RAID
device  ips # IBM (Adaptec) ServeRAID
device  mly # Mylex AcceleRAID/eXtremeRAID
device  twa # 3ware 9000 series PATA/SATA RAID
device  smartpqi# Microsemi smartpqi driver
device  tws # LSI 3ware 9750 SATA+SAS 6Gb/s RAID
controller

# RAID controllers
device  aac # Adaptec FSA RAID
device  aacp# SCSI passthrough for aac (requires
CAM)
device  aacraid # Adaptec by PMC RAID
device  ida # Compaq Smart RAID
device  mfi # LSI MegaRAID SAS
device  mlx # Mylex DAC960 family
device  mrsas   # LSI/Avago MegaRAID SAS/SATA, 6Gb/s
and 12Gb/s
device  pmspcv  # PMC-Sierra SAS/SATA Controller driver
#XXX pointer/int warnings
#device  

[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

Andrey V. Elsukov  changed:

   What|Removed |Added

 CC||a...@freebsd.org

--- Comment #1 from Andrey V. Elsukov  ---
(In reply to Sergey Anokhin from comment #0)
> I see kernel panic during racoon restart.
> 
> # uname -rv
> 12.0-STABLE FreeBSD 12.0-STABLE r343904 SERVER

Please, show the content of your kernel config and what sysctl variables do you
changed against default configuration.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235684] security/ipsec-tools kernel panic

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235684

Sergey Anokhin  changed:

   What|Removed |Added

 CC||b...@freebsd.org,
   ||sta...@freebsd.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 235683] ZFS kernel panic when access to data or scrub

2019-02-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235683

Sergey Anokhin  changed:

   What|Removed |Added

 CC||sta...@freebsd.org

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Repeated kernel panic on 11.2-RELEASE-p7

2019-02-10 Thread Jurij Kovačič via freebsd-stable
Dear list,

After some time, I have again experienced a kernel panic on a (physical)
server, running Freebsd 11.2-RELEASE-p7 with custom/debug kernel, ZFS root.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer= 0x20:0x82299013
stack pointer= 0x28:0xfe0352893ad0
frame pointer= 0x28:0xfe0352893b10
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process= 9 (dbuf_evict_thread)
trap number= 9
panic: general protection fault
cpuid = 2
KDB: stack backtrace:
#0 0x80b3d567 at kdb_backtrace+0x67
#1 0x80af6b07 at vpanic+0x177
#2 0x80af6983 at panic+0x43
#3 0x80f77fdf at trap_fatal+0x35f
#4 0x80f7759e at trap+0x5e
#5 0x80f5807c at calltrap+0x8
#6 0x8229c049 at dbuf_evict_one+0xe9
#7 0x82297a15 at dbuf_evict_thread+0x1a5
#8 0x80aba083 at fork_exit+0x83
#9 0x80f58f9e at fork_trampoline+0xe
Uptime: 20d6h13m55s
Dumping 2593 out of 12248
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

I have used "crashinfo" utility to generate the text file which is
available at this URL: http://www.ocpea.com/dump/core-3.txt.

All advice is deeply appreciated as this is a production server. :)

Kind regards,
Jurij
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Kernel panic going multiuser under 12 ( was Re: More CARP issues under 12 (maybe not CARP after all))

2019-02-05 Thread Pete French




Just to get the subject correct, as I tested this disabling CARP and I 
still see the panic when going multi-user. It netwprking related as the 
panic is in the ARP code, and seems to happen when the network 
interfaces are configured. The machine was using a mix of em and igb 
interfaces, but is now igb only.


-pete.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


[Bug 193360] [panic] [syscons] random syscons panic

2019-01-27 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193360

Oleksandr Tymoshenko  changed:

   What|Removed |Added

 CC||go...@freebsd.org
 Status|New |Closed
 Resolution|--- |Not Enough Information

--- Comment #4 from Oleksandr Tymoshenko  ---
Provided information is not enough to analyze the problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic on 11.2-RELEASE-p7

2019-01-05 Thread Jurij Kovačič via freebsd-stable
Dear list,

About a week ago, we had a kernel panic on Freebsd 11.2-RELEASE-p7 with
GENERIC kernel, ZFS root. As the kernel was not compiled with debug support
enabled, the resulting "vmcore" files were of little use. Consequently, I
recompiled kernel with debug support:

--- GENERIC 2018-12-29 08:03:04.786846000 +0100
+++ DEBUG   2018-12-29 08:23:36.522966000 +0100
@@ -19,11 +19,16 @@
 # $FreeBSD: releng/11.2/sys/amd64/conf/GENERIC 333417 2018-05-09 16:14:12Z
sbruno $

 cpuHAMMER
-ident  GENERIC
+ident  DEBUG

 makeoptionsDEBUG=-g# Build kernel with gdb(1) debug
symbols
 makeoptionsWITH_CTF=1  # Run ctfconvert(1) for DTrace
support

+# kernel debugging
+optionsKDB
+optionsKDB_UNATTENDED
+optionsKDB_TRACE
+
 optionsSCHED_ULE   # ULE scheduler
 optionsPREEMPTION  # Enable kernel thread preemption
 optionsINET# InterNETworking

and installed it.

After running for about a week, the server crashed again this night.
Unfortunately, there are no "vmcore" files on "/var/crash" this time.

The server has 12GB of RAM installed:
 # sysctl hw.physmem
hw.physmem: 12843053056

and uses 2 swap partitions (2G each):
# swapinfo -h
Device  1K-blocks UsedAvail Capacity
/dev/ada0p2   2097152 642M 1.4G31%
/dev/ada1p2   2097152 638M 1.4G31%
Total 4194304 1.3G 2.7G31%

Dump device is set in /etc/rc.conf:
# grep dump /etc/rc.conf
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"

There seems to be enough space left in "/var/crash":
 # zfs list | grep crash
zroot/var/crash  857M  17.2G   857M  /var/crash

and like I said earlier, the system DID create "vmcore" files when crashing
with GENERIC kernel. Is it possible that swap partition(s) are too small
for the memory dump, now that the kernel is compiled with debug support? Or
is some additional configuration needed to make the system save vmcore
files?

Please advise.

Kind regards,
Jurij

On Tue, Dec 25, 2018 at 7:57 AM Jurij Kovačič 
wrote:

> Dear list,
>
> I hope I am posting this to the correct list - if not, I apologize (and
> please advise where to post this instead).
>
> Today I experienced a kernel panic on a (physical) server, running Freebsd
> 11.2-RELEASE-p7 with GENERIC kernel, ZFS root:
>
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer= 0x20:0x82299013
> stack pointer= 0x28:0xfe0352893ad0
> frame pointer= 0x28:0xfe0352893b10
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process= 9 (dbuf_evict_thread)
> trap number= 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0x80b3d577 at kdb_backtrace+0x67
> #1 0x80af6b17 at vpanic+0x177
> #2 0x80af6993 at panic+0x43
> #3 0x80f77fdf at trap_fatal+0x35f
> #4 0x80f7759e at trap+0x5e
> #5 0x80f5808c at calltrap+0x8
> #6 0x8229c049 at dbuf_evict_one+0xe9
> #7 0x82297a15 at dbuf_evict_thread+0x1a5
> #8 0x80aba093 at fork_exit+0x83
> #9 0x80f58fae at fork_trampoline+0xe
>
> I have used "crashinfo" utility to generate the text file which is
> available at this URL: http://www.ocpea.com/dump/core.txt
>
> At the time of the crash, the server was probably under more intensive I/O
> load (scheduled backup with rsync).
>
> This is a production server, so naturally, all advice is deeply
> appreciated. :)
>
> Kind regards,
> Jurij
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Kernel panic on 11.2-RELEASE-p7

2018-12-28 Thread Jurij Kovačič via freebsd-stable
Hi Andriy,

Upon further investigation - I take it the kernel options should probably
be:
...
makeoptionsDEBUG=-g
options KDB
options KDB_UNATTENDED
...
?

Thank you!

Kind regards,
Jurij


On Fri, Dec 28, 2018 at 12:07 PM Jurij Kovačič 
wrote:

> Hi Andriy,
>
> Thank you for your reply.
>
> Is what you are suggesting I build and install GENERIC kernel WITH debug
> symbols?
>
> I presume I just update the sources to 11.2 release and build and install
> the GENERIC kernel with added
>
> makeoptions   DEBUG=-g ?
>
>
> Kind regards,
>
> Jurij
>
>
>
> On Fri, Dec 28, 2018 at 11:34 AM Andriy Gapon  wrote:
>
>> On 28/12/2018 12:07, Jurij Kovačič via freebsd-stable wrote:
>> > Dear list,
>> >
>> > This morning the server mentioned in my previous e-mail (Freebsd
>> > 11.2-RELEASE-p7 with GENERIC kernel, ZFS root) experienced another
>> kernel
>> > panic:
>> >
>> > Fatal trap 9: general protection fault while in kernel mode
>> > cpuid = 0; apic id = 00
>> > instruction pointer= 0x20:0x82299013
>> > stack pointer= 0x28:0xfe0352893ad0
>> > frame pointer= 0x28:0xfe0352893b10
>> > code segment= base 0x0, limit 0xf, type 0x1b
>> >     = DPL 0, pres 1, long 1, def32 0, gran 1
>> > processor eflags= interrupt enabled, resume, IOPL = 0
>> > current process= 9 (dbuf_evict_thread)
>> > trap number= 9
>> > panic: general protection fault
>> > cpuid = 0
>> > KDB: stack backtrace:
>> > #0 0x80b3d577 at kdb_backtrace+0x67
>> > #1 0x80af6b17 at vpanic+0x177
>> > #2 0x80af6993 at panic+0x43
>> > #3 0x80f77fdf at trap_fatal+0x35f
>> > #4 0x80f7759e at trap+0x5e
>> > #5 0x80f5808c at calltrap+0x8
>> > #6 0x8229c049 at dbuf_evict_one+0xe9
>> > #7 0x82297a15 at dbuf_evict_thread+0x1a5
>> > #8 0x80aba093 at fork_exit+0x83
>> > #9 0x80f58fae at fork_trampoline+0xe
>> >
>> > I have used the "crashinfo" utility to (again) generate the text file
>> which
>> > is available at this URL: http://www.ocpea.com/dump/core-2.txt
>> > <http://www.ocpea.com/dump/core.txt>
>>
>> This is useless because you do not have debug symbols for the kernel.
>>
>> > Does anyone have any idea how we can go about discovering the cause for
>> > this? We would appreciate any suggestion ...
>>
>>
>>
>> --
>> Andriy Gapon
>>
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


  1   2   3   4   5   6   7   8   9   10   >