Re: sparc64 -CURRENT in LDOM: ERROR: Last Trap: Fast Data Access Protection
Ax0n wrote: > FWIW, the kernels running in my -stable guests are considerably larger than > 8MB, and not much smaller than the -CURRENT kernels. So it's actually the size of the code in the kernel, not the file size. >From your boot message Booting /virtual-devices@100/channel-devices@200/disk@0:a/bsd 8381472@0x100+7136@0x17fe420+196864@0x180+3997440@0x1830100 8381472 + 7136 (padding) = 8388608
Re: sparc64 -CURRENT in LDOM: ERROR: Last Trap: Fast Data Access Protection
FWIW, the kernels running in my -stable guests are considerably larger than 8MB, and not much smaller than the -CURRENT kernels. -- a running LDOM guest - -bash-4.4$ doas cu -l ttyV0 Connected to /dev/ttyV0 (speed 9600) OpenBSD/sparc64 (puffyone.ldom.openbsd.local) (console) login: axon Password: Last login: Fri May 26 00:46:47 on console OpenBSD 6.1 (GENERIC.MP) #58: Sat Apr 1 17:10:24 MDT 2017 Welcome to OpenBSD: The proactively secure Unix-like operating system. [...] You have new mail. $ uname -a OpenBSD puffyone.ldom.openbsd.local 6.1 GENERIC.MP#58 sparc64 $ ls -la /bsd* -rw-r--r-- 1 root wheel 9487408 Dec 31 1999 /bsd -rw-r--r-- 1 root wheel 2739432 Dec 31 1999 /bsd.rd -rw-r--r-- 1 root wheel 9440853 Dec 31 1999 /bsd.sp the -CURRENT image (bsd.rd's been copied to bsd for testing) - -bash-4.4$ doas vnconfig /dev/vnd0c /home/axon/vm/vdisk5 -bash-4.4$ doas mount /dev/vnd0a /mnt -bash-4.4$ ls -al /mnt/bsd* -rw-r--r-- 1 root wheel 2749459 May 26 22:02 /mnt/bsd -rw-r--r-- 1 root wheel 9531028 May 26 22:02 /mnt/bsd.bak -rw-r--r-- 1 root wheel 2749459 May 24 18:28 /mnt/bsd.rd -rw-r--r-- 1 root wheel 9480748 May 24 18:28 /mnt/bsd.sp On Sat, May 27, 2017 at 12:06 AM, Ted Unangst wrote: > Ax0n wrote: > > Is this limit specifically for LDOM guests? I have a Sun Blade 1500 I > could > > compile a custom -CURRENT kernel with, if that might help. Though I'm not > > sure I want to do that with every snapshot I try. > > Not specifically, but the limit can vary by hardware. If you want to run a > snapshot now, a custom kernel with a few devices removed will help. We'll > have > to make a similar long term fix anyway. > >
fq codel panic: ifq_is_serialized and MP interrupt
Hi, I am experiencing often the following panic: panic: kernel diagnostic assertion "ifq_is_serialized(ifq)" failed: ../sys/net/ifq.c, line 394 while running with GENERIC.MP patched with mikeb@ diff: if_start(struct ifnet *ifp) { KASSERT(ifp->if_qstart == if_qstart_compat); - if_qstart_compat(&ifp->if_snd); + ifq_start(&ifp->if_snd); } I report it in another thread, as I am unsure if the problem is exactly correlated: the ddb backtrace showed a network interrupt inside ifq_serialize(). ddb{0}> trace db_enter(x,x,x,x,x) at db_enter+0x7 panic(x,x,x,x,18a) at panic+0x71 __assert(x,x,18a,x,bbd5) at __assert+0x2e ifq_mfreeml(x,x,x,2bbb,x) at ifq_mfreeml+0x6a fqcodel_deq_begin(x,x,x,x,x) at fqcodel_decbe+0x186 ifq_deb_begin(x,x,f0,0,x) at ifq_deb_begin+0x37 ifq_dequeue(x,x,x,x,x) at ifq_dequeue+0x17 bce_start(x,20,100,x,200282,bbd5,0) at bce_start+0x11f bce_intr(x,x,x,2bbb,x) at bce_intr+0xc3 Xintr_ioapic3() at Xintr_ioapic3+0x66 --- interrupt --- ifq_serialize(x,x,2,x,x) at ifq_serialize+0x1 ether_output(x,x,x,x,0) at ether_output+0x1d2 ip_output(x,0,x,800,0) at ip_output+0x821 tcp_output(x,x,x,x,0) at tcp_output+0x81d tcp_usrreq(x,8,0,0,0,0) at tcp_usrreq+0x633 soreceive(x,0,x,0,0) at soreceive+0x2da soo_read(x,x,x,x,0) at soo_read+0x43 dofilereadv(x,3,x,x,1) at dofilereadv+0x1c5 sys_read(x,x,x,0,x) at sys_read+0x8f syscall() at syscall+0x250 --- syscall (number 2081103872) --- 0x6: ddb{0}> the panic seems to occurs at tcp connection (ssh session incoming) whereas it is already doing some network activity (here it was updating using pkg_add). the host was running with: queue fq on bce0 flows 1024 default # ifconfig lo0: flags=8049 mtu 32768 index 4 priority 0 llprio 3 groups: lo inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet 127.0.0.1 netmask 0xff00 wpi0: flags=8802 mtu 1500 lladdr 00:13:02:2e:8b:46 index 1 priority 4 llprio 3 groups: wlan media: IEEE802.11 autoselect status: no network ieee80211: nwid "" bce0: flags=208a43 mtu 1500 lladdr 00:15:c5:0b:8b:7a index 2 priority 0 llprio 3 groups: egress media: Ethernet autoselect (100baseTX full-duplex) status: active inet 192.168.92.11 netmask 0xff00 broadcast 192.168.92.255 inet6 fe80::215:c5ff:fe0b:8b7a%bce0 prefixlen 64 scopeid 0x2 inet6 2001:41d0:fe39:c05c:215:c5ff:fe0b:8b7a prefixlen 64 autoconf pltime 604784 vltime 2591984 inet6 2001:41d0:fe39:c05c:5057:c993:3ee2:599a prefixlen 64 autoconf autoconfprivacy pltime 85934 vltime 604710 enc0: flags=0<> index 3 priority 0 llprio 3 groups: enc status: active pppoe0: flags=8810 mtu 1492 index 5 priority 0 llprio 3 dev: state: initial sid: 0x0 PADI retries: 0 PADR retries: 0 groups: pppoe pflog0: flags=141 mtu 33172 index 6 priority 0 llprio 3 groups: pflog (note: the pppoe0 is here for testing. it is only created and put in down state). # dmesg OpenBSD 6.1-current (GENERIC.MP) #0: Thu May 25 14:00:16 CEST 2017 semarie@bert.local:/home/openbsd/src/sys/arch/i386/compile/GENERIC.MP cpu0: Genuine Intel(R) CPU T2400 @ 1.83GHz ("GenuineIntel" 686-class) 1.83 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,MWAIT,VMX,EST,TM2,xTPR,PDCM,PERF,SENSOR real mem = 2137354240 (2038MB) avail mem = 2083602432 (1987MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: date 06/13/07, BIOS32 rev. 0 @ 0xffa10, SMBIOS rev. 2.4 @ 0xf7980 (44 entries) bios0: vendor Dell Inc. version "A17" date 06/13/2007 bios0: Dell Inc. MM061 acpi0 at bios0: rev 0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP HPET APIC MCFG SLIC BOOT SSDT acpi0: wakeup devices LID_(S3) PBTN(S4) MBTN(S5) PCI0(S3) USB0(S0) USB1(S0) USB2(S0) USB3(S0) EHCI(S0) AZAL(S3) PCIE(S4) RP01(S4) RP02(S3) RP03(S3) RP04(S3) RP05(S3) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 166MHz cpu0: mwait min=64, max=64, C-substates=0.2.2.2.2, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Genuine Intel(R) CPU T2400 @ 1.83GHz ("GenuineIntel" 686-class) 1.83 GHz cpu1: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,NXE,SSE3,MWAIT,VMX,EST,TM2,xTPR,PDCM,PERF,SENSOR ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xf000, bus 0-63 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (AGP_) acpiprt2 at acpi0: bus 3 (PCIE) acpiprt3 at acpi0: bus 11 (RP01) acpiprt4 at acpi0: bus -1 (RP02) acpiprt5 at acpi0: bus -1 (RP03) acpi
Re: sparc64 -CURRENT in LDOM: ERROR: Last Trap: Fast Data Access Protection
Ax0n wrote: > Is this limit specifically for LDOM guests? I have a Sun Blade 1500 I could > compile a custom -CURRENT kernel with, if that might help. Though I'm not > sure I want to do that with every snapshot I try. Not specifically, but the limit can vary by hardware. If you want to run a snapshot now, a custom kernel with a few devices removed will help. We'll have to make a similar long term fix anyway.
Re: Kernel panic on 6.1: init dies under load
Thanks for this latest patch; it seems to help. At least, I was able to put a fairly significant amount of load on the machine with out a panic. I'll try and load it up more and see where we get, but so far this is positive. On Wed, May 24, 2017 at 7:37 PM, Mike Belopuhov wrote: > On Wed, May 24, 2017 at 12:27 -0400, Dan Cross wrote: > > Thanks for the patch; I just got a few minutes today and I applied it, > > rebuilt and installed the kernel and rebooted. Sadly, I get a similar > > panic. Attached is a screenshot of console output. Note that, 'boot sync' > > from ddb hangs forever. > > > > - Dan C. > > That's OK. I've discovered more problems related to 64k transfers. > The reason why we didn't notice anything bad when aborting sleep > was because sleep has a small memory footprint, but if you dump > core of a larger (> 64k) program, you'd notice the issue because > core dump routine like some other places in the kernel assumes > that 64k transfers always work. > > I've attempted to attack this problem from a different angle: > ensure that xbf(4) can handle 64k transfers. Solutions to this > problem are notoriously messy and complicated and so far this > one is no exception. Today I got to the point where the system > boots multiuser but couldn't test further. I've noticed however > that "boot dump" from ddb still crashes so I know it's not 100% > right just yet, but since I won't get around doing anything > about this until early next week, I'd appreciate a quick test > if possible. > > I'm not attaching the diff since it's rather large: > > http://gir.theapt.org/~mike/xbf.diff > > Cheers, > Mike >
Re: sparc64 -CURRENT in LDOM: ERROR: Last Trap: Fast Data Access Protection
Is this limit specifically for LDOM guests? I have a Sun Blade 1500 I could compile a custom -CURRENT kernel with, if that might help. Though I'm not sure I want to do that with every snapshot I try. *musing* I wonder if that's why NetBSD 7.1 is also crashing on boot. On Fri, May 26, 2017 at 6:31 PM, Ted Unangst wrote: > Ax0n wrote: > > I have a SunFire T2000 that I've chopped up into LDOMs. The primary > domain > > and six of the LDOMs are running 6.1-STABLE just fine. I pulled down the > > May 22 snapshot, and it installs (with a strange error, see bottom of > > post), but the LDOM crashes upon boot. I just tried again with the May > 24th > > snapshot, and I'm getting the same error. This seems to dump me into > > OpenBoot, not ddb. I can provide a shell on the primary domain, and > serial > > console (over ssh) access to a developer if needed. I am not subscribed > to > > bugs@, so please copy me off-list. > > There's a hardware/software limit that currently restricts the kernel to > 8MB. > Larger than that and bad things happen. Hopefully someone will soon find a > way > to reduce the size of the kernel. >
Re: sparc64 -CURRENT in LDOM: ERROR: Last Trap: Fast Data Access Protection
Ax0n wrote: > I have a SunFire T2000 that I've chopped up into LDOMs. The primary domain > and six of the LDOMs are running 6.1-STABLE just fine. I pulled down the > May 22 snapshot, and it installs (with a strange error, see bottom of > post), but the LDOM crashes upon boot. I just tried again with the May 24th > snapshot, and I'm getting the same error. This seems to dump me into > OpenBoot, not ddb. I can provide a shell on the primary domain, and serial > console (over ssh) access to a developer if needed. I am not subscribed to > bugs@, so please copy me off-list. There's a hardware/software limit that currently restricts the kernel to 8MB. Larger than that and bad things happen. Hopefully someone will soon find a way to reduce the size of the kernel.
Re: Backlight brightness not working on Acer 5733Z Series Notebook
A quick follow-up that will hopefully make a fix a bit easier: On the advice of jcs@, I first tried https://github.com/jcs/intel_backlight_fbsd which was able to adjust my backlight fine once I rebooted with machdep.allowaperture=3 Next, I booted up with acpivout disabled (from boot -c) and after that, xbacklight and wsconsctl can both properly adjust the display brightness. This is in 6.1-STABLE. On Fri, Nov 18, 2016 at 8:53 AM, Ax0n wrote: > Anton reminded me about wsconsctl off-list. "wsconsctl > display.brightness" acts the same as xbacklight. Adjusting xbacklight > brightness and/or messing with the brightness controls on the keyboard > affects the value reported by wsconsctl display.brightness, but none of > these have any impact on the backlight brightness. > > > According to your dmesg, acpivout(4) is attached. Have you tried >> changing the brightness using wsconsctl(1)? >> > >
Re: ldapd(8) assertion fails on amd64 Dell PowerEdge R710
"Todd C. Miller" writes: > I can explain that. The page size is being set based on the file > system block size. Yes, I just discovered exactly this. I was looking at the btree.c code and saw: if (fstat(fd, &sb) == 0) psize = sb.st_blksize; else psize = PAGESIZE; On my desktop, from dumpfs(8): bsize 16384 shift 14 mask0xc000 And on the server: bsize 65536 shift 16 mask0x > Either indx_t needs to be changed to uint32_t or an upper bound > needs to be placed on psize, perhaps 0x7fff. > > I'm not familiar enough with that code to say which is better. I naively tried changing indx_t to uint32_tthat and got: May 26 10:44:03.382 [27298] opening namespace dc=example,dc=org btree_read_header:908: header has invalid magic Currently, BT_MAGIC is #defined as 0xB3DBB3DB but I don't know what comprises that value. I think my short term workaround is going to be a smaller partition mounted on /var/db/ldap. Allan
Re: ldapd(8) assertion fails on amd64 Dell PowerEdge R710
On Fri, 26 May 2017 10:52:04 -0400, Allan Streib wrote: > Note the "page size" is different. On the Dell R710 the message says > "page size 65536" which is one higher than 0x, which seems like a > red flag? The "upper" and "lower" fields look to be of type indx_t which > is defined as a uint16_t, but in the bt_head struct, psize is a > uint32_t. So the line > >mp->page->upper = bt->head.psize; > > Is going to result in mp->page->upper being zero, if bt->head.psize is 65536. > > I don't understand why the R710 has a different behavior than my desktop > machine, but that's what I'm seeing. I can explain that. The page size is being set based on the file system block size. On your desktop this is 16384 which you can verify by running the dumpfs command on the filesystem. You'll see something like this: magic 11954 (FFS1)timeFri May 26 06:54:56 2017 id [ 57ffcc69 89ea6f31 ] cylgrp dynamic inodes 4.4BSD fslevel 3 ncg 6 ncyl6 size526112 blocks 516263 bsize 16384 shift 14 mask0xc000 fsize 2048shift 11 mask0xf800 frag8 shift 3 fsbtodb 2 ... However, the R710 probably has a larger file system with bigger blocks. If it is FFS2 it will look something like this: magic 19540119 (FFS2) timeFri May 26 09:03:51 2017 superblock location 65536 id [ 53f23555 9ac85182 ] ncg 561 size234374284 blocks 232529698 bsize 65536 shift 16 mask0x fsize 8192shift 13 mask0xe000 frag8 shift 3 fsbtodb 4 ... Either indx_t needs to be changed to uint32_t or an upper bound needs to be placed on psize, perhaps 0x7fff. I'm not familiar enough with that code to say which is better. - todd
Re: ldapd(8) assertion fails on amd64 Dell PowerEdge R710
I've been trying to debug this a bit. 20 years since I did any C programming to any great degree. I enabled and added some debugging to btree.c: Index: btree.c === RCS file: /cvs/src/usr.sbin/ldapd/btree.c,v retrieving revision 1.37 diff -u -p -u -r1.37 btree.c --- btree.c 2 Dec 2016 05:52:01 - 1.37 +++ btree.c 26 May 2017 13:56:06 - @@ -36,7 +36,7 @@ #include "btree.h" -/* #define DEBUG */ +#define DEBUG #ifdef DEBUG # define DPRINTF(...) do { fprintf(stderr, "%s:%d: ", __func__, __LINE__); \ @@ -1855,6 +1855,9 @@ btree_new_page(struct btree *bt, uint32_ mp->page->flags = flags; mp->page->lower = PAGEHDRSZ; mp->page->upper = bt->head.psize; + + DPRINTF("new mpage %u, page upper %u, page lower %u", + mp->pgno, mp->page->upper, mp->page->lower); if (IS_BRANCH(mp)) bt->meta.branch_pages++; Running ldapd with this extra code I get these messages just before the assertion failure: . . btree_search_page:1470: tree is empty btree_txn_put:2948: allocating new root leaf page btree_new_page:1847: allocating new mpage 1, page size 65536 btree_new_page:1860: new mpage 1, page upper 0, page lower 12 btree_txn_put:2962: there are 0 keys, should insert new key at index 0 assertion "p->upper >= p->lower" failed: file "/usr/src/usr.sbin/ldapd/btree.c", line 1952, function "btree_add_node" Note the debug statement at line 1847 has page size (bt->head.psize) as 65536, while at line 1860 the value of mp->page->upper is 0, but it should have just been assigned the value from bt->head.psize. I'm not seeing anything that should have changed bt->head.psize between those two lines. If I run this on my local desktop I get the following. . . btree_search_page:1470: tree is empty btree_txn_put:2948: allocating new root leaf page btree_new_page:1847: allocating new mpage 1, page size 16384 btree_new_page:1860: new mpage 1, page upper 16384, page lower 12 btree_txn_put:2962: there are 0 keys, should insert new key at index 0 btree_add_node:1957: add node [dc=example,dc=org] to leaf page 1 at index 0, key size 17 Note the "page size" is different. On the Dell R710 the message says "page size 65536" which is one higher than 0x, which seems like a red flag? The "upper" and "lower" fields look to be of type indx_t which is defined as a uint16_t, but in the bt_head struct, psize is a uint32_t. So the line mp->page->upper = bt->head.psize; Is going to result in mp->page->upper being zero, if bt->head.psize is 65536. I don't understand why the R710 has a different behavior than my desktop machine, but that's what I'm seeing. Allan