Re: Intel NUC 11 very slow, acpi0 kernel thread always take 100%

2022-10-06 Thread Stuart Henderson
On 2022/10/06 11:49, Igor Petruk wrote:
> I've looked at the previous threads on this issue. One explanation was
> that it was a buggy hardware, e.g. cheap motherboard. In my case this
> is Intel NUC. I've just upgraded the BIOS from Feb 2022 version to Aug
> 16th, 2022 version from
> https://www.intel.com/content/www/us/en/download/19698/bios-update-tntgl357.html.
> 
> The problem persists. I've seen in other OSs there are indeed some
> mitigations available, whether via writing to `/sys`, sysctl or kernel
> boot options to block the interrupt. It might be a useful feature for
> OpenBSD to mitigate issues like this until a proper fix on either side
> (hardware or kernel) is implemented.

I found a different suggestion in
https://bugzilla.kernel.org/show_bug.cgi?id=203617#c19 -
do you have any Thunderbird-related BIOS options you can try changing?



Re: Intel NUC 11 very slow, acpi0 kernel thread always take 100%

2022-10-06 Thread Stuart Henderson
On 2022/10/06 12:29, Stuart Henderson wrote:
> On 2022/10/06 11:49, Igor Petruk wrote:
> > I've looked at the previous threads on this issue. One explanation was
> > that it was a buggy hardware, e.g. cheap motherboard. In my case this
> > is Intel NUC. I've just upgraded the BIOS from Feb 2022 version to Aug
> > 16th, 2022 version from
> > https://www.intel.com/content/www/us/en/download/19698/bios-update-tntgl357.html.
> > 
> > The problem persists. I've seen in other OSs there are indeed some
> > mitigations available, whether via writing to `/sys`, sysctl or kernel
> > boot options to block the interrupt. It might be a useful feature for
> > OpenBSD to mitigate issues like this until a proper fix on either side
> > (hardware or kernel) is implemented.
> 
> I found a different suggestion in
> https://bugzilla.kernel.org/show_bug.cgi?id=203617#c19 -
> do you have any Thunderbird-related BIOS options you can try changing?
> 

Thunderbolt, even. (thanks tb :)



Re: Panic on Dell Precision T1600, BIOS A21 (stopped at efi_attach+0x171)

2022-10-19 Thread Stuart Henderson
On 2022/10/19 12:07, Claudio Miranda wrote:
> Greetings,
> 
> I'm getting a kernel panic on a Dell Precision T1600 with BIOS A21
> which is the latest revision from Dell for this system. This all
> started as of the #793 snapshot of -current on Monday, October 17 at
> 10:16:43 MDT. I've attached pictures of the kernel panic on boot as
> well as the panic info, trace info, and dmesg info. Prior to this
> snapshot, the system was booting OpenBSD without issue. Unfortunately,
> I'm only able to provide pictures of the information needed. Any help
> is greatly appreciated.

transcribing the most important bits:

efi0 at bios0: UEFI 2.0
efi0: uvm_fault(0x8233d718, 0xcacc6798, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at efi_attach+0x171: movzwl 0(%rax),%eax
active proc tid 0, pid 0, uid 0 (swapper)

efi_attach(...) at efi_attach+0x171
config_attach(...) at config_attach+0x1f4
bios_attach(...) at bios_attach+0x700
config_attach(...) at config_attach+0x1f4
mainbus_attach(...) at mainbus_attach+0x78
config_attach(...) at config_attach+0x1f4
cpu_configure(...) at cpu_configure+0x33
main(...) at main+0x3a3

it maybe helpful to boot the previous kernel and include the information
from "sendbug -P" run as root, which will include the acpi tables.



Re: Had to set 'kern.timecounter.hardware' to 'acpitimer0' to fix system clock going too fast

2022-10-24 Thread Stuart Henderson
On 2022/10/24 08:26, Scott Cheloha wrote:
> > pppoe0: received unexpected PADO
> > [snip]
> > pppoe0: received unexpected PADO
> > pppoe0: LCP keepalive timeout
> > [snip]
> > pppoe0: host unique tag found, but it belongs to a connection in state 3
> > pppoe: received PADO but could not find request for it
> > pppoe0: LCP keepalive timeout
> > pppoe0: received unexpected PADO
> > [snip]
> > pppoe0: received unexpected PADO
> 
> I assume you didn't see these messages on 7.1?

Unlikely to be related to the time issues.

These can happen if the machine is restarted without bringing down the
connection nicely first and the ISP holds onto the old session for a while.

"ifconfig pppoe0 down" in rc.shutdown, possibly followed by a short
sleep, might help with those.



Re: snmpd in 7.2 dies with too many parse errors

2022-10-28 Thread Stuart Henderson
I wonder if there are any sensors which disappear and reappear..

On 2022/10/28 10:01, Martijn van Duren wrote:
> Could you run snmpd with `-vv`? That way I also have the specific
> OIDs being requested and returned (both frontend and backend) and
> might make it a little more easy to reproduce.
> 
> Do note that this adds at least 4 log lines for every request
> issues to snmpd, so your logfile might explode a bit.
> 
> martijn@
> 
> On Thu, 2022-10-27 at 14:08 -0700, Ryan Freeman wrote:
> > On Thu, Oct 27, 2022 at 01:46:21PM -0700, Ryan Freeman wrote:
> > > Hello,
> > > After upgrading some virtual machines to OpenBSD 7.2, I started noticing
> > > snmpd dying approx every 6 hours on the upgraded machines.
> > > 
> > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451/2580462718): 
> > > 2506302838 
> > > iso.org.dod.internet.private.enterprises.openBSD.sensorsMIBObjects.sensors.sensorTable.sensorEntry.sensorStatus:
> > >  oids not equal
> > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451/2580462718): 
> > > Closing: Too many parse errors
> > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451/2580462718): 
> > > Closed by snmpd (Too many AgentX parse errors from peer)
> > > Oct 27 13:14:33 mirror snmpd_metrics[88325]: [fd:0 sess:2580462718 
> > > ctx:]: unsupported call: agentx-Close-PDU
> > > Oct 27 13:14:33 mirror snmpd[98795]: AgentX(1268939451): Connection reset 
> > > by peer
> > > Oct 27 13:14:33 mirror snmpd[98795]: snmpe: AgentX(1268939451): 
> > > disappeared unexpected
> > > 
> > > The message is always the same, it tends to be around 1:20am, 7:20am, 
> > > 1:20pm, 7:20pm
> > > I have a script set to check "rcctl ls failed" and notify if something 
> > > has failed.
> > > 
> > > LibreNMS is used to scrape the snmpd instances on the affected VMs.
> > 
> > And, forgot to include the snmpd.conf, apologies.  here it is with minor
> > changes values only:
> > # $OpenBSD: snmpd.conf,v 1.2 2021/08/08 13:43:10 sthen Exp $
> > 
> > # See snmpd.conf(5) for more options (tcp, alternative ports, trap listener)
> > listen on 127.0.0.1
> > 
> > user "changed" auth hmac-sha1 authkey "randomstuff" enc aes enckey 
> > "morerandomstuff"
> > 
> > # Adjust the local system information
> > system contact "Systems Team (syst...@somecompany.com)"
> > #system location "Rack A1-24, Room 13"
> > 
> > # Required by some management software
> > system services 74
> > 
> > LibreNMS then scrapes it using snmpv3 and authPriv mode.
> > no core file is being dropped by snmpd
> > 
> > -Ryan
> > 
> 



Re: Asynchronous wait on fence

2022-10-31 Thread Stuart Henderson
On 2022/10/30 18:22, anointedfig wrote:
> Hi,
> 
> I am new to OpenBSD. X crashed with the following output:
> 
> Asynchronous wait on fence :Xorg[30301]:375f timed out 
> (hint:0x81dc8ab0s)

https://www.openbsd.org/report.html shows the sort of information
you need to include in a report for anyone to be able to help.



Re: snmpd in 7.2 dies with too many parse errors

2022-11-01 Thread Stuart Henderson
On 2022/11/01 16:03, Ryan Freeman wrote:
> On Tue, Nov 01, 2022 at 11:04:03AM +0100, Martijn van Duren wrote:
> > On Mon, 2022-10-31 at 20:14 -0700, Ryan Freeman wrote:
> > > 
> > > I can confirm the snmpd process is no-longer disappearing with this
> > > patch.  Almost 24 hours on one VM and 16 hours on another. Thanks!
> > > 
> > > -Ryan
> > 
> > To be complete, what happens is the following:
> > - snmpd sends a getnext request to the backend on a scalar
> > - libagentx increments the current OID to the OID of the table
> >   column following the scalar, which contains no elements and after
> >   reaching the last column it has reached the end OID of the original
> >   request, resulting in an endOfMibView, but forgets to reset the OID to
> >   the original start OID (as per RFC3416 section 4.2.2)
> > - snmpd validates the output from the backend and sees that the
> >   OID of the EOMV doesn't match the requested OID and decides that
> >   it doesn't trust the backend anymore; It is then closed with a
> >   "too many parse errors" notification.
> > - Upon the closing of the agentx socket the backend shuts itself
> >   down:
> >   1) It gets its fd from snmpd and it doesn't know where to connect
> >  to.
> >   2) We don't want lingering processes if snmpd itself goes away
> > - Once a backend disappears snmpd shuts itself down. Basically for
> >   the fail, fail loud reasons.
> > 
> > Note that this only goes for backends under libexec/snmpd, not for
> > backends that connect over the agentx listener, like vmd or relayd.
> > 
> > So there's no crash, just a backend that's being kicked for returning a
> > non-compliant varbind, which escalated to a premature exit. I also don't
> > expect too many people will actually hit this, because it's quite a
> > specific set of circumstances: I've had to set up an instance under kvm
> > and disable viomb(4) to get an empty sensors table, although there might
> > be other ways to trigger this.
> 
> Ah, there it is.  Our KVM platform is Proxmox, and we go out of our way
> to untick the 'memory ballooning' option every time we make a VM.  Up to
> now, I've been wondering how we managed to have such a unique setup.
> 
> I will probably keep a local build of libagentx for the duration of the
> 7.2 lifetime and fan that out, in lieu of turning on memory ballooning
> just to get a sensor to exist.  Also keep some instances running -current
> in our LibreNMS to help catch this sort of thing before next release.
> 
> Thanks for the detailed explaination, and thanks again for the work to
> figure out cause+solution.
> -Ryan

btw I have VMs under both ESXi and proxmox which run snmpd, and have
LibreNMS pointed at them, but I left that setting at default.



Re: [sparc64] dup alloc panic while recompiling base system on 7.2-current

2022-11-07 Thread Stuart Henderson
That's a sign of an unhealthy filesystem. It's just /usr/obj which is 
unlikely to have anything important on, I'd just newfs it.


--
 Sent from a phone, apologies for poor formatting.

On 7 November 2022 01:02:21 Koakuma  wrote:


Synopsis: Kernel panic while recompiling the base system on 7.2-current
Category: kernel sparc64
Environment:

System  : OpenBSD 7.2
Details : OpenBSD 7.2-current (GENERIC.MP) #9: Mon Nov  7 01:33:54 WIB 2022
 k@openbsd:/sys/arch/sparc64/compile/GENERIC.MP

Architecture: OpenBSD.sparc64
Machine : sparc64

Description:

This occurs when I'm updating a 7.2-current installation.
When recompiling the base system after installing an updated kernel,
the kernel panics in the middle of the compilation session.
The install is on a 8 vcore, 8 GiB memory LDOM on a Sun T5120.

How-To-Repeat:

Recompile kernel & base system as usual:
# cd /sys/arch/$(machine)/compile/GENERIC.MP
# make obj
# make config
# make -j8 && make install
# reboot
# cd /usr/src
# make -j8 obj
# make -j8 build # Panics in the middle of this command

Fix:


dmesg:
OpenBSD 7.2-current (GENERIC.MP) #9: Mon Nov  7 01:33:54 WIB 2022
   k@openbsd:/sys/arch/sparc64/compile/GENERIC.MP
real mem = 8589934592 (8192MB)
avail mem = 8408670208 (8019MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root: SPARC Enterprise T5120
cpu0 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu1 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu2 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu3 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu4 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu5 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu6 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
cpu7 at mainbus0: SUNW,UltraSPARC-T2 (rev 0.0) @ 1165.379 MHz
vbus0 at mainbus0
"flashprom" at vbus0 not configured
vrng0 at vbus0
cbus0 at vbus0
vdsk0 at cbus0 chan 0x2: ivec 0x4, 0x5
scsibus1 at vdsk0: 1 targets
sd0 at scsibus1 targ 0 lun 0: 
sd0: 40960MB, 512 bytes/sector, 83886080 sectors
vdsk1 at cbus0 chan 0x3: ivec 0x6, 0x7
scsibus2 at vdsk1: 1 targets
cd0 at scsibus2 targ 0 lun 0: 
vnet0 at cbus0 chan 0x4: ivec 0x8, 0x9, address 00:14:4f:f9:6a:75
vcons0 at vbus0: ivec 0x111: console
vrtc0 at vbus0
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
bootpath: /virtual-devices@100,0/channel-devices@200,0/disk@0,0
root on sd0a (f7061d976cd07cad.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted

usbdevs:
usbdevs: no USB controllers found
cp: /var/db/acpi/*: No such file or directory
b64encode: *: No such file or directory

pcidump:

acpidump:

ddb log:

mode = 0100660, inum = 441884, fs = /usr/obj
panic: ffs_valloc: dup alloc
Stopped at  db_enter+0x8:   nop
   TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
463321  83767 210x13  02  as
 92098  92219 210x13  05  cc1
258938  85141 210x13  04  as
422498  22643 210x13  00  cc1
165247  67024 210x13  06  as
*211105  14170 210x13  07K cc1
299542  30294 210x13  01  cc1
458544  71411  0 0x14000  0x2003  kmthread
ffs_inode_alloc(4000bfc94a0, 81b0, 4000ccfc380, 4007af57f00, 
fffe, 4007b497ba0) at ffs_inode_alloc+0x3b4
ufs_makeinode(81b0, 4000bf5e8d0, 4007b497ba0, 4007b497bd0, 4000cad5a80, 
800) at ufs_makeinode+0x5c
ufs_create(4007b497908, 4007b497b70, 8, 4007b497810, 2000, 8000) at 
ufs_create+0x3c
VOP_CREATE(4000bf5e8d0, 4007b497ba0, 19ad988, 4007b4979f8, 4007b497918, 
19ad988) at VOP_CREATE+0x44
vn_open(4007b4979f8, 602, 1b0, 0, ff9c, fffbe2d7) at 
vn_open+0x2fc
doopenat(4007b497b70, ff9c, fffbe2d7, 601, 1b0, 
4007b497df0) at doopenat+0x16c

syscall(4007b497ed0, 405, 2f11928a8, 2f11928ac, 0, 0) at syscall+0x388
syscall_setup(fffbe2d7, 601, 1b6, 0, 400, 200) at syscall_setup+0x134
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{7}> show panic
*cpu7: ffs_valloc: dup alloc
ddb{7}> mach ddbcpu 0
Stopped at  __mp_lock+0x68: ld  [%o0 + 0x800], %g1
sparc_intr_retry(1c8d160, 2c8adf658, 5, 100, 1ff, 6) at sparc_intr_retry+0x5c
intr_handler(2017ec8, 4000cd04300, 100f191, 100, 1ff, 2025e4000) at 
intr_handler+0x74
sparc_intr_retry(1c8d160, 4007b413b84, 4007b1a5cf0, 28a732300, 0, 6) at 
sparc_intr_retry+0x5c

trapsignal(4000cae6dc0, 5, 0, 1, 266798, 1) at trapsignal+0xe8
trap(4007b1a5ed0, 101, 266798, 820012, 6c3560, 128f40) at trap+0x500
Lslowtrap_reenter(0, 0, 0, 0, 28aaba228, 13) at Lslowtrap_reenter+0xf8
ddb{0}> mach ddbcpu 1
panic: trap  type  0x114 (*trap):  ppc=13fb26c  npc=13fb270  
pstate=820006

Stopped at  __mp_lock+0x64: rd  %ccr, %

Re: ASUS BR1100CK panics on Boot

2022-11-20 Thread Stuart Henderson
On 2022/11/20 16:38, theprin...@post-scri.pt wrote:
> I bought an ASUS BR1100CK laptop and decided to see if it could run
> OpenBSD 7.2. Installation appeared to go well with the help of the
> FAQ but I'm getting a kernel panic at boot.

at the boot> prompt, "boot -c", disable acpitz, quit - if that then allows you
to boot, use sendbug (often "doas sendbug -P > somefile", and copy the file to a
system which has working mail) to generate a report which includes dmesg and
a copy of the acpi tables.

> The CPU is an Intel Celeron N4500 with an integrated Intel GPU if that
> helps.
> 
> Here is the most that I am able to provide per the guidelines on bug
> reports.
> 
> `show panic`
> cp0: aml_die aml_convert:2095
> 
> `ps`
> PIDTIDPPIDUIDSFLAGS WAITCOMMAND
> 00   -1 070x10200  swapper
> 
> I've also attached photos of the trace command results for both
> -RELEASE and -CURRENT installations. show panic and ps commands
> are identical on both branch installations.
> 
> If anything else is useful, I'm prepared to attempt to retrieve more 
> information.
> 
> Thanks,
> TP







Re: Bug report: installer freezes in macOS Virtualization.framework virtual machines.

2022-12-02 Thread Stuart Henderson
On 2022/12/01 19:55, Catherine Kelly wrote:
> Bryan Steele wrote:
> > You may need to switch to the framebuffer console.
> >
> > boot> set tty fb0
> 
> I've tried that - it still didn't boot.

I don't know if it's general to all Virtualization.frameworks VMs, but
to run OpenBSD under UTM I needed to disable viorng in the VM config
otherwise it would hang (I forget where, but RNG is setup early so it
could be at this point).



Re: Bug report: installer freezes in macOS Virtualization.framework virtual machines.

2022-12-02 Thread Stuart Henderson
On 2022/12/02 05:23, Catherine Kelly wrote:
> UTM does not use Virtualization.framework. It is a GUI wrapper for QEMU.

Yes, but doesn't QEMU use apple's virtualization layer underneath it?
It's not doing machine emuation there. (QEMU+apple's virt stuff, similar
to QEMU+KVM, or vmd+vmm).

So if you're using whatever viorng maps to in that layer, it could be the
same issue.



Re: Bug report: installer freezes in macOS Virtualization.framework virtual machines.

2022-12-02 Thread Stuart Henderson
On 2022/12/02 06:32, Catherine Kelly wrote:
> No, Hypervisor.framewok and Virtualization.framework are two different
> things. Virtualization.framework was introduced in macOS 11 as an
> alternative to Hypervisor.framework.
> 
> Apple's documentation for Hypervisor.framework:
> https://developer.apple.com/documentation/hypervisor
> For Virtualization.framework:
> https://developer.apple.com/documentation/virtualization
> 
> On Fri, Dec 2, 2022, 5:48 AM Bryan Steele  wrote:
> 
> > On Fri, Dec 02, 2022 at 05:23:45AM -0800, Catherine Kelly wrote:
> > > UTM does not use Virtualization.framework. It is a GUI wrapper for QEMU.
> >
> > https://mac.getutm.app/
> >
> >  "UTM employs Apple's Hypervisor virtualization framework to run ARM64
> >   operating systems on Apple Silicon at near native speeds."
> >
> > > On Fri, Dec 2, 2022, 3:11 AM Stuart Henderson 
> > wrote:
> > >
> > > > On 2022/12/01 19:55, Catherine Kelly wrote:
> > > > > Bryan Steele wrote:
> > > > > > You may need to switch to the framebuffer console.
> > > > > >
> > > > > > boot> set tty fb0
> > > > >
> > > > > I've tried that - it still didn't boot.
> > > >
> > > > I don't know if it's general to all Virtualization.frameworks VMs, but
> > > > to run OpenBSD under UTM I needed to disable viorng in the VM config
> > > > otherwise it would hang (I forget where, but RNG is setup early so it
> > > > could be at this point).
> > > >
> > > >
> >
> >

Well, if you have got the bits of Virtualization.framework that provide
a virtio rng enabled, it might be worth trying to disable them.



Re: installation issue on x86_64

2022-12-09 Thread Stuart Henderson
On 2022/12/09 07:19, Andreas Ehlert wrote:
> hello openbsd folks,
> 
> thanks a lot for your os.
> i have an issue for your interest.
> 
> the install image install72.img have an failure.
> the installation routine can not find the sha256.sig
> file to check the base files with checksum.
> 
> i take a look on the usb stick and i found the sha256
> under 7.2/amd64 but not sha256.sig
> 
> the installation is only possible without verification of the base files.
> 
> i think this is a security issue for a fresh 7.2 installation.

If you have booted the USB stick, it is already too late to check
the crypto-signature; if it was a dodgy malicious file then it could have
already done damage. And the sha256 signature is good enough to detect
bad imaging.

If you have an existing OpenBSD installation, you can use signify to
verify the downloaded image. If it is an installation of 7.1, you already
have the 7.2 keys available. If not, you can either upgrade release by
release to 7.2 (each release having the keys for the subsequent release,
maintainging the chain of authenticity), or copy the public key for
the signature from https://www.openbsd.org/72.html.

If you don't have an existing OpenBSD installation, you can alternatively
use minisign to verify the download. It's packaged in some OS, or fetch
it from https://jedisct1.github.io/minisign/

> when i make a wish. i wish peace, love and unity for the human race and a 
> installation routine with checksum verification of the base files.

eh, diversity is good too



Re: cc claims ISO C99 support, but %n printf format specifier calls abort()

2022-12-16 Thread Stuart Henderson
On 2022/12/16 10:50, Vincent Lefevre wrote:
> On 2022-12-15 18:56:15 -0700, Theo de Raadt wrote:
> > There are almost no %n left in the software ecosystem.  If we are able
> > to make this crossing, everyone else is also capable, and eventually
> > will.  Just like with gets().
> 
> FYI, this breaks GMP, whose configure script insists on %n being
> available, otherwise GMP uses its own, buggy implementation of
> vsnprintf, which triggers an assertion failure when %a/%A is used
> (and this bug affects MPFR). AFAIK, the GMP developers haven't
> reacted to the bug report sent in October.

btw, that doesn't appear to affect the GMP port; the values passed in from
ports infrastructure via config.cache override the autoconf check for %n
(which appears to be trying to detect a bug in Solaris 2.7 on 64-bit SPARC).

> BTW, if developers use an untrusted format string, then sprintf()
> is unsafe too (possible buffer overflow), and at some point,
> printf() too.
> 
> -- 
> Vincent Lefèvre  - Web: 
> 100% accessible validated (X)HTML - Blog: 
> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
> 



Re: feature request: installation image with xfce installed

2022-12-19 Thread Stuart Henderson
On 2022/12/19 17:41, Peter Nicolai Mathias Hansteen wrote:
> > 17. des. 2022 kl. 13:03 skrev Andrey :
> > 
> > 
> > I need an installation image that immediately, in addition to OpenBSD, 
> > install xfce for me and will automatically launch XFCE when the computer is 
> > turned on. For many reasons, it is impossible to download it manually, 
> > especially when you are very far from the Internet, and having a separate 
> > installation image containing xfce will lower the threshold for logging 
> > into OpenBSD for Windows, mac users, etc.
> 
> What you describe can be achieved fairly straightforwardly with a siteNM.tgz 
> that contains the required files and the script to perform the package 
> install at the tail end of the install process.
> 
> Take a few moments to read https://www.openbsd.org/faq/faq4.html#site and 
> you’re quite a bit of the way there.

also, "lower the threshold for Windows, mac users, etc" isn't really a
goal of OpenBSD.



Re: mpv segfaults on -current

2023-01-24 Thread Stuart Henderson
On 2023/01/23 14:21, nat...@blazebone.com wrote:
> >Synopsis:Trying to play videos with mpv segfaults
> >Category:
> >Environment:
>   System  : OpenBSD 7.2
>   Details : OpenBSD 7.2-current (GENERIC.MP) #979: Sun Jan 22 
> 21:51:52 MST 2023
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Right after sysupgrading to the latest snapshot today and after running 
> pkg_add -uU
>   mpv was segfaulting each time I try to play a video.  I tried mp4 and 
> webm, I get the
>   following output:

Very unlikely to be the problem here, but don't use -U when you're
updating packages, it's a shortcut for when you install a new package
and need to get the immediate dependencies updated, but might leave
other installed packages with a broken mixture of libraries. (i.e.
speed hack but might not always give good results).

The normal way to update all packages is just "pkg_add -u".

>  (+) Video --vid=1 (*) 'Presented By EMBER' (hevc 1920x1080 23.976fps)
>  Video --vid=2 [P] 'cover.jpg' (mjpeg)
>  (+) Audio --aid=1 --alang=jpn (*) (aac 2ch 44100Hz)
>  (+) Subs  --sid=1 --slang=eng (*) (ass)
> File tags:
>  Title: S01E05-More Than a Nosebleed, but Less Than a Kiss
> Segmentation fault (core dumped) 

https://www.openbsd.org/faq/ports/ports.html#Backtrace may give us some
clues as to where the problem is.



Re: mpv segfaults on -current

2023-01-24 Thread Stuart Henderson
On 2023/01/24 15:24, nature wrote:
> 
> "Theo de Raadt"  writes:
> 
> > I'm very confident we can make it through this last phase, in which case
> > the next release will ship with xonly.  Otherwise, we'll slow the
> > process down.  Not going to slow it down yet.  Thanks for participating
> > in snapshots and helping us make a better world.
> 
> Thank you Theo for building this wonderful system which is (also in my
> opinion) a step towards a better world!  I am very glad if I can be of
> any help.
> 
> 
> Stuart Henderson  writes:
> 
> > Very unlikely to be the problem here, but don't use -U when you're
> > updating packages, it's a shortcut for when you install a new package
> > and need to get the immediate dependencies updated, but might leave
> > other installed packages with a broken mixture of libraries. (i.e.
> > speed hack but might not always give good results).
> >
> > The normal way to update all packages is just "pkg_add -u".
> 
> Okay, thank you for the explaination, I initially tried with pkg_add -u
> But then I tried -Uu since I thought it was more "thorough", forcing
> everything to update.
> 
> >>  (+) Video --vid=1 (*) 'Presented By EMBER' (hevc 1920x1080 23.976fps)
> >>  Video --vid=2 [P] 'cover.jpg' (mjpeg)
> >>  (+) Audio --aid=1 --alang=jpn (*) (aac 2ch 44100Hz)
> >>  (+) Subs  --sid=1 --slang=eng (*) (ass)
> >> File tags:
> >>  Title: S01E05-More Than a Nosebleed, but Less Than a Kiss
> >> Segmentation fault (core dumped) 
> >
> > https://www.openbsd.org/faq/ports/ports.html#Backtrace may give us some
> > clues as to where the problem is.
> 
> Okay, so to expand, I think the problem seems to be at the level of the
> codec code since not only mpv segfault, but vlc and mplayer... Even more
> so, each time I try to play a video in my web browser (chromium-based),
> the tab crashes as well.
> 
> So followed the FAQ that you linked, fortunately there was a debug-mpv
> package in the ports, so I'll inline the little report I made with it.
> I tried to play files with different video codecs (hevc and vp9) and
> they both produce the following error:
> 

> This is the core after trying to play a video encoded in hevc:
> 
> $ egdb mpv mpv.core   
>   
>  
> GNU gdb (GDB) 9.2
> Copyright (C) 2020 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Type "show copying" and "show warranty" for details.
> This GDB was configured as "x86_64-unknown-openbsd7.2".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> 
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from mpv...
> Reading symbols from /usr/local/bin/.debug/mpv.dbg...
> [New process 590111]
> [New process 446975]
> [New process 584337]
> [New process 382906]
> [New process 408449]
> [New process 326071]
> [New process 344625]
> [New process 397508]
> [New process 420988]
> [New process 163934]
> [New process 532396]
> [New process 397225]
> [New process 506725]
> [New process 121303]
> Core was generated by `mpv'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0bdcab34912c in ff_imdct_half_avx.pre () from 
> /usr/local/lib/libavcodec.so.25.0
> [Current thread is 1 (process 590111)]

Thanks - this might be fixed by a recent commit to ports/graphics/ffmpeg
(you can try updating yourself if you have the ports tree installed, or
wait for the next package snapshot). But other changes may be needed
too. This is being actively looked at.

General hint about GDB (though I think at this point we probably don't
need more): if you type "bt" at that point hopefully it will give you a
list of other functions that were called prior to the one which is
immediately printed (ff_imdct_half_avx.pre in this case), which can
sometimes be helpful. And in the case where there are entries coming
from libraries (like libavcodec in this case), installing the
debug-package for the library can be helpful too (this one is from
ffmpeg, and there's a "debug-ffmpeg" package).

> This is the core after trying to play a video file encoded with vp9:
> 
> $ egdb mpv mpv.core   
>   
>  
..
> #0  0x099a1ac9ff51 in ff_fft_calc_sse () from 
> /usr/local/lib/libavcodec.so.25.0
> [Current thread is 1 (process 308636)]

I think this one in particular (fft) is very likely to be fixed by the
ffmpeg commit.



Re: Caps Lock influences yubikey

2023-02-07 Thread Stuart Henderson
On 2023/02/07 12:13, Paul de Weerd wrote:
> Should yubikey really depend on the caps lock state of an (external)
> keyboard?  Would it make sense to lower the case of any password

there's complex code to handle 90 odd different keymaps, it would seem
silly to go to that much trouble and not cope with capslock.

> entered by the yubikey (I locked myself out with my first attempt at
> toucing libexec/login_yubikey to achieve this, so I won't offer that
> diff here ;)

you could try this, untested as I don't have a private key handy
and don't want to reset my yk, but I think it should work.

Index: yubikey.c
===
RCS file: /cvs/src/libexec/login_yubikey/yubikey.c,v
retrieving revision 1.6
diff -u -p -r1.6 yubikey.c
--- yubikey.c   16 Sep 2017 08:07:15 -  1.6
+++ yubikey.c   7 Feb 2023 12:33:29 -
@@ -308,10 +308,14 @@ uint8_t
 yubikey_keymap_decode(wchar_t *wpassword, char *token, int index)
 {
int c, j, found;
+   wchar_t lower;
for (j=0; j= 0x41 && lower <= 0x5a)
+   lower |= 0x20;
+   if (lower == keymaps[index][c]) {
token[j] = modhex_trans[c];
found++;
break;



Re: Debugging a 7.2-current segfault that started around #3033

2023-02-20 Thread Stuart Henderson
On 2023/02/20 10:46, Stephan Somogyi wrote:
> On aarch64 on a RPi3, somewhere between 7.2-current GENERIC.MP#2028 and
> GENERIC.MP#2033, and the package snapshots that happened around the #2033
> time,something changed that causes persistent segfaulting while executing a
> perl script that's been unchanged for a year.

The kernel build numbers aren't really helpful in identifying when these
snapshots were built; dates would be better

First off, assuming you're now on a system which is after the perl
update, make sure you updated all packages and don't have old XS modules
lying around; you should get no results from

grep '@wantlib perl.22.0' /var/db/pkg/*/+CONTENTS

(Or if you've built your own Perl native extensions from outside of
packages then they'll want rebuilding)

> The segfault happens when python3 scripts are invoked from within the perl
> script. I'm invoking the python3 scripts via system(); when I then SIGINT
> via Ctrl-C, the traceback is from within python3.10, suggesting that
> something that python is sub-launching might be causing the problem, but my
> understanding of python internals is basically zero.
> 
> If I manually invoke either of the python scripts with identical
> parameters, everything works, so it's not an innate problem with those
> scripts or their interaction with python3.10; it's something about how
> they're being invoked from perl.
> 
> I've tried building minimal repro case perl scripts and am so far
> unsuccessful; everything works without segfaulting.
> 
> I've done enough isolation work now that I'm running into my limits of
> knowledge and was curious whether this recent behavior rings a bell with
> anyone here. Grasping at straws, but could this have anything to do with
> the xonly work?
> 
> Thanks for any suggestions for how to better identify the root cause and
> thus generate a more useful bug report.

A backtrace from the segfault might give some clues (pkg_add gdb and
use the 'egdb' binary; the version of gdb that is in base is not really
useful in most cases any more)



Re: Support for Banana PI BPI-M5 confusion on ftp.openbsd.org vs www.openbsd.org

2023-03-03 Thread Stuart Henderson
On 2023/03/02 15:48, kod code wrote:
> Hello,
> 
> at
> https://ftp.openbsd.org/pub/OpenBSD/7.2/arm64/INSTALL.arm64
> under
> "OpenBSD System Requirements and Supported Devices:"
> ...
> "Amlogic G12B/SM1"
> the Banana PI BPI-M5 isn't listed.

That's from 7.2 release time, the webpage was updated later.

> Whereas at
> https://www.openbsd.org/arm64.html
> under
> "OpenBSD/arm64 runs on the following hardware:"
> ...
> "Amlogic G12B/SM1"
> it can be found.
> 
> Please inform me, which page is accurate.

I don't think there was any additional code adding support since 7.2
so it should work there as well as in -current.

See https://marc.info/?l=openbsd-arm&m=16775528577&w=2 for info
about the boot loader.



Re: Sierra Wireless MC7750 attaches as ugen(4) on OpenBSD 7.3 #1125 2023-March-25

2023-04-06 Thread Stuart Henderson
On 2023/04/06 09:13, Gerhard Roth wrote:
> 2) query the list of modes with "AT!UDUSBCOMP=?". Example result:
> 
> 0  - reserved NOT SUPPORTED
> 1  - DM   AT  SUPPORTED
> 2  - reserved NOT SUPPORTED
> 3  - reserved NOT SUPPORTED
> 4  - reserved NOT SUPPORTED
> 5  - reserved NOT SUPPORTED
> 6  - DM   NMEA  ATQMI SUPPORTED
> 7  - DM   NMEA  ATRMNET1 RMNET2 RMNET3SUPPORTED
> 8  - DM   NMEA  ATMBIMSUPPORTED
> 9  - MBIM SUPPORTED
> 10 - NMEA MBIMSUPPORTED
> 11 - DM   MBIMSUPPORTED
> 12 - DM   NMEA  MBIM  SUPPORTED
> 13 - Config1: comp6Config2: comp8 NOT SUPPORTED
> 14 - Config1: comp6Config2: comp9 SUPPORTED
> 15 - Config1: comp6Config2: comp10NOT SUPPORTED
> 16 - Config1: comp6Config2: comp11NOT SUPPORTED
> 17 - Config1: comp6Config2: comp12NOT SUPPORTED
> 18 - Config1: comp7Config2: comp8 NOT SUPPORTED
> 19 - Config1: comp7Config2: comp9 SUPPORTED
> 20 - Config1: comp7Config2: comp10NOT SUPPORTED
> 21 - Config1: comp7Config2: comp11NOT SUPPORTED
> 22 - Config1: comp7Config2: comp12NOT SUPPORTED
> 
> There is no guarantee that the table doesn't change. And every
> device has a differnt set of supported modes.
> 
> 3) select the desired mode with "AT!UDUSBCOMP=X"

Take care with this. You can put the modem in a mode where you no
longer have an AT interface to be able to reset it (there maybe
a different way to reset it via USB commands rather than AT commands,
but that won't be straightforward from OpenBSD).


> 4) wait for the device to reset itself
> 
> > 
> > Index: umsm.c
> > ===
> > RCS file: /cvs/src/sys/dev/usb/umsm.c,v
> > retrieving revision 1.125
> > diff -u -p -r1.125 umsm.c
> > --- umsm.c  2 Apr 2023 23:57:57 -   1.125
> > +++ umsm.c  6 Apr 2023 08:40:30 -
> > @@ -271,6 +271,7 @@ static const struct umsm_type umsm_devs[
> > {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_AIRCARD_340U}, 0},
> > {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_AIRCARD_770S}, 0},
> > {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_MC7455}, 0},
> > +   {{ USB_VENDOR_SIERRA, USB_PRODUCT_SIERRA_MC7700}, 0},
> >  
> > {{ USB_VENDOR_SIMCOM, USB_PRODUCT_SIMCOM_SIM5320}, 0},
> > {{ USB_VENDOR_SIMCOM, USB_PRODUCT_SIMCOM_SIM7600E}, 0},
> > 
> 




intel T14 gen 3, picom triggers page fault trap in dpt_insert_entries

2023-04-24 Thread Stuart Henderson
Running picom (with no special config or command line flags) on intel
T14 gen 3 fairly easily triggers a crash in drm. If it doesn't fail the
first time, exiting and restarting a few times pretty much always
triggers it.

Full proc listing below after dmesg, Xorg is the only active process
at the time.

xcompmgr hasn't yet triggered it.


uvm_fault(0x824b4570, 0x81e73014, 0, 1) -> e
kernel: page fault trap, code=0
Stopped at  dpt_insert_entries+0xbc:movl0x34(%r8),%r10d
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND  
 
*459624  48440 350x12  04K Xorg 
  
dpt_insert_entries(81a1cc00,fd83b9afd178,0,0) at 
dpt_insert_entries+0xbc
dpt_bind_vma(81a1cc00,0,fd83b9afd178,0,400) at dpt_bind_vma+0x64
i915_vma_bind(81ce4ec0,0,400,0,fd83b9afd178) at i915_vma_bind+0x319
i915_vma_pin_ww(81ce4ec0,800033b78db0,0,20,400) at 
i915_vma_pin_ww+0x454
intel_plane_pin_fb(81cc9000) at intel_plane_pin_fb+0x25c
intel_prepare_plane_fb(814c7400,81cc9000) at 
intel_prepare_plane_fb+0x127
drm_atomic_helper_prepare_planes(8044c078,81cda000) at 
drm_atomic_helper_prepare_planes+0x5b
intel_atomic_commit(8044c078,81cda000,1) at 
intel_atomic_commit+0xda
drm_atomic_helper_page_flip(814c2800,81e41200,81d55300,1,800033b79048)
 at drm_atomic_helper_page_flip+0x77
drm_mode_page_flip_ioctl(8044c078,800033b793e0,8195bc00) at 
drm_mode_page_flip_ioctl+0x466
drm_do_ioctl(8044c078,100,c01864b0,800033b793e0) at 
drm_do_ioctl+0x29e
drmioctl(15700,c01864b0,800033b793e0,3,800033bba5c8) at drmioctl+0xdc
VOP_IOCTL(fd845bb870f0,c01864b0,800033b793e0,3,fd845efad750,800033bba5c8)
 at VOP_IOCTL+0x60
vn_ioctl(fd845bd084c0,c01864b0,800033b793e0,800033bba5c8) at 
vn_ioctl+0x79


OpenBSD 7.3-current (GENERIC.MP) #2: Mon Apr 24 08:24:39 BST 2023
st...@lundy.spacehopper.org:/sys/arch/amd64/compile/GENERIC.MP
real mem = 16814370816 (16035MB)
avail mem = 16285147136 (15530MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.4 @ 0x8d8a3000 (81 entries)
bios0: vendor LENOVO version "N3MET12W (1.11 )" date 02/09/2023
bios0: LENOVO 21AJS4GY00
efi0 at bios0: UEFI 2.7
efi0: Lenovo rev 0x1110
acpi0 at bios0: ACPI 6.3
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT SSDT TPM2 HPET APIC MCFG ECDT SSDT 
SSDT SSDT SSDT SSDT SSDT LPIT WSMT SSDT DBGP DBG2 NHLT MSDM SSDT BATB DMAR SSDT 
SSDT SSDT PHAT UEFI FPDT ASF! BGRT
acpi0: wakeup devices PEG0(S4) PEGP(S4) PEGP(S4) PEG2(S4) PEGP(S4) GLAN(S4) 
XHCI(S3) XDCI(S4) HDAS(S4) CNVW(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) 
RP03(S4) PXSX(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 1920 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: 12th Gen Intel(R) Core(TM) i5-1245U, 1720.17 MHz, 06-9a-04
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,WAITPKG,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 12MB 64b/line 12-way L3 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 38MHz
cpu0: mwait min=64, max=64, C-substates=0.2.0.2.0.1.0.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: 12th Gen Intel(R) Core(TM) i5-1245U, 1893.04 MHz, 06-9a-04
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,PKU,PKS,MD_CLEAR,IBT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 48KB 64b/line 12-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
10-way L2 cache, 12MB 64b/line 12-way L3 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 8 (application processor)
cpu2: 12th Gen Intel(R) Core(TM) i5-1245U, 1561.10 MHz, 06-9a-04
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES6

Re: intel T14 gen 3, picom triggers page fault trap in dpt_insert_entries

2023-04-24 Thread Stuart Henderson
On 2023/04/24 23:53, Jonathan Gray wrote:
> On Mon, Apr 24, 2023 at 01:49:32PM +0100, Stuart Henderson wrote:
> > Running picom (with no special config or command line flags) on intel
> > T14 gen 3 fairly easily triggers a crash in drm. If it doesn't fail the
> > first time, exiting and restarting a few times pretty much always
> > triggers it.
> > 
> > Full proc listing below after dmesg, Xorg is the only active process
> > at the time.
> > 
> > xcompmgr hasn't yet triggered it.
> > 
> > 
> > uvm_fault(0x824b4570, 0x81e73014, 0, 1) -> e
> > kernel: page fault trap, code=0
> > Stopped at  dpt_insert_entries+0xbc:movl0x34(%r8),%r10d
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND  
> >  
> > *459624  48440 350x12  04K Xorg 
> >   
> > dpt_insert_entries(81a1cc00,fd83b9afd178,0,0) at 
> > dpt_insert_entries+0xbc
> 
> this is line 34 of /sys/dev/pci/drm/i915/i915_scatterlist.h
> 
> 23  static __always_inline struct sgt_iter {
> 24  struct scatterlist *sgp;
> 25  union {
> 26  unsigned long pfn;
> 27  dma_addr_t dma;
> 28  };
> 29  unsigned int curr;
> 30  unsigned int max;
> 31  } __sgt_iter(struct scatterlist *sgl, bool dma) {
> 32  struct sgt_iter s = { .sgp = sgl };
> 33
> 34  if (dma && s.sgp && sg_dma_len(s.sgp) == 0) {
> 35  s.sgp = NULL;
> 36  } else if (s.sgp) {
> 
> sgl is pointing to something that isn't there?
> 
> I have an intel t14 gen 3 but can't reproduce this.
> Running fvwm from xenocara and starting picom from xterm 20 times or so,
> ^C after each.

I'm using i3, though I used to have picom in .xsession which was started
very early - certainly before the wm loaded - and that was crashing too.

Currently this is twin display (internal + an HDMI display connected to
a USB-C dock, attached via DP-3) though I first ran into with just the
internal display. (Took me a little while to get to a state where I
could type into ddb).

> Looking over the local changes to i915_scatterlist.h the segment size
> could be larger, I'm not sure if that would help.

I will try that and report back after lunch.


> Index: dev/pci/drm/i915/i915_scatterlist.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_scatterlist.h,v
> retrieving revision 1.3
> diff -u -p -r1.3 i915_scatterlist.h
> --- dev/pci/drm/i915/i915_scatterlist.h   1 Jan 2023 01:34:54 -   
> 1.3
> +++ dev/pci/drm/i915/i915_scatterlist.h   24 Apr 2023 13:15:46 -
> @@ -153,7 +153,7 @@ static inline unsigned int i915_sg_segme
>  #else
>  static inline unsigned int i915_sg_segment_size(struct device *dev)
>  {
> - return PAGE_SIZE;
> + return round_down(UINT_MAX, PAGE_SIZE);
>  }
>  #endif
>  
> 
> > dpt_bind_vma(81a1cc00,0,fd83b9afd178,0,400) at dpt_bind_vma+0x64
> > i915_vma_bind(81ce4ec0,0,400,0,fd83b9afd178) at 
> > i915_vma_bind+0x319
> > i915_vma_pin_ww(81ce4ec0,800033b78db0,0,20,400) at 
> > i915_vma_pin_ww+0x454
> > intel_plane_pin_fb(81cc9000) at intel_plane_pin_fb+0x25c
> > intel_prepare_plane_fb(814c7400,81cc9000) at 
> > intel_prepare_plane_fb+0x127
> > drm_atomic_helper_prepare_planes(8044c078,81cda000) at 
> > drm_atomic_helper_prepare_planes+0x5b
> > intel_atomic_commit(8044c078,81cda000,1) at 
> > intel_atomic_commit+0xda
> > drm_atomic_helper_page_flip(814c2800,81e41200,81d55300,1,800033b79048)
> >  at drm_atomic_helper_page_flip+0x77
> > drm_mode_page_flip_ioctl(8044c078,800033b793e0,8195bc00)
> >  at drm_mode_page_flip_ioctl+0x466
> > drm_do_ioctl(8044c078,100,c01864b0,800033b793e0) at 
> > drm_do_ioctl+0x29e
> > drmioctl(15700,c01864b0,800033b793e0,3,800033bba5c8) at 
> > drmioctl+0xdc
> > VOP_IOCTL(fd845bb870f0,c01864b0,800033b793e0,3,fd845efad750,800033bba5c8)
> >  at VOP_IOCTL+0x60
> > vn_ioctl(fd845bd084c0,c01864b0,800033b793e0,800033bba5c8) at 
> > vn_ioctl+0x79
> 



Re: intel T14 gen 3, picom triggers page fault trap in dpt_insert_entries

2023-04-24 Thread Stuart Henderson
On 2023/04/24 15:00, Stuart Henderson wrote:
> > Looking over the local changes to i915_scatterlist.h the segment size
> > could be larger, I'm not sure if that would help.
> 
> I will try that and report back after lunch.

Doesn't help.


> 
> > Index: dev/pci/drm/i915/i915_scatterlist.h
> > ===
> > RCS file: /cvs/src/sys/dev/pci/drm/i915/i915_scatterlist.h,v
> > retrieving revision 1.3
> > diff -u -p -r1.3 i915_scatterlist.h
> > --- dev/pci/drm/i915/i915_scatterlist.h 1 Jan 2023 01:34:54 -   
> > 1.3
> > +++ dev/pci/drm/i915/i915_scatterlist.h 24 Apr 2023 13:15:46 -
> > @@ -153,7 +153,7 @@ static inline unsigned int i915_sg_segme
> >  #else
> >  static inline unsigned int i915_sg_segment_size(struct device *dev)
> >  {
> > -   return PAGE_SIZE;
> > +   return round_down(UINT_MAX, PAGE_SIZE);
> >  }
> >  #endif
> >  
> > 
> > > dpt_bind_vma(81a1cc00,0,fd83b9afd178,0,400) at 
> > > dpt_bind_vma+0x64
> > > i915_vma_bind(81ce4ec0,0,400,0,fd83b9afd178) at 
> > > i915_vma_bind+0x319
> > > i915_vma_pin_ww(81ce4ec0,800033b78db0,0,20,400) at 
> > > i915_vma_pin_ww+0x454
> > > intel_plane_pin_fb(81cc9000) at intel_plane_pin_fb+0x25c
> > > intel_prepare_plane_fb(814c7400,81cc9000) at 
> > > intel_prepare_plane_fb+0x127
> > > drm_atomic_helper_prepare_planes(8044c078,81cda000) at 
> > > drm_atomic_helper_prepare_planes+0x5b
> > > intel_atomic_commit(8044c078,81cda000,1) at 
> > > intel_atomic_commit+0xda
> > > drm_atomic_helper_page_flip(814c2800,81e41200,81d55300,1,800033b79048)
> > >  at drm_atomic_helper_page_flip+0x77
> > > drm_mode_page_flip_ioctl(8044c078,800033b793e0,8195bc00)
> > >  at drm_mode_page_flip_ioctl+0x466
> > > drm_do_ioctl(8044c078,100,c01864b0,800033b793e0) at 
> > > drm_do_ioctl+0x29e
> > > drmioctl(15700,c01864b0,800033b793e0,3,800033bba5c8) at 
> > > drmioctl+0xdc
> > > VOP_IOCTL(fd845bb870f0,c01864b0,800033b793e0,3,fd845efad750,800033bba5c8)
> > >  at VOP_IOCTL+0x60
> > > vn_ioctl(fd845bd084c0,c01864b0,800033b793e0,800033bba5c8) at 
> > > vn_ioctl+0x79
> > 
> 



Re: lock order reversal: drmwq and wakeref.mutex

2023-04-24 Thread Stuart Henderson
On 2023/04/24 15:50, Klemens Nanni wrote:
> cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03

ah you got one of the warm CPU versions then :)



Re: lock order reversal: drmwq and wakeref.mutex

2023-04-24 Thread Stuart Henderson
On 2023/04/24 16:05, Klemens Nanni wrote:
> On Mon, Apr 24, 2023 at 04:58:08PM +0100, Stuart Henderson wrote:
> > On 2023/04/24 15:50, Klemens Nanni wrote:
> > > cpu0: 12th Gen Intel(R) Core(TM) i7-1270P, 2095.31 MHz, 06-9a-03
> > 
> > ah you got one of the warm CPU versions then :)
> 
> what does that mean?

there are U and P versions of the intel 12g laptop CPUs,
P are faster but I found a few comments that they can run a little
on the toasty side.



Re: intel t14 gen3: microphone recording does not work

2023-04-26 Thread Stuart Henderson
On 2023/04/25 19:40, Klemens Nanni wrote:
> Speakers work fine, 'aucat -o rec.wav' produces non-zero data,
> but 'aucat -i rec.wav' keeps quiet ('mpv song73.ogg' plays).
> 
> https://www.openbsd.org/faq/faq13.html#enablerec did not help me,
> there is nothing muted and I did not find a knob to tweak to make it work.

Do you mean the internal mic array? I believe it will need sof-firmware
that we don't have support for.



Re: Intel Ethernet (?Synopsys based) on Filet3 Elkhart Lake unconfigured on recent snapshot

2023-05-03 Thread Stuart Henderson
On 2023/05/02 19:03, Ted Ri wrote:
> According to Compulab the Fitlet3 onboard Ethernet uses Marvell 88E1512 phys. 
>  The data sheet from Marvell is here:
> 
> https://www.marvell.com/content/dam/marvell/en/public-collateral/phys-transceivers/marvell-phys-transceivers-alaska-88e151x-datasheet.pdf

The PHY is likely already supported or very close to something
already supported, it's the network interface itself that isn't.



Re: OpenBSD 7.3/amd64 on APU4D4 - system drops into ddb

2023-05-15 Thread Stuart Henderson
On 2023/05/15 16:29, Radek wrote:
> Hello,
> continuing the previous topic [1] my APU4D4 router randomly drops into ddb 
> when runnig 7.3/amd64. At some crashes my serial console is not responding 
> too. 
> I attached dmesg and ddb console output of my last crash. 
> Maybe there is a hardware issue...
> 
> 1. https://marc.info/?t=16609992761&r=1&w=2
> 
> ddb{0}> show panic
> the kernel did not panic

Do you have a log of what was output on the console when the system
entered DDB?



Re: Some programs appear to cause system to leak memory, fill ram

2023-05-15 Thread Stuart Henderson
On 2023/05/15 19:55, bugreport555 wrote:
> Ok, I tested it in various ways and tried to force OOM killer to step in but 
> it never did and all worked fine.

OOM killer? This isn't Linux.



Re: Some programs appear to cause system to leak memory, fill ram

2023-05-16 Thread Stuart Henderson
On 2023/05/16 09:12, Rudolf Leitgeb wrote:
> Lots of people (including myself) come from linux background and use
> OpenBSD for specific security sensitive tasks. Since OpenBSD, like 
> every other desktop&server OS these days, has some strategy to deal
> with OOM conditions, the term "OOM killer" is perfectly clear
> regardless of what the actual implementation in OpenBSD is called.
> 
> On Mon, 2023-05-15 at 21:32 +0100, Stuart Henderson wrote:
> > On 2023/05/15 19:55, bugreport555 wrote:
> > > Ok, I tested it in various ways and tried to force OOM killer to
> > > step in but it never did and all worked fine.
> > 
> > OOM killer? This isn't Linux.
> > 
> 

The strategy is that the sysadmin should configure datasize limits so
that processes hit memory allocation failures if they try to overreach.
Defaults are setup with typical use-cases and machines in mind but you
might know better and adjust.

The kernel doesn't cope particularly well if you actually run out of
memory. Long delays, deadlocks, panics are likely. Yes bugs, but they are
difficult ones, and the above strategy (i.e. use the system's built in
protection mechanisms for userland processes) is not a bad one.

(I understand that even on Linux with "OOM killer" it is often still
advisable to reboot when possible after triggering it.)



Re: Can't sync local usb mounted mirror with openrsync

2023-05-16 Thread Stuart Henderson
On 2023/05/16 22:38, Julian Huhn wrote:
>   I have an external hard drive that I want to use as a storage 
>   location for a local mirror. The initial synchronization of the 
>   mirror went through successfully with openrsync, but each new run 
>   hangs either with no error or with the following error message:
> 
>   openrsync: error: poll: hangup
>   openrsync: error: io_write_nonblocking
>   openrsync: error: io_write_nonblocking
>   openrsync: error: rsync_uploader
>   openrsync: error: rsync_receiver
> 
>   The problem is reproducible for all tested mirrors from the official 
>   list: https://www.openbsd.org/ftp.html#rsync

Even if it had worked, please don't use openrsync for that, it speaks
an old protocol version which requires the mirror to transfer full file
list in one go. Use the original rsync from packages.



Re: group owner segmentation

2023-05-21 Thread Stuart Henderson
On 2023/05/21 12:49, panpansh wrote:
> Hi, trying this:
> 
> chmod o-rx /usr/bin/ftp; groupadd g_fetch; usermod -G g_fetch _pkgfetch; 
> chown root:g_fetch /usr/bin/ftp
> 
> # pkg_add: can't exec /usr/bin/ftp: permission denied at 
> /usr/libdata/perl5/OpenBSD/PackageRepository.pm line 869
> 
> # offcourse setting _pkgfetch as group owner of /usr/bin/ftp raise no error 
> executing pkg_add. But its restrictive and not the goal
> .

You don't mention what the goal is. But it's possible that it might be
better solved by using PF "user" and/or "group" rules, which will also
restrict network access from programs other than ftp (since there are
a couple of other programs in base which would allow doing basically
the same thing).



Re: Is it possible to use torrent to distribute OpenBSD?

2023-05-24 Thread Stuart Henderson
On 2023/05/23 00:05, Abel Abraham Camarillo Ojeda wrote:
> In some countries ISPs limit aggressively the throughput of single tcp
> connections

and sometimes this isn't done on purpose but can happen due to problems
in their equipment.

>  and torrents are the only mean available to get more than 20%
> your billed speed

not the only way, you can run parallel http downloads from servers that
handle "range" requests. see e.g. aria2



Re: pfctl problem with osfp parser

2023-05-25 Thread Stuart Henderson
On 2023/05/25 17:40, Alexandr Nedvedicky wrote:
> Hello,
> 
> I took a look at signatures:
> > https://tools.netsa.cert.org/p0f/p0f.fp.2012032901 signatures file (pf.os),
> 
> This change is not about updating parser it looks like it will
> also require to update matching stuff in kernel. I have not looked
> at all details yet.

as well as PF, this database is used by tcpdump (-o).

quirks were added in p0f v2.x (p0f upstream is now at version 3.x which
has a completely different database).

(p0f 2.x is in ports)

> below is fingerprint entry format as found in etc/pf.os we have
> in current:
> 
> # Fingerprint entry format:
> #
> # :ttt:D:ss:OOO...:OS:Version:Subtype:Details
> #
> #  - window size (can be *, %nnn, Snn or Tnn).  The special values
> #"S" and "T" which are a multiple of MSS or a multiple of MTU
> #respectively.
> # ttt  - initial TTL
> # D- don't fragment bit (0 - not set, 1 - set)
> # ss   - overall SYN packet size
> # OOO  - option value and order specification (see below)
> # OS   - OS genre (Linux, Solaris, Windows)
> # Version  - OS Version (2.0.27 on x86, etc)
> # Subtype  - OS subtype or patchlevel (SP3, lo0)
> # details  - Generic OS details
> 
> and here is a format description from link above
> 
> # Fingerprint entry format:
> #
> # :ttt:D:ss:OOO...:QQ:OS:Details
> #
> #  - window size (can be * or %nnn or Sxx or Txx)
> #  "Snn" (multiple of MSS) and "Tnn" (multiple of MTU) are allowed.
> # ttt  - initial TTL 
> # D- don't fragment bit (0 - not set, 1 - set)
> # ss   - overall SYN packet size (* has a special meaning)
> # OOO  - option value and order specification (see below)
> # QQ   - quirks list (see below)
> # OS   - OS genre (Linux, Solaris, Windows)
> # details  - OS description (2.0.27 on x86, etc)
> 
> 
> # Quirks section is usually an empty list ('.') of oddities or bugs of this
> # particular stack. List items are not separated in any way. Possible values:
> #
> # P - options past EOL,
> # Z   - zero IP ID,
> # I   - IP options specified,
> # U   - urg pointer non-zero,
> # X - unused (x2) field non-zero,
> # A   - ACK number non-zero,
> # T - non-zero second timestamp,
> # F - unusual flags (PUSH, URG, etc),
> # D - data payload,
> # ! - broken options segment.
> 
> 
> quirks are new and I think we will also have to update code in kernel too.
> 
> I'm afraid it's more than just fixing the parser.
> 
> regards
> sashan
> 
> On Mon, May 22, 2023 at 06:50:34PM +0300,   wrote:
> > Apologize in advance for my bad english :) I am trying to use this
> > https://tools.netsa.cert.org/p0f/p0f.fp.2012032901 signatures file (pf.os),
> > as far as I understand, it is newer than the one that comes with the OS
> > (and maybe you will update it too). "too short OS description" error
> > appears when trying to apply rules (pfctl -f /etc/pf.conf).
> > 
> > I think the problem is somewhere in parser, judging by the description in
> > the file at the link I provided:
> > 
> > # If OS genre starts with '*', p0f will not show distance, link type
> > # and timestamp data. It is useful for userland TCP/IP stacks of
> > # network scanners and so on, where many settings are randomized or
> > # bogus.
> > #
> > # If OS genre starts with @, it denotes an approximate hit for a group
> > # of operating systems (signature reporting still enabled in this case).
> > # Use this feature at the end of this file to catch cases for which
> > # you don't have a precise match, but can tell it's Windows or FreeBSD
> > # or whatnot by looking at, say, flag layout alone.
> > #
> > # If OS genre starts with - (which can prefix @ or *), the entry is
> > # not considered to be a real operating system (but userland stack
> > # instead). It is important to mark all scanners and so on with -,
> > # so that they are not used for masquerade detection (also add this
> > # prefix for signatures of application-induced behavior, such as
> > # increased window size with Opera browser).
> > 
> > Attaching the dump of ktrace. OpenBSD version: 7.3
> 
> 



Re: OpenBSD in QEMU KVM: High QEMU CPU usage when OpenBSD is 100% idle

2023-05-27 Thread Stuart Henderson
On 2023/05/27 06:36, br...@mailbox.org wrote:
> 
> 
> On Sat, 27 May 2023, Mike Larkin wrote:
> 
> > probably IPI traffic then. not sure what else to say. If a few % host 
> > overhead
> > is too much fot you with a 16 vCPU VM, I'd suggest reducing that.
> > 
> > What is your workload for a 16 vcpu openbsd VM anyway?
> 
> I would like to use the OpenBSD VM as my main workstation. I also need to
> use Linux for some graphic intensive stuff, so the ideal OpenBSD on host
> with vmm for Linux is not an option unfortunately. I guess I could accept
> that CPU usage price, but of course not having to pay it would be better.

OpenBSD doesn't do brilliantly with that many CPUs yet. Things are
getting better but I think you're likely to find many workloads are a
bit less laggy with half that.



Re: OpenBSD in QEMU KVM: High QEMU CPU usage when OpenBSD is 100% idle

2023-05-27 Thread Stuart Henderson
On 2023/05/27 15:35, Mike Larkin wrote:
> On Sat, May 27, 2023 at 10:29:37AM +0200, Claudio Jeker wrote:
> > On Sat, May 27, 2023 at 09:16:23AM +0100, Stuart Henderson wrote:
> > > On 2023/05/27 06:36, br...@mailbox.org wrote:
> > > >
> > > >
> > > > On Sat, 27 May 2023, Mike Larkin wrote:
> > > >
> > > > > probably IPI traffic then. not sure what else to say. If a few % host 
> > > > > overhead
> > > > > is too much fot you with a 16 vCPU VM, I'd suggest reducing that.
> > > > >
> > > > > What is your workload for a 16 vcpu openbsd VM anyway?
> > > >
> > > > I would like to use the OpenBSD VM as my main workstation. I also need 
> > > > to
> > > > use Linux for some graphic intensive stuff, so the ideal OpenBSD on host
> > > > with vmm for Linux is not an option unfortunately. I guess I could 
> > > > accept
> > > > that CPU usage price, but of course not having to pay it would be 
> > > > better.
> > >
> > > OpenBSD doesn't do brilliantly with that many CPUs yet. Things are
> > > getting better but I think you're likely to find many workloads are a
> > > bit less laggy with half that.
> >
> > Also a few % CPU on the host can be caused by the interrupts caused by the
> > clocks on every CPU. It may be that we do not select a cheap clock source
> > like TSC and the result is much more overhead on the host.
> >
> > --
> > :wq Claudio
> >
> 
> yes I did not think of that; thanks Claudio!
> 
> OP: what is your sysctl kern.timecounter ?

on kvm I would expect pvclock to be preferred if the driver thinks it's
stable, otherwise probably acpihpet. fwiw mine looks like this.

$ sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=acpihpet0
kern.timecounter.choice=i8254(0) pvclock0(500) acpihpet0(1000) acpitimer0(1000)



Re: On 2013-06-12 snapshot xfce4 session dies immediately after xenodm login

2023-06-12 Thread Stuart Henderson
On 2023/06/12 16:54, Peter N. M. Hansteen wrote:
> On Mon, Jun 12, 2023 at 04:47:41PM +0200, Sebastien Marie wrote:
> > > (gdb)
> > > 
> > > with some instruction I might be able to extract more information.
> > > 
> > 
> > failing in _start is odd. it look like the binary wasn't build with 
> > cf-protection=branch, and the compiler has it since few weeks now (since 
> > 2023-04-26 exactly).
> > 
> > Could you check the signature date of your package ?
> > 
> > $ grep @digital-signature /var/db/pkg/xfce4-session-*/+CONTENTS   
> > @digital-signature signify2:2023-06-10T10:18:49Z:external
> 
> [Mon Jun 12 16:49:33] peter@zaida:~$ grep @digital-signature 
> /var/db/pkg/xfce4-session-*/+CONTENTS
> @digital-signature signify2:2023-04-16T09:46:52Z:external

Please run pkg_add -u and see if it fixes it.



Re: On 2013-06-12 snapshot xfce4 session dies immediately after xenodm login

2023-06-12 Thread Stuart Henderson
On 2023/06/12 19:21, Peter N. M. Hansteen wrote:
> On Mon, Jun 12, 2023 at 05:50:13PM +0100, Stuart Henderson wrote:
> > On 2023/06/12 18:34, Peter N. M. Hansteen wrote:
> > > On Mon, Jun 12, 2023 at 05:08:20PM +0100, Stuart Henderson wrote:
> > > > Ah, the next thing I would have suggested in that cause would have
> > > > been pkg_add -vvvu run under typescript, which might have given us a
> > > > clue why it wasn't updated (the system package version, recorded in
> > > > @version lines in /var/db/pkg/*/+CONTENTS, was raised - so pkg_add -u
> > > > _should_ have updated them all).
> > > > 
> > > 
> > > possibly better late than never, there are still packages that are 
> > > failing, so
> > > here is the typescript from just now - 
> > > 
> > > https://nxdomain.no/~peter/2023-06-12_pkg_add_vvvu_output_zaida.bsdly.net.txt
> > > 
> > > All the best,
> > > Peter
> > 
> > ah interesting :)
> > 
> > in which other package/s have you seen a problem, and can you send me a
> > 'head -30 /var/db/pkg/$pkgname/+CONTENTS' for them please?
> >
> 
> Oh, absolutely. Firefox:
> 
> [Mon Jun 12 19:17:23] peter@zaida:~$ head -30 
> /var/db/pkg/firefox-114.0.1/+CONTENTS
> @name firefox-114.0.1
> @url 
> https://cloudflare.cdn.openbsd.org/pub/OpenBSD/snapshots/packages/amd64/firefox-114.0.1.tgz
> @version 10

Thanks - the "@version 10" shows that this was updated.

Here are some of the known problem ports for IBT enforcement:

various mozillas
the various chromium derivatives + anything using v8
ffmpeg
libreoffice

chromium itself is also a problem but there is currently a workaround in
the kernel via a process name check on "chrome"

We have an annotation we can use during ports build to mark binaries to
disable enforcement, this will get added to some ports with known
problems (but I think it maybe a bit problematic when it's a _library_
which doesn't yet work with IBT enforcement - in that case AIUI we'll 
need to annotate all binaries using that library..)



> @signer openbsd-73-pkg
> @digital-signature signify2:2023-06-11T20:27:37Z:external
> @option manual-installation
> @comment pkgpath=www/mozilla-firefox ftp=yes
> @arch amd64
> +DESC
> @sha LtQcl2Xn0LDK5kWxuGklXoRyDX/N+y0oXu1piOVnnhE=
> @size 601
> @conflict firefox3-*
> @conflict firefox35-*
> @conflict firefox36-*
> @conflict mozilla-firebird-*
> @conflict mozilla-firefox-*
> @pkgpath www/firefox3
> @pkgpath www/firefox35
> @pkgpath www/firefox36
> @pkgpath www/firefox4
> @pkgpath www/mozilla-firefox,-main
> @depend devel/desktop-file-utils:desktop-file-utils-*:desktop-file-utils-0.26
> @depend devel/nspr:nspr->=4.35:nspr-4.35
> @depend security/nss:nss->=3.84:nss-3.90
> @depend textproc/icu4c,-main:icu4c-*:icu4c-73.1v0
> @depend x11/gtk+3,-main:gtk+3-*:gtk+3-3.24.38
> @depend x11/gtk+4,-guic:gtk4-update-icon-cache-*:gtk4-update-icon-cache-4.10.4
> @wantlib X11-xcb.2.0
> @wantlib X11.18.0
> @wantlib Xcomposite.4.0
> 
> thunderbird:
> 
> [Mon Jun 12 19:17:45] peter@zaida:~$ head -30 
> /var/db/pkg/thunderbird-102.12.0/+CONTENTS
> @name thunderbird-102.12.0
> @url 
> https://cloudflare.cdn.openbsd.org/pub/OpenBSD/snapshots/packages/amd64/thunderbird-102.12.0.tgz
> @version 10
> @signer openbsd-73-pkg
> @digital-signature signify2:2023-06-11T20:33:35Z:external
> @option manual-installation
> @comment pkgpath=mail/mozilla-thunderbird ftp=yes
> @arch amd64
> +DESC
> @sha uGHKT1MHnjf7Tl3oCctE0gRbeeA8UBZ1gWTuc0Fx6L8=
> @size 748
> @conflict mozilla-thunderbird-<74.0
> @conflict lightning-<74.0v0
> @pkgpath mail/mozilla-thunderbird,-main
> @pkgpath mail/mozilla-thunderbird,-lightning
> @depend devel/desktop-file-utils:desktop-file-utils-*:desktop-file-utils-0.26
> @depend devel/libffi:libffi-*:libffi-3.4.4
> @depend devel/nspr:nspr->=4.35:nspr-4.35
> @depend security/nss:nss->=3.84:nss-3.90
> @depend security/rnp:rnp-*:rnp-0.16.3
> @depend x11/gtk+3,-main:gtk+3-*:gtk+3-3.24.38
> @depend x11/gtk+4,-guic:gtk4-update-icon-cache-*:gtk4-update-icon-cache-4.10.4
> @wantlib X11-xcb.2.0
> @wantlib X11.18.0
> @wantlib Xcomposite.4.0
> @wantlib Xcursor.5.0
> @wantlib Xdamage.4.0
> @wantlib Xext.13.0
> @wantlib Xfixes.6.1
> @wantlib Xi.12.2
> 
> There may well be others, but those are ones I use rather frequently. 
> 
> - Peter
> 
> -- 
> Peter N. M. Hansteen, member of the first RFC 1149 implementation team
> https://bsdly.blogspot.com/ https://www.bsdly.net/ https://www.nuug.no/
> "Remember to set the evil bit on all malicious network traffic"
> delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
> 



Re: On 2013-06-12 snapshot xfce4 session dies immediately after xenodm login

2023-06-12 Thread Stuart Henderson
On 2023/06/12 18:52, Sebastien Marie wrote:
> note that instead of using pkg_delete, you could use: `pkg_add -Dinstalled 
> -u' 
> to force reinstalling already installed packages.

if it gets to this point for anyone else, we would like to see the state
of /var/db/pkg and try to figure out why things aren't getting updated

specifically on amd64 if you have a bunch of packages left with
"@version 9" in the +CONTENTS file for that package, then it probably
didn't get updated correctly for some reason or other



Re: On 2013-06-12 snapshot xfce4 session dies immediately after xenodm login

2023-06-13 Thread Stuart Henderson
On 2023/06/13 11:57, Matthias Schmidt wrote:
> $ pkg_info -vv screen | head -30 
> Information for inst:screen-4.9.0
> [...]
> Size: 1244302
> Signature: screen-4.9.0,10,c.97.0,curses.14.0,util.16.0
> Packing-list:
> @comment $OpenBSD: PLIST,v 1.24 2019/08/15 21:01:49 naddy Exp $
> @name screen-4.9.0
> @url file:./screen-4.9.0.tgz
> @version 10

I'm at a loss to explain how you've got screen-4.9.0 linked against
libc.so.97.0 with @version 10 but with @comment still in the PLIST.
The only way I can see is building locally with parts of an old and
parts of a new ports tree.

However that happened, I think it is beyond what pkg_add -u can be
expected to deal with.



Re: kernel reordering happily consumes invalid objects

2023-06-13 Thread Stuart Henderson
On 2023/06/14 04:12, Schech, C. W. ("Connor") wrote:
> There's no check of the checksums for all the object files that the
> /rc task consumes
> 
> This can be trivially fixed by generating them in, say
> 
> In /sys/conf/newvars.sh, add the line:
> 
> +sha512 -h /var/db/obj.${id}.sha512 *.o lorder
> 
> above the segment starting with:
> 
> cat >vers.c < 
> [...]
> 
> then the right checksums always persist in /var/db on release or
> between builds, labelled with the {id}
> 
> in /etc/rc or kernel_reorder, before invoking the kernel reordering
> routine, make a guard statement that checks that all the object
> checksums are OK, i.e.,

How do you know the .o files from the build are ok in the first place?
If you don't trust your hardware to keep the installed copy safe, why
would the build be any different?

> Also consider moving the relinking to "only at shutdown", so no other
> jobs are running concurrently in case that causes a random kernel
> fault due to extreme load on faulty hardware, and to make the boot
> time as fast as possible, since the relinked kernel isn't used until
> the boot after AFAIK.

Consideration was already made to the timing of when this is run.
Shutdown doesn't always happen. It can sometimes happen triggered by
a UPS running low on battery, in which case writing out a new kernel
is about the worst thing you can be doing at the time. There's no
"one size fits all" and there are problems with the current timing too,
but it's the least worst option for many cases (and can be disabled and
run manually if it's a big problem for your use).

> Also conside not using a link kit and just scrambling kernel A with
> lorder C into kernel B with lorder D without carrying around any
> object code (by default) in the environment that persists anywhere.

reorder_kernel is also there to support syspatch. (even just the hash
check also needs to take syspatch into account).



Re: ftp(1) will never attempt to set the modification date of any file retrieved by http[s]

2023-06-28 Thread Stuart Henderson
On 2023/06/28 12:19, Theo Buehler wrote:
> > Good catch.  It's the only header where we forget to skip leading
> > blanks.
> 
> This was overlooked in fetch.c r1.209

ah I was wondering about that, because it definitely used to work.

> ok tb

and from me.



> > 
> > I can reproduce and confirm that this does indeed fix the parsing and
> > make ftp set the mtime accordingly to Last-Modified.
> > 
> > > diff --git i/usr.bin/ftp/fetch.c w/usr.bin/ftp/fetch.c
> > > index 0ba7ad4d099..b6d6f4d775a 100644
> > > --- i/usr.bin/ftp/fetch.c
> > > +++ w/usr.bin/ftp/fetch.c
> > > @@ -984,6 +984,7 @@ noslash:
> > >   } else if (strncasecmp(cp, LAST_MODIFIED,
> > >   sizeof(LAST_MODIFIED) - 1) == 0) {
> > >   cp += sizeof(LAST_MODIFIED) - 1;
> > > + cp += strspn(cp, " \t");
> > >   cp[strcspn(cp, "\t")] = '\0';
> > >   if (strptime(cp, "%a, %d %h %Y %T %Z", &lmt) == NULL)
> > >   server_timestamps = 0;
> > 
> > 
> 



Re: Xorg totally freezes after a while while using picom or playing 2d or 3d games

2023-07-02 Thread Stuart Henderson
On 2023/07/02 06:52, chris greek wrote:
> Xorg totally freezes after a while while using picom or playing 2d or 3d
> games
> If i don't use picom it doesn't seem to have a problem but i tried playing
> a 2d game and also freezed.
>   I have an R7 240 with 2Gbytes of Ram , my pc has 8 Gbytes of ram
> I don't know what is the problem but it always happens
> The xorg freezes and i can't do anything if change into a console and don't
> kill Xorg it completely freezes the pc and can't switch to a console
> anymore.

It may be panicking and entering ddb. See if it reboots if you blindly
type "boot r".

Send a dmesg as well.



Re: dvmrpd reports "routeroute decision engine terminated; signal 11"

2023-07-03 Thread Stuart Henderson
On 2023/07/03 14:52, Why 42? The lists account. wrote:
> 
> Hi All,
> 
> FYI, after patching the kernel (See: discussion from June 7th entitled
> "dvmrpd start causes kernel panic: assertion failed") I am able to run
> the dvmrpd multicast routing daemon and indeed it seems to be doing
> something, I see messages logged regarding multicast IP address groups
> or ranges that are in use, or at least configured.
> 
> Strangely though, the daemon occasionally logs these messages:
> ...
> kmr_shutdown: interface em0
> waiting for children to terminate
> route decision engine terminated; signal 11
> fatal in dvmrpe: msgbuf_write: Broken pipe
> 
> It's unclear to me if this is normal operation or not, but signal 11
> (segmentation violation?) certainly doesn't look typical ...
> 
> Should a signal 11 result in a core file being dumped? I don't find any
> in any of the likely places e.g. the starting directory.
> 
> Thanks for any tips!
> 
> Cheers,
> Robb.
> 

mkdir -m 700 /var/crash/dvmrpd
sysctl kern.nosuidcoredump=3



Re: 'scsi_xfer pool exhausted' console spam, system unresponsive

2023-07-05 Thread Stuart Henderson
Are you monitoring memory usage too? My first instinct is that 2GB feels a 
bit low so I'd want to get some stats on that.


(I have reported these on i386 ports builders from time to time too, I 
can't do much about memory use there though..)


--
 Sent from a phone, apologies for poor formatting.

On 5 July 2023 16:09:06 Lucas  wrote:


Synopsis:   'scsi_xfer pool exhausted' console spam, system unresponsive
Category:
Environment:

System  : OpenBSD 7.3
Details : OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 
2023
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64

Description:

This machine runs a got (gameoftrees) mirror. Every 15 minutes,
it contacts the main mirror with 'got fetch' (small network and
disk I/O) and then runs 'git gc' (expensive disk I/O, am told by
op@ and stsp@). After some unknown time running, this tasks will
make the CPU spike (graphs in [0] and [1]) until eventually the
system becomes unresponsive. 2 of the 3 times it happened, the
console was being spammed with 'scsi_xfer pool exhausted'
messages.

Of potential importance is that this is Hetzner CAX11 VPS in
their new region, Hillsboro.

[0]: https://glacier.lgv5.net/tmp/lemon-20230626.png
[1]: https://glacier.lgv5.net/tmp/lemon-20230705.png

How-To-Repeat:

No clue. Happens every 1-3 weeks I think.

Fix:

No clue. Can try out patches.

dmesg:
OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 2023
   dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2080227328 (1983MB)
avail mem = 1997840384 (1905MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf59d0 (10 entries)
bios0: vendor Hetzner version "2017" date 11/11/2017
bios0: Hetzner vServer
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S5
acpi0: tables DSDT FACP APIC HPET MCFG WAET
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD EPYC Processor, 2445.58 MHz, 17-31-00
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,TOPEXT,CPCTR,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 
64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache

cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: AMD EPYC Processor, 2445.66 MHz, 17-31-00
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,TOPEXT,CPCTR,FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 
64b/line 8-way L2 cache, 16MB 64b/line 16-way L3 cache

cpu1: smt 0, core 1, package 0
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xb000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
"ACPI0006" at acpi0 not configured
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
com0 at acpi0 COM1 addr 0x3f8/0x8 irq 4: ns16550a, 16 byte fifo
acpicmos0 at acpi0
"PNP0A06" at acpi0 not configured
"PNP0A06" at acpi0 not configured
"QEMU0002" at acpi0 not configured
"ACPI0010" at acpi0 not configured
acpicpu0 at acpi0: C1(@1 halt!)
acpicpu1 at acpi0: C1(@1 halt!)
pvbus0 at mainbus0: KVM
pvclock0 at pvbus0
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82G33 Host" rev 0x00
vga1 at pci0 dev 1 function 0 "Qumranet Virtio 1.x GPU" rev 0x01
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb0 at pci0 dev 2 function 0 vendor "Red Hat", unknown product 0x000c rev 
0x00: apic 0 int 22

pci1 at ppb0 bus 1
virtio0 at pci1 dev 0 function 0 "Qumranet Virtio 1.x Network" rev 0x01
vio0 at virtio0: address 96:00:01:e4:f0:79
virtio0: msix shared
ppb1 at pci0 dev 2 function 1 vendor "Red Hat", unknown product 0x000c rev 
0x00: apic 0 int 22

pci2 at ppb1 bus 2
xhci0 at pci2 dev 0 function 0 vendor "Red Hat", unknown product 0x000d rev 
0x01: apic 0 int 22,

Re: could there be a breach of license in efiboot?

2023-07-10 Thread Stuart Henderson
On 2023/07/10 05:22, Peter J. Philipp wrote:
> Redistributions in binary form must reproduce the above copyright
> notice, this list of conditions and the following disclaimer in
> the documentation and/or other materials provided with the
> distribution.

> This should be included on all the efiboot distributions on install disks.

IANAL, but I don't get anything from that text suggesting that it has to
be included _on_ the install image, just "provided with".

Seems to me that the source tree, which includes that list, is provided
with the distribution.

> Here is another license:
> 
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/stand/efi/include/efi.h?rev=1.1&content-type=text/plain
> 
> /*++
> 
> Copyright (c)  1999 - 2002 Intel Corporation. All rights reserved
> This software and associated documentation (if any) is furnished
> under a license and may only be used or copied in accordance
> with the terms of the license. Except as permitted by such
> license, no part of this software or documentation may be
> reproduced, stored in a retrieval system, or transmitted in any
> form or by any means without the express written consent of
> Intel Corporation.

This refers to "a license" but doesn't state it, they're talking about
the same one mentioned above aren't they? (I'm not sure efi.h really
has anything copyrightable in anyway though?)



patch crash related to remove_special_lines

2023-07-11 Thread Stuart Henderson
I ran into a segfault with patch(1) in a port, here's a test case with a
minimal reproducer.

$ echo foo > test
$ perl -e 'print "--- test.orig\n+++ test\n@@ -1,1 +1,2 @@\n foo\n+" . 'x' x 
32768 . "\n\\ No newline at end of file\n"' > test.patch
$ patch < test.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--
|--- test.orig
|+++ test
--
Patching file test using Plan A...
Segmentation fault (core dumped)
$ egdb -q /usr/src/usr.bin/patch/obj/patch patch.core
Reading symbols from /usr/src/usr.bin/patch/obj/patch...
[New process 276205]
Core was generated by `patch'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0f18920a76e0 in another_hunk () at /usr/src/usr.bin/patch/pch.c:1008
1008s[p_len[filldst - 1]] = 
0;
(gdb) list
1003p_line[filldst] = s;
1004p_len[filldst++] = strlen(s);
1005if (fillsrc > p_ptrn_lines) {
1006if (remove_special_line()) {
1007p_len[filldst - 1] -= 1;
1008s[p_len[filldst - 1]] = 
0;
1009}
1010}
1011break;
1012default:
(gdb) quit
$



Re: WireGuard(?) issues

2024-05-17 Thread Stuart Henderson
There are problems with wg(4) that people with some workloads have been
seeing after upgrading past 7.3, though looking at this thread from when
it last came up https://marc.info/?t=17094089271&r=1&w=2 I'm not
sure if we'd be expecting to see trouble on non-MP...


On 2024/05/17 00:55, Anthony J. Bentley wrote:
> Hi,
> 
> This week I updated a machine from 7.3 to 7.5. Almost immediately it
> started panicking constantly. The machine runs a webserver on a wg(4)
> interface and receives a mild amount of traffic. I turned off
> wireguard, moved the wg config to a vmm(4) virtual machine, and
> immediately the host stopped crashing and the VM started crashing.
> 
> The problem still occurs on -current. It reliably lands in ddb after a
> few hours (or, sometimes, less than a minute) of uptime.
> 
> Here are four traces from -current. They all look pretty different to me,
> but I don't know what I'm looking at.
> 
> 
> uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at  schedcpu+0xf8:  movzbl  0x344(%rax),%ebx
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> *400822  72372  0 0x14000  0x2000  wg_crypt
> schedcpu(0) at schedcpu+0xf8
> softclock_process_tick_timeout(8250cc60,0) at 
> softclock_process_tick_ti
> meout+0xfb
> softclock(0) at softclock+0x10a
> softintr_dispatch(0) at softintr_dispatch+0xc1
> Xsoftclock() at Xsoftclock+0x27
> memset() at memset+0x5c
> wg_encap_worker(807ee000) at wg_encap_worker+0x79
> taskq_thread(807e8e80) at taskq_thread+0xf0
> end trace frame: 0x0, count: 7
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb> show panic
> *cpu0: uvm_fault(0x825bfa18, 0x344, 0, 1) -> e
> ddb> trace
> schedcpu(0) at schedcpu+0xf8
> softclock_process_tick_timeout(8250cc60,0) at 
> softclock_process_tick_ti
> meout+0xfb
> softclock(0) at softclock+0x10a
> softintr_dispatch(0) at softintr_dispatch+0xc1
> Xsoftclock() at Xsoftclock+0x27
> memset() at memset+0x5c
> wg_encap_worker(807ee000) at wg_encap_worker+0x79
> taskq_thread(807e8e80) at taskq_thread+0xf0
> end trace frame: 0x0, count: -8
> ddb> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
>  42950  295310  1  0  3   0x8100083  ttyin ksh
>  89976  468349  1  0  3  0x18100098  kqreadcron
> *72372  400822  0  0  7 0x14200wg_crypt
>  34099  494464  0  0  3 0x14200  bored wg_handshake
>  81076  468124  0  0  3 0x14200  bored wg_handshake
>  17034   32138  1110  3  0x18100090  kqreadsndiod
>   3443   18288  1 99  3  0x19100090  kqreadsndiod
>  39555  464562  27824 67  3  0x19100092  kqreadhttpd
>  40852  373284  27824 67  3  0x19100092  kqreadhttpd
>   9740  249651  27824 67  3  0x19100092  kqreadhttpd
>  22641  107130  27824 67  3  0x19100092  kqreadhttpd
>  27824  140918  1  0  3  0x18100080  kqreadhttpd
>  46973  222762  59115 95  3  0x19100092  kqreadsmtpd
>  34463  438292  59115103  3  0x19100092  kqreadsmtpd
>  93000  371803  59115 95  3  0x19100092  kqreadsmtpd
>  99364  301284  59115 95  3  0x18100092  kqreadsmtpd
>  13911  166137  59115 95  3  0x19100092  kqreadsmtpd
>   5160   64906  59115 95  3  0x19100092  kqreadsmtpd
>  59115  474304  1  0  3  0x18100080  kqreadsmtpd
>  75950  258004  99655 89  3  0x19100092  kqreadrelayd
>  39929   22451  99655 89  3  0x19100092  kqreadrelayd
>  60312  356080  99655 89  2  0x19100012relayd
>  23739  153447  99655 89  3  0x19100092  kqreadrelayd
>  90077  500816  99655 89  3  0x19100092  kqreadrelayd
>  82619  373105  99655 89  3  0x19100092  kqreadrelayd
>  71305  347863  1  0  3  0x18100080  kqreadntpd
>  34900  127546  77355 83  3  0x18100092  kqreadntpd
>  90622  315749  81226 74  3  0x19100092  bpf   pflogd
>  81226  424867  1  0  3  0x1880  sbwaitpflogd
>  33232  160507  1  0  3  0x18100080  kqreadresolvd
>   9751  395040  51158 77  3  0x18100092  kqreaddhcpleased
>177  216019  51158 77  3  0x18100092  kqreaddhcpleased
>  51158  427067  1  0  3  0x1880  kqreaddhcpleased
>  93566  387647  43331115  3  0x18100092  kqreadslaacd
>  68547  195472  43331115  3  0x18100092  kqreadslaacd
>  43331  513834  1  0  3  0x18100080  kqreadslaacd
>  16278  173604  0  0  3 0x14200  bored smr
>  44227  432058  0  0  3 0x14200  pgzerozerothread
>  64719  187250  0  0  3 0x14200  aiodoned

Re: TLS handshake failure during pkg_add

2024-06-19 Thread Stuart Henderson
On 2024/06/19 23:36, Mizsei Zoltán wrote:
> Hi,
> 
> I am facing this issue on my VPS. All other machines are unaffected. All of 
> them are in the same TZ.
> 
> vps# pkg-add -u
> /bin/ksh: pkg-add: not found
> vps#
> vps# pkg_add -u
> quirks-7.14 signed on 2024-06-15T18:27:56Z
> https://cloudflare.cdn.openbsd.org/pub/OpenBSD//7.5/packages/amd64/updatedb-0p0.tgz:
>  TLS handshake failure: handshake failed: error:02FFF00D:system 
> library:func(4095):Permission denied

That looks rather like PF is blocking the outbound connnection.



Re: TLS handshake failure during pkg_add

2024-06-20 Thread Stuart Henderson
On 2024/06/20 06:08, Mizsei Zoltán wrote:
> Hi and thanks for your reply.
> 
> Some extra information:
> - If i try pkg_add many times, it will eventually do its job without any 
> error. But it needs many tries.
> Also  switching to other mirror using the /etc/installurl helps *sometimes*...
> I don't have any issue with other networking programs.
> Your suggestion regarding firewall can still be the culprit, I have set up pf 
> according to this blogpost: 
> https://blog.thechases.com/posts/bsd/aggressive-pf-config-for-ssh-protection/
> Do you see any obvious errors here?

Yes, I do, one of the rules in the example file that isn't explained
in the text affects http and https connections and is almost certain
to be the cause. It should be obvious when you read through the rules.
(Also the way that "synproxy state" is used is a bit dubious, though
isn't responsible for this problem).



Re: No rtw88 firmware support

2024-07-03 Thread Stuart Henderson
On 2024/07/03 01:31, mederimmedeiros wrote:
> I installed openbsd recently and can't get connection in my laptop that have 
> a RTL8821CE seems
> openbsd nota have a firmware for it now. Is there any plans to support it?
> 
> 

Not a bug - that device is just plain not supported - there's no
driver for it. It isn't a firmware problem.

I suggest either looking for an 11n USB device (in general *not*
11ac/newer for USB) or replacing the card (most intel cards, including
11ac/newer, do work).



Re: crontab(5) clarification: "~" field is evaluated once at install time

2024-07-03 Thread Stuart Henderson
On 2024/07/03 17:56, K R wrote:
> Hi,
> 
> On Wed, Jul 3, 2024 at 12:31 PM Jason McIntyre  wrote:
> >
> > On Wed, Jul 03, 2024 at 10:52:46AM -0300, K R wrote:
> > > >Synopsis:  crontab(5) clarification:  "~" field is evaluated once at 
> > > >install time
> > > >Category:  documentation system amd64
> > > >Environment:
> > > System  : OpenBSD 7.5
> > > Details : OpenBSD 7.5-current (GENERIC) #150: Wed Jun 26
> > > 20:30:54 MDT 2024
> > >
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > >Description:
> > > The crontab(5) manpage could be more explicit about the fact
> > > that a "~" char only gets evaluated to a random value once, at
> > > table install time.
> > >
> > > From the EXAMPLES section:
> > >
> > > # run hourly at a random time within the first 30 minutes of the 
> > > hour
> > > 0~30 * * * *   /usr/libexec/spamd-setup
> > >
> > > "run hourly at a random time" could me interpreted as "run
> > > hourly, with a different random minute every hour".
> > > This is not the case and may be unexpected for some users.
> > >
> > > >How-To-Repeat:
> > > # some initial random minute, repeating itself every hour
> > > ~ * * * *   date >> /tmp/LOG
> > >
> > > >Fix:
> > > Just clarify that "~" will be evaluated at table install time
> > > and then be reused.
> > >
> > > Thanks,
> > > --Kor
> > >
> >
> > hi.
> >
> > i agree it might not be totally clear, initially, that it works that
> > way. but if you think about it, if it ran at, for example, different
> > minute intervals, then you could have something run at 59 minutes past
> > the hour, and then at 1 minute past the next hour - a difference of two
> > minutes. that would hardly qualify as "hourly".
> >
> > i think the doc provides enough of a hint ("a radom value ... may be
> > obtained") when combined with that logic. i'm not sure that adding the
> > extra text to try and explain that would be worth it. if you feel
> > unconvinced by that, propose a text which you think improves it. but i'm
> > not sure it's needed.
> 
> I believe a single sentence could be added to clarify this.  This
> paragraph is the first place where "~" is explained:
> 
> from:
> 
>  A random value (within the legal range) may be obtained by using the ‘~’
>  character in a field.  The interval of the random value may be specified
>  explicitly, for example “0~30” will result in a random value between 0
>  and 30 inclusive.  If either (or both) of the numbers on either side of
>  the ‘~’ are omitted, the appropriate limit (low or high) for the field
>  will be used.
> 
> to:
> 
>  A random value (within the legal range) may be obtained by using the ‘~’
>  character in a field.  The interval of the random value may be specified
>  explicitly, for example “0~30” will result in a random value between 0
>  and 30 inclusive.  If either (or both) of the numbers on either side of
>  the ‘~’ are omitted, the appropriate limit (low or high) for the field
>  will be used.  The '~' character gets expanded to a random value
>  only once, at table install time.

"only once, at table install time" doesn't seem quite clear to me.
In particular consider "only once" when you edit the crontab once,
then edit it again.

How about this?

The '~' character gets expanded to a random value when the
.Nm crontab
is loaded.

"The allowed values for the fields" above misses the various
possibilities involving ~ too.



Re: crontab(5) clarification: "~" field is evaluated once at install time

2024-07-04 Thread Stuart Henderson
On 2024/07/04 00:01, Jason McIntyre wrote:
> On Wed, Jul 03, 2024 at 10:08:09PM +0100, Stuart Henderson wrote:
> > > 
> > >  A random value (within the legal range) may be obtained by using the 
> > > ???~???
> > >  character in a field.  The interval of the random value may be 
> > > specified
> > >  explicitly, for example ???0~30??? will result in a random value 
> > > between 0
> > >  and 30 inclusive.  If either (or both) of the numbers on either side 
> > > of
> > >  the ???~??? are omitted, the appropriate limit (low or high) for the 
> > > field
> > >  will be used.  The '~' character gets expanded to a random value
> > >  only once, at table install time.
> > 
> > "only once, at table install time" doesn't seem quite clear to me.
> > In particular consider "only once" when you edit the crontab once,
> > then edit it again.
> > 
> > How about this?
> > 
> > The '~' character gets expanded to a random value when the
> > .Nm crontab
> > is loaded.
> > 
> 
> well, we already say "A random value ... may be obtained", which i
> think is equivalent (note the singular). and this text does not
> explicitly say that it remains at this value afterwards, which is what
> we are supposed to be addressing.
> 
> i still don;t think the complexity of the text is warranted. it works
> how it works. will someone stop using "random" because of this, or
> somehow be caught out (genuinely asking)?

People could get caught out, because another job might run with more
or less than the expected interval. Really depends what it's used for.
For something like the rpki-client example in src/etc/crontab it doesn't
matter much. For scheduling something once a week on a random day, it
needs a bit more careful thought. For once a month on a random date,
~ is very likely to be a bad idea, a reboot or making any change to
the file (including one that doesn't touch the line with ~) will trip
you up.

> i'm not trying to reject the suggestion. i just think that being
> explicit will add a level of complexity that won;t be an improvement. i
> did try to rework the text! for example:
> 
>   A random, fixed, value...
> 
> it's still not explicit ;(
> 
>   Once loaded, this value remains constant.

that would be very misleading - if the crontab is loaded again,
the value will change.

> don;t know...
> 
> jmc
> 
> > "The allowed values for the fields" above misses the various
> > possibilities involving ~ too.
> > 
> 



Re: Pre-compiled static go binary fails with "undefined symbol 'syscall'"

2024-07-07 Thread Stuart Henderson
On 2024/07/07 22:06, Chris Narkiewicz wrote:
> There is plenty of go applications in ports, so I guess this
> is a solved problem.

Not completely - some of these still try to use syscall and fail at
runtime.



Re: [k...@disroot.org: Re: Pre-compiled static go binary fails with "undefined symbol 'syscall'"]

2024-07-08 Thread Stuart Henderson
On 2024/07/07 21:09, Kian Ali Agheli wrote:
> 
> OpenBSD recently changed access to its syscalls.
> The change which allows upstream Go to build working executables
> for current OpenBSD was committed on May 4th.
> https://github.com/golang/go/commit/8841f50d98b224ecf5ee27d9b7e6f18ad2c98e46

That is not in a go release yet. Should be in 1.23 when ready. It is
however in the OpenBSD ports version of go 1.22.4 (and earlier).

It's a bodge and is there mainly because go's standard libraries don't
provide OS-related functions that people have been wanting. This code
in go only handles ioctl/sysctl syscalls however some go software does
all sorts of things using syscalls - they don't want to depend on libc
so they depend on low level kernel interfaces instead. (the equivalent
bodge that we have in perl, see syscall_emulator.c etc, supports a much
wider range).

> $ git clone https://github.com/kovidgoyal/kitty .
> $ grep -r syscall *
> ...
> kittens/transfer/receive.go:func syscall_mode(i os.FileMode) (o uint32) {
> kittens/transfer/receive.go:  if err := 
> unix.Fchmodat(unix.AT_FDCWD, self.expanded_local_path, 
> syscall_mode(self.permissions), unix.AT_SYMLINK_NOFOLLOW); err == nil || 
> !(errors.Is(err, unix.EINTR) || errors.Is(err, unix.EAGAIN)) {
> kittens/transfer/send.go: "syscall"
> kittens/transfer/send.go: stat, ok := stat_result.Sys().(*syscall.Stat_t)
> kittens/transfer/send.go: stat, ok := 
> st.Sys().(*syscall.Stat_t)
> ...

Those references in kittens seem to be using functions like os.Stat and
unix.Fchmodat which IIRC should work ok as long as they're built against
current versions of go libraries.

The 'syscall' references in those files are I think due to using structs
coming from the Go syscall library but not using syscall() themselves.

There is some direct use of syscall in
https://github.com/kovidgoyal/kitty/blob/2076cd870a3f0a56333151f9df2dcb1719aee61e/tools/utils/filelock.go#L16
but I don't know if that affects kittens.

There's also
https://github.com/kovidgoyal/kitty/blob/2076cd870a3f0a56333151f9df2dcb1719aee61e/tools/utils/shm/shm_syscall.go#L67
but that's not set to build on openbsd.

I didn't check the libraries listed in kitty's go.mod so there might
still be some uses hiding there.

If someone wants to fix this I suggest starting by building "kitten"
with the openbsd packages version of the go compiler and see if things
still fail. It may work as-is,but if not, it's likely to then be a
runtime error rather than outright refusal to run. If it's still broken,
figure out which code is actually using syscall. If it's a library
there may be a newer version or fork which doesn't do that. Or there
might be a more portable alternative method that could be used instead.

(btw, the version of kitty in ports is older, from when "kittens" were
written in python. last time we tried updating, we had problems with
the go components as it wanted to download during build, which is not
allowed for ports - that might be fixable but I don't think anyone
wanted to put that much effort in to cope with a terminal emulator -
especially one which seems to take the diametrically opposite view
to OpenBSD regarding complexity vs simplicity!)



Re: sendbug(1) should not delete bug report before confirming it has landed on https://marc.info/?l=openbsd-bugs

2024-07-08 Thread Stuart Henderson
On 2024/07/07 16:55, Qingyao Sun wrote:
>   1. make sendbug(1) block on smtpd(8) until the mail goes through, and 
> then print
> a URL of the message on 
> https://marc.info/?l=openbsd-bugs&m=XX as a
> confirmation.

sendbug *does* block until your local mail system accepts the message,
if that system is not working there's not too much sendbug can do.

Waiting for the message to show up on marc.info is not sanely possible.
Apart from anything else, what happens if the mailing list is down, if
the 3rd party list archive is down or updating slowly, if your internet
connection is broken, etc - should it just sit there retrying until you
hit ^C?

Normal use of sendbug requires working email setup on the machine where
you run it. If mail(1) doesn't work, sendbug(1) won't work either.

If you don't have a working email setup on your machine, run sendbug -P
redirected to a file, move that file to a machine with working email,
and send via a standard mail client.

>   2. move the temporary problem reports to /tmp/sendbug/p.XX and 
> never delete
> anything in /tmp/sendbug. In this case smtpd(8) could still sliently 
> fail, but
> at least we keep a copy of the message available (until reboot).

It might possibly make sense to have a way to prevent sendbug deleting
its tmpfile, but /tmp is cleaned automatically in various circumstances
so that might not help you. Best save a copy from your editor if you
want to keep it, really.



Re: Xorg hangs and segfaults with today snapshot

2024-08-18 Thread Stuart Henderson
On 2024/08/18 09:09, Walter Alejandro Iglesias wrote:
> I still don't understand how gdb(1) works.  Today I ran the same command
> against the same Xorg.core file but I got different messages.  The whole
> egdb output:
> 
> $ egdb -q $(which X) Xorg.core
> Reading symbols from /usr/X11R6/bin/X...
[...]
> warning: .dynamic section for "/usr/lib/libc.so.100.3" is not at the expected 
> address (wrong library or version mismatch?)

You presumably rebooted between generating the core and trying to use
gdb and triggered the "library_aslr" mechanism, so the libc.so that you
now have is useless for debugging with that core.



Re: exim SIGSEGV on TLS connections on latest amd64 snapshot

2024-08-18 Thread Stuart Henderson
Original message didn't show up.

Is this exim 4.97.1 or 4.98? If it's 4.98 can you try building 4.97.1
('cvs up -D 2024/07/29' in mail/exim) to see whether it was the update
or something else causing it?

On 2024/08/18 14:14, Peter N. M. Hansteen wrote:
> And I should add, the data in the report is from after I did another 
> sysupgrade -s followed by pkg_add -vurm and observing that the problem
> had not gone away.
> 
> I assume and hope there is some relatively obvious fix for this. I look
> forward to reading my backlog of openbsd.org mail :)
> 
> All the best,
> Peter
> 
> -- 
> Peter N. M. Hansteen, member of the first RFC 1149 implementation team
> https://bsdly.blogspot.com/ https://www.bsdly.net/ https://www.nuug.no/
> "Remember to set the evil bit on all malicious network traffic"
> delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
> 



Re: exim SIGSEGV on TLS connections on latest amd64 snapshot

2024-08-18 Thread Stuart Henderson
On 2024/08/18 13:57, Stuart Henderson wrote:
> Original message didn't show up.

Ah it showed up now.

: >Fix:
: To be determined. Likely abi mismatch between exim and libressl

that's unlikely.

> Is this exim 4.97.1 or 4.98? If it's 4.98 can you try building 4.97.1
> ('cvs up -D 2024/07/29' in mail/exim) to see whether it was the update
> or something else causing it?
> 
> On 2024/08/18 14:14, Peter N. M. Hansteen wrote:
> > And I should add, the data in the report is from after I did another 
> > sysupgrade -s followed by pkg_add -vurm and observing that the problem
> > had not gone away.
> > 
> > I assume and hope there is some relatively obvious fix for this. I look
> > forward to reading my backlog of openbsd.org mail :)
> > 
> > All the best,
> > Peter
> > 
> > -- 
> > Peter N. M. Hansteen, member of the first RFC 1149 implementation team
> > https://bsdly.blogspot.com/ https://www.bsdly.net/ https://www.nuug.no/
> > "Remember to set the evil bit on all malicious network traffic"
> > delilah spamd[29949]: 85.152.224.147: disconnected after 42673 seconds.
> > 
> 



Re: exim SIGSEGV on TLS connections on latest amd64 snapshot

2024-08-19 Thread Stuart Henderson
On 2024/08/19 15:26, Theo Buehler wrote:
> On Mon, Aug 19, 2024 at 02:57:28PM +0200, Renaud Allard wrote:
> > 
> > 
> > On 8/19/24 12:04 PM, Peter Nicolai Mathias Hansteen wrote:
> > > 
> > > So quite odd, the whole thing.
> > > 
> > 
> > That's indeed quite odd if connecting with openssl s_client works.
> > I really think you should try out asking exim devs.
> 
> It would be helpful to have a reproducer or a backtrace. It is
> impossible to gather much info from this discussion.

Install the debug-exim package to match the exim package that you've
installed (whether that's self built or from the snapshot - reinstall
the main package if unsure whether they match). Then

# mkdir /var/crash/exim (i assume that is the process name; adjust if not)
# sysctl kern.nosuidcoredump=3

Trigger a crash, see if you get a /var/crash/exim/$PID.core and if so,
point egdb at it.

Alternative: bisect around upstream commits touching TLS? these, maybe others?
https://github.com/Exim/exim/commit/5d5ad9fb16a2511ff2e0e7d4528d399f06f608da
https://github.com/Exim/exim/commit/fe105877d57ac7e05a4333e0d072f232d212b9fe

> While it is impossible to be sure where exactly the bug lies, it sure
> looks as if exim had another pretty bad bug in a release. The diff
> doesn't show much information since it's mostly pointless churn.
> 
> I think it is about time to seriously consider removing exim from the
> ports tree for good.

That would be OK with me. Of course people can still fetch from the
Attic and build themselves if they really need it, but the extra
steps needed for that (+ OS updates) will increase the motivation
to port the config across to another MTA.

(If it _does_ stay, perhaps it should switch to using gnutls).



Re: sysupgrade Verifying sets FAILED

2024-08-19 Thread Stuart Henderson
Wait and try again, or try a different mirror. Something is causing 
shearing in some cases i.e. a mixture of files from two different snapshots.


(Ran into the same one myself earlier and discovered that autoinstall 
doesn't reboot in this situation, instead drops back to 
install/upgrade/auto install and sits there)


--
 Sent from a phone, apologies for poor formatting.

On 19 August 2024 19:47:43 Anon Loli  wrote:


Grettings, fellow gentlemen and ladies.
Hereby I want to announce the following issue that I got while running
`torsocks sysupgrade -n` on 1 machine.

Dmesg should be attached below.
I'll summarize the sysupgrade output because the X11 doesn't work right
now so I can't easily copy everything now because of the current bug.


Fetching from [REDACTED for increased anonymity]
SHA256.sig 100% ...
Signature Verified
BUILDINFO 100% ...
INSTALL.amd64 100% ...
[literally everything else from base76.tgz to xshare76.tgz, all 100%]
Verifying sets.
(SHA256) bsd: FAILED
[literally everything else from this bsd to xshare76, all FAILED]

and that's where it ends.
If it were because of the transition to 76, then it would stop immediately,
right?
In all of my time of using OpenBSD, this is the 1st time this has happened
to me.
Does this have something to do with the specific mirror that I use?
Is that mirror being fishy?
I use Tor for anonymity reasons and have done this many times successfully.
The currently used snapshot should be like from within 24 hours.

Speaking of sysupgrade, if I use it with I2P (installurl is a localhost link
to a I2P tunnel that goes to a I2P mirror for OpenBSD), then sometimes it just
quits with 0 output, shortly after starting, OR
case b) it stalls forever, which probably has something to do with what Solene
said for this exact use-case - ftp drops the connection way too soon.
I made the entire process exactly ONCE, and it was such a good feeling!

Nothing better than downloading a snapshot for half a day, at 0-50kb/s :')
seriously, it feels like the good old days, except you're more-less anonymous,
safe and cozy.

It wouldn't be bad to have a verbose option for sysupgrade like some other
utilities do, but it more-less gets the job done so ohwell :/




Re: sysupgrade Verifying sets FAILED

2024-08-20 Thread Stuart Henderson
On 2024/08/20 05:22, UDENIX wrote:
> I wish OpenBSD supported using mirrors hosted on the Tor or I2P networks for
> system and package installation and upgrades. For instance, Debian not only
> supports but also maintains official mirrors on the Tor network.

Not going to happen. If you can find a trustworthy source of files on
one of those networks you could download the img or iso and install
from there, but the install kernel is a very constrained environment -
there's no space to add things like that.



Re: sysupgrade Verifying sets FAILED

2024-08-20 Thread Stuart Henderson
"WireGuard is not included in the base system because it violates OpenBSD's 
copyright policy" huh?


--
 Sent from a phone, apologies for poor formatting.

On 20 August 2024 18:54:37 UDENIX  wrote:


I wish OpenBSD supported using mirrors hosted on the Tor or I2P networks for
system and package installation and upgrades. For instance, Debian not only
supports but also maintains official mirrors on the Tor network.


Not going to happen. If you can find a trustworthy source of files on
one of those networks you could download the img or iso and install
from there, but the install kernel is a very constrained environment -
there's no space to add things like that.


I'm not asking for Tor or I2P to be added to the base system, but rather
for the ability to use mirrors hosted on these networks without having
to go through a lot of steps [1]. Ideally, it should be as simple as
installing the necessary software (even if it's from the ports tree),
configuring and running it, and then adding the mirror address to
installurl(5). This should be sufficient to install and update the
system and packages.

On the other hand, there are kernel drivers for WireGuard, but WireGuard
is not included in the base system because it violates OpenBSD's
copyright policy. In contrast, the Tor and I2Pd routing software is
licensed under the 3-Clause BSD license.

[1]
https://dataswamp.org/~solene/2024-05-25-openbsd-privacy-friendly-mirror.html




Re: relayd TLS handshake failure

2024-08-22 Thread Stuart Henderson
On 2024/08/21 22:54, Omar Polo wrote:
> On 21/08/24 14:49, Kirill A. Korinsky wrote:
> > On Wed, 21 Aug 2024 14:32:34 +0200,
> > David McMackins II  wrote:
> >> rsae_send_imsg: privenc poll timeout, keyop #0
> >> relay gemini, session 1 (1 active), 0, 192.168.1.1 -> :11965, TLS
> >> handshake error: handshake failed: error:1402D438:SSL
> >> routines:ACCEPT_SW_CERT:tlsv1 alert internal error: Invalid argument
> >> relay_dispatch_ca: privenc result after timeout
> >>
> > TLSv1 and TSLv1.1 are disabled by default, and you must enable them to use
> > them; see man for relayd.conf.
> 
> not just disabled, in july 2023 tls 1.0 and 1.1 were completely removed from 
> libtls, and very shortly later also from libssl.
> 
> (plus the Gemini specification actually requires tls 1.2 or 1.3)
> 

This error looks more like something to do with a cert than with the
TLS version.



MBP M2 pro, !cold assert failed drm/include/linux/completion.h line 89 at shutdown

2024-08-27 Thread Stuart Henderson
'Apple MacBook Pro (14-inch, M2 Pro, 2023)', running recent -current.
Not running X. I hit this after running halt -p:

panic: kernel diagnostic assertion "!cold" failed: file 
"/usr/src/sys/dev/pci/drm/include/linux/completion.h", line 89

Transcribed, maybe typos:

TID *328025, PID 16963, UID 0, PRFLAGS 0x3, PFLAGS 0, CPU 0K, COMMAND halt

db_enter at panic
panic at assert
panic at drm_atomic_helper_swap_state+0x57c
commit_tail at drm_atomic_helper_commit+0x1d4
drm_atomic_helper_commit at drm_atomic_commit+0xa4
drm_atomic_commit at drm_client_modeset_commit_atomic+0x158
drm_client_modeset_commit_atomic at drm_client_modeset_commit_locked+0x5c

After restart/fsck I tried halt -p again and it succeeded.


OpenBSD 7.6-beta (GENERIC.MP) #160: Mon Aug 26 05:36:05 MDT 2024
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 16312172544 (15556MB)
avail mem = 15672647680 (14946MB)
random: good seed from bootblocks
mainbus0 at root: Apple MacBook Pro (14-inch, M2 Pro, 2023)
efi0 at mainbus0: UEFI 2.10
efi0: Das U-Boot rev 0x20240100
cpu0 at mainbus0 mpidr 0: Apple Blizzard Pro r1p0
cpu0: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu0: 4096KB 128b/line 16-way L2 cache
cpu0: 
TLBIOS+IRANGE,TS+AXFLAG,FHM,DP,SHA3,RDM,Atomic,CRC32,SHA2+SHA512,SHA1,AES+PMULL,I8MM,BF16,SPECRES,SB,FRINTTS,GPI,LRCPC+LDAPUR,FCMA,JSCVT,API+PAC,DPB+DCCVADP,ECV,SpecSEI,PAN+ATS1E1,LO,HPDS,VH,IDS,AT,CSV3,CSV2,DIT,AdvSIMD+HP,FP+HP,BT,SSBS+MSR
cpu1 at mainbus0 mpidr 1: Apple Blizzard Pro r1p0
cpu1: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu1: 4096KB 128b/line 16-way L2 cache
cpu2 at mainbus0 mpidr 2: Apple Blizzard Pro r1p0
cpu2: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu2: 4096KB 128b/line 16-way L2 cache
cpu3 at mainbus0 mpidr 3: Apple Blizzard Pro r1p0
cpu3: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu3: 4096KB 128b/line 16-way L2 cache
cpu4 at mainbus0 mpidr 10100: Apple Avalanche Pro r1p0
cpu4: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu4: 16384KB 128b/line 16-way L2 cache
cpu5 at mainbus0 mpidr 10101: Apple Avalanche Pro r1p0
cpu5: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu5: 16384KB 128b/line 16-way L2 cache
cpu6 at mainbus0 mpidr 10102: Apple Avalanche Pro r1p0
cpu6: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu6: 16384KB 128b/line 16-way L2 cache
cpu7 at mainbus0 mpidr 10103: Apple Avalanche Pro r1p0
cpu7: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu7: 16384KB 128b/line 16-way L2 cache
cpu8 at mainbus0 mpidr 10200: Apple Avalanche Pro r1p0
cpu8: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu8: 16384KB 128b/line 16-way L2 cache
cpu9 at mainbus0 mpidr 10201: Apple Avalanche Pro r1p0
cpu9: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu9: 16384KB 128b/line 16-way L2 cache
cpu10 at mainbus0 mpidr 10202: Apple Avalanche Pro r1p0
cpu10: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu10: 16384KB 128b/line 16-way L2 cache
cpu11 at mainbus0 mpidr 10203: Apple Avalanche Pro r1p0
cpu11: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu11: 16384KB 128b/line 16-way L2 cache
"asc-firmware" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"framebuffer" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"region157" at mainbus0 not configured
"region95" at mainbus0 not configured
"region94" at mainbus0 not configured
"region57" at mainbus0 not configured
"dcp_data" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"uat-handoff" at mainbus0 not configured
"uat-pagetables" at mainbus0 not configured
"uat-ttbs" at mainbus0 not configured
"isp-heap" at mainbus0 not configured
apm0 at mainbus0
"opp-table-0" at mainbus0 not configured
"opp-table-1" at mainbus0 not configured
"opp-table-gpu" at mainbus0 not configured
"opp-table-gpu-cs" at mainbus0 not configured
"opp-table-gpu-afr" at mainbus0 not configured
"pmu-e" at mainbus0 not configured
"pmu-p" at mainbus0 not configured
agtimer0 at mainbus0: 24000 kHz
"clock-ref" at mainbus0 not configured
"clock-200m" at mainbus0 not configured
"clock-disp0" at mainbus0 not configured
"clock-dispext0" at mainbus0 not configured
"clock-dispext0_die1" at mainbus0 not configured
"clock-dispext1" at mainbus0 not configured
"clock-dispext1_die1" at mainbus0 not configured
"clock-ref-nco" at mainbus0 not configured
simplebus0 at mainbus0: "soc"
aplpmgr0 at simplebus0
aplpmgr1 at simplebus0
aplpmgr2 at simplebus0
aplpmgr3 at simplebus0
aplintc0 at simplebus0 nirq 1961 ndie 1
apldog0 at simplebus0
aplmbox0 at simplebus0
aplpinctrl0 at simplebus0
aplmbox1 at simplebus0
apldart0 at simplebus0 rev 2.0: 42 bits, bypass
apldart1 at simplebus0 rev 2.0: 42 bits, bypass
aplda

Re: MBP M2 pro, !cold assert failed drm/include/linux/completion.h line 89 at shutdown

2024-08-28 Thread Stuart Henderson

I've not run into this again yet (with a few reboots).

--
 Sent from a phone, apologies for poor formatting.

On 27 August 2024 23:46:46 Stuart Henderson  wrote:


'Apple MacBook Pro (14-inch, M2 Pro, 2023)', running recent -current.
Not running X. I hit this after running halt -p:

panic: kernel diagnostic assertion "!cold" failed: file 
"/usr/src/sys/dev/pci/drm/include/linux/completion.h", line 89


Transcribed, maybe typos:

TID *328025, PID 16963, UID 0, PRFLAGS 0x3, PFLAGS 0, CPU 0K, COMMAND halt

db_enter at panic
panic at assert
panic at drm_atomic_helper_swap_state+0x57c
commit_tail at drm_atomic_helper_commit+0x1d4
drm_atomic_helper_commit at drm_atomic_commit+0xa4
drm_atomic_commit at drm_client_modeset_commit_atomic+0x158
drm_client_modeset_commit_atomic at drm_client_modeset_commit_locked+0x5c

After restart/fsck I tried halt -p again and it succeeded.


OpenBSD 7.6-beta (GENERIC.MP) #160: Mon Aug 26 05:36:05 MDT 2024
   dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 16312172544 (15556MB)
avail mem = 15672647680 (14946MB)
random: good seed from bootblocks
mainbus0 at root: Apple MacBook Pro (14-inch, M2 Pro, 2023)
efi0 at mainbus0: UEFI 2.10
efi0: Das U-Boot rev 0x20240100
cpu0 at mainbus0 mpidr 0: Apple Blizzard Pro r1p0
cpu0: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu0: 4096KB 128b/line 16-way L2 cache
cpu0: 
TLBIOS+IRANGE,TS+AXFLAG,FHM,DP,SHA3,RDM,Atomic,CRC32,SHA2+SHA512,SHA1,AES+PMULL,I8MM,BF16,SPECRES,SB,FRINTTS,GPI,LRCPC+LDAPUR,FCMA,JSCVT,API+PAC,DPB+DCCVADP,ECV,SpecSEI,PAN+ATS1E1,LO,HPDS,VH,IDS,AT,CSV3,CSV2,DIT,AdvSIMD+HP,FP+HP,BT,SSBS+MSR

cpu1 at mainbus0 mpidr 1: Apple Blizzard Pro r1p0
cpu1: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu1: 4096KB 128b/line 16-way L2 cache
cpu2 at mainbus0 mpidr 2: Apple Blizzard Pro r1p0
cpu2: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu2: 4096KB 128b/line 16-way L2 cache
cpu3 at mainbus0 mpidr 3: Apple Blizzard Pro r1p0
cpu3: 128KB 64b/line 4-way L1 PIPT I-cache, 64KB 64b/line 8-way L1 D-cache
cpu3: 4096KB 128b/line 16-way L2 cache
cpu4 at mainbus0 mpidr 10100: Apple Avalanche Pro r1p0
cpu4: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu4: 16384KB 128b/line 16-way L2 cache
cpu5 at mainbus0 mpidr 10101: Apple Avalanche Pro r1p0
cpu5: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu5: 16384KB 128b/line 16-way L2 cache
cpu6 at mainbus0 mpidr 10102: Apple Avalanche Pro r1p0
cpu6: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu6: 16384KB 128b/line 16-way L2 cache
cpu7 at mainbus0 mpidr 10103: Apple Avalanche Pro r1p0
cpu7: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu7: 16384KB 128b/line 16-way L2 cache
cpu8 at mainbus0 mpidr 10200: Apple Avalanche Pro r1p0
cpu8: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu8: 16384KB 128b/line 16-way L2 cache
cpu9 at mainbus0 mpidr 10201: Apple Avalanche Pro r1p0
cpu9: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu9: 16384KB 128b/line 16-way L2 cache
cpu10 at mainbus0 mpidr 10202: Apple Avalanche Pro r1p0
cpu10: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu10: 16384KB 128b/line 16-way L2 cache
cpu11 at mainbus0 mpidr 10203: Apple Avalanche Pro r1p0
cpu11: 192KB 64b/line 6-way L1 PIPT I-cache, 128KB 64b/line 8-way L1 D-cache
cpu11: 16384KB 128b/line 16-way L2 cache
"asc-firmware" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"framebuffer" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"region157" at mainbus0 not configured
"region95" at mainbus0 not configured
"region94" at mainbus0 not configured
"region57" at mainbus0 not configured
"dcp_data" at mainbus0 not configured
"asc-firmware" at mainbus0 not configured
"uat-handoff" at mainbus0 not configured
"uat-pagetables" at mainbus0 not configured
"uat-ttbs" at mainbus0 not configured
"isp-heap" at mainbus0 not configured
apm0 at mainbus0
"opp-table-0" at mainbus0 not configured
"opp-table-1" at mainbus0 not configured
"opp-table-gpu" at mainbus0 not configured
"opp-table-gpu-cs" at mainbus0 not configured
"opp-table-gpu-afr" at mainbus0 not configured
"pmu-e" at mainbus0 not configured
"pmu-p" at mainbus0 not configured
agtimer0 at mainbus0: 24000 kHz
"clock-ref" at mainbus0 not configured
"clock-200m" at mainbus0 not configured
"clock-disp0" at mainbus0 not configured
"clock-dispext0" at mainbus0 not configured
"clock-dispext0_die1" at mainbus0 not configured
"clock-di

Re: [ksh?] quote missing exit error message

2024-08-30 Thread Stuart Henderson
On 2024/08/29 19:03, Nick Holland wrote:
> On 8/29/24 16:11, Anon Loli wrote:
> > Okay, I have an vague idea about what happens, let me 1st add this to the 
> > bug
> > report:
> > 
> > So I launched a half a dozen tmux windows and they had archivemedia script
> > running and while I started all the windows, I went on to edit the script 
> > and
> > add a few lines,
> so...you were editing a running script?  I do believe that falls under the
> category, "undefined behavior".  Yeah, I kinda expect things to go bad
> when you do that.

That falls under the "you are damn lucky if things _don't_ break".
Edit a copy and move it into place.



Re: Spurious non-deterministic kernel panics

2024-09-02 Thread Stuart Henderson
On 2024/09/01 20:46, Jonathan Kalbfeld wrote:
> >Synopsis: Spurious reboots involving mutiple traps and double faults
> >Category:
> >Environment:
> System  : OpenBSD 7.5
> Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
> The system will occasionally reboot for no reason.  I have replaced both 
> memory and CPU on the
> system and sometimes it will go 5 days and sometimes 1-2 days without a 
> reboot.  Here are some
> of the kernel panic messages:

maybe set ddb.panic back to the default (1) and try to collect more from
ddb? at least ps /o and ps, and a trace from each cpu (use mach ddbcpu
followed by the cpu number to switch - the default radix in ddb is hex,
so as your numbers go above cpu9 use "mach ddbcpu 0t10" etc).

"show mbuf" and "show all pools" are often useful too.

> cpu0: smt 0, core 0, package 0
...
> cpu55: smt 1, core 14, package 1

you're not going to be gaining anything from so many cores on OpenBSD
yet, and extra cores aren't without costs to the os. is there an easy
way to disable one of the two CPU chips ("package"s)?



Re: panic - ffs_write - AMD64/7.4 to current

2024-09-06 Thread Stuart Henderson
On 2024/08/24 12:13, Martin Pieuchot wrote:
> Hugh,
> 
> If you can reproduce this easily, please send a new panic with the
> outputs of:
> - show uvm
> - show bcstats
> - And the traces of all running processes...  In the two reports below we
> only have the trace of pax(1) which is running on CPU2.
> 
> The two panics are due to corruptions of two different global data
> structures related to buffers: the tree of pages and the tree of buffers.
> 
> In both cases it happens when the buffer cache reaches low DMA watermark
> and tries to flip a buffer high.  The fact that global data structures
> are corrupted and the given buffer cannot be found tends to indicate
> there is a race.  And this is coherent with the use of pax | nc which
> are currently running on two different CPUs.
> 
> I fear there's a sleeping point somewhere, we could try converting the
> splbio() to a mutex which should help.
> 
> On 24/08/24(Sat) 01:11, Hugh Graham wrote:
> > On Fri, Aug 23, 2024 at 01:52:52PM -0600, Bob Beck wrote:
> > > My immediate suspicion would also fall there.  Nothing in here has 
> > > recently changed. 
> > > 
> > > You should probably share this with a wider audience, like bugs@ or tech@ 
> > > instead of just
> > > Mailing individuals. 
> > 
> > Apologies for the lack of process. I am only barely awake after a
> > long slumber.
> > 
> > It ran all day, but I did manage to reproduce the crash on 7.4,
> > so that absolves a whole bunch of "recent" changes.
> > 
> > Also, as yet, I have only the single machine for testing and
> > can't exclude hardware. If anyone wants to make an independent
> > confirmation, sending a ports tree with plenty of packages and
> > distfiles might be a successful recipe.
> > 
> > pax -w ports | network | pax -r
> > 
> > Where the receiver's network media is forced to 10BaseT, or the
> > sending machine is just that slow. My latest crash was near the
> > 25GB mark, but this varies greatly and is usually sooner. I
> > will confirm this recipe when I see my next crash.
> > 
> > /Hugh
> > 
> > >> OpenBSD/amd64 BOOTX64 3.65
> > boot> boot bsd.mp.74.dist -s
> > booting hd0a:bsd.mp.74.dist: 17249612+4142096+368672+0+1241088 
> > [1340407+128+1321080+1013316]=0x1973738
> > entry point at 0x1001000
> > [ using 3675960 bytes of bsd ELF symbol table ]
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> > The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2023 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> > 
> > OpenBSD 7.4 (GENERIC.MP) #1397: Tue Oct 10 09:02:37 MDT 2023
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 33551818752 (31997MB)
> > ...
> > panic: kernel diagnostic assertion "tpg != NULL" failed: file 
> > "/usr/src/sys/uvm
> > /uvm_page.c", line 855
> > Stopped at  db_enter+0x14:  popq%rbp
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> > *  1154  46918  00x13  02  pax
> >  401298  12308  00x13  03  nc
> >  404052  38135  0 0x14000  0x2001  softnet0
> > db_enter() at db_enter+0x14
> > panic(820a9e1f) at panic+0xc3
> > __assert(82122f8e,8207775c,357,8215caa2) at 
> > __assert+0x
> > 29
> > uvm_pagerealloc_multi(fd8741fa1218,40,4000,22,8250d8d0) 
> > at u
> > vm_pagerealloc_multi+0x2f8
> > buf_realloc_pages(fd8741fa1158,8250d8d0,2) at 
> > buf_realloc_pages+0xb
> > f
> > buf_flip_high(fd8741fa1158) at buf_flip_high+0x7e
> > bufcache_recover_dmapages(0,4) at bufcache_recover_dmapages+0x12b
> > buf_get(fd873280ab58,3be4,4000) at buf_get+0xcb
> > getblk(fd873280ab58,3be4,4000,0,) at getblk+0x71
> > ffs2_balloc(fd872b47be18,ef9,2d,fd880dad9ea0,1,8000443a11a8)
> >  at
> >  ffs2_balloc+0xeef
> > ffs_write(8000443a1228) at ffs_write+0x229
> > VOP_WRITE(fd873280ab58,8000443a1388,1,fd880dad9ea0) at 
> > VOP_WRITE+0x
> > 45
> > vn_write(fd8718901708,8000443a1388,0) at vn_write+0xcc
> > dofilewritev(80004436e2b0,6,8000443a1388,0,8000443a1460) at 
> > dofilew
> > ritev+0x151
> > end trace frame: 0x8000443a13f0, count: 0
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{2}> 
> > 
> > > 
> > > Very likely related to come of the changes being made in uvm. 

I've just been sent a photo from a crashed machine (not local to me -
it's running 7.6-beta from Aug 19) with a trace which doesn't look
entirely dissimilar to this first one from Hugh. It would have been
idling at the time with X, mate, possibly chromium running but not
actively used.

Sadly I don't have any further information from DDB beyond what was
on-screen, the machine was already rebooted so I can't get it now,
so I'm afraid this is probably not all that a useful report..

Hand-retyped bel

Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-03-02 Thread Stuart Henderson
[cc's trimmed back to bugs@]

On 2021/03/02 19:00, Mark Schneider wrote:
> On 02.03.21 10:39, Stuart Henderson wrote:
> > On 2021/03/02 00:09, Mark Schneider wrote:
> > > Hi,
> > > 
> > > Thank you for your feeeback.
> > > 
> > > Also OpenBSD 6.9beta snapshot is crashing when I setup RAID5 with three
> > > "Samsung PRO 860 1TB" SSDs.
> > > OpenBSD obsd69b.it-infra.org 6.9 GENERIC.MP#368 amd64
> > > 
> > > obsd69b# dmesg | grep  -i bios
> > > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries)
> > > bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015
> > > bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z
> > > acpi0 at bios0: ACPI 5.0
> > Can you isolate softraid from the equation? Are the drives reliable with
> > this hardware configuration when not using softraid? I guess it would
> > need testing with simultaneous writes to the 3 drives to give a closer
> > match to the situation with softraid.
> 
> Thanks a lot for all hints Stuart.
> 
> The isolated 1TB SSD Samsung PRO 860 drives have some AHCI errors
> (OpenBSD_6.9beta-RAID5-3x1TB-SSD-isolated.txt in the attachment).
> 
> 
> Writing to an "isolated" drive does not crash OpenBSD even there are AHCI
> errors and sometimes an I/O error from dd (see directly below).

Thanks. So even if softraid were to not crash, things would not be
in good shape and probably all it could do is mark the component as
failed. Which would be better than a crash but suboptimal ;)

> # ---
> obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
> dd: /ssd1T-sd1a/1GB-urandom.bin: Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes transferred in 0.014 secs (0 bytes/sec)
> 
> obsd69b# dd if=/dev/urandom of=/ssd1T-sd1a/1GB-urandom.bin bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes transferred in 5.156 secs (208228191 bytes/sec)
> 
> # ---
> 
> ahci2: NCQ errored slot 3 is idle (0400 active)
> ahci2: NCQ errored slot 13 is idle (7c0f01eb active)
> ahci2: NCQ errored slot 26 is idle (03fe1e07 active)
> ahci2: NCQ errored slot 30 is idle (03e1e38f active)
> ahci2: NCQ errored slot 28 is idle (03e1fc71 active)
> ahci2: NCQ errored slot 30 is idle (03fc9f81 active)
> ahci2: NCQ errored slot 9 is idle (0f0ee03f active)
> ahci2: NCQ errored slot 16 is idle (70f400ff active)
> ahci2: NCQ errored slot 28 is idle (0f3c407f active)
> ahci2: NCQ errored slot 13 is idle (70dc41fc active)
> ahci2: NCQ errored slot 17 is idle (0f3c1fe0 active)
> ahci2: NCQ errored slot 30 is idle (0f7c181f active)
> 
> 
> Writing to all "isolated" drives simultanously does not crash OpenBSD even
> there are AHCI errors

Some layers don't cope well with errors in layers below, especially if
they haven't been bumped into in development or they're relatively
uncommon, so it's not a complete surprise. I don't get the impression
many people are running softraid raid5 to have bumped into bugs before
you. (There's a fair chance that if you used softdep it would run into
problems too, that doesn't cope too well with lower layer failure
either..).

> # OpenBSD 6.9beta is crashing after a dd command writing to the RAID5
> softraid volume (sd4a) and the access to the ddb{4}> prompt is not possible
> to run trace, ps or sh commands (the root console is dead).
> 
> 
> > "trace" and "sh reg" from ddb would give more clues.
> 
> I am not able to run the commands above as the root ddb{4} console is dead
> (I can see only the last error message but I am not able to type in using
> the keyboard)

Things to try for this:

"sysctl machdep.forceukbd=1" may allow the keyboard to work

"sysctl ddb.panic=0" I don't know if this will help as it isn't
technically a panic, but it may show you a stack trace (and then
try to write a kernel coredump to the swap partition which may
or may not work, then reboot). So with a bit of luck you might
be able to at least grab a screenshot.

> I will connect those Samsung PRO 860 1TB SSDs to a Xeon based system
> (another SATA-controller) and check there for AHCI errors.
> 
> Maybe it is worth to mention, that the original RAID tests on Debian buster
> with six of 512GB Samsung PRO 860 (the same drives andf RAID6 set with
> mdadm) worked without crashing the OS.

I guess it will be a combination of the newer drives + the controller
+ something unhandled in the ahci driver.

If you're able (and ideally can arrange serial console to capture the
output) then building a kernel with AHCI_DEBUG defined might give clues
(either add "option AHCI_DEBUG" to kernel config, or just add a #define
in sys/ic/ahci.c).



Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-03-02 Thread Stuart Henderson
On 2021/03/02 23:09, Mark Schneider wrote:
> Thanks a lot Stuart.
> 
> On 02.03.21 18:25, Stuart Henderson wrote:
> > [cc's trimmed back to bugs@]
> > 
> > On 2021/03/02 19:00, Mark Schneider wrote:
> > > ...
> > > Thanks a lot for all hints Stuart.
> > > 
> > > The isolated 1TB SSD Samsung PRO 860 drives have some AHCI errors
> > > (OpenBSD_6.9beta-RAID5-3x1TB-SSD-isolated.txt in the attachment).
> > > 
> > > 
> > > Writing to an "isolated" drive does not crash OpenBSD even there are AHCI
> > > errors and sometimes an I/O error from dd (see directly below).
> > Thanks. So even if softraid were to not crash, things would not be
> > in good shape and probably all it could do is mark the component as
> > failed. Which would be better than a crash but suboptimal ;)
> 
> It looks like "not so perfect" mix of a bit outdated ASUS mainboard / it's
> BIOS, less old Samsung PRO 860 512GB or 1TB SSD drives and the handlig of
> I/O errors in OpenBSD 6.8 or 6.9beta.
> 
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries)
> bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015
> bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z
> acpi0 at bios0: ACPI 5.0
> 
> The Samsung PRO 860 SSDs are new so I do not expect the problem there (as
> they are working on Linux).
> 
> I have taken two of those Samsung PRO 860 512GB SSD drives and connected
> them to another "P8B WS, BIOS 0704 07/25/2011" Xeon based Asus mainboard and
> there are no AHCI errors showing up. I have tested there isolated drives as
> well as plain RAID1 and ecnrypted RAID1 (nested, not using the "-c 1C"
> option of bioctl in OpenBSD 6.9beta) writing 1, 2 10 or 20GBytes big files
> to the RAID device and there was no issue at all.
> 
> The issue is showing up on three Asus FX CPU based mainboards ( 1 x
> "SABERTOOTH 990FX R2.0" and 2 x "CROSSHAIR V FORMULA-Z" - all of them have
> the same SB950 chipset)
> 
> AMD ® SB950 Chipset
> Supports AMD ® QUAD-GPU CrossFireXTM Technology
> - 6 x SATA 6.0 Gb/s ports with RAID 0, 1, 5 and 10 support
> 
> ASUS_E7335_Sabertooth_990FX_R2_Manual.pdf
> ASUS_E7710_Crosshair_5_Formula-Z_Manual.pdf
> 
> 
> > Some layers don't cope well with errors in layers below, especially if
> > they haven't been bumped into in development or they're relatively
> > uncommon, so it's not a complete surprise. I don't get the impression
> > many people are running softraid raid5 to have bumped into bugs before
> > you. (There's a fair chance that if you used softdep it would run into
> > problems too, that doesn't cope too well with lower layer failure
> > either..).
> 
> I/O errors writing to a additional RAID device should not lead to an OS
> crash anyway.
> 
> I mean it is a good opportunity to check the error handling as I/O errors
> can always happen.
> 
> 
> > > # OpenBSD 6.9beta is crashing after a dd command writing to the RAID5
> > > softraid volume (sd4a) and the access to the ddb{4}> prompt is not 
> > > possible
> > > to run trace, ps or sh commands (the root console is dead).
> > > 
> > > 
> > > > "trace" and "sh reg" from ddb would give more clues.
> > > I am not able to run the commands above as the root ddb{4} console is dead
> > > (I can see only the last error message but I am not able to type in using
> > > the keyboard)
> > Things to try for this:
> > 
> > "sysctl machdep.forceukbd=1" may allow the keyboard to work
> > 
> > "sysctl ddb.panic=0" I don't know if this will help as it isn't
> > technically a panic, but it may show you a stack trace (and then
> > try to write a kernel coredump to the swap partition which may
> > or may not work, then reboot). So with a bit of luck you might
> > be able to at least grab a screenshot.
> 
> I use USB keyboard and that seems to be the problem with the ddb{4}> prompt.
> 
> I will check for a PS/2 keyboard or PS/2 to USB adapter to run "trace" and
> "sh reg" commands after a OS crash.
> 
> 
> > > I will connect those Samsung PRO 860 1TB SSDs to a Xeon based system
> > > (another SATA-controller) and check there for AHCI errors.
> > > 
> > > Maybe it is worth to mention, that the original RAID tests on Debian 
> > > buster
> > > with six of 512GB Samsung PRO 860 (the same drives andf RAID6 set with
> > > mdadm) worked without crashing the OS.
> > 

gdb on arm64 -> SIGILL in OPENSSL_cpuid_setup

2021-03-02 Thread Stuart Henderson
Don't know if it's important, but if I try to use gdb on a process
using libcrypto on arm64 (rpi4) I get this.

$ egdb ftp
GNU gdb (GDB) 7.12.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-unknown-openbsd6.9".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ftp...(no debugging symbols found)...done.
(gdb) run -o- https://www.openbsd.org/
Starting program: /usr/bin/ftp -o- https://www.openbsd.org/

Program received signal SIGILL, Illegal instruction.
_armv8_pmull_probe () at /usr/src/lib/libcrypto/arm64cpuid.S:38
38  pmull   v0.1q, v0.1d, v0.1d
(gdb) bt
#0  _armv8_pmull_probe () at /usr/src/lib/libcrypto/arm64cpuid.S:38
#1  0x001797a70b18 in OPENSSL_cpuid_setup ()
at /usr/src/lib/libcrypto/armcap.c:69
#2  0x0017309f28c8 in _dl_call_init_recurse (object=0x17840b1c00, 
initfirst=0)
at /usr/src/libexec/ld.so/loader.c:815
#3  0x0017309f27b8 in _dl_call_init_recurse (object=0x173856b000, 
initfirst=0)
at /usr/src/libexec/ld.so/loader.c:790
#4  0x0017309f27b8 in _dl_call_init_recurse (object=0x173856b400, 
initfirst=0)
at /usr/src/libexec/ld.so/loader.c:790
#5  0x0017309fb774 in _dl_call_init (object=0x173856b400)
at /usr/src/libexec/ld.so/loader.c:760
#6  _dl_boot (argv=, envp=, dyn_loff=99599908864, 
dl_data=0x7d8804) at /usr/src/libexec/ld.so/loader.c:682
#7  0x0017309fbcb4 in _dl_start () at 
/usr/src/libexec/ld.so/aarch64/ldasm.S:59
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


OpenBSD 6.9-beta (GENERIC.MP) #1048: Tue Mar  2 03:35:37 MST 2021
dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
real mem  = 4081004544 (3891MB)
avail mem = 3923197952 (3741MB)
random: good seed from bootblocks
mainbus0 at root: Raspberry Pi 4 Model B Rev 1.2
psci0 at mainbus0: PSCI 1.1, SMCCC 1.2
cpu0 at mainbus0 mpidr 0: ARM Cortex-A72 r0p3
cpu0: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu0: 1024KB 64b/line 16-way L2 cache
cpu0: CRC32
cpu1 at mainbus0 mpidr 1: ARM Cortex-A72 r0p3
cpu1: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu1: 1024KB 64b/line 16-way L2 cache
cpu1: CRC32
cpu2 at mainbus0 mpidr 2: ARM Cortex-A72 r0p3
cpu2: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu2: 1024KB 64b/line 16-way L2 cache
cpu2: CRC32
cpu3 at mainbus0 mpidr 3: ARM Cortex-A72 r0p3
cpu3: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
cpu3: 1024KB 64b/line 16-way L2 cache
cpu3: CRC32
efi0 at mainbus0: UEFI 2.7
efi0: https://github.com/pftf/RPi4 rev 0x1
smbios0 at efi0: SMBIOS 3.3.0
smbios0: vendor https://github.com/pftf/RPi4 version "UEFI Firmware v1.13" date 
May 13 2020 10:40:33
smbios0: Sony UK Raspberry Pi 4 Model B
apm0 at mainbus0
"system" at mainbus0 not configured
"axi" at mainbus0 not configured
simplebus0 at mainbus0: "soc"
bcmclock0 at simplebus0
bcmmbox0 at simplebus0
bcmgpio0 at simplebus0
bcmaux0 at simplebus0
ampintc0 at simplebus0 nirq 256, ncpu 4 ipi: 0, 1: "interrupt-controller"
bcmdmac0 at simplebus0: DMA0 DMA2 DMA4 DMA5 DMA6 DMA7
bcmtmon0 at simplebus0
"timer" at simplebus0 not configured
bcmirng0 at simplebus0
pluart0 at simplebus0: console
com0 at simplebus0: ns16550, no working fifo
"local_intc" at simplebus0 not configured
bcmdog0 at simplebus0
"clock" at simplebus0 not configured
simplebus1 at simplebus0: "firmware"
"gpio" at simplebus1 not configured
"power" at simplebus0 not configured
sdhc0 at simplebus0
sdhc0: SDHC 3.0, 250 MHz base clock
sdmmc0 at sdhc0: 4-bit, sd high-speed, mmc high-speed
"gpiomem" at simplebus0 not configured
"fb" at simplebus0 not configured
"vcsm" at simplebus0 not configured
"clocks" at mainbus0 not configured
"phy" at mainbus0 not configured
"clk-108M" at mainbus0 not configured
"firmware-clocks" at mainbus0 not configured
"arm-pmu" at mainbus0 not configured
agtimer0 at mainbus0: 54000 kHz
simplebus2 at mainbus0: "scb"
bcmpcie0 at simplebus2
pci0 at bcmpcie0
ppb0 at pci0 dev 0 function 0 "Broadcom BCM2711" rev 0x10
pci1 at ppb0 bus 1
xhci0 at pci1 dev 0 function 0 "VIA VL805 xHCI" rev 0x01: intx, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "VIA xHCI root hub" rev 3.00/1.00 
addr 1
bse0 at simplebus2: address dc:a6:32:8b:e1:b7
brgphy0 at bse0 phy 1: BCM54210E 10/100/1000baseT PHY, rev. 2
"dma" at simplebus2 not configured
"mailbox" at simplebus

Re: relayd SIGBABRT's on start

2021-03-03 Thread Stuart Henderson
On 2021/03/03 10:08, an...@disroot.org wrote:
> Sorry. Here it is:
> 
> table  { 127.0.0.1 }
> 
> protocol ircfilter {
> tls keypair "localhost"
> }
> 
> relay relay4 {
> listen on 127.0.0.1 port 6697 tls 
> protocol ircfilter
> forward to  port 6667 
> }
> 
> relay relay6 {
> listen on ::1 port 6697 tls
> protocol ircfilter 
> forward to  port 6667
> }
> 

I can't replicate the failure with this config with a cert in
/etc/ssl/localhost.crt and a key in /etc/ssl/private/localhost.key.
(I don't have anything handy on 6667 but I pointed it at another
plaintext service and checked that it worked and I could connect ok).

I suggest "ktrace -i relayd -vvvd" and "kdump > kdump.out" and
look toward the end of that file to see what it's trying to open that
it fails on.



Re: relayd SIGBABRT's on start

2021-03-03 Thread Stuart Henderson
On 2021/03/03 12:19, an...@disroot.org wrote:
> Here's the contents of kdump.out 

 92196 relayd   GIO   fd 9 read 1858 bytes
   "-BEGIN ENCRYPTED PRIVATE KEY-
MIIFHzBJBgkqhkiG9w0BBQ0wPDAbBgkqhkiG9w0BBQwwDgQIWbKKi98dRTcCAggA
...
 92196 relayd   NAMI  "/dev/tty"
 92196 relayd   PLDG  open, "rpath", errno 1 Operation not permitted
...
 25477 relayd   GIO   fd 2 wrote 43 bytes
   "lost child: pid 92196 terminated; signal 6

pledged relayd cannot cope with encrypted private keys, it is trying
to ask for the password.

(also even though it's encrypted I would recommend replacing that
key now that it's in list archives)

relayd's pledge will need some thought but for now you can get it working
by using a key which isn't encrypted.



Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-03-03 Thread Stuart Henderson
On 2021/03/03 15:10, Karel Gardas wrote:
> 
> On 3/2/21 11:09 PM, Mark Schneider wrote:
> > 
> > It looks like "not so perfect" mix of a bit outdated ASUS mainboard /
> > it's BIOS, less old Samsung PRO 860 512GB or 1TB SSD drives and the
> > handlig of I/O errors in OpenBSD 6.8 or 6.9beta.
> > 
> > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdc312018 (61 entries)
> > bios0: vendor American Megatrends Inc. version "2201" date 03/23/2015
> > bios0: ASUSTeK COMPUTER INC. CROSSHAIR V FORMULA-Z
> > acpi0 at bios0: ACPI 5.0
> > 
> > The Samsung PRO 860 SSDs are new so I do not expect the problem there
> > (as they are working on Linux).
> > 
> > I have taken two of those Samsung PRO 860 512GB SSD drives and connected
> > them to another "P8B WS, BIOS 0704 07/25/2011" Xeon based Asus mainboard
> > and there are no AHCI errors showing up. I have tested there isolated
> > drives as well as plain RAID1 and ecnrypted RAID1 (nested, not using the
> > "-c 1C" option of bioctl in OpenBSD 6.9beta) writing 1, 2 10 or 20GBytes
> > big files to the RAID device and there was no issue at all.
> > 
> > The issue is showing up on three Asus FX CPU based mainboards ( 1 x
> > "SABERTOOTH 990FX R2.0" and 2 x "CROSSHAIR V FORMULA-Z" - all of them
> > have the same SB950 chipset)
> > 
> > AMD ® SB950 Chipset
> > Supports AMD ® QUAD-GPU CrossFireXTM Technology
> > - 6 x SATA 6.0 Gb/s ports with RAID 0, 1, 5 and 10 support
> > 
> 
> Looks like you are not the only one having issue with AMD chipset and new
> samsung SSD:
> 
> https://eu.community.samsung.com/t5/computers-it/860-evo-250gb-causing-freezes-on-amd-system/td-p/575813/page/2
> https://community.amd.com/t5/drivers-software/990fx-sb950-chipset-drivers-not-working-with-samsung-860-pro-ssd/td-p/100175
> 
> And here you even have a trace of issues in Linux kernel bugzilla:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=201693
> 
> If you read all info on those pages, you will find some workarounds for you
> to test, but generally speaking I'm afraid this hardware combination
> does not seem to be too trustful...
> 


"So I guess we should consider doing a kernel side quirk where the
kernel disables NCQ on the combination of having a Samsung 860 or 870
SSD with a SATA controller on these older AMD chipsets. This does
require having a list of PCI-ids for the controllers on which to enable
this quirk."

ouch



Re: OpenBSD 6.8 - softraid issue: "uvm_fault(0xffffffff821f5490, 0x40, 0, 1) -> e"

2021-03-04 Thread Stuart Henderson
No need to patch to check this, it is null, see r9 in sh reg (it is %reg9 
in my disassembly)


--
 Sent from a phone, apologies for poor formatting.
On 4 March 2021 10:16:58 Karel Gardas  wrote:


On 3/3/21 7:45 PM, Mark Schneider wrote:

I can run some tests with modified BIOS settings (IDE instead of AHCI)
and the Samsung PRO 860 SSDs.

From my point of view it is important to check and optimize the I/O
error handling as I/O errors can always happen and in such case the OS
should not crash (other hardware combinations might be impacted es well).



Mark, since you do have hardware to test, are you able to develop and
push the patch to tech@ ?


I am afraid, currently I am not able to develop such patch myself as I
am not familiar enough with the I/O stuff in OpenBSD and last time I
did some x86 assembler programming was approx 35 years ago (debugging).

What I can do however is just testing as I have the "non working"
system combination with the AMD SB950 chipset and Samsung PRO 860 SSDs.


Mark,

are you able to test patch below? I'm currious if you will get some
illegal xs reference printing on console.

diff --git a/sys/dev/softraid.c b/sys/dev/softraid.c
index 0ffaeb86ca0..c6d2086594e 100644
--- a/sys/dev/softraid.c
+++ b/sys/dev/softraid.c
@@ -4583,6 +4583,12 @@ sr_validate_io(struct sr_workunit *wu, daddr_t
*blkno, char *func)
goto bad;
}

+   if (xs == 0) {
+   printf("%s: %s: illegal xs reference for %s\n",
+   DEVNAME(sd->sd_sc), func, sd->sd_meta->ssd_devname);
+   goto bad;
+   }
+
if (xs->datalen == 0) {
printf("%s: %s: illegal block count for %s\n",
DEVNAME(sd->sd_sc), func, sd->sd_meta->ssd_devname);




Re: dhcplease fails to acquire a lease with a trunk interface

2021-03-06 Thread Stuart Henderson
On 2021/03/06 16:09, Raf Czlonka wrote:
> On Sat, Mar 06, 2021 at 08:59:16AM GMT, Florian Obser wrote:
> > On Sat, Mar 06, 2021 at 06:41:44AM +, Adam Steen wrote:
> > > >Synopsis:dhcpleased does not acquire a lease
> > > >Category:networking
> > > >Environment:
> > >   System  : OpenBSD 6.9
> > >   Details : OpenBSD 6.9-beta (GENERIC.MP) #376: Thu Mar  4 21:04:56 
> > > MST 2021
> > >
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > > >Description:
> > >   Testing dhcpleased with a trunk interface and it doesn't acquire a 
> > > lease.
> > > >How-To-Repeat:
> > >   command before new line completed first, then ones after
> > >   terminal 1
> > >   pkill dhcpleased
> > > 
> > >   dhcpleased -dv
> > >   IMSG_OPEN_BPFSOCK
> > >   open_bpfsock: 6
> > >   state_transition Down -> Down, timo: -1
> > >   set_bpfsock: 6 fd: 9
> > >   terminal 2
> > >   ifconfig trunk0 -inet
> > >   ifconfig trunk0
> > >   trunk0: flags=8802 mtu 1500
> > 
> > your interface is not "UP"
> > 
> > >   lladdr f0:de:f1:77:c2:c8
> > >   index 6 priority 0 llprio 3
> > >   trunk: trunkproto failover
> > >   iwn0 port active
> > >   em0 port master
> > >   groups: trunk egress
> > >   media: Ethernet autoselect
> > >   status: active
> > > 
> > >   ifconfig trunk0 inet autoconf
> > 
> > ifconfig trunk0 up
> > 
> > should make it go. There is currently a discussion going on if
> > ifconfig(8) should implicitly bring interfaces up, until that is
> > settled you either need ifconfig $IF up or add "up" to your
> > hostname.if(5) file.
> 
> FWIW, I've made the same mistake thinking that "inet autoconf" and
> "dhcp" are functionally equivalent.

At a very basic level, "inet autoconf" is a signal to something
running in the background and other startup proceeds right away,
whereas "dhcp" is "run a program and wait until either the address
is configured or times out" (which is behaviour some people want).

> I based the assumption both on the commit message[0]:
> 
>   The flag can be set by "ifconfig $if inet autoconf" or by
>   adding "inet autoconf" to /etc/hostname.if. An existing
>   "dhcp" line should be removed.
> 
> i.e. no mention of additional "up"; and the fact that "up" is already
> implied when "dhcp" is being used.
> 
> I can understand both sides to the argument, i.e. on the one hand
> you can configure several interfaces without bringing any of them
> up; or why would you configure an interface if you don't want to
> use it,

there's the case of "don't want to use it _yet_", for example you might
want to setup umb ready for use but only bring it up manually when
needed, or you might want to setup pppoe and don't want it to start
trying to connect immediately at the point it processes "inet" but
wait until you've sent all of the configuration commands.

> not to mention one command less to run or one less option
> to use in hostname.if(5), on the other.

For the hostname.if case, it's easy enough, netstart could do "up"
automatically after configuring if there is no "down" in the file.



Re: dhcplease fails to acquire a lease with a trunk interface

2021-03-06 Thread Stuart Henderson
On 2021/03/06 10:46, Theo de Raadt wrote:
> Mark Kettenis  wrote:
> 
> > > Date: Sat, 6 Mar 2021 16:47:26 +0000
> > > From: Stuart Henderson 
> > > 
> > > there's the case of "don't want to use it _yet_", for example you might
> > > want to setup umb ready for use but only bring it up manually when
> > > needed, or you might want to setup pppoe and don't want it to start
> > > trying to connect immediately at the point it processes "inet" but
> > > wait until you've sent all of the configuration commands.
> > 
> > Right.  I have a few machines where I have wireless interfaces that I
> > configure with a nwid and wpakey such that I can bring them up with a
> > simple "ifconfig bwfm0 up" and deliberately don't bring them up
> > automatically.
> > 
> > > > not to mention one command less to run or one less option
> > > > to use in hostname.if(5), on the other.
> > > 
> > > For the hostname.if case, it's easy enough, netstart could do "up"
> > > automatically after configuring if there is no "down" in the file.
> > 
> > That would break my configurations.
> 
> If we are going to do something, it should just be for
> 
>'address has been configured'
> 
> which would include a static ipv4, or static ipv6, or dynamic ipv6,
> or dynamic ipv4.
> 
> so, bring those up.

For dynamic, the interface needs to be up anyway, so in that case it's
"we expect an address to be configured".

> If the (tail of) the hostname.if file contains "down", it would come up
> and quickly go back down.  The little flip should not hurt anyone.

We already rely on a hack to make pppoe work reliably, there is a race
due to auto-up. dest ought to be configured before the interface is
brought up otherwise we try to negotiate with the ISP before pppoe
knows what the destination address should be or whether it should be a
wildcard.

The trick is to use "inet 0.0.0.0 255.255.255.255 0.0.0.1" which relies
on the fact that "dest" for a point-point interface is stored in the
same place as the broadcast address.

So, for pppoe it doesn't hurt as long as you know the trick, but it is
an example of a (minor) problem with the "auto up on address config".

> The question is where do we do this "up" operation.  inside netstart?
> Inside ifconfig?  Maybe in the return-side of the ioctl code that sets
> addresses?

I think any change to this is going to improve some use cases and make
others worse..



Re: Raspberry Pi 3B+ panic on changing video0 permissions for motion

2021-03-08 Thread Stuart Henderson
On 2021/03/08 23:21, Marfaba Stewart wrote:
> On Monday, March 8, 2021 3:47 PM, Marcus Glocker  wrote:>
> > Does this patch make a difference?
> >
> 
> Hi, I apologize for the dumb question, but I'm not sure
> how to build for the Pi; I've only built for i386
> and amd64.

Exactly the same way. Checkout src/sys (no need for all of src if
you're only building a kernel), cd /sys/arch/arm64/compile/GENERIC
or GENERIC.MP, make obj, make config, make, doas make install

> I'm assuming I would need something more high-powered
> than the Pi to build. Would it be possible to build
> on a Pinebook Pro?
> 
> If not, would you have any suggestions as to the
> correct hardware I should get or donate?
> 
> Thank you very much.
> 

There's just one type of kernel used on OpenBSD/arm64, you can build
on any machine type and copy it over. So yes there would be no problem
building on a Pinebook Pro if you have one available. (Or Raspberry Pi 4
which is pretty quick). Same for userland software / packages / etc.
There's not a general problem building on RPi3 either though (I have
to say I'm surprised you've got this much running on 3B+ already,
I thought the USB controller driver was in worse shape than this).



Re: dnscrypt_proxy fails to start (core dump) after upgrade to tonight snapshot

2021-03-11 Thread Stuart Henderson
Reinstall the package (pkg_delete and pkg_add again).

Same applies for anyone else with problems for software written in go,
package revisions have now been bumped so after the next snapshot is
available pkg_add -u will pick it up, but reinstall will fix it now.


On 2021/03/11 15:17, Chris Jones wrote:
> Synopsis: dnscrypt_proxy fails to start (core dump) after upgrade to tonight 
> snapshot
> Category: Ports package
> Environment:
> 
> System : OpenBSD 6.9
> Details : OpenBSD 6.9-beta (GENERIC.MP) #394: Wed Mar 10 19:59:10 MST 2021
> dera...@amd64.openbsd.org  
> :/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> 
> 
> Description:
> 
> After upgrading to the snapshot this evening, all daemons came backup as 
> expected except one — dnscrypt_proxy. When I tried to manually start the 
> dnscrypt_proxy service ('rcctl start dnscrypt_proxy') I could see the 
> following message in /var/log/messages:
> 
> Mar 10 21:39:07 minto-fw01 /bsd: [dnscrypt-proxy]62869/394062 pc=46b946 
> inside 2318ff000-23190c000: bogus syscall
> 
> How-To-Repeat:
> 
> This will repeat evertime I try to start dnscrypt_proxy.
> 
> Fix:
> 
> No workaround at this point.
> 
> dmesg:
> OpenBSD 6.9-beta (GENERIC.MP) #394: Wed Mar 10 19:59:10 MST 2021
> dera...@amd64.openbsd.org  
> :/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8519643136 (8124MB)
> avail mem = 8246059008 (7864MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x7cd0b020 (14 entries)
> bios0: vendor coreboot version "v4.12.0.3" date 10/28/2020
> bios0: Protectli FW4B
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG APIC
> acpi0: wakeup devices XHCI(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xe000, bus 0-255
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Celeron(R) CPU J3160 @ 1.60GHz, 2240.42 MHz, 06-4c-04
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu0: 1MB 64b/line 16-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 79MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.0.0.0.3.3, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Celeron(R) CPU J3160 @ 1.60GHz, 2239.94 MHz, 06-4c-04
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu1: 1MB 64b/line 16-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Celeron(R) CPU J3160 @ 1.60GHz, 2239.95 MHz, 06-4c-04
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu2: 1MB 64b/line 16-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Celeron(R) CPU J3160 @ 1.60GHz, 2239.95 MHz, 06-4c-04
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,TSC_ADJUST,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN
> cpu3: 1MB 64b/line 16-way L2 cache
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 115 pins, remapped
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
> "INT33BD" at acpi0 not configured
> acpicmos0 at acpi0
> chvgpio0 at acpi0 GPSW uid 1 addr 0xfed8/0x8000 irq 49, 56 pins
> chvgpio1 at acpi0 GPNC uid 2 addr 0xfed88000/0x8000 irq 48, 59 pins
> chvgpio2 at acpi0 GPEC uid 3 addr 0xfed9/0x8000 irq 50, 24 pins
> chvgpio3 at acpi0 GPSE uid 4 addr 0xfed98000/0x8000 irq 91, 55 pins
> "BOOT" at acpi0 not configured
> acpicpu0 at acpi0: C3(1@1500 mwait.1@0x52), C2(10@500 mwait.1@0x51), 
> C1(1000@1 mwai

Re: Possible binary incompatible change of struct usb_device_info in 2018

2021-03-13 Thread Stuart Henderson
On 2021/03/13 16:38, enrik.berk...@inka.de wrote:
> >Synopsis:Undocumented change of struct usb_device_info
> >Category:documentation kernel
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   struct usb_device_info has been changed in sys/dev/usb/usb.h from 
> version
>   1.59 to 1.60 on 2018-07-10 most probably resizing it and relocating its
>   last member udi_serial in memory.

That is not unusual, OpenBSD doesn't have particularly stable ABIs.

> This change is not documented in usb(4).

That needs fixing.

Index: usb.4
===
RCS file: /cvs/src/share/man/man4/usb.4,v
retrieving revision 1.205
diff -u -p -r1.205 usb.4
--- usb.4   4 Feb 2021 16:25:38 -   1.205
+++ usb.4   13 Mar 2021 16:19:14 -
@@ -431,9 +431,11 @@ struct usb_device_info {
u_int8_tudi_protocol;
u_int8_tudi_config;
u_int8_tudi_speed;
-#define USB_SPEED_LOW  1
-#define USB_SPEED_FULL 2
-#define USB_SPEED_HIGH 3
+#define USB_SPEED_LOW  1
+#define USB_SPEED_FULL 2
+#define USB_SPEED_HIGH 3
+#define USB_SPEED_SUPER4
+   u_int8_tudi_port;
int udi_power;  /* power consumption */
int udi_nports;
charudi_devnames[USB_MAX_DEVNAMES]



Re: Possible binary incompatible change of struct usb_device_info in 2018

2021-03-13 Thread Stuart Henderson
On 2021/03/13 09:28, Todd C. Miller wrote:
> On Sat, 13 Mar 2021 16:20:43 +0000, Stuart Henderson wrote:
> 
> > That needs fixing.
> 
> Also need to sync udi_ports[] which is now u_int32_t not u_int8_t.

oh good catch, the USB_PORT_* defines have gone too. I'll fix that and commit.



Re: APU1D4 with OpenBSD 6.8 stable: Kernel Panic db_enter() x86_ipi_db()

2021-03-14 Thread Stuart Henderson
On 2021/03/14 09:34, Fox Steward wrote:
> Hello again.
> 
> in another test I removed the external crypto device (micro sd card) entirely,
> And still I get a kernel panic (see below).
> 
> When I do not start the services httpd, smtpd, dovecot the crash does not
> seem to occur. I will start the services one by one to see which one
> produces the crash. Otherwise, I could also sysupgrade to -current.
> Any support is welcome.
> 
> Thank you.

This one feels like filesystem corruption to me.
I would see if forcing fsck helps anything, and if not then try
recreating the filesystems (backup/newfs/restore).

If you do ps /o (which will list active processes) that may give a clue
as to what's triggering it, which may point to a particular fs.


> = START DDB LOG =
> 
> ddb{0}> show panic
> bad dir
> ddb{0}> trace
> db_enter() at db_enter+0x10
> panic(81e12148) at panic+0x12a
> ufs_lookup() at ufs_lookup+0xed1
> VOP_LOOKUP(fd817df6b0d0,800021ffa0c8,800021ffa070)
> at VOP_LOOKUP+0x
> 46
> unveil_find_cover(fd817df6b0d0,800022164a08) at
> unveil_find_cover+0xff
> unveil_add_vnode(800022164a08,fd817df6b0d0) at
> unveil_add_vnode+0x168
> unveil_add(800022164a08,800021ffa238,800021ffa313)
> at unveil_add+0x
> 335
> sys_unveil(800022164a08,800021ffa370,800021ffa3d0)
> at sys_unveil+0x
> 2b4
> syscall(800021ffa440) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7ef7e0, count: -10
> ddb{0}> trace
> db_enter() at db_enter+0x10
> panic(81e12148) at panic+0x12a
> ufs_lookup() at ufs_lookup+0xed1
> VOP_LOOKUP(fd817df6b0d0,800021ffa0c8,800021ffa070)
> at VOP_LOOKUP+0x
> 46
> unveil_find_cover(fd817df6b0d0,800022164a08) at
> unveil_find_cover+0xff
> unveil_add_vnode(800022164a08,fd817df6b0d0) at
> unveil_add_vnode+0x168
> unveil_add(800022164a08,800021ffa238,800021ffa313)
> at unveil_add+0x
> 335
> sys_unveil(800022164a08,800021ffa370,800021ffa3d0)
> at sys_unveil+0x
> 2b4
> syscall(800021ffa440) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7ef7e0, count: -10
> ddb{0}> show pani
> bad dir
> ddb{0}> show panic
> bad dir
> ddb{0}> trace
> db_enter() at db_enter+0x10
> panic(81e12148) at panic+0x12a
> ufs_lookup() at ufs_lookup+0xed1
> VOP_LOOKUP(fd817df6b0d0,800021ffa0c8,800021ffa070)
> at VOP_LOOKUP+0x
> 46
> unveil_find_cover(fd817df6b0d0,800022164a08) at
> unveil_find_cover+0xff
> unveil_add_vnode(800022164a08,fd817df6b0d0) at
> unveil_add_vnode+0x168
> unveil_add(800022164a08,800021ffa238,800021ffa313)
> at unveil_add+0x
> 335
> sys_unveil(800022164a08,800021ffa370,800021ffa3d0)
> at sys_unveil+0x
> 2b4
> syscall(800021ffa440) at syscall+0x389
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7ef7e0, count: -10
> ddb{0}> machine ddbcpu 1
> Stopped at  x86_ipi_db+0x12:leave
> x86_ipi_db(800021f38ff0) at x86_ipi_db+0x12
> x86_ipi_handler() at x86_ipi_handler+0x80
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
> acpicpu_idle() at acpicpu_idle+0x11f
> sched_idle(800021f38ff0) at sched_idle+0x27e
> end trace frame: 0x0, count: 10
> ddb{1}> trace
> x86_ipi_db(800021f38ff0) at x86_ipi_db+0x12
> x86_ipi_handler() at x86_ipi_handler+0x80
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
> acpicpu_idle() at acpicpu_idle+0x11f
> sched_idle(800021f38ff0) at sched_idle+0x27e
> end trace frame: 0x0, count: -5
> ddb{1}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT
> COMMAND
>  81780   42576  1  0  7 0x2
> getty
>  60611  275784  26394518  30x92  kqread
> stats
>  34433  117386  18366  0  30x100083  ttyin
> ksh
>  45134  320729  26394  0  30x92  kqread
> config
>  53995  328845  26394  0  30x92  kqread
> log
>  79218   61722  26394518  30x92  kqread
> anvil
>  26394   21383  1  0  30x80  kqread
> dovecot
>  57755   70198  11653 95  30x100092  kqread
> smtpd
>  19506  169291  11653103  30x100092  kqread
> smtpd
>  61810  276101  11653 95  30x100092  kqread
> smtpd
>  69218  374657  11653 95  30x100092  kqread
> smtpd
>  40948  456626  11653 95  30x100092  kqread
> smtpd
>  21457   75905  11653 95  30x100092  kqread
> smtpd
>  11653   80772  1  0  30x100080  kqread
> smtpd
>  88536  295847  1  0  30x100080  kqread
> httpd
>  18782  144491  1 67  30x100092  kqread
> httpd
>  17744  118446  1 67  30x100092  kqread
> httpd
>  83374  401707  57866   1002  30x100083  ttyin
> ksh
>  57866  248670  18366  0  30x10008b  pause
> ksh
>  53462  146176  58708   1001  30x100083  select
> ssh
>  58708   22987  72824   1001  30x10008b  pause
> sh
>  72824  447139  70163   1001  30x10008b  pause
> ksh
>  701

Re: Q: pkg_add fails with: TLS handshake failure: ocsp verify failed: Undefined error ...

2021-03-19 Thread Stuart Henderson
In gmane.os.openbsd.misc, li...@y42.org wrote:
>
> Hi All,
>
> What would cause pkg_add -u to report this error?
>> https://ftp.fau.de/pub/OpenBSD/snapshots/packages/amd64/: TLS handshake 
>> failure: ocsp verify failed: Undefined error: 0
>> https://ftp.fau.de/pub/OpenBSD/snapshots/packages/amd64/: empty
>> Couldn't find updates for ... a long list of (all?) installed packages ...
>
> Error 0?

There is some problem doing OCSP validation. It validates OK with openssl
1.0.2u and 1.1.1j but not with libressl. DFN run their own PKI and OCSP
responder so it might hit some edge case that isn't seen with other
responders.

> That directory, on fau.de, is not empty.
>
> I have just rebooted after running sysupgrade to arrive at:
>> OpenBSD mjoelnir.fritz.box 6.9 GENERIC.MP#416 amd64
>
> And as my next step I wanted to then upgrade my installed packages.
>
> Did I miss something?

pkg_add doesn't get a directory index from ftp(1), it's limited in what
it can do at that point.

Workarounds are,

use http (packages are signed anyway)
use a different mirror
set FETCH_CMD="ftp -S noverifytime" in the environment which disables OCSP

I've included certs below if someone wants to reproduce to debug it.

$ openssl ocsp -sha1 -issuer fau-ca.crt -cert fau-cert.crt -url 
http://ocsp.pca.dfn.de/OCSP-Server/OCSP -text -CAfile fau-ca.crt -no_nonce
[...]
Response Verify Failure
3535329314880:error:27FFF065:OCSP routines:CRYPTO_internal:certificate verify 
error:/usr/src/lib/libcrypto/ocsp/ocsp_vfy.c:141:Verify error:error number 1
fau-cert.crt: good
This Update: Mar 19 12:22:25 2021 GMT
Next Update: Mar 26 12:22:25 2021 GMT

$ eopenssl ocsp -sha1 -issuer fau-ca.crt -cert fau-cert.crt -header host 
ocsp.pca.dfn.de -url http://ocsp.pca.dfn.de/OCSP-Server/OCSP -text -CAfile 
fau-ca.crt -no_nonce
Response verify OK
fau-cert.crt: good
This Update: Mar 19 12:22:25 2021 GMT
Next Update: Mar 26 12:22:25 2021 GMT

$ eopenssl11 ocsp -sha1 -issuer fau-ca.crt -cert fau-cert.crt -header 
host=ocsp.pca.dfn.de -url http://ocsp.pca.dfn.de/OCSP-Server/OCSP -text -CAfile 
fau-ca.crt -no_nonce
Response verify OK
fau-cert.crt: good
This Update: Mar 19 12:22:25 2021 GMT
Next Update: Mar 26 12:22:25 2021 GMT


cat > fau-cert.crt << EOF
-BEGIN CERTIFICATE-
MIIKjTCCCXWgAwIBAgIMIKr6htHOf3G7wcorMA0GCSqGSIb3DQEBCwUAMIGNMQsw
CQYDVQQGEwJERTFFMEMGA1UECgw8VmVyZWluIHp1ciBGb2VyZGVydW5nIGVpbmVz
IERldXRzY2hlbiBGb3JzY2h1bmdzbmV0emVzIGUuIFYuMRAwDgYDVQQLDAdERk4t
UEtJMSUwIwYDVQQDDBxERk4tVmVyZWluIEdsb2JhbCBJc3N1aW5nIENBMB4XDTE5
MDMxNTEwMjI1MVoXDTIxMDYxNjEwMjI1MVowgZMxCzAJBgNVBAYTAkRFMQ8wDQYD
VQQIDAZCYXllcm4xETAPBgNVBAcMCEVybGFuZ2VuMTwwOgYDVQQKDDNGcmllZHJp
Y2gtQWxleGFuZGVyLVVuaXZlcnNpdGFldCBFcmxhbmdlbi1OdWVybmJlcmcxDTAL
BgNVBAsMBFJSWkUxEzARBgNVBAMMCmZ0cC5mYXUuZGUwggIiMA0GCSqGSIb3DQEB
AQUAA4ICDwAwggIKAoICAQDw/LdY8/DG14NOIDqtJOsi14DwF6O7DHw11fqYuJZ6
3OBGOdHBRkTtUe2thjUny0LanvFLmuHqPzpYpDRuayTd156Rdr6dD5BpokVK6O/P
TzQSREYHX0VdGsqN5kLYSsXzVuYxjlWKLJxxWXDmKHQdYJpIePzIyrTM2Y9nQQKv
tq4y7EKaj7vFkRtRrX0opnJat33kip/KaWiAFhbJCIIy7Tjuh2sPJXYy9jigQ9OP
YLrzPNADkoUkOUaYp0LyUOcvIi4lY2/IdQZZfW59Lu9o8PcNSF262OFvTi55IoWP
sbuY6/h88XvycB8eqZTvToXIf9siAa/Hbf7xmTLnllOcegE9v5K6B9FSiuBEgcNe
bXFq0OTYHSjrqOzeohUa8b5n2M7kQyXi1bGjH/JwcnpAbjwkMK7rq3dWs7rnCBlN
fvoW/aSqjKgg5SCphl6YuxD49LqC5NIKqdqH/TbCbiVsXd/guM0HrEkGiAeNmqr+
HKvkRsr3fL7vwKEkitpC4jIG6XoDpqQskeS5bhsl49Sl9VsMfGTbr73Iv+A57Z5e
zQPjG0hBReC5bNP9DOoKYkGNzWMG7Z98sj6XmYO39Jpwo+GmXOX7dr2zQJ8lcTR6
J4uvNFZYDku2UC5Acm2+sbeibOApJCeZgwRUo9bGZx0DYZeHPKfoDwwiI6pqj20W
NQIDAQABo4IF4zCCBd8wWQYDVR0gBFIwUDAIBgZngQwBAgIwDQYLKwYBBAGBrSGC
LB4wDwYNKwYBBAGBrSGCLAEBBDARBg8rBgEEAYGtIYIsAQEEAwkwEQYPKwYBBAGB
rSGCLAIBBAMJMAkGA1UdEwQCMAAwDgYDVR0PAQH/BAQDAgWgMBMGA1UdJQQMMAoG
CCsGAQUFBwMBMB0GA1UdDgQWBBRIst54HQp2KRkBTizEsSfkuCsuZDAfBgNVHSME
GDAWgBRrOpiL+fJTidrgrbIyHgkf6Ko7dDBEBgNVHREEPTA7ggpmdHAuZmF1LmRl
ghhmdHAucnJ6ZS51bmktZXJsYW5nZW4uZGWCE2Z0cC51bmktZXJsYW5nZW4uZGUw
gY0GA1UdHwSBhTCBgjA/oD2gO4Y5aHR0cDovL2NkcDEucGNhLmRmbi5kZS9kZm4t
Y2EtZ2xvYmFsLWcyL3B1Yi9jcmwvY2FjcmwuY3JsMD+gPaA7hjlodHRwOi8vY2Rw
Mi5wY2EuZGZuLmRlL2Rmbi1jYS1nbG9iYWwtZzIvcHViL2NybC9jYWNybC5jcmww
gdsGCCsGAQUFBwEBBIHOMIHLMDMGCCsGAQUFBzABhidodHRwOi8vb2NzcC5wY2Eu
ZGZuLmRlL09DU1AtU2VydmVyL09DU1AwSQYIKwYBBQUHMAKGPWh0dHA6Ly9jZHAx
LnBjYS5kZm4uZGUvZGZuLWNhLWdsb2JhbC1nMi9wdWIvY2FjZXJ0L2NhY2VydC5j
cnQwSQYIKwYBBQUHMAKGPWh0dHA6Ly9jZHAyLnBjYS5kZm4uZGUvZGZuLWNhLWds
b2JhbC1nMi9wdWIvY2FjZXJ0L2NhY2VydC5jcnQwggNcBgorBgEEAdZ5AgQCBIID
TASCA0gDRgB1AG9Tdqwx8DEZ2JkApFEV/3cVHBHZAsEAKQaNsgiaN9kTAAABaYDg
Q5QDAEYwRAIgOHt1Qj3kWYPCYkOE+Yktck4NtASSAmwmyGJiAgUU0IECIE/f
4U8U/djAkLHekTpgIb/+2X/pvv2sZ7a8zr2PJd2zAHYAqucLfzy41WbIbC8Wl5yf
RF9pqw60U1WJsvd6AwEE880AAAFpgOBD1AAABAMARzBFAiANnF5N+jUtfc3NXPwO
4f1hTuQR3k1uPXQClzVqDfPkvwIhAM1NePQ2Ba71eYhQcnm059HMCGHRP8wElbsV
aAyCCOg2AHUAVYHUwhaQNgFK6gubVzxT8MDkOHhwJQgXL6OqHQcT0wwAAAFpgOBE
lQAABAMARjBEAiB/jZNuQ4ctEzWi0evXQR4e0gwWbV/g+Sinqe9xvC16HgIgUgfx
PU7FeIV8s4fnjkHEz2vFFwaoTGhSl9U0LbXhagcAdgC72d+8H4pxtZOUI5eqkntH
OFeVCqtS6BqQlmQ2jh7RhQAAA

wg(4) crash

2021-03-19 Thread Stuart Henderson
Not a great report but I don't have much more to go on, machine had
ddb.panic=0 and ddb hanged while printing the stack trace. Retyped by
hand, may contain typos. Happened a few hours after setting up wg on it.

uvm_fault(0x82204e38, 0x20, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 81752116 cs 8 rflags 10246 cr2 20 cpl 0 rsp 
00023b35eb0
gsbase 0x820eaff0 kgsbase 0x0
panic: trap type 6, code=0, pc=81752116
Starting stack trace...
panic(81ddc97a) at panic+0x11d
kerntrap(800023b35e00) at kerntrap+0x114
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
wg_index_drop(812ae000,0) at wg_index_drop+0x96
noise_create_initiation(

OpenBSD 6.9-beta (GENERIC.MP) #383: Sun Mar  7 20:38:08 MST 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34295709696 (32706MB)
avail mem = 33240948736 (31701MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xcf42c000 (99 entries)
bios0: vendor Dell Inc. version "2.9.0" date 12/06/2019
bios0: Dell Inc. PowerEdge R620
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S0 S4 S5
acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT EINJ 
TCPA PC__ SRAT SSDT
acpi0: wakeup devices PCI0(S5) PCI1(S5)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 2900.44 MHz, 06-3e-04
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
cpu1 at mainbus0: apid 32 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 1200.01 MHz, 06-3e-04
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: disabling user TSC (skew=135)
cpu1: smt 0, core 0, package 1
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 2900.01 MHz, 06-3e-04
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 34 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 1200.00 MHz, 06-3e-04
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 1, package 1
cpu4 at mainbus0: apid 4 (application processor)
cpu4: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 2900.00 MHz, 06-3e-04
cpu4: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu4: 256KB 64b/line 8-way L2 cache
cpu4: smt 0, core 2, package 0
cpu5 at mainbus0: apid 36 (application processor)
cpu5: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz, 1200.00 MHz, 06-3e-04
cpu5: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,

Re: wg(4) crash

2021-03-20 Thread Stuart Henderson
oh, let's cc Matt on this too.

On 2021/03/20 11:17, Martin Pieuchot wrote:
> On 19/03/21(Fri) 20:15, Stuart Henderson wrote:
> > Not a great report but I don't have much more to go on, machine had
> > ddb.panic=0 and ddb hanged while printing the stack trace. Retyped by
> > hand, may contain typos. Happened a few hours after setting up wg on it.
> > 
> > uvm_fault(0x82204e38, 0x20, 0, 1) -> e
> > fatal page fault in supervisor mode
> > trap type 6 code 0 rip 81752116 cs 8 rflags 10246 cr2 20 cpl 0 rsp 
> > 00023b35eb0
> > gsbase 0x820eaff0 kgsbase 0x0
> > panic: trap type 6, code=0, pc=81752116
> > Starting stack trace...
> > panic(81ddc97a) at panic+0x11d
> > kerntrap(800023b35e00) at kerntrap+0x114
> > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > wg_index_drop(812ae000,0) at wg_index_drop+0x96
> > noise_create_initiation(
> 
> This is a NULL dereference at line 1981 of net/if_wg.c:
> 
> wg_index_drop(void *_sc, uint32_t key0)
> {
>   ...
>   /* We expect a peer */
> peer = CONTAINER_OF(iter->i_value, struct wg_peer, p_remote);
> ...
> }
> 
> Does that mean that `iter' is NULL and i_value' is at ofset 0x20 in that
> struct?
> 

Oh, I am an idiot, I had debug set and there is something other than just
standard messages around that time. Both sides are OpenBSD wg(4). I did not
have debug on the other side.

[...]
18:51:08.041Z  wg2: Sending handshake initiation to peer 3
18:51:08.091Z  wg2: Receiving handshake initiation from peer 3
18:51:08.091Z  wg2: Sending handshake response to peer 3
18:51:08.091Z  wg2: Unknown handshake response
18:51:13.141Z  wg2: Receiving handshake initiation from peer 3
18:51:13.141Z  wg2: Sending handshake response to peer 3
18:51:13.191Z  wg2: Handshake for peer 3 did not complete after 5 seconds, 
retrying (try 2)
18:51:13.191Z  wg2: Receiving keepalive packet from peer 3
18:51:13.191Z  wg2: Sending keepalive packe
18:51:13.191Z  t to peer 3
18:52:28.242Z  wg2: Sending keepalive packet to peer 3
18:52:28.342Z  wg2: Receiving keepalive packet from peer 3
18:53:43.343Z  wg2: Sending keepalive packet to peer 3
18:54:58.345Z  wg2: Sending handshake initiation to peer 3
18:54:58.395Z  wg2: Receiving handshake initiation from peer 3
18:54:58.395Z  wg2: Sending handshake response to peer 3
18:54:58.395Z  wg2: Unknown handshake response

wg2: Handshake for peer 3 did not complete after 5 seconds, retrying (try 2)
wg2: Sending handshake initiation to peer 3
wg2: Sending handshake response to peer 3




Re: 6.8/arm64 on RaspberryPi 4B: kernel panics during booting under devicetree mode due to xhci

2021-03-22 Thread Stuart Henderson
On 2021/03/22 04:09, c...@imap.cc wrote:
> > smbios0 at efi0: SMBIOS 3.3.0
> > smbios0: vendor https://github.com/pftf/RPi4 version "UEFI Firmware v1.24" 
> > date 02/26/2021
> > smbios0: Raspberry Pi Foundation Raspberry Pi 4 Model B

Try different firmware versions. Mine is running 1.13 (and is stable so
I haven't seen a need to try others)..



Re: wg(4) crash

2021-04-08 Thread Stuart Henderson

I committed this a couple of weeks ago.

--
 Sent from a phone, apologies for poor formatting.
On 8 April 2021 06:10:25 Klemens Nanni  wrote:


On Mon, Mar 22, 2021 at 12:42:27AM +1100, Matt Dunwoodie wrote:

On Sat, 20 Mar 2021 11:48:52 +
Stuart Henderson  wrote:


oh, let's cc Matt on this too.

On 2021/03/20 11:17, Martin Pieuchot wrote:

On 19/03/21(Fri) 20:15, Stuart Henderson wrote:

Not a great report but I don't have much more to go on, machine
had ddb.panic=0 and ddb hanged while printing the stack trace.
Retyped by hand, may contain typos. Happened a few hours after
setting up wg on it.

uvm_fault(0x82204e38, 0x20, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 81752116 cs 8 rflags 10246 cr2 20
cpl 0 rsp 00023b35eb0 gsbase 0x820eaff0 kgsbase 0x0
panic: trap type 6, code=0, pc=81752116
Starting stack trace...
panic(81ddc97a) at panic+0x11d
kerntrap(800023b35e00) at kerntrap+0x114
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
wg_index_drop(812ae000,0) at wg_index_drop+0x96
noise_create_initiation(


This is a NULL dereference at line 1981 of net/if_wg.c:

wg_index_drop(void *_sc, uint32_t key0)
{
...
/* We expect a peer */
   peer = CONTAINER_OF(iter->i_value, struct wg_peer,
p_remote); ...
}

Does that mean that `iter' is NULL and i_value' is at ofset 0x20 in
that struct?


Correct. The issue is we're trying to remove an index that doesn't
exist. wg_index_drop iterates through the list and expects to find a
matching index (perhaps a KASSERT could have been helpful here).
Nevertheless, since index 0 doesn't exist `iter` ends up being NULL.


Oh, I am an idiot, I had debug set and there is something other than
just standard messages around that time. Both sides are OpenBSD
wg(4). I did not have debug on the other side.

[...]
18:51:08.041Z  wg2: Sending handshake initiation to peer 3
18:51:08.091Z  wg2: Receiving handshake initiation from peer 3
18:51:08.091Z  wg2: Sending handshake response to peer 3
18:51:08.091Z  wg2: Unknown handshake response
18:51:13.141Z  wg2: Receiving handshake initiation from peer 3
18:51:13.141Z  wg2: Sending handshake response to peer 3
18:51:13.191Z  wg2: Handshake for peer 3 did not complete after 5
seconds, retrying (try 2) 18:51:13.191Z  wg2: Receiving keepalive
packet from peer 3 18:51:13.191Z  wg2: Sending keepalive packe
18:51:13.191Z  t to peer 3
18:52:28.242Z  wg2: Sending keepalive packet to peer 3
18:52:28.342Z  wg2: Receiving keepalive packet from peer 3
18:53:43.343Z  wg2: Sending keepalive packet to peer 3
18:54:58.345Z  wg2: Sending handshake initiation to peer 3
18:54:58.395Z  wg2: Receiving handshake initiation from peer 3
18:54:58.395Z  wg2: Sending handshake response to peer 3
18:54:58.395Z  wg2: Unknown handshake response

wg2: Handshake for peer 3 did not complete after 5 seconds, retrying
(try 2) wg2: Sending handshake initiation to peer 3
wg2: Sending handshake response to peer 3



With this information, it was possible to reproduce the issue on my
end. There is a race between sending/receiving handshake packets. This
occurs if we consume an initiation, then send an initiation prior to
replying to the consumed initiation.

In particular, when consuming an initiation, we don't generate the
index until creating the response (which is incorrect). If we attempt
to create an initiation between these processes, we drop any
outstanding handshake which in this case has index 0 as set when
consuming the initiation.

The fix attached is to generate the index when consuming the initiation
so that any spurious initiation creation can drop a valid index. The
patch also consolidates setting fields on the handshake.

With this patch applied, I was unable to reproduce the crash.

This looks good and works, OK kn

sthen, do you want to commit this fix?  I think it should make it into
6.9 release.


diff --git net/wg_noise.c net/wg_noise.c
index 86f7823cc83..176c36609fc 100644
--- net/wg_noise.c
+++ net/wg_noise.c
@@ -299,9 +299,6 @@ noise_consume_initiation(struct noise_local *l, struct 
noise_remote **rp,

NOISE_TIMESTAMP_LEN + NOISE_AUTHTAG_LEN, key, hs.hs_hash) != 0)
goto error;

- hs.hs_state = CONSUMED_INITIATION;
- hs.hs_local_index = 0;
- hs.hs_remote_index = s_idx;
memcpy(hs.hs_e, ue, NOISE_PUBLIC_KEY_LEN);

/* We have successfully computed the same results, now we ensure that
@@ -321,6 +318,9 @@ noise_consume_initiation(struct noise_local *l, struct 
noise_remote **rp,


/* Ok, we're happy to accept this initiation now */
noise_remote_handshake_index_drop(r);
+ hs.hs_state = CONSUMED_INITIATION;
+ hs.hs_local_index = noise_remote_handshake_index_get(r);
+ hs.hs_remote_index = s_idx;
r->r_handshake = hs;
*rp = r;
ret = 0;
@@ -369,7 +369,6 @@ noise_create_response(struct noise_remote *r, uint32_t 
*s_idx, uint32_t *r_idx,

noise_msg_encrypt(en, NULL, 0, key, hs->hs_hash);

hs->hs_state = CREATED_RESPONS

Re: Realtek 8168 not working

2021-04-11 Thread Stuart Henderson
I don't think you mentioned the date of the working one, could you include 
that please. It would be helpful to include dmesg from both working and 
non-working (you may still have them both in /var/log/messages*).


--
 Sent from a phone, apologies for poor formatting.
On 11 April 2021 14:33:04 hendri...@gmail.com wrote:


Synopsis:   re interface is not processing data
Category:   amd64
Environment:

System  : OpenBSD 6.9
Details : OpenBSD 6.9-beta (GENERIC.MP) #440: Wed Mar 31 11:13:57 
MDT 2021
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64

Description:

The interface is up, but no data is seen on tcpdump.

hendrikm@hpl:~$ ifconfig re0
re0: flags=808843 mtu 1500
   lladdr b4:b5:2f:8b:c7:96
   index 1 priority 0 llprio 3
   media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
   status: active

hendrikm@hpl:~$ doas tcpdump -i re0 -nnnvvv
tcpdump: listening on re0, link-type EN10MB
15:23:16.134778 0.0.0.0.68 > 255.255.255.255.67:  xid:0x198b6516 [|bootp] 
[tos 0x10] (ttl 128, id 0, len 328)

^C
1 packets received by filter
0 packets dropped by kernel

It is similar when configuring a static ip.
hendrikm@hpl:~$ ifconfig re0
re0: flags=808843 mtu 1500
   lladdr b4:b5:2f:8b:c7:96
   index 1 priority 0 llprio 3
   media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
   status: active
   inet 192.168.101.200 netmask 0xff00 broadcast 192.168.101.255

hendrikm@hpl:~$ doas tcpdump -i re0 -nnnvvv
tcpdump: listening on re0, link-type EN10MB
15:25:25.658162 0.0.0.0.68 > 255.255.255.255.67:  xid:0xc87c5454 [|bootp] 
[tos 0x10] (ttl 128, id 0, len 328)

^C
1 packets received by filter
0 packets dropped by kernel

systat vm doesn't show any interupts being processed for re0

  2 users Load 0.46 0.63 0.55 hpl 15:31:25

   memory totals (in KB)PAGING   SWAPPING Interrupts
  real   virtual free   in  out   in  out  309 total
Active   139148139148  5540480   ops100 clock
All 2452044   2452044 14051900   pages2 ipi
 1 acpi0
Proc:r  d  s  wCsw   Trp   Sys   Int   Sof  Flt   forks 
inteldrm

106   462 189   207362   fkppw  11 xhci0
 fksvm ehci0
  0.6%Int   0.1%Spn   0.9%Sys   0.1%Usr  98.2%Idle   pwait azalia0
|||||||||||   relck 190 ral0
= rlkok re0
 noram ehci1
Namei Sys-cacheProc-cacheNo-cache ndcpy   5 ahci0
   Calls hits%hits %miss   % fltcp pckbc0
   22  100   zfod  pckbc0
 cow
Disks   sd0   cd0   66604 fmin
seeks   88805 ftarg
xfers18   itarg
speed  589K  7735 wired
 sec   0.0   pdfre
 pdscn
 pzidl 420 IPKTS
  14 kmape 198 OPKTS


How-To-Repeat:

This seems to have broken in a Feb snapshot

Fix:
	I haven't been able to identify how to fix it. I thought it might have 
been because of a change in sys/dev/pci/if_re_pci.c, but reverting it 
didn't solve the issue by building a custom kernel. I rollback to this 
snapshot.



dmesg:
OpenBSD 6.9-beta (GENERIC.MP) #440: Wed Mar 31 11:13:57 MDT 2021
   dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8447262720 (8055MB)
avail mem = 8175857664 (7797MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xb9b5 (32 entries)
bios0: vendor Hewlett-Packard version "68IRR Ver. F.43" date 10/07/2013
bios0: Hewlett-Packard HP ProBook 4540s
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP HPET APIC MCFG ASF! SSDT SSDT SLIC MSDM FPDT BGRT 
SSDT SSDT
acpi0: wakeup devices LANC(S5) EHC1(S3) EHC2(S3) XHC_(S3) PCIB(S5) ECF0(S4) 
RP03(S4) RP04(S5) WNIC(S5) RP06(S5) NIC_(S5)

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-3230M

Re: Realtek 8168 not working

2021-04-12 Thread Stuart Henderson
"zgrep OpenBSD /var/log/messages*" and see if you can find messages from an 
earlier boot. Without more information this report is not useful.


--
 Sent from a phone, apologies for poor formatting.
On 12 April 2021 06:25:17 Hendrik Meyburgh  wrote:


Hi.

Unfortunately I didn't keep track of it so I can't give the exact dates, 
apologies for that.


On Sun, Apr 11, 2021 at 06:28:22PM +0100, Stuart Henderson wrote:

I don't think you mentioned the date of the working one, could you include
that please. It would be helpful to include dmesg from both working and
non-working (you may still have them both in /var/log/messages*).

--
 Sent from a phone, apologies for poor formatting.
On 11 April 2021 14:33:04 hendri...@gmail.com wrote:

> > Synopsis: re interface is not processing data
> > Category: amd64
> > Environment:
>System  : OpenBSD 6.9
> 	Details : OpenBSD 6.9-beta (GENERIC.MP) #440: Wed Mar 31 11:13:57 
MDT 2021

> 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>Architecture: OpenBSD.amd64
>Machine : amd64
> > Description:
>The interface is up, but no data is seen on tcpdump.
>
> hendrikm@hpl:~$ ifconfig re0
> re0: flags=808843 mtu 1500
>lladdr b4:b5:2f:8b:c7:96
>index 1 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
>
> hendrikm@hpl:~$ doas tcpdump -i re0 -nnnvvv
> tcpdump: listening on re0, link-type EN10MB
> 15:23:16.134778 0.0.0.0.68 > 255.255.255.255.67:  xid:0x198b6516
> [|bootp] [tos 0x10] (ttl 128, id 0, len 328)
> ^C
> 1 packets received by filter
> 0 packets dropped by kernel
>
> It is similar when configuring a static ip.
> hendrikm@hpl:~$ ifconfig re0
> re0: flags=808843 mtu 1500
>lladdr b4:b5:2f:8b:c7:96
>index 1 priority 0 llprio 3
>media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
>status: active
>inet 192.168.101.200 netmask 0xff00 broadcast 192.168.101.255
>
> hendrikm@hpl:~$ doas tcpdump -i re0 -nnnvvv
> tcpdump: listening on re0, link-type EN10MB
> 15:25:25.658162 0.0.0.0.68 > 255.255.255.255.67:  xid:0xc87c5454
> [|bootp] [tos 0x10] (ttl 128, id 0, len 328)
> ^C
> 1 packets received by filter
> 0 packets dropped by kernel
>
> systat vm doesn't show any interupts being processed for re0
>
>   2 users Load 0.46 0.63 0.55 hpl 
15:31:25

>
>memory totals (in KB)PAGING   SWAPPING Interrupts
>   real   virtual free   in  out   in  out  309 total
> Active   139148139148  5540480   ops100 clock
> All 2452044   2452044 14051900   pages2 ipi
>  1 acpi0
> Proc:r  d  s  wCsw   Trp   Sys   Int   Sof  Flt   forks
> inteldrm
> 106   462 189   207362   fkppw  11 xhci0
>  fksvm ehci0
>   0.6%Int   0.1%Spn   0.9%Sys   0.1%Usr  98.2%Idle   pwait 
azalia0

> |||||||||||   relck 190 ral0
> = rlkok re0
>  noram ehci1
> Namei Sys-cacheProc-cacheNo-cache ndcpy   5 ahci0
>Calls hits%hits %miss   % fltcp pckbc0
>22  100   zfod  pckbc0
>  cow
> Disks   sd0   cd0   66604 fmin
> seeks   88805 ftarg
> xfers18   itarg
> speed  589K  7735 wired
>  sec   0.0   pdfre
>  pdscn
>  pzidl 420 IPKTS
>   14 kmape 198 OPKTS
>
> > How-To-Repeat:
>This seems to have broken in a Feb snapshot
> > Fix:
>I haven't been able to identify how to fix it. I thought it might have
> been because of a change in sys/dev/pci/if_re_pci.c, but reverting it
> didn't solve the issue by building a custom kernel. I rollback to this
> snapshot.
>
>
> dmesg:
> OpenBSD 6.9-beta (GENERIC.MP) #440: Wed Mar 31 11:13:57 MDT 2021
>dera..

Re: Realtek 8168 not working

2021-04-12 Thread Stuart Henderson
On 2021/04/12 11:39, Hendrik Meyburgh wrote:
> Thanks, I have the snapshot numbers but I don't know after which one it broke.

>From your comment "This seems to have broken in a Feb snapshot" it
seems like that probably doesn't go far enough back.

> Anything else I can do to make the report more useful?

You could try some old kernels from https://ftp.hostserver.de/archive/
and see if you can narrow it down. My usual method is to fetch a handful

# ftp -o /bsd.20210201 
https://ftp.hostserver.de/archive/2021-02-01-0105/snapshots/amd64/bsd.mp
# ftp -o /bsd.20210215 
https://ftp.hostserver.de/archive/2021-02-15-0105/snapshots/amd64/bsd.mp
# ftp -o /bsd.20210301 
https://ftp.hostserver.de/archive/2021-03-01-0105/snapshots/amd64/bsd.mp
# ftp -o /bsd.20210315 
https://ftp.hostserver.de/archive/2021-03-15-0105/snapshots/amd64/bsd.mp

Test one with e.g. "b bsd.20210201" at the boot> prompt, if it works
then move forwards, when I've found a pair with one working and one failing
then pick some dates between the two to download a few more and test those.
Getting it to within say 5 days or so (and send dmesg from working and
non-working) would make it easier to figure out where the problem was
introduced.


> Here's the logs in anyway:
> 
> hendrikm@hpl:~$ zgrep OpenBSD /var/log/messages*  
> 
> /var/log/messages:Mar 25 01:10:33 hpl sysupgrade: installed new /bsd.upgrade. 
> Old kernel version: OpenBSD 6.9-beta (GENERIC.MP) #412: Wed Mar 17 12:47:12 
> MDT 2021 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> /var/log/messages:Mar 25 01:15:18 hpl /bsd: OpenBSD 6.9-beta (RAMDISK_CD) 
> #413: Wed Mar 24 11:19:29 MDT 2021
> /var/log/messages:Mar 25 01:15:18 hpl /bsd: OpenBSD 6.9-beta (GENERIC.MP) 
> #428: Wed Mar 24 11:12:16 MDT 2021
> /var/log/messages:Apr  1 01:23:41 hpl sysupgrade: installed new /bsd.upgrade. 
> Old kernel version: OpenBSD 6.9-beta (GENERIC.MP) #428: Wed Mar 24 11:12:16 
> MDT 2021 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> /var/log/messages:Apr  1 01:28:35 hpl /bsd: OpenBSD 6.9-beta (RAMDISK_CD) 
> #424: Wed Mar 31 11:21:15 MDT 2021
> /var/log/messages:Apr  1 01:28:35 hpl /bsd: OpenBSD 6.9-beta (GENERIC.MP) 
> #440: Wed Mar 31 11:13:57 MDT 2021
> /var/log/messages:Apr 10 09:48:35 hpl /bsd: OpenBSD 6.9 (CUSTOM) #0: Sat Apr 
> 10 09:42:03 SAST 2021
> /var/log/messages:Apr 10 09:55:23 hpl /bsd: OpenBSD 6.9 (CUSTOM) #0: Sat Apr 
> 10 09:42:03 SAST 2021
> /var/log/messages:Apr 10 09:59:48 hpl /bsd: OpenBSD 6.9 (CUSTOM) #0: Sat Apr 
> 10 09:42:03 SAST 2021
> /var/log/messages:Apr 11 15:05:59 hpl /bsd: OpenBSD 6.9-beta (GENERIC.MP) 
> #440: Wed Mar 31 11:13:57 MDT 2021
> /var/log/messages:Apr 11 15:39:23 hpl sysupgrade: installed new /bsd.upgrade. 
> Old kernel version: OpenBSD 6.9-beta (GENERIC.MP) #440: Wed Mar 31 11:13:57 
> MDT 2021 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> /var/log/messages:Apr 11 15:43:24 hpl /bsd: OpenBSD 6.9 (RAMDISK_CD) #443: 
> Sat Apr 10 17:55:44 MDT 2021
> /var/log/messages:Apr 11 15:43:24 hpl /bsd: OpenBSD 6.9 (GENERIC.MP) #460: 
> Sat Apr 10 17:48:37 MDT 2021
> /var/log/messages.0.gz:Mar 11 11:56:07 hpl sysupgrade: installed new 
> /bsd.upgrade. Old kernel version: OpenBSD 6.9-beta (GENERIC.MP) #373: Wed Mar 
>  3 10:47:27 MST 2021 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> /var/log/messages.0.gz:Mar 11 12:00:37 hpl /bsd: OpenBSD 6.9-beta 
> (RAMDISK_CD) #379: Wed Mar 10 20:06:19 MST 2021
> /var/log/messages.0.gz:Mar 11 12:00:37 hpl /bsd: OpenBSD 6.9-beta 
> (GENERIC.MP) #394: Wed Mar 10 19:59:10 MST 2021
> /var/log/messages.0.gz:Mar 18 01:10:06 hpl sysupgrade: installed new 
> /bsd.upgrade. Old kernel version: OpenBSD 6.9-beta (GENERIC.MP) #394: Wed Mar 
> 10 19:59:10 MST 2021 
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> /var/log/messages.0.gz:Mar 18 01:15:12 hpl /bsd: OpenBSD 6.9-beta 
> (RAMDISK_CD) #397: Wed Mar 17 12:54:20 MDT 2021
> /var/log/messages.0.gz:Mar 18 01:15:12 hpl /bsd: OpenBSD 6.9-beta 
> (GENERIC.MP) #412: Wed Mar 17 12:47:12 MDT 2021
> /var/log/messages.2.gz:Mar  9 22:01:41 hpl /bsd: OpenBSD 6.9-beta 
> (GENERIC.MP) #373: Wed Mar  3 10:47:27 MST 2021
> 
> Thank you.
> 
> On Mon, Apr 12, 2021 at 09:26:28AM +0100, Stuart Henderson wrote:
> > "zgrep OpenBSD /var/log/messages*" and see if you can find messages from an
> > earlier boot. Without more information this report is not useful.
> > 
> > -- 
> >  Sent from a phone, apologies for poor formatting.
> > On 12 April 2021 06:25:17 Hendrik Meyburgh  wrote:
> > 
> > > Hi.
> > > 
> > > Unfortunately I 

Re: dhcpleased(8) doesn't handle underlying changes in trunk(4)

2021-04-16 Thread Stuart Henderson
On 2021/04/16 17:19, Florian Obser wrote:
> On Fri, Apr 16, 2021 at 10:42:00AM -0400, Kenneth R Westerback wrote:
> > On Fri, Apr 16, 2021 at 03:17:45PM +0200, Florian Obser wrote:
> > > On Thu, Apr 15, 2021 at 12:54:44AM +, Lucas wrote:
> > > > >Synopsis:  dhcpleased(8) doesn't handle underlying changes in 
> > > > >trunk(4)
> > > > >Category:  system
> > > > >Environment:
> > > > System  : OpenBSD 6.9
> > > > Details : OpenBSD 6.9 (GENERIC.MP) #459: Fri Apr  9 
> > > > 11:31:33 MDT 2021
> > > >  
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > > 
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > > 
> > > > >Description:
> > > > On a trunk(4) interface for wired-WiFi failover, dhcpleased(8)
> > > > isn't able to ask for a new lease on active physical interface change.
> > > > This is a problem for me, because the network for wired and wireless
> > > > devicen on my LAN are different (172.16.0.0/24 for wired, 172.17.0.0/24
> > > > for wireless).
> > > > 
> > > > >How-To-Repeat:
> > > > Start with a trunk(4) with failover configured as showed below,
> > > > connected over WiFi and without an ethernet cord plugged in. Start
> > > > dhcpleased(8) and get a lease. Now plug ethernet cord and check
> > > > interface status; despite the change on the active physical interface
> > > > dhcpleased(8) doesn't solicite a new lease. dhclient(8) does work in
> > > > this setup.
> > > 
> > > Rough consensus is that this setup is a poor choice when wired and
> > > wireless are on different L2 networks.
> > > 
> > > In that case it's better to not have a trunk(4) but request leases on
> > > wired and wireless.  Interface priorities will pick the correct
> > > network.
> > > 
> > > Trunking is more appropriate when wired and wireless are on the same
> > > L2 network since then the IP address does not change when switching
> > > between wired and wireless and long running sessions (ssh) stay alive.
> > > 
> > > Arguably the point of trunk(4) is to hide changes in network topology
> > > from upper layers. When the active physical interface changes nothing
> > > changes on the trunk(4) port. The MAC address stays the same, the link
> > > state doesn't change and the interface stays running. Therefore
> > > dhcpleased(8) concludes that there is nothing to do.
> > > 
> > > I think this working with dhclient(8) is due to a logic error and not
> > > intentional. It also tries to check if something changed on the
> > > interface but gets this wrong.
> > > 
> > > -- 
> > > I'm not entirely sure you are real.
> > > 
> > 
> > dhclient(8) deliberately behaves the way it does by specific demand from
> > developers using trunk(4). A fairly recent (2019) demand.
> 
> Ha! I had forgotten all about it.
> 
> > 
> > Never having been a user of trunk(4) I have no opinion on the correctness or
> > desirabilty of the behaviour. Which of course may have changed over time.
> > 
> > As I recall (and as mentioned in the commit message) there were also routing
> > socket changes made by claudio@ at the same time to support this. 
> > Unfortunately
> > a quick pre-caffeine scan of Changelogs did not make those changes pop out 
> > for
> > me.
> > 
> > I refer to r1.634 of dhclient.c, May 10, 2019.
> 
> Yes, that was for the case of being on the same L2 network.
> So that diff checks if the mac address changed and does
> quit = RESTART;
> I don't quite get what it does about link state changing.
> 
> I think this did that but that's unconditional.
> + if (quit == 0)
> + quit = RESTART;
> If nothing changed try to get a new lease.
> 
> I think I misunderstood the diff in 2019.
> 
> in r1.634 this changed to
> - if (quit == 0)
> + if (oldmtu == ifi->mtu)
>   quit = RESTART;
> 
> So if the mtu did not change request a new lease.
> 
> I think this comes down to:
> RTM_IFINFO is received
>   if the mac address changed or the mtu did not change
>   request a new lease.
> 
> 
> I think we didn't get the "relevant RTM_IFINFO" quite right:
>   Restart the protocol and get a new/renewed lease for any relevant
>   RTM_IFINFO seen. As dhclient no longer commits suicide to restart the
>   protocol this should be very low cost.
> 
> What happens is, whenever something twiddles the interface, dhclient
> requests a new lease, unless it twiddled the interface itself.
> 
> > 
> > I can try scanning my email archives to find a record of any detailed 
> > reasoning,
> > but this took place at g2k19 so the discussion may have just been verbal.
> 
> Probably. I remember wandering around suspending and resuming my
> laptop and wondering why I wouldn't get a new lease.
> 
> > 
> >  Ken
> 
> -- 
> I'm not entirely sure you are real.
> 

this one?

-
PatchSet 5687 
Date: 2019/05/11 19:10:45
Author: florian
B

Re: unbound in memory cache does not work when num threads option is set

2021-04-27 Thread Stuart Henderson
On 2021/04/27 11:15, Amado Tucker wrote:
> Hello world,
> when I utilize num-threads in unbound and I set the num-threads options to
> any number other than 1 or if num-threads is commented out, unbound in
> memory dns cache stops working.

This surprised me so I tested on my servers, it works as expected here
on -current (unbound 1.13.1) or 6.8 (1.11.0), both amd64, with either no
num-threads, or set to 1/2/4.

Perhaps something else in your config is triggering it?



Re: i386 pagedaemon panic pg->wire_count == 0

2021-04-29 Thread Stuart Henderson
On 2021/04/27 10:35, Alexander Bluhm wrote:
> On Mon, Apr 26, 2021 at 07:43:29PM +0200, Alexander Bluhm wrote:
> > One of my i386 machines paniced during make -j 9 build.
> 
> This is perfectly reproducable.  Machine crashes while building
> clang.  This time with snapshot kernel.

Same panic here building ports (base build was done with an older
kernel) on 2/4 builders.

I'm now going to try with the "Convert allocations to km_alloc(9)"
commit reverted (i386/pmap.c:1.211->1.212, i386/pmapae.c:1.60->1.61)


> panic: kernel diagnostic assertion "pg->wire_count == 0" failed: file 
> "/usr/src/sys/uvm/uvm_page.c", line 1265
> Stopped at  db_enter+0x4:   popl%ebp
> TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
>  100962  21754 21 0x3  03  c++
>  165585775 21 0x3  04  c++
>  482715   4014 21 0x3  02  c++
>  104450  78451 21 0x3  06  c++
>  492054   2530 21 0x3  07  c++
>  441463  23628 21 0x3  05  c++
>  266171  31308 21 0x3  00  c++
> *453385  97818  0 0x14000  0x2001K pagedaemon
> db_enter() at db_enter+0x4
> panic(d0bd507b) at panic+0xd3
> __assert(d0c39ae9,d0bad7c6,4f1,d0c4a94c) at __assert+0x19
> uvm_pagedeactivate(d4fd923c) at uvm_pagedeactivate+0x122
> uvmpd_scan() at uvmpd_scan+0x294
> uvm_pageout(d6fc34c8) at uvm_pageout+0x365

Exact same trace.

panic: kernel diagnostic assertion "pg->wire_count == 0" failed: file 
"/usr/src/sys/uvm/uvm_page.c", line 1265
Stopped at  db_enter+0x4:   popl%ebp
TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
 234486  25502  00x11  02  perl
 214247  55192 55 0x2  0x4000  rustc.bin
 471121  59408  00x13  03  perl
* 31012  31392  0 0x14000  0x2001K pagedaemon

> version:OpenBSD 6.9-current (GENERIC.MP) #802: Mon Apr 26 02:54:36 
> MDT 2021\012
> dera...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MP\012

OpenBSD 6.9-current (GENERIC.MP) #0: Wed Apr 28 21:36:47 MDT 2021
st...@i386.ports.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC.MMP

Built from a clean CVS checkout (no diffs) done at Thu Apr 29 01:20:58 UTC 2021

> ddb{1}> show register
> ds  0x10
> es  0x10
> fs  0x20
> gs 0
> edi   0xd0bd507bacx100_txpower_maxim+0xe81a
> esi0x100
> ebp   0xf582662c
> ebx   0xf5826654
> edx0x3fd
> ecx0
> eax  0x1
> eip   0xd08a8874db_enter+0x4
> cs   0x8
> eflags 0x202
> esp   0xf582662c
> ss  0x10
> db_enter+0x4:   popl%ebp

same except for

edi   0xd0bc972dacx100_txpower_maxim+0xc82f
ebp   0xf598b6cc
ebx   0xf598b6f4
eip   0xd0817b04db_enter+0x4
esp   0xf598b6cc

> ddb{1}> show uvmexp
> Current UVM status:
>   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>   832339 VM pages: 517320 active, 5058 inactive, 1 wired, 147968 free (0 zero)
>   min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>   freemin=27744, free-target=36992, inactive-target=208035, wired-max=277446
>   faults=201515090, traps=201773820, intrs=2524713, ctxswitch=15585835 
> fpuswitch=99375
>   softint=3182921, syscalls=287857563, kmapent=15
>   fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> ok relocks(total)=269519(271199), anget(retries)=141363890(0), 
> amapcopy=24784161
> neighbor anon/obj pg=13486052/85376915, gets(lock/unlock)=27741219/271199
> cases: anon=139829185, anoncow=1534705, obj=27205510, prcopy=534029, 
> przero=32411649
>   daemon and swap counts:
> woke=2, revs=1, scans=101727, obscans=101727, anscans=0
> busy=0, freed=101727, reactivate=0, deactivate=0
> pageouts=0, pending=0, nswget=0
> nswapdev=1
> swpages=917207, swpginuse=0, swpgonly=0 paging=0
>   kernel pointers:
> objs(kern)=0xd0e719ac

Current UVM status:
  pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
  564802 VM pages: 298864 active, 47935 inactive, 1 wired, 100404 free (2361 
zero)
  min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
  freemin=18826, free-target=25101, inactive-target=141715, wired-max=188267
  faults=827797786, traps=847312744, intrs=42461394, ctxswitch=169637771 
fpuswitch=1593503
  softint=25918765, syscalls=579364703, kmapent=28
  fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
ok relocks(total)=1345165(1347679), anget(retries)=367906740(0), 
amapcopy=128308912
neighbor anon/obj pg=299836900/426127687, 
gets(lock/unlock)=116789120/1347679
cases: anon=273495004, anoncow=94411736, obj=109145091, prcopy=7641515, 
przero=343104381
  daemon and swap counts:
woke=3, revs=1, scans=78347, obscans

Re: i386 pagedaemon panic pg->wire_count == 0

2021-04-29 Thread Stuart Henderson
On 2021/04/29 16:59, Alexander Bluhm wrote:
> On Thu, Apr 29, 2021 at 04:17:05PM +0200, Martin Pieuchot wrote:
> > On 29/04/21(Thu) 12:07, Alexander Bluhm wrote:
> > > On Thu, Apr 29, 2021 at 11:08:30AM +0200, Mark Kettenis wrote:
> > > > > > panic: kernel diagnostic assertion "pg->wire_count == 0" failed: 
> > > > > > file "/usr/src/sys/uvm/uvm_page.c", line 1265
> > > >
> > > > I suspect pmapae.c rev 1.61 causes this issue.  Does reverting that
> > > > commit "fix" the issue?
> > > >
> > > > It won't really fix the issue as you may still hit the "can't locate PD 
> > > > page"
> > > > panic.
> > >
> > > I think this diff prevents the panic.  But I need one more test run
> > > to be sure.
> 
> One test without and one with this diff.  Either panic or make build
> passes.  I am convinced that this triggers the bug.  And one of my
> i386 regress machines can easily reproduce it.  Console access for
> developers possible.

I only tried backing out the whole commit so far; it's been building
ports for 5h including some huge ram-hungry ones, I think I would have
seen it on at least one of the four machines by now.



Re: pkg_add -u Base64.c: loadable library and perl binaries are mismatched

2021-05-04 Thread Stuart Henderson
On 2021/05/03 21:51, Jon Fineman wrote:
> For cpan/cpanm I had gotten the same base64 error.
> 
> After moving those two directories and rebooting I was able to run pkg_add 
> and now cpanm. However cpan now gets the below error:
> desktop(~)$: cpan
> Encode.c: loadable library and perl binaries are mismatched (got handshake 
> key 0xb60, needed 0xec0)
> desktop(~)$: 

You maybe able to get past this by running cpan like this

  perl -I /usr/libdata/perl5/amd64-openbsd -I /usr/libdata/perl5 `which cpan`

We might be able to tweak things a bit to get some improvements but
fundamentally, if you're installing modules outside of the OS
infrastructure (base OS or packages), there's only so much we can do
to work around such problems.

If you do need to do this it's probably a good idea to use a separate
directory tree (/usr/local should be considered as being under the
control of packages) and set @INC or use "use lib" to pull it in.



uhidpp-related assertwaitok [linuxjus...@gmail.com: Re: Raspberry Pi 4B randomly kernel panics after upgrading to 6.9]

2021-05-08 Thread Stuart Henderson
Forwarding to bugs@ because this does not look Raspberry Pi specific
rather connected with uhidpp (Logitech keyboard driver new in 6.9).

Justin, you can probably bypass this for now if you "boot -c" at the
bootloader prompt and "disable uhidpp". If that works you can modify
an on-disk kernel to disable it with config(8) -ef.

Trace roughly transcribed from the first photo. (It's always easier
to have text rather than photos).

panic: assertwaitok: non-zero mutex count: 1, active process usbtask
assertwaitok at malloc
malloc at taskq_create+0x3c
taskq_create at sensor_task_register+0x50
sensor_task_register at uhidpp_device_connect+0x234
uhidpp_device_connect at uhidpp_task+0x10c


- Forwarded message from Justin Yang  -

From: Justin Yang 
Date: Fri, 7 May 2021 23:58:02 +0800
To: Stuart Henderson 
Cc: Mark Kettenis , "a...@openbsd.org" 

Subject: Re: Raspberry Pi 4B randomly kernel panics after upgrading to 6.9

I found that my previous mail was still incomplete, so I reinstalled the
whole system, and captured again:
https://i.postimg.cc/jjrd07xZ/2021-05-07-23-38-49.jpg
https://i.postimg.cc/qRq8nYPt/2021-05-07-23-40-09.jpg
https://i.postimg.cc/9QM2TYx4/2021-05-07-23-41-45.jpg
https://i.postimg.cc/N0kxcRc4/2021-05-07-23-42-53.jpg
https://i.postimg.cc/FKkf4LJ6/2021-05-07-23-43-18.jpg
https://i.postimg.cc/VLTmcHBY/2021-05-07-23-45-34.jpg

On Friday, May 7, 2021, Justin Yang  wrote:

> OK, I captured the trace and ddbcpu output:
>
> https://i.postimg.cc/hGcQfznj/2021-05-07-20-37-56.jpg
> https://i.postimg.cc/nz54W4jz/2021-05-07-20-38-56.jpg
>
> On Friday, May 7, 2021, Stuart Henderson  wrote:
>
>> At least "trace" from ddb is required.
>>
>> --
>>   Sent from a phone, apologies for poor formatting.
>>
>> On 6 May 2021 15:21:25 Justin Yang  wrote:
>>
>> Hi,
>>> Sorry for the late reply. Here are the dmesg links for both 6.8 and 6.9:
>>>
>>> 6.8:
>>> https://dmesgd.nycbug.org/index.cgi?do=view&id=5924
>>>
>>> 6.9:
>>> https://dmesgd.nycbug.org/index.cgi?do=view&id=6067
>>>
>>> and some ddb output:
>>> https://i.postimg.cc/tTWPLzCB/2021-05-06-21-42-54.jpg
>>>
>>> On Sat, May 1, 2021 at 9:57 PM Mark Kettenis 
>>> wrote:
>>>
>>> From: Justin Yang 
>>>>> Date: Sat, 1 May 2021 21:15:31 +0800
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have a Raspberry Pi 4B (8G mem) with OpenBSD 6.8 installed on a USB 3
>>>>> disk drive, and with edk2 firmware v1.22 flashed on SD card. It works
>>>>>
>>>> fine
>>>>
>>>>> untill I upgrade to 6.9 Release by 'sysupgrade' today. After this
>>>>> upgrading, the system becomes unstable and crashes randomly and says:
>>>>>
>>>>> panic: assertwaitok: non-zero mutex count: 1
>>>>>
>>>>> The screenshot can be viewed here: https://ibb.co/K6N1DdM
>>>>>
>>>>> Then I flashed edk2 firmware v1.21 to SD card just now, and booted it
>>>>> again, but still, it didn't work and crashed as before. I can confirm
>>>>> there's not such issue in 6.8. Am I missing something or is this a bug?
>>>>>
>>>>
>>>> Please post the full dmesg of both the 6.8 and the 6.9 kernel.  And if
>>>> the panic happens again, follow the instructions pointed at on your
>>>> screen.
>>>>
>>>>
>>>
>>> --
>>> Justin Yang
>>>
>>
>>
>
> --
> Justin Yang
>
>

-- 
Justin Yang

- End forwarded message -



Re: sysupgrade; pkg_add -u 6.8 to 6.9

2021-05-12 Thread Stuart Henderson
A conflict marker was missed that would have set the update order. 
Workaround by uninstalling and reinstalling mrtg.


--
 Sent from a phone, apologies for poor formatting.
On 12 May 2021 17:00:56 Glen Gunsalus  wrote:

Attached is edited "sendbug" file from machine showing anomalies during 
upgrade.


Hope this is helpful, especially on the mrtg package update failure.

Regards, Glen Gunsalus




Re: NAT with PF over umb(4) and wireguard traffic from a client blackholed

2021-05-18 Thread Stuart Henderson
This can happen with any long lived UDP-based protocol that is natted to a 
dynamically configured address. I think this is not a bug; PF is doing 
exactly what you have asked it to.


The issue is that you're sending packets frequently (due to the fairly low 
keepalive timer) which refreshes the PF state and keeping the NAT mapping 
alive. Normally this is what you want, it's the whole point of the 
keepalive, but it falls apart when the address changes.


As you've said, flushing all states does the trick (and it's obvious why); 
this could possibly be automated via ifstated. It's a bit heavy handed to 
flush all states when only those involving one natted IP address actually 
need it but pf doesn't have a more targetted way for states using a certain 
NAT address, short of parsing pfctl -ss and killing the individual states 
by id. Alternatively, not a generic solution but might work in your case is 
to use ifstated to run pfctl -k to delete states from the known address of 
the rpi to the known wireguard endpoint address. I'm hoping that the change 
of address would also involve a link state change so that ifstated can 
trigger on this. However the interface used by pfctl -k is buggy and I 
can't remember if this use will run into a problem or not...


I don't know if it might make sense to handle automatically in the kernel; 
it would be convenient for the user but would I think be delicate work, 
especially regarding MP locking.


--
 Sent from a phone, apologies for poor formatting.
On 19 May 2021 01:16:06 Mikolaj Kucharski  wrote:


Forgot to also show PF rules:

pce-0035# grep -ve '^$' -e '^#' /etc/pf.conf
set skip on lo
set limit states 25000
queue q_umb0 on umb0 flows 1024 qlimit 50 quantum 300 default
queue q_athn0 on athn0 flows 1024 qlimit 100 quantum 300 default
match out on umb0 inet from !(umb0) nat-to (umb0:0)
match on umb0 inet all scrub (no-df random-id max-mss 1460)
block return
pass quick proto icmp all
pass quick proto icmp6 all
pass quick on tun0 from (tun0:network) to any keep state (if-bound)
pass in quick proto tcp from any to (self) port ssh
pass in quick proto udp from any to (self) port 51820
pass quick on { athn0 em0 em1 em2 }
pass out


On Wed, May 19, 2021 at 12:10:45AM +, miko...@kucharski.name wrote:

Synopsis: wireguard traffic blackholed after umb(4) changes ip addr
Category: kernel
Environment:

System  : OpenBSD 6.9
Details : OpenBSD 6.9-current (GENERIC.MP) #14: Tue May 11 18:41:12 UTC 
2021

r...@pc1.home.local:/home/mkucharski/openbsd/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64

Description:

This also occurs on vanilla kernel, but at present I'm running
custom kernel, with some athn(4) related changes.

Once in a while umb(4) disconnects from network and reconnects with new
IP address. OpenBSD machine, from this bug report has a wg0 interface
and when umb(4) changes IP address wireguard tunnel from OpenBSD to
external peer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM= is NOT
affected:

pce-0035# ifconfig wg0
wg0: flags=80c3 mtu 1420
index 9 priority 0 llprio 3
wgport 51820
wgpubkey BvWfmzqI94CkkI5TygWcmT10de8+7DUA2cxsl3jPeyo=
wgpeer klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
   wgpsk (present)
   wgpka 25 (sec)
   wgendpoint 5.135.165.132 51820
   tx: 19396164, rx: 2960456
   last handshake: 57 seconds ago
   wgaip fde4:f456:48c2:13c0::/64
groups: wg
inet6 fde4:f456:48c2:13c0::cc35 prefixlen 64

Per above output last handshake is pretty recent. However, over em(1)
device there is RPi connected with Linux, which also has wireguard
tunnel configured to the same endpoint:

rpi-0058:~# wg
interface: wg0
public key: QLG8RSrYJ/MUmIo2NJcgwleAnPFnl843HwNDgcd9u0c=
private key: (hidden)
listening port: 51820

peer: klMOiaGJpjMM1bqJouUirOIJRRqcQ8J5QdWOErfj5UM=
preshared key: (hidden)
endpoint: 5.135.165.132:51820
allowed ips: fde4:f456:48c2:13c0::/64
latest handshake: 1 day, 29 minutes, 14 seconds ago
transfer: 424.79 KiB received, 3.09 MiB sent
persistent keepalive: every 25 seconds

Per above output, we see that latest handshake was more than a day ago.
When I look tcpdump on em(4) which is connected to RPi I see following:

pce-0035# tcpdump -c5 -ni em1 host 5.135.165.132 and port 51820
tcpdump: listening on em1, link-type EN10MB
23:22:03.803721 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation 
from 0xec39d550 [tos 0x88]
23:22:08.923739 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation 
from 0x13821437 [tos 0x88]
23:22:14.043830 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation 
from 0x3fda1931 [tos 0x88]
23:22:19.803752 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation 
from 0x1537a4da [tos 0x88]
23:22:25.083788 192.168.1.58.51820 > 5.135.165.132.51820: [wg] initiation 
from 0x93343c5f [tos 0x88]


Then when I look at tcpdmump on umb(4) I see traffic from local machine,
which works fine, but also we see above initiation traffic from RPi,
which uses wrong (old)

<    3   4   5   6   7   8   9   10   11   >