Re: efiboot, serial port order
Hi, On Sun, 23 Feb 2020 12:18:37 +0100 (CET) Mark Kettenis wrote: >> Date: Sat, 22 Feb 2020 10:40:13 +0900 (JST) >> From: YASUOKA Masahiko >> efiboot is using ACPI UID to determine the minor number of comX. (snip) >> I originally wrote this code, because I thought ACPI UID enumeration >> is better than the order of handles by EFI. >> >> On qemu or vmware, 2 serials mappped like the following: >> >> EFI handle ACPI UID I/O addr efiboot kernel >> 0 0 0x3f8 com0 com0 >> 1 1 0x2f8 com1 com1 >> >> EFI handle order and ACPI UID enumeration are same and they also match >> I/O address assignment. >> >> But on "HPE DL20 Gen10", 2 serials mappped like the following: >> >> EFI handle ACPI UID I/O addr efiboot kernel >> 0 1 0x3f8 com1 com0 >> 1 0 0x2f8 com0 com1 >> >> Note that EFI handle order and ACPI UID enumeration is different and >> ACPI UID enumeration doesn't match the order in I/O address >> assignment. In this case, since com0 or com1 are mixed up between >> efiboot and kernel, if serial is usable on efiboot, it becomes not >> usable on kernel. >> >> Fortunately we can use "machine comaddr" to fix up the problem. >> >> >> Also I don't know any actual case such that EFI handle order is wrong >> but ACPI UID is correct. If using ACPI UID is useless, we can apply >> the diff attached at last. >> >> comment? > > I fear using the EFI handle order is going to cause similar problems > on some other machine. What we really need here is the io port > address. Unfortunately we can't map the UID into an address without a > full AML parser, which is not something we want in the bootloader. > > On arm64 we use the SPCR table to select the serial port. This table > does contain the io address of the serial port. Unfortunately not all > machines have an SPCR table so we want to retain some of the existing > logic in case an SPCR table isn't provided. Thank you for the comment. I suppose we need to accept the behavior for amd64 until we found any good way. Also I'd like to commit the diff attached. When working around the problem with the boot.conf like below, machine comaddr 0x3f8 set tty com1 this expects the kernel selects com0 for console. But actually existing com(4) doesn't work so. ok? Update the console device always when attaching the real device driver. The information by the driver is supposed more reliable than the information which was set up earlier. Index: sys/dev/ic/com.c === RCS file: /var/cvs/openbsd/src/sys/dev/ic/com.c,v retrieving revision 1.171 diff -u -p -r1.171 com.c --- sys/dev/ic/com.c5 Feb 2020 10:21:17 - 1.171 +++ sys/dev/ic/com.c2 Mar 2020 06:35:01 - @@ -1498,8 +1498,8 @@ com_attach_subr(struct com_softc *sc) if (cdevsw[maj].d_open == comopen) break; - if (maj < nchrdev && cn_tab->cn_dev == NODEV) - cn_tab->cn_dev = makedev(maj, sc->sc_dev.dv_unit); + KASSERT(maj < nchrdev); + cn_tab->cn_dev = makedev(maj, sc->sc_dev.dv_unit); printf("%s: console\n", sc->sc_dev.dv_xname); }
diff: ipmi initializing watchdog
When I played with ipmi, I found problems in the initializing watchdog. As far as my test on HPE DL20 Gen10, the ipmi device refuses "set watchdog timer" command if "use" value (in first byte) is none. Since existing code uses the value which read from the device, if the value is not set on the device, the command actually fails. The diff makes the driver set the value IPMI_WDOG_USE_SMS_OS explicitly always. Also existing code ignores error when "set watchdog timer" fail. I think this is no good and some actual problems can happen. ok? Set "use" bits to "SMS/OS" always to prevent error. Actually "set watchdog timer" fails on HP DL20 Gen10 with ccode 0xCC (invalid request) if the bits was not set in advance. Also don't update the timer period if setting watchdog timer is failed. Index: sys/dev/ipmi.c === RCS file: /var/cvs/openbsd/src/sys/dev/ipmi.c,v retrieving revision 1.109 diff -u -p -r1.109 ipmi.c --- sys/dev/ipmi.c 18 Feb 2020 00:06:12 - 1.109 +++ sys/dev/ipmi.c 2 Mar 2020 05:39:34 - @@ -130,7 +130,7 @@ voidipmi_cmd_wait_cb(void *); intipmi_watchdog(void *, int); void ipmi_watchdog_tickle(void *); -void ipmi_watchdog_set(void *); +intipmi_watchdog_set(void *, int); struct ipmi_softc *ipmilookup(dev_t dev); @@ -1741,8 +1741,9 @@ ipmi_watchdog(void *arg, int period) if (period < MIN_PERIOD && period > 0) period = MIN_PERIOD; + if (ipmi_watchdog_set(sc, period) == -1) + return (sc->sc_wdog_period); sc->sc_wdog_period = period; - ipmi_watchdog_set(sc); printf("%s: watchdog %sabled\n", DEVNAME(sc), (period == 0) ? "dis" : "en"); return (period); @@ -1766,8 +1767,8 @@ ipmi_watchdog_tickle(void *arg) ipmi_cmd(); } -void -ipmi_watchdog_set(void *arg) +int +ipmi_watchdog_set(void *arg, int period) { struct ipmi_softc *sc = arg; uint8_t wdog[IPMI_GET_WDOG_MAX]; @@ -1785,14 +1786,15 @@ ipmi_watchdog_set(void *arg) ipmi_cmd(); /* Period is 10ths/sec */ - uint16_t timo = htole16(sc->sc_wdog_period * 10); + uint16_t timo = htole16(period * 10); memcpy([IPMI_SET_WDOG_TIMOL], , 2); - wdog[IPMI_SET_WDOG_TIMER] &= ~IPMI_WDOG_DONTSTOP; - wdog[IPMI_SET_WDOG_TIMER] |= (sc->sc_wdog_period == 0) ? - 0 : IPMI_WDOG_DONTSTOP; + wdog[IPMI_SET_WDOG_TIMER] &= ~(IPMI_WDOG_DONTSTOP | IPMI_WDOG_USE_MASK); + wdog[IPMI_SET_WDOG_TIMER] |= IPMI_WDOG_USE_SMS_OS; + if (period != 0) + wdog[IPMI_SET_WDOG_TIMER] |= IPMI_WDOG_DONTSTOP; wdog[IPMI_SET_WDOG_ACTION] &= ~IPMI_WDOG_MASK; - wdog[IPMI_SET_WDOG_ACTION] |= (sc->sc_wdog_period == 0) ? + wdog[IPMI_SET_WDOG_ACTION] |= (period == 0) ? IPMI_WDOG_DISABLED : IPMI_WDOG_REBOOT; c.c_cmd = APP_SET_WATCHDOG_TIMER; @@ -1801,6 +1803,13 @@ ipmi_watchdog_set(void *arg) c.c_rxlen = 0; c.c_data = wdog; ipmi_cmd(); + if (c.c_ccode != 0) { + printf("%s: set watchdog timer failed: %.2x", DEVNAME(sc), + c.c_ccode); + return (-1); + } + + return (0); } #if defined(__amd64__) || defined(__i386__) Index: sys/dev/ipmivar.h === RCS file: /var/cvs/openbsd/src/sys/dev/ipmivar.h,v retrieving revision 1.32 diff -u -p -r1.32 ipmivar.h --- sys/dev/ipmivar.h 19 Dec 2019 09:01:50 - 1.32 +++ sys/dev/ipmivar.h 2 Mar 2020 05:39:34 - @@ -136,6 +136,11 @@ struct ipmi_thread { }; #define IPMI_WDOG_DONTSTOP 0x40 +#define IPMI_WDOG_USE_MASK 0x07 +#define IPMI_WDOG_USE_BIOS_FRB21 +#define IPMI_WDOG_USE_BIOS_POST2 +#define IPMI_WDOG_USE_OS_LOAD 3 +#define IPMI_WDOG_USE_SMS_OS 4 #define IPMI_WDOG_MASK 0x03 #define IPMI_WDOG_DISABLED 0x00
Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing
On Mon, Feb 24 2020 15:33:35 -0500, Ted Unangst wrote: > Martin Pieuchot wrote: > > On 24/02/20(Mon) 11:29, Lauri Tirkkonen wrote: > > > On Mon, Feb 24 2020 10:24:53 +0100, Martin Pieuchot wrote: > > > > On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote: > > > > > I was working on a make jobserver implementation that uses POSIX > > > > > semaphores as job tokens instead of a complicated socket-based > > > > > approach. > > > > > Initially I used named semaphores, which work fine, except if child > > > > > processes with less privileges need to also open the named semaphore > > > > > (eg. 'make build' as root executing 'su build -c make'). For that > > > > > reason > > > > > I wanted to use an unnamed semaphore (sem_init()) which is stored in > > > > > shm > > > > > -- that way I could leave the shm fd open and pass it to children. > > > > > > > > > > But unfortunately, sem_t is currently just a pointer to the opaque > > > > > struct __sem, and sem_int() always calloc()s the storage for the > > > > > struct. > > > > > > > > That's by design. > > > > > > Ok - could you elaborate what the design is? > > > > If the size of a descriptor change, because some fields are added and/or > > removed, it doesn't matter for the application because it only manipulates > > pointers. That means we can change the data types without creating an ABI > > break. > > I think we are approaching the point where we can settle on fixed sized types > now. If we want to be cautious, we can add a reserved padding field, too. But > there are some edge cases which I think can be removed by eliminating the > dynamic allocation paths. > > > > See the followup patch -- sharing the semaphore between processes does > > > work with it. > > > > Well ignoring the `pshared' argument is questionable. Why don't you > > remove the "#if notyet" and start playing with the existing code and > > try to figure out if something is missing for your use case? > > I'm not sure the code in notyet will work. It was based on a misunderstanding > I had of the requirements. Returning control of the sem_t placement to the > application is the right approach. Thanks for the input, and ping - is there still something about this diff that I should fix? -- Lauri Tirkkonen | lotheac @ IRCnet
diff: "ipmi0: sendcmd fails"
Hi, On HPE DL20 Gen10, kernel keeps printing "ipmi0: sendcmd fails" if ipmi0 is enabled. The machine has following 4 sensor devices. 19-P/S 1 Inlet 20-P/S 2 Inlet 21-P/S 1 22-P/S 2 But reading value from these devices fails always. This causes the problem above. The diff makes such the devices disabled if the error happens when probing. ok? Return error value when sending "sensor reading" is failed. This fixes "ipmi0: sendcmd fails" errors when there is a sensor which is enumurated but reading it is failed. Index: sys/dev/ipmi.c === RCS file: /var/cvs/openbsd/src/sys/dev/ipmi.c,v retrieving revision 1.109 diff -u -p -r1.109 ipmi.c --- sys/dev/ipmi.c 18 Feb 2020 00:06:12 - 1.109 +++ sys/dev/ipmi.c 2 Mar 2020 05:38:25 - @@ -1288,6 +1288,11 @@ read_sensor(struct ipmi_softc *sc, struc c.c_data = data; ipmi_cmd(); + if (c.c_ccode != 0) { + dbg_printf(1, "sensor reading command for %s failed: %.2x\n", + psensor->i_sensor.desc, c.c_ccode); + return (rv); + } dbg_printf(10, "values=%.2x %.2x %.2x %.2x %s\n", data[0],data[1],data[2],data[3], psensor->i_sensor.desc); psensor->i_sensor.flags &= ~SENSOR_FINVALID;
Re: diff: init efifb even if VGA is probed.
Hi, On Fri, 21 Feb 2020 18:55:38 -0600 Andrew Daugherity wrote: > On Thu, Feb 20, 2020 at 11:10 PM YASUOKA Masahiko wrote: >> Hello, >> >> I am testing a new hardware, HPE DL20 Gen10. >> >> When efiboot starts the kernel, the video display becomes distorted >> and never recovered until CPU reset. >> [...] >> On HPE DL20 Gen10, probing vga is succeeded on first trial, the kernel >> selects vga for the console, but actually it is broken. On usual >> machines which boot with EFI, the problem doesn't happen since they >> have no vga. >> >> The diff following fixes the problem by initializing efifb console >> even if the VGA is probed. > > This is exciting! Your HP server sounds very much like what I've > experienced on the Dell PowerEdge R230 [1] (probably also affects > other Dell Rx30 in UEFI mode, maybe Rx40 too?), and I would not be > surprised if your diff fixes mine too. > > >> # Also, HP DL20 Gen10 has "UEFI optimized boot" setting on BIOS and >> # disabling the setting avoids the problem happening. But since the >> # setting seems to be for old Windows, I think we should fix our >> # kernel. > > OpenBSD squishes the video to a thin purple line unless I enable the > "Load Legacy Video Option ROM" setting in the Dell BIOS. However, > that setting is described as a compatibility shim for old OSes which > don't support EFI GOP natively, and enabling it restricts the efifb > resolution to 1024x768. Every other OS I tested (including FreeBSD & > Linux) worked properly without the legacy video ROM. > > I got as far a nasty hack of a diff on efifb where if probing failed, > attach it with hardcoded values matching my machine. With said diff > plus vga disabled via 'boot -c' (otherwise vga would steal the > console), kernel output was still scrambled, but userland output from > the boot process displayed correctly. Unfortunately the keyboard > wasn't attached to this console, but I could at least print stuff on > the screen from an ssh session, e.g. 'echo "Hello, world!" > > /dev/console'. At some point I had to actually use this server > though, so I abandoned that effort and put the server into production > (using the legacy video ROM). The problems you are pointing seem to be the same problem. > I'll try to test this diff next week if I can schedule some downtime. Test is appreciated. Also I'd like to know the result of hexdump -C /var/db/acpi/FACP.1 when "Load Legacy Video Option ROM" setting is disabled. Thanks, > -Andrew > > [1] https://marc.info/?l=openbsd-tech=150707255507032=2 > The first half isn't relevant, but the last part covers the R230, and > has (old) dmesg for with and without the legacy video ROM. > I should clarify "X still works in either case": with efifb active, X > will prefer wsfb unless you add an xorg.conf for mga. mga performs > better and can use any resolution, while wsfb is limited to the efifb > resolution.
ldapd: fix return values for illegal passwords
Hi, When trying to bind to ldapd using a dn which has an invalid userPassword entry, e.eg. a “defective” SHA valus like “{SHA}alpha”, ldapd currently returns 1 (Operations Error). The patch below changes this to 49 (Invalid Credentials). There are basically two reasons for this: 1. The userPassword is a multi-valued attribute, i.e. there can be more than one in an ldap entry. Now when you have a valid and a defective entry and the binding user supplies a password which does not match the valid entry you get different results whether the defective entry comes last (return value 1) or not (return value 49). When the supplied password matches the valid entry the bind succeeds independent of order. A change to return value 49 gets consistent results. For a single userPassword entry 49 also is not wrong, as the supplied password still doesn't match the entry (even if it is defective). 2. RFC 4511 defines the result code “Operations Error” as follows: “operationsError (1) Indicates that the operation is not properly sequenced with relation to other operations (of same or different type). For example, this code is returned if the client attempts to StartTLS [RFC4346] while there are other uncompleted operations or if a TLS layer was already installed.” A defective password entry in no way is an operations error of the ldapd server. Also I don't think it is the server's job to take care of the correctness of password entries. There is no provision in the LDAP RFCs to stop one from entering an incorrect entry. I checked this with other directory servers: 389, apache ds, and openldap. All three accept incorrect entries and all three return 49 (Invalid Credentials) on bind attempts. Best regards Robert --- commit e9fe05bf15bf216ea7759cd64f17103008e1b69e (master) from: Robert Klein date: Sun Mar 1 15:49:40 2020 UTC ldapd: fix return values for illegal passwords diff 7a641edbd5c2c2686d4a80fd730967f00e58e681 f62b80b397c780e8273b5cc67d544b951c4c9d1f blob - f8debff7a2d6a0feb9cf984905de385b029b8744 blob + 094a8360e326ac7956c54a7dc8380df81c23efa5 --- usr.sbin/ldapd/auth.c +++ usr.sbin/ldapd/auth.c @@ -218,12 +218,12 @@ check_password(struct request *req, const char *stored int sz; if (stored_passwd == NULL) - return -1; + return 0; if (strncmp(stored_passwd, "{SHA}", 5) == 0) { sz = b64_pton(stored_passwd + 5, tmp, sizeof(tmp)); if (sz != SHA_DIGEST_LENGTH) - return (-1); + return (0); SHA1_Init(); SHA1_Update(, passwd, strlen(passwd)); SHA1_Final(md, ); @@ -231,7 +231,7 @@ check_password(struct request *req, const char *stored } else if (strncmp(stored_passwd, "{SSHA}", 6) == 0) { sz = b64_pton(stored_passwd + 6, tmp, sizeof(tmp)); if (sz <= SHA_DIGEST_LENGTH) - return (-1); + return (0); salt = tmp + SHA_DIGEST_LENGTH; SHA1_Init(); SHA1_Update(, passwd, strlen(passwd)); @@ -241,11 +241,11 @@ check_password(struct request *req, const char *stored } else if (strncmp(stored_passwd, "{CRYPT}", 7) == 0) { encpw = crypt(passwd, stored_passwd + 7); if (encpw == NULL) - return (-1); + return (0); return (strcmp(encpw, stored_passwd + 7) == 0 ? 1 : 0); } else if (strncmp(stored_passwd, "{BSDAUTH}", 9) == 0) { if (send_auth_request(req, stored_passwd + 9, passwd) == -1) - return (-1); + return (0); return 2; /* Operation in progress. */ } else return (strcmp(stored_passwd, passwd) == 0 ? 1 : 0);
Re: rwsleep and stopped process
On Sun, Mar 01, 2020 at 02:16:20PM +0100, Mark Kettenis wrote: > This probably means that msleep(4) has a similar issue. Here is the diff for msleep() and rwsleep(). bluhm Index: kern/kern_synch.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.162 diff -u -p -r1.162 kern_synch.c --- kern/kern_synch.c 30 Jan 2020 08:51:27 - 1.162 +++ kern/kern_synch.c 1 Mar 2020 13:50:29 - @@ -259,7 +259,6 @@ msleep(const volatile void *ident, struc sleep_setup(, ident, priority, wmesg); sleep_setup_timeout(, timo); - sleep_setup_signal(); /* XXX - We need to make sure that the mutex doesn't * unblock splsched. This can be made a bit more @@ -268,6 +267,8 @@ msleep(const volatile void *ident, struc spl = MUTEX_OLDIPL(mtx); MUTEX_OLDIPL(mtx) = splsched(); mtx_leave(mtx); + /* signal may stop the process, release mutex before that */ + sleep_setup_signal(); error = sleep_finish_all(, 1); @@ -320,9 +321,10 @@ rwsleep(const volatile void *ident, stru sleep_setup(, ident, priority, wmesg); sleep_setup_timeout(, timo); - sleep_setup_signal(); rw_exit(rwl); + /* signal may stop the process, release rwlock before that */ + sleep_setup_signal(); error = sleep_finish_all(, 1);
Re: rwsleep and stopped process
> Date: Sun, 1 Mar 2020 14:02:53 +0100 > From: Alexander Bluhm > > Hi, > > I had a 6.6 machine where a lot of git processes got stuck sleeping > on "futex". The process holding the futex rwlock was this one. > > 33293 332235 1 2734 30x800483 fsleepgit > > It called mi_switch() from proc_stop() with this trace. > > issignal(80002acc74a8) at issignal+0x2ec > sleep_setup_signal(120,81e2e168) at sleep_setup_signal+0xdf > rwsleep(12d8,80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94 > futex_wait(2,80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180 > sys_futex(530,80002acc74a8,53) at sys_futex+0x80 > syscall(0) at syscall+0x37d > Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128 > > So I would say the process was stopped instead of sleeping and did > not release the lock. Can rwsleep() call rw_exit() before > sleep_setup_signal()? This diff survived a full make regress on > amd64. > > ok? I think that should work. We should be able to release the lock as soon as we grab the scheduler lock. And that happens in sleep_setup(). This probably means that msleep(4) has a similar issue. And maybe other places that do the split sleep_setup()/sleep_finish(). > Index: kern/kern_synch.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.162 > diff -u -p -r1.162 kern_synch.c > --- kern/kern_synch.c 30 Jan 2020 08:51:27 - 1.162 > +++ kern/kern_synch.c 1 Mar 2020 12:11:30 - > @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru > > sleep_setup(, ident, priority, wmesg); > sleep_setup_timeout(, timo); > - sleep_setup_signal(); > - > rw_exit(rwl); > + /* signal may stop the process, release rwlock before that */ > + sleep_setup_signal(); > > error = sleep_finish_all(, 1); > >
rwsleep and stopped process
Hi, I had a 6.6 machine where a lot of git processes got stuck sleeping on "futex". The process holding the futex rwlock was this one. 33293 332235 1 2734 30x800483 fsleepgit It called mi_switch() from proc_stop() with this trace. issignal(80002acc74a8) at issignal+0x2ec sleep_setup_signal(120,81e2e168) at sleep_setup_signal+0xdf rwsleep(12d8,80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94 futex_wait(2,80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180 sys_futex(530,80002acc74a8,53) at sys_futex+0x80 syscall(0) at syscall+0x37d Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128 So I would say the process was stopped instead of sleeping and did not release the lock. Can rwsleep() call rw_exit() before sleep_setup_signal()? This diff survived a full make regress on amd64. ok? bluhm Index: kern/kern_synch.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.162 diff -u -p -r1.162 kern_synch.c --- kern/kern_synch.c 30 Jan 2020 08:51:27 - 1.162 +++ kern/kern_synch.c 1 Mar 2020 12:11:30 - @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru sleep_setup(, ident, priority, wmesg); sleep_setup_timeout(, timo); - sleep_setup_signal(); - rw_exit(rwl); + /* signal may stop the process, release rwlock before that */ + sleep_setup_signal(); error = sleep_finish_all(, 1);