[patch] traceroute timeouts
The default traceroute timeout of 5 seconds is excruciatingly long when there are elements of the route that don't respond, and it wasn't allowed to be set lower than 2 seconds. This changes the minimum to 1 second, matching FreeBSD, and also makes that the default, which should be reasonable for the vast majority of users today. The two awk files in this directory are two decades old, and not installed anywhere they can be executed as part of a traceroute pipeline; can they be removed? If the functionality is useful, implementing mean/median reporting as a new option in C would be straightforward. Index: usr.sbin/traceroute/traceroute.8 === RCS file: /cvs/src/usr.sbin/traceroute/traceroute.8,v retrieving revision 1.69 diff -u -p -u -r1.69 traceroute.8 --- usr.sbin/traceroute/traceroute.811 Feb 2020 18:41:39 - 1.69 +++ usr.sbin/traceroute/traceroute.820 Aug 2021 06:33:30 - @@ -201,7 +201,7 @@ and are listed. .It Fl w Ar waittime Set the time, in seconds, to wait for a response to a probe. -The default is 5. +The default is 1. .It Fl x Print the ICMP extended headers if available. This option is not available for IPv6. Index: usr.sbin/traceroute/traceroute.c === RCS file: /cvs/src/usr.sbin/traceroute/traceroute.c,v retrieving revision 1.164 diff -u -p -u -r1.164 traceroute.c --- usr.sbin/traceroute/traceroute.c12 Jul 2021 15:09:21 - 1.164 +++ usr.sbin/traceroute/traceroute.c20 Aug 2021 06:33:30 - @@ -351,7 +351,7 @@ main(int argc, char *argv[]) rcvsock4 = rcvsock6 = sndsock4 = sndsock6 = -1; v4sock_errno = v6sock_errno = 0; - conf->waittime = 5 * 1000; + conf->waittime = 1000; if ((rcvsock6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == -1) v6sock_errno = errno; @@ -554,9 +554,9 @@ main(int argc, char *argv[]) err(1, "setsockopt SO_RTABLE"); break; case 'w': - conf->waittime = strtonum(optarg, 2, INT_MAX, &errstr); + conf->waittime = strtonum(optarg, 1, INT_MAX, &errstr); if (errstr) - errx(1, "wait must be >1 sec."); + errx(1, "wait must be >=1 sec."); conf->waittime *= 1000; break; case 'x':
Re: Poison file names
I would like to be able to clone the github mirror on windows. I do wind up using 7z on the tar file as a workaround, but it would be nice if github "just worked". The com files is what the clone fails on, and those seemed easy enough to address, but if it is actually a deep rat hole, I certainly understand. Original Message Subject: Re: Poison file names From: Daniel Dickman Date: Fri, December 11, 2020 9:42 am To: jo...@armadilloaerospace.com, "tech@openbsd.org" On Wed, Dec 9, 2020 at 4:50 AM Jonathan Gray wrote: > > On Tue, Dec 08, 2020 at 11:36:37PM -0700, jo...@armadilloaerospace.com wrote: > > The game battlestar has source files names com1.c through com7.c, which > > are illegal on windows due to ancient dos com port rules. > > > > I understand there might not be much sympathy for that, but being able > > to have the full source tree on a windows system can be convenient, and > > it should be a painless change. > > > > Any chance someone could do that rename in the repository and change > > the Makefile? > > NetBSD renamed them all to command for that reason. > > https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file > > usr.sbin/pkg_add/pod/OpenBSD::*.pod would also be a problem due to colons > Hi John, maybe you can say more how you're getting the files to a windows system? If we rename the battlestar files, and given it won't completely solve things for you as jsg@ points out, does it make any difference for you? I do this myself too from time to time by extracting the src.tar.gz using 7z (https://www.7-zip.org/). That tool seems to automatically rename the comX.c files to _comX.c. And for the pod files it seems to automatically replace the invalid ":" characters with "_". The only annoyance I have left is that there are a small number of remaining conflicts from upper/lower case on unix vs windows. Bottom line, for me extracting with 7z and just pressing "automatically rename" on the first conflict seems to work as all those conflicting filenames are for things I really don't look at on my windows box. But maybe you could tell us more if changing battlestar would partially improve things for you? Either way, I don't object to syncing battlestar with netbsd's 2001 commit.
Poison file names
The game battlestar has source files names com1.c through com7.c, which are illegal on windows due to ancient dos com port rules. I understand there might not be much sympathy for that, but being able to have the full source tree on a windows system can be convenient, and it should be a painless change. Any chance someone could do that rename in the repository and change the Makefile?
PATCH: better error return for exFAT filesystem
Perform an explicit check for the unsupported exFAT MSDOS filesystem instead of letting it fail mysteriously when it gets cluster sizes of 0 from the normal fields. This causes mount_msdos to report: mount_msdos: /dev/sd1i on /root/mnt: filesystem not supported by kernel Instead of the more obscure: mount_msdos: /dev/sd1i on /root/mnt: Inapropriate file type or format Index: msdosfs_vfsops.c === RCS file: /cvs/src/sys/msdosfs/msdosfs_vfsops.c,v retrieving revision 1.93 diff -u -p -r1.93 msdosfs_vfsops.c --- msdosfs_vfsops.c24 Jan 2020 03:49:34 - 1.93 +++ msdosfs_vfsops.c7 Aug 2020 19:52:04 - @@ -298,6 +298,12 @@ msdosfs_mountfs(struct vnode *devvp, str b50 = (struct byte_bpb50 *)bsp->bs50.bsBPB; b710 = (struct byte_bpb710 *)bsp->bs710.bsBPB; + /* No support for exFAT filesystems */ + if (!strncmp("EXFAT", bsp->bs33.bsOemName, 5)) { + error = EOPNOTSUPP; + goto error_exit; + } + pmp = malloc(sizeof *pmp, M_MSDOSFSMNT, M_WAITOK | M_ZERO); pmp->pm_mountp = mp;
PATCH: iostat spacing
IO rates above 100 MB/s are common with SSD; this patch expands the column so it stays neatly printed. An argument can be made for expanding it one more for fast M.2 drives. ? dkstats.d ? iostat ? iostat.d Index: iostat.c === RCS file: /cvs/src/usr.sbin/iostat/iostat.c,v retrieving revision 1.42 diff -u -p -r1.42 iostat.c --- iostat.c14 Oct 2019 19:22:17 - 1.42 +++ iostat.c7 Aug 2020 19:00:45 - @@ -227,7 +227,7 @@ header(void) if (ISSET(todo, SHOW_TOTALS)) printf(" %18.18s ", cur.dk_name[i]); else - printf(" %16.16s ", cur.dk_name[i]); + printf(" %17.17s ", cur.dk_name[i]); } if (ISSET(todo, SHOW_STATS_2)) for (i = 0; i < dk_ndrive; i++) @@ -252,7 +252,7 @@ header(void) if (ISSET(todo, SHOW_TOTALS)) printf(" KB/t xfr MB "); else - printf(" KB/t t/s MB/s "); + printf(" KB/t t/s MB/s "); } if (ISSET(todo, SHOW_STATS_2)) for (i = 0; i < dk_ndrive; i++) @@ -299,10 +299,7 @@ disk_stats(double etime) (double)(1024 * 1024); else mbps = 0; - if (ISSET(todo, SHOW_TOTALS)) - printf(" %6.2f ", mbps / etime); - else - printf(" %5.2f ", mbps / etime); + printf(" %6.2f ", mbps / etime); } }
exFAT support
I tried to mount a 12TB USB drive, and was getting an "Inappropriate file type or format" error. It turned out to be due to exFAT formatting, but it took me some investigating to figure that out. Would it be reasonable to have the kernel print a more informative warning like "exFAT filesystem not supported" when you try to mount it with mount_msdos, or are additional kernel prints considered bad form? With Microsoft's release of the spec last year, is the path open for kernel support now, when someone gets around to it? I have seen some comments about the reliability of exfat-fuse, will I be sad if I try to use it for something important?
Proposal: integrate wsmoused into kernel
There are several user visible "warts" with wsmoused: Wsmoused conflicts with X's use of the mouse, you can't ctrl-alt-screen between X and text consoles and use the mouse in both of them. The cursor location is in character cells instead of pixels, so the cursor moves twice as fast vertically as horizontally, and you need to scale down the mouse to get a reasonable speed. Wsmoused makes an attempt to address this by setting MSMOUSEIO_SRES to WSMOUSE_RES_MIN, but that only does anything on PS/2 mice. There is also a really strange (not even monotonically increasing or directionally invariant!) coordinate scaling function in normalize_event(). Wsmoused is enabled with rcctl and configuration options are set with wsmoused_flags= in /etc/rc.local instead of wsconsctl / wsconsctl.conf. I propose: Wsdisplay would have a mouse device associated with it as keyboards are associated, with "just works" defaults and wsconsctl options. Console cursor work would be automatically supressed when X is active. Cursor position would be tracked in pixels. If no additional scaling is done in X, the cursor would move at the same speed across the graphics or text console screens. The possibility is also then open to draw a pixel addressed bitmap cursor on any of the rasops consoles, which is more user friendly than a text block cursor for both disambiguation from the typing cursor, and avoidance of the unknown block boundary jitter problem. This would also save a couple thousand lines of code, remove a "moving part" from the system, and allow additional improvements like drag selection into the scrollback to be implemented. This would lose the support wsmoused has for some ancient serial mouse protocols parsed in user space. I would argue that if those mice are actually important, they should get a kernel driver. However, I suspect that the intersection of wsmoused users with 25 year old mice is small enough to neglect. Sound ok to pursue?
Re: Use VGA text mode palette RGB values in rasops(9)
I have also noticed that comments are hard to read in color-vim because of the dark blue on black; increasing the intensity would be a trivial improvement. I have thought about testing a bold attribute implementation by way of shift-and-overlay of the font bitmap so all the colors could be full intensity. The effect would be resolution dependent, but probably superior for at least the 12x24 and 16x32 fonts. 32x64 might need triple striking. An actual bold font for each size would be superior quality and speed at a space cost, of course. Related, I looked through the xterm code to see how hard it would be to use some of it directly in the wscons system for accurate emulation, and it looked pretty challenging to me. Fixing problems piecemeal is probably easier than trying to reuse code. Original Message Subject: Re: Use VGA text mode palette RGB values in rasops(9) From: "Theo de Raadt" Date: Tue, July 07, 2020 9:07 am To: Frederic Cambus Cc: tech@openbsd.org, Jonathan Gray Frederic Cambus wrote: > One can test in frame buffer consoles by doing: > > export TERM=wsvt25 > > And run either vim or colorls -G. > Furthermore the situation is just beyond ridiculous. On the one hand your diff is talking about minimizing the differences between terminals, but your example immediately heads in the opposite direction by SELECTING for divergent configuration -- which I suspect noone except you uses. This termcap mess should never have happened. Up until about 10 years ago, pointless and accidental divergence in console emulators ended up being reflected in termcap entries *which to a large degree noone used, except for the people making the termcap entries*, and the situation is so retarded, because is ENTRENCHES differences rather than recognizing the differences shouldn't have existed. The console emulators SHOULD HAVE become more mundane and less featureful, they should have become closer to a clean blend of vt220 + xterm. But no! People focused on extremely narrow details and hid them behind TERM feature flags, and avoided focusing on the big picture of improving ease of utilization and compatibility. It is very sad.
Re: [PATCH} Optimized rasops32 putchar
I did some more tests, and I think the odd performance I am seeing may be due to TLB thrash on the 32x64 characters with 4k pages, since writing each character will require 64 data TLB. Are huge page mappings supported in OpenBSD? Original Message Subject: Re: [PATCH} Optimized rasops32 putchar From: Mark Kettenis Date: Sat, June 27, 2020 1:30 pm To: Cc: tech@openbsd.org > Content-Type: text/plain; charset="utf-8" > From: > > I was doing my timings with a user mode program after mmmaping the > efifb display, so the mapping might be different in the kernel. That should still give you a write-combining mapping as efifb_mmap() adds the PMAP_WC flag to the physical address. Cachable on x86 means write-back cachable. And using a write-back cachable mapping for a framebuffer often leads to interesting "damage" where pixels in certain cache lines show up "late" on the display. Not sure if you'd see that on recent Intel graphics hardware as the current hardware designs are much more coherent than what was produced in the past. > Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES / > WSDISPLAYIO_SMODE to the drm drivers by consolidating code into > rasops. While the point of the DRM drivers is to get fully hardware > accelerated drawing in X, there isn't any reason why they can't > support dumb framebuffer mappings as well. True. Although there are DRM interfaces that give you a dumb framebuffer as well. Using those interfaces is a bit more complicated though. Centralising the code would be good. That code probably should use bus_space_mmap(4) as the PMAP_WC flag is amd64-specific. Unfortunately the amd64 implementation of bus_space_mmap(4) is incomplete and doesn't actually implement write-combining for mappings with the BUS_SPACE_MAP_PREFETCHABLE flag set. So that has to be fixed as well. > Original Message > Subject: RE: [PATCH} Optimized rasops32 putchar > From: > Date: Sat, June 27, 2020 11:13 am > To: "Mark Kettenis" > Cc: "tech@openbsd.org" > > I believe it is mapped as normally cached right now, rather than > uncached or write combining. > > Reads aren't ultra-slow, and the timings of 48 byte writes appear to > involve a cacheline read. > > 128 byte writes are actually slower than 64 byte writes, which I > guessed might be because of automatic prefetching kicking in and > reading the following cacheline. > > > Original Message > Subject: Re: [PATCH} Optimized rasops32 putchar > From: Mark Kettenis > Date: Sat, June 27, 2020 7:56 am > To: > Cc: tech@openbsd.org > > > From: > > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > > > Optimized 32 bit character rendering with unrolled rows and pairwise > > foreground / background pixel rendering. > > > > If it weren't for the 5x8 font, I would have just assumed everything > > was an even width and made the fallback path also pairwise. > > > > In isolation, the 16x32 character case got 2x faster, but that wasn't > > a huge real world speedup where the space rendering that was already > > at memory bandwidth limits accounted for most of the character > > rendering time. However, in combination with the previous fast > > conditional console scrolling that removes most of the space rendering, > > it becomes significant. > > > > I also found that at least the efi and intel framebuffers are not > > currently mapped write combining, which makes this much slower than > > it should be. > > Hi John, > > The framebuffer should be mapped write-combining. In OpenBSD this is > requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to > bbus_space_map(9) when mapping the framebuffer. > > I'm fairly confident since until last January the initial mapping of > the framebuffer that we used wasn't write-combining. And things were > really, really slow. > > Cheers, > > Mark > > > Index: rasops32.c > > === > > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > > retrieving revision 1.10 > > diff -u -p -r1.10 rasops32.c > > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > > +++ rasops32.c 26 Jun 2020 14:34:06 - > > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > > int > > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > > attr) > > { > > - int width, height, cnt, fs, fb, clr[2]; > > + int width, height, step, cnt, fs, b, f; > > + uint32_t fb, clr[2]; > > struct rasops_info *ri; > > - int32_t *dp, *rp; > > + int64_t *rp, q; > > + union { > > + int64_t q[4]; > > + int32_t d[4][2]; > > + } u; > > u_char *fr; > > > > ri = (struct rasops_info *)cookie; > > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > > return 0; > > #endif > > > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > > > height = ri->ri_font->fontheight; > > width = ri->ri_font->fontwidth; > > + step = ri->ri_
Re: [PATCH} Optimized rasops32 putchar
I was doing my timings with a user mode program after mmmaping the efifb display, so the mapping might be different in the kernel. Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES / WSDISPLAYIO_SMODE to the drm drivers by consolidating code into rasops. While the point of the DRM drivers is to get fully hardware accelerated drawing in X, there isn't any reason why they can't support dumb framebuffer mappings as well. Original Message Subject: RE: [PATCH} Optimized rasops32 putchar From: Date: Sat, June 27, 2020 11:13 am To: "Mark Kettenis" Cc: "tech@openbsd.org" I believe it is mapped as normally cached right now, rather than uncached or write combining. Reads aren't ultra-slow, and the timings of 48 byte writes appear to involve a cacheline read. 128 byte writes are actually slower than 64 byte writes, which I guessed might be because of automatic prefetching kicking in and reading the following cacheline. Original Message Subject: Re: [PATCH} Optimized rasops32 putchar From: Mark Kettenis Date: Sat, June 27, 2020 7:56 am To: Cc: tech@openbsd.org > From: > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > Optimized 32 bit character rendering with unrolled rows and pairwise > foreground / background pixel rendering. > > If it weren't for the 5x8 font, I would have just assumed everything > was an even width and made the fallback path also pairwise. > > In isolation, the 16x32 character case got 2x faster, but that wasn't > a huge real world speedup where the space rendering that was already > at memory bandwidth limits accounted for most of the character > rendering time. However, in combination with the previous fast > conditional console scrolling that removes most of the space rendering, > it becomes significant. > > I also found that at least the efi and intel framebuffers are not > currently mapped write combining, which makes this much slower than > it should be. Hi John, The framebuffer should be mapped write-combining. In OpenBSD this is requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to bbus_space_map(9) when mapping the framebuffer. I'm fairly confident since until last January the initial mapping of the framebuffer that we used wasn't write-combining. And things were really, really slow. Cheers, Mark > Index: rasops32.c > === > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > retrieving revision 1.10 > diff -u -p -r1.10 rasops32.c > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > +++ rasops32.c 26 Jun 2020 14:34:06 - > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > int > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > attr) > { > - int width, height, cnt, fs, fb, clr[2]; > + int width, height, step, cnt, fs, b, f; > + uint32_t fb, clr[2]; > struct rasops_info *ri; > - int32_t *dp, *rp; > + int64_t *rp, q; > + union { > + int64_t q[4]; > + int32_t d[4][2]; > + } u; > u_char *fr; > > ri = (struct rasops_info *)cookie; > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > return 0; > #endif > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > height = ri->ri_font->fontheight; > width = ri->ri_font->fontwidth; > + step = ri->ri_stride >> 3; > > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; > + b = ri->ri_devcmap[(attr >> 16) & 0xf]; > + f = ri->ri_devcmap[(attr >> 24) & 0xf]; > + u.d[0][0] = b; u.d[0][1] = b; > + u.d[1][0] = b; u.d[1][1] = f; > + u.d[2][0] = f; u.d[2][1] = b; > + u.d[3][0] = f; u.d[3][1] = f; > > if (uc == ' ') { > + q = u.q[0]; > while (height--) { > - dp = rp; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) > - *dp++ = clr[0]; > + /* the general, pixel-at-a-time case is fast enough */ > + for (cnt = 0; cnt < width; cnt++) > + ((int *)rp)[cnt] = b; > + rp += step; > } > } else { > uc -= ri->ri_font->firstchar; > fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale; > fs = ri->ri_font->stride; > - > - while (height--) { > - dp = rp; > - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) | > - (fr[0] << 24); > - fr += fs; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) { > - *dp++ = clr[(fb >> 31) & 1]; > - fb <<= 1; > - } > + /* double-pixel special cases for the common widths */ > + switch (width) { > + case 8: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + rp += step; > + fr += 1; > + } > + break; > + > + case 12: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; > + rp += step; > + fr += 2;
Re: [PATCH} Optimized rasops32 putchar
I believe it is mapped as normally cached right now, rather than uncached or write combining. Reads aren't ultra-slow, and the timings of 48 byte writes appear to involve a cacheline read. 128 byte writes are actually slower than 64 byte writes, which I guessed might be because of automatic prefetching kicking in and reading the following cacheline. Original Message Subject: Re: [PATCH} Optimized rasops32 putchar From: Mark Kettenis Date: Sat, June 27, 2020 7:56 am To: Cc: tech@openbsd.org > From: > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > Optimized 32 bit character rendering with unrolled rows and pairwise > foreground / background pixel rendering. > > If it weren't for the 5x8 font, I would have just assumed everything > was an even width and made the fallback path also pairwise. > > In isolation, the 16x32 character case got 2x faster, but that wasn't > a huge real world speedup where the space rendering that was already > at memory bandwidth limits accounted for most of the character > rendering time. However, in combination with the previous fast > conditional console scrolling that removes most of the space rendering, > it becomes significant. > > I also found that at least the efi and intel framebuffers are not > currently mapped write combining, which makes this much slower than > it should be. Hi John, The framebuffer should be mapped write-combining. In OpenBSD this is requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to bbus_space_map(9) when mapping the framebuffer. I'm fairly confident since until last January the initial mapping of the framebuffer that we used wasn't write-combining. And things were really, really slow. Cheers, Mark > Index: rasops32.c > === > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > retrieving revision 1.10 > diff -u -p -r1.10 rasops32.c > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > +++ rasops32.c 26 Jun 2020 14:34:06 - > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > int > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > attr) > { > - int width, height, cnt, fs, fb, clr[2]; > + int width, height, step, cnt, fs, b, f; > + uint32_t fb, clr[2]; > struct rasops_info *ri; > - int32_t *dp, *rp; > + int64_t *rp, q; > + union { > + int64_t q[4]; > + int32_t d[4][2]; > + } u; > u_char *fr; > > ri = (struct rasops_info *)cookie; > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > return 0; > #endif > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > height = ri->ri_font->fontheight; > width = ri->ri_font->fontwidth; > + step = ri->ri_stride >> 3; > > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; > + b = ri->ri_devcmap[(attr >> 16) & 0xf]; > + f = ri->ri_devcmap[(attr >> 24) & 0xf]; > + u.d[0][0] = b; u.d[0][1] = b; > + u.d[1][0] = b; u.d[1][1] = f; > + u.d[2][0] = f; u.d[2][1] = b; > + u.d[3][0] = f; u.d[3][1] = f; > > if (uc == ' ') { > + q = u.q[0]; > while (height--) { > - dp = rp; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) > - *dp++ = clr[0]; > + /* the general, pixel-at-a-time case is fast enough */ > + for (cnt = 0; cnt < width; cnt++) > + ((int *)rp)[cnt] = b; > + rp += step; > } > } else { > uc -= ri->ri_font->firstchar; > fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale; > fs = ri->ri_font->stride; > - > - while (height--) { > - dp = rp; > - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) | > - (fr[0] << 24); > - fr += fs; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) { > - *dp++ = clr[(fb >> 31) & 1]; > - fb <<= 1; > - } > + /* double-pixel special cases for the common widths */ > + switch (width) { > + case 8: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + rp += step; > + fr += 1; > + } > + break; > + > + case 12: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; > + rp += step; > + fr += 2; > + } > + break; > + > + case 16: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; > + rp[6] = u.q[(fb >> 2) & 3]; > + rp[7] = u.q[fb & 3]; > + rp += step; > + fr += 2; > + } > + break; > + case 32: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; >
Re: [PATCH] fast conditional console scrolling
I should have been more rigorous -- I had two different changes running on my system, as well as forcing it to use the 12x24 font for a 160x45 console. If you apply the "Optimized rasops32 putchar" patch I just posted, you should see another significant speedup. Original Message Subject: Re: [PATCH] fast conditional console scrolling From: Paul de Weerd Date: Fri, June 26, 2020 1:23 am To: jo...@armadilloaerospace.com Cc: "tech@openbsd.org" Hi John, I tried your diff. I don't quite see the same 3x improvement that you report, more like 2x. I timed 7 runs of ls -R /usr/ports: Before diff, time ls -R /usr/ports | wc -l 2.897s on average After diff, time ls -R /usr/ports | wc -l 2.707s on average Before diff, time ls -R /usr/ports 2m53.067 on average After diff, time ls -R /usr/ports 1m30.387 on average Note that the 'before diff' runs were with a snapshot kernel. There may be diffs in there that account for the difference between before and after of the no-output runs. See dmesg and full stats below. So, on average, a speed-up of ~48%. Thanks! Paul 'WEiRD' de Weerd
[PATCH} Optimized rasops32 putchar
Optimized 32 bit character rendering with unrolled rows and pairwise foreground / background pixel rendering. If it weren't for the 5x8 font, I would have just assumed everything was an even width and made the fallback path also pairwise. In isolation, the 16x32 character case got 2x faster, but that wasn't a huge real world speedup where the space rendering that was already at memory bandwidth limits accounted for most of the character rendering time. However, in combination with the previous fast conditional console scrolling that removes most of the space rendering, it becomes significant. I also found that at least the efi and intel framebuffers are not currently mapped write combining, which makes this much slower than it should be. Index: rasops32.c === RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v retrieving revision 1.10 diff -u -p -r1.10 rasops32.c --- rasops32.c 25 May 2020 09:55:49 - 1.10 +++ rasops32.c 26 Jun 2020 14:34:06 - @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) int rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t attr) { - int width, height, cnt, fs, fb, clr[2]; + int width, height, step, cnt, fs, b, f; + uint32_t fb, clr[2]; struct rasops_info *ri; - int32_t *dp, *rp; + int64_t *rp, q; + union { + int64_t q[4]; + int32_t d[4][2]; + } u; u_char *fr; ri = (struct rasops_info *)cookie; @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, return 0; #endif - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); height = ri->ri_font->fontheight; width = ri->ri_font->fontwidth; + step = ri->ri_stride >> 3; - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; + b = ri->ri_devcmap[(attr >> 16) & 0xf]; + f = ri->ri_devcmap[(attr >> 24) & 0xf]; + u.d[0][0] = b; u.d[0][1] = b; + u.d[1][0] = b; u.d[1][1] = f; + u.d[2][0] = f; u.d[2][1] = b; + u.d[3][0] = f; u.d[3][1] = f; if (uc == ' ') { + q = u.q[0]; while (height--) { - dp = rp; - DELTA(rp, ri->ri_stride, int32_t *); - - for (cnt = width; cnt; cnt--) - *dp++ = clr[0]; + /* the general, pixel-at-a-time case is fast enough */ + for (cnt = 0; cnt < width; cnt++) + ((int *)rp)[cnt] = b; + rp += step; } } else { uc -= ri->ri_font->firstchar; fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale; fs = ri->ri_font->stride; - - while (height--) { - dp = rp; - fb = fr[3] | (fr[2] << 8) | (fr[1] << 16) | - (fr[0] << 24); - fr += fs; - DELTA(rp, ri->ri_stride, int32_t *); - - for (cnt = width; cnt; cnt--) { - *dp++ = clr[(fb >> 31) & 1]; - fb <<= 1; - } + /* double-pixel special cases for the common widths */ + switch (width) { + case 8: + while (height--) { + fb = fr[0]; + rp[0] = u.q[fb >> 6]; + rp[1] = u.q[(fb >> 4) & 3]; + rp[2] = u.q[(fb >> 2) & 3]; + rp[3] = u.q[fb & 3]; + rp += step; + fr += 1; + } + break; + + case 12: + while (height--) { + fb = fr[0]; + rp[0] = u.q[fb >> 6]; + rp[1] = u.q[(fb >> 4) & 3]; + rp[2] = u.q[(fb >> 2) & 3]; + rp[3] = u.q[fb & 3]; + fb = fr[1]; + rp[4] = u.q[fb >> 6]; + rp[5] = u.q[(fb >> 4) & 3]; + rp += step; + fr += 2; + } + break; + + case 16: + while (height--) { +
[PATCH] fast conditional console scrolling
This causes the write-only framebuffer console to only redraw the chars that differ between the start and end positions. 'time ls -R /usr/src/sys' is 3x faster with this, because most of the characters stay the same after a scroll. If this looks good, I can do the same thing for clear rows and copy/ clear columns, although I will need to make a test case for them. It would probably be a good idea to change the rasops interface to have generic block copy and clear oeprations, versus the current full-column / full-row interface, so tmux and friends could get the full acceleration. Index: rasops.c === RCS file: /cvs/src/sys/dev/rasops/rasops.c,v retrieving revision 1.61 diff -u -p -r1.61 rasops.c --- rasops.c25 May 2020 09:55:49 - 1.61 +++ rasops.c26 Jun 2020 04:14:13 - @@ -1627,28 +1627,42 @@ rasops_vcons_copyrows(void *cookie, int struct rasops_info *ri = scr->rs_ri; int cols = ri->ri_cols; int row, col, rc; + int srcofs; + int move; + /* update the scrollback buffer if the entire screen is moving */ if (dst == 0 && (src + num == ri->ri_rows) && scr->rs_sbscreens > 0) memmove(&scr->rs_bs[dst], &scr->rs_bs[src * cols], - ((ri->ri_rows * (scr->rs_sbscreens + 1) * cols) - - (src * cols)) * sizeof(struct wsdisplay_charcell)); - else + ri->ri_rows * scr->rs_sbscreens * cols + * sizeof(struct wsdisplay_charcell)); + + /* copy everything */ + if ((ri->ri_flg & RI_WRONLY) == 0 || !scr->rs_visible) { memmove(&scr->rs_bs[dst * cols + scr->rs_dispoffset], - &scr->rs_bs[src * cols + scr->rs_dispoffset], - num * cols * sizeof(struct wsdisplay_charcell)); + &scr->rs_bs[src * cols + scr->rs_dispoffset], + num * cols * sizeof(struct wsdisplay_charcell)); - if (!scr->rs_visible) - return 0; + if (!scr->rs_visible) + return 0; - if ((ri->ri_flg & RI_WRONLY) == 0) return ri->ri_copyrows(ri, src, dst, num); + } - for (row = dst; row < dst + num; row++) { + /* smart update, only redraw characters that are different */ + srcofs = (src - dst) * cols; + + for (move = 0 ; move < num ; move++) { + row = srcofs > 0 ? dst + move : dst + num - 1 - move; for (col = 0; col < cols; col++) { int off = row * cols + col + scr->rs_dispoffset; - - rc = ri->ri_putchar(ri, row, col, - scr->rs_bs[off].uc, scr->rs_bs[off].attr); + int newc = scr->rs_bs[off+srcofs].uc; + int newa = scr->rs_bs[off+srcofs].attr; + if ( scr->rs_bs[off].uc == newc + && scr->rs_bs[off].attr == newa ) + continue; + scr->rs_bs[off].uc = newc; + scr->rs_bs[off].attr = newa; + rc = ri->ri_putchar(ri, row, col, newc, newa); if (rc != 0) return rc; }
SSE in kernel?
Are SSE instructions allowed in the AMD64 kernel? Is #ifdef __SSE__ a sufficient guard? I have a rasops32 putchar with SSE that is 2x faster.
Re: [patch] remove NPCDISPLAY from AMD64
Ok, I didn't know such things existed. I would think that a coreboot framebuffer would be preferred over a VGA console, unless there was a system that only had the framebuffer in uncached-not-write-combined mapping. Original Message Subject: Re: [patch] remove NPCDISPLAY from AMD64 From: Jonathan Gray Date: Tue, June 16, 2020 9:39 pm To: jo...@armadilloaerospace.com Cc: "tech@openbsd.org" , j...@openbsd.org On Tue, Jun 16, 2020 at 07:15:24PM -0700, jo...@armadilloaerospace.com wrote: > You can't put an ISA CGA/EGA/MGA in an AMD64 system, so these can > go away. While it is incredibly unlikely someone would try, amd64 capable "industrial" motherboards with ISA exist. We don't build pcdisplay(4) on amd64 by default however. > > Does anyone know if there is an ordering reason that the coreboot > efifb_cb_cnattach console is after the VGA attach? Things could be > cleaned up a bit if the efifb entry points just checked for both > efi and coreboot framebuffers in the same function. https://marc.info/?l=openbsd-tech&m=146609528005661&w=2 "The cnattach path still has to be different to allow vga to try to attach before doing the coreboot version" It seems the coreboot framebuffer support was intended to be a last resort if vga and efi gop were not available going by the initial mail when it was proposed as a different driver: https://marc.info/?l=openbsd-tech&m=14655876693&w=2 > > > Index: wscons_machdep.c > === > RCS file: /cvs/src/sys/arch/amd64/amd64/wscons_machdep.c,v > retrieving revision 1.14 > diff -u -p -r1.14 wscons_machdep.c > --- wscons_machdep.c 14 Oct 2017 04:44:43 - 1.14 > +++ wscons_machdep.c 17 Jun 2020 02:05:34 - > @@ -35,18 +35,12 @@ > #include > > #include "vga.h" > -#include "pcdisplay.h" > -#if (NVGA > 0) || (NPCDISPLAY > 0) > +#if (NVGA > 0) > #include > #include > -#if (NVGA > 0) > #include > #include > #endif > -#if (NPCDISPLAY > 0) > -#include > -#endif > -#endif > > #include "wsdisplay.h" > #if NWSDISPLAY > 0 > @@ -146,10 +140,6 @@ wscn_video_init(void) > #endif > #if (NEFIFB > 0) > if (efifb_cb_cnattach() == 0) > - return (0); > -#endif > -#if (NPCDISPLAY > 0) > - if (pcdisplay_cnattach(X86_BUS_SPACE_IO, X86_BUS_SPACE_MEM) == 0) > return (0); > #endif > return (-1); > > >
[patch] remove NPCDISPLAY from AMD64
You can't put an ISA CGA/EGA/MGA in an AMD64 system, so these can go away. Does anyone know if there is an ordering reason that the coreboot efifb_cb_cnattach console is after the VGA attach? Things could be cleaned up a bit if the efifb entry points just checked for both efi and coreboot framebuffers in the same function. Index: wscons_machdep.c === RCS file: /cvs/src/sys/arch/amd64/amd64/wscons_machdep.c,v retrieving revision 1.14 diff -u -p -r1.14 wscons_machdep.c --- wscons_machdep.c14 Oct 2017 04:44:43 - 1.14 +++ wscons_machdep.c17 Jun 2020 02:05:34 - @@ -35,18 +35,12 @@ #include #include "vga.h" -#include "pcdisplay.h" -#if (NVGA > 0) || (NPCDISPLAY > 0) +#if (NVGA > 0) #include #include -#if (NVGA > 0) #include #include #endif -#if (NPCDISPLAY > 0) -#include -#endif -#endif #include "wsdisplay.h" #if NWSDISPLAY > 0 @@ -146,10 +140,6 @@ wscn_video_init(void) #endif #if (NEFIFB > 0) if (efifb_cb_cnattach() == 0) - return (0); -#endif -#if (NPCDISPLAY > 0) - if (pcdisplay_cnattach(X86_BUS_SPACE_IO, X86_BUS_SPACE_MEM) == 0) return (0); #endif return (-1);
Good CoreBoot system?
What would be a good system with CoreBoot to get for OpenBSD testing? It looks like there are some edge cases where the CoreBoot framebuffer support won't behave the same as the EFI framebuffer.
code style questions
Looking for some guidance to avoid proposing any unpopular diffs. Style(9) says not to use static on file-local functions in the kernel, because it interferes with the debugger. They still show up on some functions today; is this still an issue? I usually advocate for directly inlining small functions that are only called from one place, because it removes any doubt about whether it is or ever should be called elsewhere. For instance, these functions: vaddr_t efifb_early_map(paddr_t pa) { return pmap_set_pml4_early(pa); } void efifb_early_cleanup(void) { pmap_clear_pml4_early(); } Is that frowned on? I know I am on the longer-functions side of the spectrum. Also, style(9) says that prototypes should not have variable names associated with the types. I try to use good names in the headers for documentation purposes; what is the thinking behind the rule? int pcdisplay_copycols(void *id, int row, int srccol, int dstcol, int ncols); vs: int pcdisplay_copycols(void *, int, int, int,int);
Shrink efifb.c code
Most of the display drivers wrap the rasops functions, but you can get rid of a lot of code by swapping the cookies and just using the rasops functions directly in wsdisplay_accessops. It is unfortunate that an extra parameter was added to the alloc_screen function, aparently for an obscure driver, so rasops_alloc_screen can't be directly used. If this looks good, I can make similar changes in the intel, amd, and radeon display drivers, which would then justify adding a rasops_alloc_screen2 function with a dummy parameter to replace the wrapper in all four cases. This also consolidates the four separate calls to rasops_init, also fixing the fact that one of them did not include the RI_CENTER flag. Index: efifb.c === RCS file: /cvs/src/sys/arch/amd64/amd64/efifb.c,v retrieving revision 1.32 diff -u -p -r1.32 efifb.c --- efifb.c 28 May 2020 20:26:25 - 1.32 +++ efifb.c 10 Jun 2020 00:08:30 - @@ -92,18 +92,11 @@ struct efifb_softc { int efifb_match(struct device *, void *, void *); voidefifb_attach(struct device *, struct device *, void *); -voidefifb_rasops_preinit(struct efifb *); +voidefifb_rasops_init(struct efifb *, int); int efifb_ioctl(void *, u_long, caddr_t, int, struct proc *); paddr_t efifb_mmap(void *, off_t, int); int efifb_alloc_screen(void *, const struct wsscreen_descr *, void **, int *, int *, uint32_t *); -voidefifb_free_screen(void *, void *); -int efifb_show_screen(void *, void *, int, void (*cb) (void *, int, int), - void *); -int efifb_getchar(void *, int, int, struct wsdisplay_charcell *); -int efifb_list_font(void *, struct wsdisplay_font *); -int efifb_load_font(void *, void *, struct wsdisplay_font *); -voidefifb_scrollback(void *, void *, int lines); voidefifb_efiinfo_init(struct efifb *); voidefifb_cnattach_common(void); vaddr_t efifb_early_map(paddr_t); @@ -132,12 +125,12 @@ struct wsdisplay_accessops efifb_accesso .ioctl = efifb_ioctl, .mmap = efifb_mmap, .alloc_screen = efifb_alloc_screen, - .free_screen = efifb_free_screen, - .show_screen = efifb_show_screen, - .getchar = efifb_getchar, - .load_font = efifb_load_font, - .list_font = efifb_list_font, - .scrollback = efifb_scrollback, + .free_screen = rasops_free_screen, + .show_screen = rasops_show_screen, + .getchar = rasops_getchar, + .load_font = rasops_load_font, + .list_font = rasops_list_font, + .scrollback = rasops_scrollback, }; struct cfdriver efifb_cd = { @@ -198,9 +191,7 @@ efifb_attach(struct device *parent, stru return; } ri->ri_bits = bus_space_vaddr(iot, ioh); - efifb_rasops_preinit(fb); - ri->ri_flg = RI_VCONS | RI_CENTER | RI_WRONLY; - rasops_init(ri, EFIFB_HEIGHT, EFIFB_WIDTH); + efifb_rasops_init(fb, RI_VCONS); efifb_std_descr.ncols = ri->ri_cols; efifb_std_descr.nrows = ri->ri_rows; efifb_std_descr.textops = &ri->ri_ops; @@ -218,22 +209,19 @@ efifb_attach(struct device *parent, stru ccol = ri->ri_ccol; crow = ri->ri_crow; - efifb_rasops_preinit(fb); - ri->ri_flg &= ~RI_CLEAR; - ri->ri_flg |= RI_VCONS | RI_WRONLY; - - rasops_init(ri, EFIFB_HEIGHT, EFIFB_WIDTH); + efifb_rasops_init(fb, RI_VCONS); ri->ri_ops.pack_attr(ri->ri_active, 0, 0, 0, &defattr); wsdisplay_cnattach(&efifb_std_descr, ri->ri_active, ccol, crow, defattr); } + ri->ri_hw = sc; memset(&aa, 0, sizeof(aa)); aa.console = console; aa.scrdata = &efifb_screen_list; aa.accessops = &efifb_accessops; - aa.accesscookie = sc; + aa.accesscookie = ri; aa.defaultscreens = 0; config_found_sm(self, &aa, wsemuldisplaydevprint, @@ -241,7 +229,7 @@ efifb_attach(struct device *parent, stru } void -efifb_rasops_preinit(struct efifb *fb) +efifb_rasops_init(struct efifb *fb, int flags) { #define bmnum(_x) (fls(_x) - ffs(_x) + 1) #define bmpos(_x) (ffs(_x) - 1) @@ -271,14 +259,16 @@ efifb_rasops_preinit(struct efifb *fb) ri->ri_bpos = bmpos(bios_efiinfo->fb_blue_mask); } ri->ri_bs = efifb_bs; + /* if reinitializing, it is important to not clear all the flags */ + ri->ri_flg &= ~RI_CLEAR; + ri->ri_flg |= flags | RI_CENTER | RI_WRONLY; + rasops_init(ri, EFIFB_HEIGHT, EFIFB_WIDTH); } int efifb_ioctl(void *v, u_long cmd, caddr_t data, int flag, struct proc *p) { - struct efifb_softc *sc = v; - struct efifb*fb = sc->sc_fb; - struct rasops_info *ri = &fb->rinfo; + struct rasops_info
Numeric config variables
I think I am missing some basic bit of configuration lore. I hacked my kernel to change the font size selected by rasops, but I would like to make a proper configuration interface for it, something like an integer that biased the default size calculation. It would be nice to be able to use /etc/wsconsctl.conf to set it, but screens can't be reconfigured after they are created (I want to fix that in the future), and it would be better to have it take effect on console startup, well before anything else gets loaded. It feels like what I want is an integer variable that can be set to a default in the kernel config file, then modified on an existing kernel with config(8), or temporarily overridden with a parameter from boot(8). I see how I can make a numeric option in the config file, but it just turns into a #define that can't be modified. Is there any facility like this?
ttymalloc(int baud)
Another trivial code readability comment: I see 24 references to this in the code, 20 of which pass 0, and 4 of which pass 1,000,000 as the parameter. Passing 0 defaults to a baud of 115200. The baud determines the qlen, which is 4096 for 115200 and 8192 for anything larger, so all it is doing now is selecting between two buffer sizes via magic number parameters. It would probably be appropriate today to just always use 8192 and not even take a parameter; it doesn't look like the selection between the two is done in a rigorous way. There is also a comment in umodem.c that has an outdated assumption: /* * These are the maximum number of bytes transferred per frame. * Buffers are this large to deal with high speed wireless devices. * Capped at 1024 as ttymalloc() is limited to this amount. */ #define UMODEMIBUFSIZE 1024 #define UMODEMOBUFSIZE 1024
[PATCH} efifb support for wsmoused + smaller fonts
The efifb driver behaves almost identically to the inteldrm driver for wscons, but did not implement the getchar() accessops, so wsmoused would fail at startup. Separately, I increased the maximum screen dimensions to 160x50 to allow the 12x24 font to be used on an 1920 monitor, which looks great! Unlike the intel driver, efifb has a static buffer of that size, which is a non-trivial amount of memory. Not setting the backing store to save the memory results in an ultra-slow console until the driver gets fully initialized, because it is doing scrolls by reading from write combined memory instead of the redrawing all the characters. I plan on changing that to a dynamic allocation once I am more comfortable with the reinitialization that goes on when autoconf gets around to the console display device, so feel free to punt on that if you want to wait. Index: efifb.c === RCS file: /cvs/src/sys/arch/amd64/amd64/efifb.c,v retrieving revision 1.27 diff -u -p -r1.27 efifb.c --- efifb.c 24 Jan 2020 05:27:31 - 1.27 +++ efifb.c 24 May 2020 23:32:47 - @@ -100,6 +100,7 @@ int efifb_alloc_screen(void *, const st voidefifb_free_screen(void *, void *); int efifb_show_screen(void *, void *, int, void (*cb) (void *, int, int), void *); +int efifb_getchar(void *, int, int, struct wsdisplay_charcell *); int efifb_list_font(void *, struct wsdisplay_font *); int efifb_load_font(void *, void *, struct wsdisplay_font *); voidefifb_scrollback(void *, void *, int lines); @@ -114,8 +115,9 @@ const struct cfattach efifb_ca = { sizeof(struct efifb_softc), efifb_match, efifb_attach, NULL }; -#defineEFIFB_WIDTH 100 -#defineEFIFB_HEIGHT31 +/* support a 12x24 font on a 1920x1200 screen */ +#defineEFIFB_WIDTH 160 +#defineEFIFB_HEIGHT50 struct wsscreen_descr efifb_std_descr = { "std" }; @@ -133,6 +135,7 @@ struct wsdisplay_accessops efifb_accesso .alloc_screen = efifb_alloc_screen, .free_screen = efifb_free_screen, .show_screen = efifb_show_screen, + .getchar = efifb_getchar, .load_font = efifb_load_font, .list_font = efifb_list_font, .scrollback = efifb_scrollback, @@ -371,6 +374,15 @@ efifb_show_screen(void *v, void *cookie, struct rasops_info *ri = &sc->sc_fb->rinfo; return rasops_show_screen(ri, cookie, waitok, cb, cb_arg); +} + +int +efifb_getchar(void *v, int row, int col, struct wsdisplay_charcell *cell) +{ + struct efifb_softc *sc = v; + struct rasops_info *ri = &sc->sc_fb->rinfo; + + return rasops_getchar(ri, row, col, cell); } int
[PATCH] wsconsctl mouse.scaling
Ulf pointed out that you could scale the mouse speed with the cryptic: wsconsctl mouse.param=0:401,1:401 Looking into it more, wsconsctl already has a floating point version of this that sets both X and Y values, but it only shows up and functions if you have a touchpad, not a mouse: wsconsctl mouse.tp.scaling=0.1 Adding another exception to the FLG_DEAD check allow this to work on the mouse, just the way I wanted. I added an aliased parameter name so you could just use "mouse.scaling" instead of specifying something named as a touchpad variable, which also makes it self documenting when the parms are listed. It would make sense to remove the tp.scaling name, since this is generic like the reverse_scrolling line, but there are probably some people with the tp.scaling line in their current configuration files that would need to make a change. I would also suggest that the rawmode and scale parameters be removed, since we don't want the USB level drivers doing (improper) scaling before wscons does the proper scaling on it, and seeing the old scale list in the parameter dump is confusing: mouse.scale=0,0,0,0,0,0,0 mouse.scaling=0.500 Adding a line about mouse.scaling in the man page would be nice, but having it show up in the parameter dump now may be sufficient. Index: mouse.c === RCS file: /cvs/src/sbin/wsconsctl/mouse.c,v retrieving revision 1.20 diff -u -p -r1.20 mouse.c --- mouse.c 19 Aug 2019 21:42:33 - 1.20 +++ mouse.c 20 May 2020 20:16:51 - @@ -54,6 +54,7 @@ struct field mouse_field_tab[] = { { "rawmode", &rawmode, FMT_UINT, FLG_MODIFY|FLG_INIT}, { "scale", &wmcoords, FMT_SCALE, FLG_MODIFY|FLG_INIT}, /* mouse and touchpad configuration (mousecfg): */ +{ "scaling", &cfg_scaling, FMT_CFG,FLG_NORDBACK }, { "reverse_scrolling", &cfg_revscroll, FMT_CFG,FLG_NORDBACK }, /* touchpad-specific options: */ { "tp.tapping",&cfg_tapping, FMT_CFG,FLG_NORDBACK }, @@ -95,6 +96,7 @@ mouse_init(int devfd, int devidx) { for (f = mouse_field_tab; f->name != NULL; f++) if (f->format == FMT_CFG) { if (f->valp != &cfg_param + && f->valp != &cfg_scaling && f->valp != &cfg_revscroll) f->flags |= FLG_DEAD; else
Re: Mouse movement speed
Ok, that works -- I could not tell what that parsing code did in wsconsctl, and I thought the scaling code in wscons only got applied to touchpads. In that case, how about just allowing wsconsctl to accept mouse.speed as a special floating point parameter that sets both the DX and DY scales appropriately? I'll make a patch to the code and man page, and add a comment about the debug interface format. The scaling code in the USB HUD driver is definitely dropping low order bits, and seems to be completely subsumed by the newer parm based code, so removing WSMOUSEIO_SRES still seems like a good idea. Original Message Subject: Re: Mouse movement speed From: Ulf Brosziewski Date: Mon, May 18, 2020 12:16 pm To: jo...@armadilloaerospace.com, "tech@openbsd.org" Hi John, you can slow down mice via wsconsctl, but this possibility is "hidden" in a debug and testing interface. You have seen the WSMOUSECFG_D*_SCALE parameters, two parameters in *.12 fixed-point format that determine the scaling of DX- and DY-deltas. You can read the values with $ wsconsctl mouse.param=0,1 and change them by issuing, for example, $ wsconsctl mouse.param=0:401,1:401 This should reduce the speed to 10% of its "raw" value. (You don't need the device number in the commands if they apply to wsmouse0.) wsmouse tracks remainders when scaling, and slow movements shouldn't cause noticeable rounding errors at this layer. The effects, however, will also depend on what the receiving drivers make of the coordinates. On 5/18/20 01:17, jo...@armadilloaerospace.com wrote: > I enabled wsmoused for console mouse support, but the cursor was > unusably fast. This is a high resolution, high update rate USB gaming > mouse, but it was off by well over an order of magnitude. > > I searched for a global mouse speed setting, but the only thing I > found was a "mouse.scale=0,0,0,0,0,0,0" in wsconsctl, and I couldn't > figure out how to use it even after reading all the source. > > There is an ioctl for mouse resolution, but it only applies to PS/2 > mice: > > /* Set resolution. Not applicable to all mouse types. */ > #define WSMOUSEIO_SRES _IOW('W', 33, u_int) > > The wsmousecfg values for WSMOUSECFG_DX_SCALE and WSMOUSECFG_DY_SCALE > are only used for touchpads, and are zero for mice. > > The USB mouse driver can apply scaling to mouse delta movements if it > is taken out of raw mode and the wsmouse_calibcoords are set to > reasonable values, but the comments don't seem to be correct: > > /* Set/get sample coordinates for calibration */ > #define WSMOUSE_CALIBCOORDS_MAX 16 > #define WSMOUSE_CALIBCOORDS_RESET -1 > struct wsmouse_calibcoords { > int minx, miny; /* minimum value of X/Y */ > int maxx, maxy; /* maximum value of X/Y */ > int swapxy; /* swap X/Y axis */ > int resx, resy; /* X/Y resolution */ > int samplelen; /* number of samples available or > WSMOUSE_CALIBCOORDS_RESET for raw mode */ > struct wsmouse_calibcoord { > int rawx, rawy; /* raw coordinate */ > int x, y; /* translated coordinate */ > } samples[WSMOUSE_CALIBCOORDS_MAX]; /* sample coordinates */ > }; > > No kernel code uses the samples[] array, and the samplelen field is > interpreted as raw mode when non-0, and scaled by the other values > when it is 0. > > This explains wsconsctl's mouse.rawmode and mouse.scale[7] values, > but I still couldn't set them, getting > WSMOUSEIO_SCALIBCOORDS: Invalid argument for everything I tried. > > If I called the ioctl with my own code, I could clear raw mode and > scale the mouse speed down by 10x. > > That slows the mouse down, but the behavior is terrible. > > The scaling applied in hidms_input() would be reasonable for an > absolute referenced touch screen, but it isn't for integer valued > relative motion, because information is permanently lost in the > scaling -- you can slowly move the mouse an arbitrary distance > without the pointer moving at all, as the deltas are rounded to > zero, and it is done per-axis, so slight angles turn into strictly > axial motion. > > Mouse scaling needs to maintain a fractional residual for each axis. > > The user visible behavior I want to see is: > > wsconsctl mouse.speed=0.6 > > This could overload the existing DX_SCALE value, but that may still > need to be used independently for touchpad configuration, so I > think it would be better to define a new WSMOUSECFG_SPEED > parameter with the same *.12 fixed point format as > WSMOUSECFG_DX_SCALE. > > If this sounds reasonable, I will go ahead and make a diff for the > kernel and wsconsctl to implement it that with proper residuals. > > For additional cleanup to make it net-negative-LoC, could everything > relating to struct wsmouse_calibcoords go away completely? It looks > like all the fields are now replicated as wsmousecfg
Wsmoused cursor tracking
With the changes I proposed in the "mouse movement speed" message, the text mode cursor can be slowed down to the point where it is controllable, but it still isn't ideal for a few reasons: It is still anisotropic, with vertical movement feeling faster than horizontal due to the aspect of the character cells. It would be at drastically different "screen crossing speeds" in X versus the console. They could both be addressed by having wscons track the cursor position in pixels, and divide by the font size whenever it needs to get a character cell location. For a framebuffer based console that would make the mouse feel identical in X and the console. A VGA text console would still be a few times faster than a high res X screen, but it would be 8x closer than it is today. The drm framebuffer drivers might even choose to use a pixel addressable cursor in text modes. Even with that change, the console cursor would still feel strange relative to a conventional graphics cursor. Presumably in an attempt to mitigate the crazy-fast movement speed, wsmoused applies a dubious scaling to the mouse deltas: two_power = 1; dy = abs(event->value); while (dy > 2) { two_power++; dy = dy / 2; } event->value = event->value / (NORMALIZE_DIVISOR * two_power); This is sort of an inverse-acceleration, where the faster you move, the more it divides the value down. This has the problem of truncating fractional residuals and scaling the axis unequally, but worse, it makes the speeds not even monotonically increasing. So, the cursor movements are too fast, anisotropic in steady speeds, lose residuals, and have a non-monotonic, axial-biased speed profile. I am very confident that removing the event normalization code in wsmoused and doing the pixel based tracking in wscons will make a much better cursor in general, but it is also likely that there will be some users that have made themselves completely comfortable with the current idiosyncratic behavior, and will be put off by any changes. They could get the raw speed back by setting mouse.speed=8, but if they switch back and forth to X it still won't be "the way it was". If it was necessary to have a compatibility behavior flag, I would rather see it in wsmoused than as a kernel mouse parameter.
Mouse movement speed
I enabled wsmoused for console mouse support, but the cursor was unusably fast. This is a high resolution, high update rate USB gaming mouse, but it was off by well over an order of magnitude. I searched for a global mouse speed setting, but the only thing I found was a "mouse.scale=0,0,0,0,0,0,0" in wsconsctl, and I couldn't figure out how to use it even after reading all the source. There is an ioctl for mouse resolution, but it only applies to PS/2 mice: /* Set resolution. Not applicable to all mouse types. */ #define WSMOUSEIO_SRES _IOW('W', 33, u_int) The wsmousecfg values for WSMOUSECFG_DX_SCALE and WSMOUSECFG_DY_SCALE are only used for touchpads, and are zero for mice. The USB mouse driver can apply scaling to mouse delta movements if it is taken out of raw mode and the wsmouse_calibcoords are set to reasonable values, but the comments don't seem to be correct: /* Set/get sample coordinates for calibration */ #define WSMOUSE_CALIBCOORDS_MAX 16 #define WSMOUSE_CALIBCOORDS_RESET -1 struct wsmouse_calibcoords { int minx, miny; /* minimum value of X/Y */ int maxx, maxy; /* maximum value of X/Y */ int swapxy; /* swap X/Y axis */ int resx, resy; /* X/Y resolution */ int samplelen; /* number of samples available or WSMOUSE_CALIBCOORDS_RESET for raw mode */ struct wsmouse_calibcoord { int rawx, rawy; /* raw coordinate */ int x, y; /* translated coordinate */ } samples[WSMOUSE_CALIBCOORDS_MAX]; /* sample coordinates */ }; No kernel code uses the samples[] array, and the samplelen field is interpreted as raw mode when non-0, and scaled by the other values when it is 0. This explains wsconsctl's mouse.rawmode and mouse.scale[7] values, but I still couldn't set them, getting WSMOUSEIO_SCALIBCOORDS: Invalid argument for everything I tried. If I called the ioctl with my own code, I could clear raw mode and scale the mouse speed down by 10x. That slows the mouse down, but the behavior is terrible. The scaling applied in hidms_input() would be reasonable for an absolute referenced touch screen, but it isn't for integer valued relative motion, because information is permanently lost in the scaling -- you can slowly move the mouse an arbitrary distance without the pointer moving at all, as the deltas are rounded to zero, and it is done per-axis, so slight angles turn into strictly axial motion. Mouse scaling needs to maintain a fractional residual for each axis. The user visible behavior I want to see is: wsconsctl mouse.speed=0.6 This could overload the existing DX_SCALE value, but that may still need to be used independently for touchpad configuration, so I think it would be better to define a new WSMOUSECFG_SPEED parameter with the same *.12 fixed point format as WSMOUSECFG_DX_SCALE. If this sounds reasonable, I will go ahead and make a diff for the kernel and wsconsctl to implement it that with proper residuals. For additional cleanup to make it net-negative-LoC, could everything relating to struct wsmouse_calibcoords go away completely? It looks like all the fields are now replicated as wsmousecfg parms and handled at the wsmouse level for the touchpad case, so doing it at the usb device or other level is redundant, as well as being broken for relative moves. It would be nice to use parms for everything, also getting rid of WSMOUSEIO_GTYPE and WSMOUSEIO_SRES.
Code changes for clarity
(Not sure if this belongs in tech@ or misc@) What is the thinking around non-functional code changes that just improve clarity without functionality changes? I can imagine bad experiences with that, but there is certainly room for improvement. For instance, in the wsdisplay_emulops structure, there are: int (*alloc_attr)(void *c, int fg, int bg, int flags, long *attrp); void(*unpack_attr)(void *c, long attr, int *fg, int *bg, int *ul); And at the end of the structure is this comment, showing that at least someone (other than me) was confused by it: /* XXX need a free_attr() ??? */ I would suggest that alloc_attr should be renamed to pack_attr, but there are 84 matches on alloc_attr across 39 unique files. Similarly, I am taking notes as I read the code, and adding a few comments in various files and functions could help other people get up to speed much faster in the future, but I am hesitant to submit patches. I am hoping to write a decent article about my explorations, but inline comments are more durable documentation.
Improving extended VGA text modes
The process of using 40/50 row vga text modes is pretty cumbersome now: wsfontload -h 8 -e ibm /usr/share/misc/pcvtfonts/vt220l.808 wsconscfg -dF 1 wsconscfg -t 80x50 1 I just reported a bug and a fix with font corruption, but I want to propose a few usability improvements that I am interested in hacking on. Not caring about extended VGA text modes is a perfectly reasonable position today, so wave me off if this wouldn't be desirable. Including a default 8x8 and 8x10 font in the kernel to avoid the need to wsfontload would be nice, and allow it to be used for early boot. Is 4.5k too much for a fringe feature? 2k for just the 8x8? Wsconscfg could allow the -dF options at the same time as the type and emul options, allowing an existing screen to be deleted and reconfigured in a single step. Better yet would be to allow the screen type to be reconfigured on the fly without deleting it at all. If that is done, then wscons could support ctrl-alt-+/- to dynamically change text modes. In the modern world, it would be really nice to have similar ability to change the fonts used by the framebuffer based consoles to get denser text modes. Wsconsctl can report the display.fontwidth/height, but can't change it. Use of different spleen fonts could be defined as a "type" like the VGA modes and take advantage of the other changes.