[patch] traceroute timeouts

2021-08-20 Thread johnc
The default traceroute timeout of 5 seconds is excruciatingly long
when there are elements of the route that don't respond, and it
wasn't allowed to be set lower than 2 seconds.

This changes the minimum to 1 second, matching FreeBSD, and also
makes that the default, which should be reasonable for the vast
majority of users today.

The two awk files in this directory are two decades old, and
not installed anywhere they can be executed as part of a traceroute
pipeline; can they be removed? If the functionality is useful,
implementing mean/median reporting as a new option in C would be
straightforward.

Index: usr.sbin/traceroute/traceroute.8
===
RCS file: /cvs/src/usr.sbin/traceroute/traceroute.8,v
retrieving revision 1.69
diff -u -p -u -r1.69 traceroute.8
--- usr.sbin/traceroute/traceroute.811 Feb 2020 18:41:39 -  1.69
+++ usr.sbin/traceroute/traceroute.820 Aug 2021 06:33:30 -
@@ -201,7 +201,7 @@ and
 are listed.
 .It Fl w Ar waittime
 Set the time, in seconds, to wait for a response to a probe.
-The default is 5.
+The default is 1.
 .It Fl x
 Print the ICMP extended headers if available.
 This option is not available for IPv6.
Index: usr.sbin/traceroute/traceroute.c
===
RCS file: /cvs/src/usr.sbin/traceroute/traceroute.c,v
retrieving revision 1.164
diff -u -p -u -r1.164 traceroute.c
--- usr.sbin/traceroute/traceroute.c12 Jul 2021 15:09:21 -  1.164
+++ usr.sbin/traceroute/traceroute.c20 Aug 2021 06:33:30 -
@@ -351,7 +351,7 @@ main(int argc, char *argv[])
rcvsock4 = rcvsock6 = sndsock4 = sndsock6 = -1;
v4sock_errno = v6sock_errno = 0;
 
-   conf->waittime = 5 * 1000;
+   conf->waittime = 1000;
 
if ((rcvsock6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == -1)
v6sock_errno = errno;
@@ -554,9 +554,9 @@ main(int argc, char *argv[])
err(1, "setsockopt SO_RTABLE");
break;
case 'w':
-   conf->waittime = strtonum(optarg, 2, INT_MAX, &errstr);
+   conf->waittime = strtonum(optarg, 1, INT_MAX, &errstr);
if (errstr)
-   errx(1, "wait must be >1 sec.");
+   errx(1, "wait must be >=1 sec.");
conf->waittime *= 1000;
break;
case 'x':




Re: Poison file names

2020-12-11 Thread johnc
I would like to be able to clone the github mirror on windows.  I do
wind up
using 7z on the tar file as a workaround, but it would be nice if github
"just worked".  The com files is what the clone fails on, and those
seemed
easy enough to address, but if it is actually a deep rat hole, I
certainly
understand.

 Original Message 
Subject: Re: Poison file names
From: Daniel Dickman 
Date: Fri, December 11, 2020 9:42 am
To: jo...@armadilloaerospace.com, "tech@openbsd.org" 

On Wed, Dec 9, 2020 at 4:50 AM Jonathan Gray  wrote:
>
> On Tue, Dec 08, 2020 at 11:36:37PM -0700, jo...@armadilloaerospace.com wrote:
> > The game battlestar has source files names com1.c through com7.c, which
> > are illegal on windows due to ancient dos com port rules.
> >
> > I understand there might not be much sympathy for that, but being able
> > to have the full source tree on a windows system can be convenient, and
> > it should be a painless change.
> >
> > Any chance someone could do that rename in the repository and change
> > the Makefile?
>
> NetBSD renamed them all to command for that reason.
>
> https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
>
> usr.sbin/pkg_add/pod/OpenBSD::*.pod would also be a problem due to colons
>

Hi John, maybe you can say more how you're getting the files to a
windows system? If we rename the battlestar files, and given it won't
completely solve things for you as jsg@ points out, does it make any
difference for you?

I do this myself too from time to time by extracting the src.tar.gz
using 7z (https://www.7-zip.org/). That tool seems to automatically
rename the comX.c files to _comX.c. And for the pod files it seems to
automatically replace the invalid ":" characters with "_".

The only annoyance I have left is that there are a small number of
remaining conflicts from upper/lower case on unix vs windows.

Bottom line, for me extracting with 7z and just pressing
"automatically rename" on the first conflict seems to work as all
those conflicting filenames are for things I really don't look at on
my windows box.

But maybe you could tell us more if changing battlestar would
partially improve things for you? Either way, I don't object to
syncing battlestar with netbsd's 2001 commit.



Poison file names

2020-12-09 Thread johnc
The game battlestar has source files names com1.c through com7.c, which
are illegal on windows due to ancient dos com port rules.

I understand there might not be much sympathy for that, but being able
to have the full source tree on a windows system can be convenient, and
it should be a painless change.

Any chance someone could do that rename in the repository and change
the Makefile?




PATCH: better error return for exFAT filesystem

2020-08-07 Thread johnc
Perform an explicit check for the unsupported exFAT MSDOS filesystem
instead of letting it fail mysteriously when it gets cluster sizes
of 0 from the normal fields.

This causes mount_msdos to report:
mount_msdos: /dev/sd1i on /root/mnt: filesystem not supported by kernel

Instead of the more obscure:
mount_msdos: /dev/sd1i on /root/mnt: Inapropriate file type or format

Index: msdosfs_vfsops.c
===
RCS file: /cvs/src/sys/msdosfs/msdosfs_vfsops.c,v
retrieving revision 1.93
diff -u -p -r1.93 msdosfs_vfsops.c
--- msdosfs_vfsops.c24 Jan 2020 03:49:34 -  1.93
+++ msdosfs_vfsops.c7 Aug 2020 19:52:04 -
@@ -298,6 +298,12 @@ msdosfs_mountfs(struct vnode *devvp, str
b50 = (struct byte_bpb50 *)bsp->bs50.bsBPB;
b710 = (struct byte_bpb710 *)bsp->bs710.bsBPB;
 
+   /* No support for exFAT filesystems */
+   if (!strncmp("EXFAT", bsp->bs33.bsOemName, 5)) {
+   error = EOPNOTSUPP;
+   goto error_exit;
+   }
+
pmp = malloc(sizeof *pmp, M_MSDOSFSMNT, M_WAITOK | M_ZERO);
pmp->pm_mountp = mp;
 




PATCH: iostat spacing

2020-08-07 Thread johnc
IO rates above 100 MB/s are common with SSD; this patch expands the
column so it stays neatly printed.

An argument can be made for expanding it one more for fast M.2 drives.

? dkstats.d
? iostat
? iostat.d
Index: iostat.c
===
RCS file: /cvs/src/usr.sbin/iostat/iostat.c,v
retrieving revision 1.42
diff -u -p -r1.42 iostat.c
--- iostat.c14 Oct 2019 19:22:17 -  1.42
+++ iostat.c7 Aug 2020 19:00:45 -
@@ -227,7 +227,7 @@ header(void)
if (ISSET(todo, SHOW_TOTALS))
printf(" %18.18s ", cur.dk_name[i]);
else
-   printf(" %16.16s ", cur.dk_name[i]);
+   printf(" %17.17s ", cur.dk_name[i]);
}
if (ISSET(todo, SHOW_STATS_2))
for (i = 0; i < dk_ndrive; i++)
@@ -252,7 +252,7 @@ header(void)
if (ISSET(todo, SHOW_TOTALS))
printf("  KB/t   xfr MB ");
else
-   printf("  KB/t  t/s  MB/s ");
+   printf("  KB/t  t/s   MB/s ");
}
if (ISSET(todo, SHOW_STATS_2))
for (i = 0; i < dk_ndrive; i++)
@@ -299,10 +299,7 @@ disk_stats(double etime)
(double)(1024 * 1024);
else
mbps = 0;
-   if (ISSET(todo, SHOW_TOTALS))
-   printf(" %6.2f ", mbps / etime);
-   else
-   printf(" %5.2f ", mbps / etime);
+   printf(" %6.2f ", mbps / etime);
}
 }
 




exFAT support

2020-08-06 Thread johnc
I tried to mount a 12TB USB drive, and was getting an "Inappropriate
file type or format" error.

It turned out to be due to exFAT formatting, but it took me some
investigating to figure that out.  Would it be reasonable to have the
kernel print a more informative warning like "exFAT filesystem not
supported" when you try to mount it with mount_msdos, or are additional
kernel prints considered bad form?

With Microsoft's release of the spec last year, is the path open for
kernel support now, when someone gets around to it?

I have seen some comments about the reliability of exfat-fuse, will I
be sad if I try to use it for something important?




Proposal: integrate wsmoused into kernel

2020-07-14 Thread johnc
There are several user visible "warts" with wsmoused:

Wsmoused conflicts with X's use of the mouse, you can't ctrl-alt-screen
between X and text consoles and use the mouse in both of them.

The cursor location is in character cells instead of pixels, so the
cursor moves twice as fast vertically as horizontally, and you need to
scale down the mouse to get a reasonable speed.  Wsmoused makes an
attempt to address this by setting MSMOUSEIO_SRES to WSMOUSE_RES_MIN,
but that only does anything on PS/2 mice.  There is also a really
strange (not even monotonically increasing or directionally
invariant!) coordinate scaling function in normalize_event().

Wsmoused is enabled with rcctl and configuration options are set with
wsmoused_flags= in /etc/rc.local instead of wsconsctl / wsconsctl.conf.


I propose:

Wsdisplay would have a mouse device associated with it as keyboards are
associated, with "just works" defaults and wsconsctl options.  Console
cursor work would be automatically supressed when X is active.

Cursor position would be tracked in pixels.  If no additional scaling
is done in X, the cursor would move at the same speed across the
graphics or text console screens.  The possibility is also then open
to draw a pixel addressed bitmap cursor on any of the rasops consoles,
which is more user friendly than a text block cursor for both
disambiguation from the typing cursor, and avoidance of the unknown
block boundary jitter problem.

This would also save a couple thousand lines of code, remove a "moving
part" from the system, and allow additional improvements like drag
selection into the scrollback to be implemented.

This would lose the support wsmoused has for some ancient serial mouse
protocols parsed in user space.  I would argue that if those mice are
actually important, they should get a kernel driver.  However, I
suspect that the intersection of wsmoused users with 25 year old mice
is small enough to neglect.


Sound ok to pursue?




Re: Use VGA text mode palette RGB values in rasops(9)

2020-07-07 Thread johnc
I have also noticed that comments are hard to read in color-vim
because of the dark blue on black; increasing the intensity would
be a trivial improvement.

I have thought about testing a bold attribute implementation by
way of shift-and-overlay of the font bitmap so all the colors could
be full intensity.  The effect would be resolution dependent, but
probably superior for at least the 12x24 and 16x32 fonts.  32x64
might need triple striking.  An actual bold font for each size
would be superior quality and speed at a space cost, of course.

Related, I looked through the xterm code to see how hard it would
be to use some of it directly in the wscons system for accurate
emulation, and it looked pretty challenging to me.  Fixing problems
piecemeal is probably easier than trying to reuse code.


 Original Message 
Subject: Re: Use VGA text mode palette RGB values in rasops(9)
From: "Theo de Raadt" 
Date: Tue, July 07, 2020 9:07 am
To: Frederic Cambus 
Cc: tech@openbsd.org, Jonathan Gray 

Frederic Cambus  wrote:

> One can test in frame buffer consoles by doing:
> 
> export TERM=wsvt25
> 
> And run either vim or colorls -G.
> 

Furthermore the situation is just beyond ridiculous.

On the one hand your diff is talking about minimizing the differences
between terminals, but your example immediately heads in the opposite
direction by SELECTING for divergent configuration -- which I suspect
noone except you uses.

This termcap mess should never have happened. Up until about 10 years
ago, pointless and accidental divergence in console emulators ended up
being reflected in termcap entries *which to a large degree noone used,
except for the people making the termcap entries*, and the situation is
so retarded, because is ENTRENCHES differences rather than recognizing
the differences shouldn't have existed.

The console emulators SHOULD HAVE become more mundane and less
featureful, they should have become closer to a clean blend of vt220 +
xterm. But no! People focused on extremely narrow details and hid them
behind TERM feature flags, and avoided focusing on the big picture of
improving ease of utilization and compatibility.

It is very sad.



Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread johnc
I did some more tests, and I think the odd performance I am seeing
may be due to TLB thrash on the 32x64 characters with 4k pages,
since writing each character will require 64 data TLB.

Are huge page mappings supported in OpenBSD?

 Original Message 
Subject: Re: [PATCH} Optimized rasops32 putchar
From: Mark Kettenis 
Date: Sat, June 27, 2020 1:30 pm
To: 
Cc: tech@openbsd.org

> Content-Type: text/plain; charset="utf-8"
> From: 
> 
> I was doing my timings with a user mode program after mmmaping the
> efifb display, so the mapping might be different in the kernel.

That should still give you a write-combining mapping as efifb_mmap()
adds the PMAP_WC flag to the physical address.

Cachable on x86 means write-back cachable. And using a write-back
cachable mapping for a framebuffer often leads to interesting "damage"
where pixels in certain cache lines show up "late" on the display.
Not sure if you'd see that on recent Intel graphics hardware as the
current hardware designs are much more coherent than what was produced
in the past.

> Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES /
> WSDISPLAYIO_SMODE to the drm drivers by consolidating code into
> rasops. While the point of the DRM drivers is to get fully hardware
> accelerated drawing in X, there isn't any reason why they can't
> support dumb framebuffer mappings as well.

True. Although there are DRM interfaces that give you a dumb
framebuffer as well. Using those interfaces is a bit more complicated
though.

Centralising the code would be good. That code probably should use
bus_space_mmap(4) as the PMAP_WC flag is amd64-specific.
Unfortunately the amd64 implementation of bus_space_mmap(4) is
incomplete and doesn't actually implement write-combining for mappings
with the BUS_SPACE_MAP_PREFETCHABLE flag set. So that has to be fixed
as well.

>  Original Message 
> Subject: RE: [PATCH} Optimized rasops32 putchar
> From: 
> Date: Sat, June 27, 2020 11:13 am
> To: "Mark Kettenis" 
> Cc: "tech@openbsd.org" 
> 
> I believe it is mapped as normally cached right now, rather than
> uncached or write combining.
> 
> Reads aren't ultra-slow, and the timings of 48 byte writes appear to
> involve a cacheline read.
> 
> 128 byte writes are actually slower than 64 byte writes, which I
> guessed might be because of automatic prefetching kicking in and
> reading the following cacheline.
> 
> 
>  Original Message 
> Subject: Re: [PATCH} Optimized rasops32 putchar
> From: Mark Kettenis 
> Date: Sat, June 27, 2020 7:56 am
> To: 
> Cc: tech@openbsd.org
> 
> > From: 
> > Date: Fri, 26 Jun 2020 07:42:50 -0700
> > 
> > Optimized 32 bit character rendering with unrolled rows and pairwise
> > foreground / background pixel rendering.
> > 
> > If it weren't for the 5x8 font, I would have just assumed everything
> > was an even width and made the fallback path also pairwise.
> > 
> > In isolation, the 16x32 character case got 2x faster, but that wasn't
> > a huge real world speedup where the space rendering that was already
> > at memory bandwidth limits accounted for most of the character
> > rendering time. However, in combination with the previous fast
> > conditional console scrolling that removes most of the space rendering,
> > it becomes significant.
> > 
> > I also found that at least the efi and intel framebuffers are not
> > currently mapped write combining, which makes this much slower than
> > it should be.
> 
> Hi John,
> 
> The framebuffer should be mapped write-combining. In OpenBSD this is
> requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
> bbus_space_map(9) when mapping the framebuffer.
> 
> I'm fairly confident since until last January the initial mapping of
> the framebuffer that we used wasn't write-combining. And things were
> really, really slow.
> 
> Cheers,
> 
> Mark
> 
> > Index: rasops32.c
> > ===
> > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 rasops32.c
> > --- rasops32.c 25 May 2020 09:55:49 - 1.10
> > +++ rasops32.c 26 Jun 2020 14:34:06 -
> > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> > int
> > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> > attr)
> > {
> > - int width, height, cnt, fs, fb, clr[2];
> > + int width, height, step, cnt, fs, b, f;
> > + uint32_t fb, clr[2];
> > struct rasops_info *ri;
> > - int32_t *dp, *rp;
> > + int64_t *rp, q;
> > + union {
> > + int64_t q[4];
> > + int32_t d[4][2];
> > + } u;
> > u_char *fr;
> > 
> > ri = (struct rasops_info *)cookie;
> > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> > return 0;
> > #endif
> > 
> > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> > 
> > height = ri->ri_font->fontheight;
> > width = ri->ri_font->fontwidth;
> > + step = ri->ri_

Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread johnc
I was doing my timings with a user mode program after mmmaping the
efifb display, so the mapping might be different in the kernel.

Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES /
WSDISPLAYIO_SMODE to the drm drivers by consolidating code into
rasops. While the point of the DRM drivers is to get fully hardware
accelerated drawing in X, there isn't any reason why they can't
support dumb framebuffer mappings as well.


 Original Message 
Subject: RE: [PATCH} Optimized rasops32 putchar
From: 
Date: Sat, June 27, 2020 11:13 am
To: "Mark Kettenis" 
Cc: "tech@openbsd.org" 

I believe it is mapped as normally cached right now, rather than
uncached or write combining.

Reads aren't ultra-slow, and the timings of 48 byte writes appear to
involve a cacheline read.

128 byte writes are actually slower than 64 byte writes, which I
guessed might be because of automatic prefetching kicking in and
reading the following cacheline.


 Original Message 
Subject: Re: [PATCH} Optimized rasops32 putchar
From: Mark Kettenis 
Date: Sat, June 27, 2020 7:56 am
To: 
Cc: tech@openbsd.org

> From: 
> Date: Fri, 26 Jun 2020 07:42:50 -0700
> 
> Optimized 32 bit character rendering with unrolled rows and pairwise
> foreground / background pixel rendering.
> 
> If it weren't for the 5x8 font, I would have just assumed everything
> was an even width and made the fallback path also pairwise.
> 
> In isolation, the 16x32 character case got 2x faster, but that wasn't
> a huge real world speedup where the space rendering that was already
> at memory bandwidth limits accounted for most of the character
> rendering time. However, in combination with the previous fast
> conditional console scrolling that removes most of the space rendering,
> it becomes significant.
> 
> I also found that at least the efi and intel framebuffers are not
> currently mapped write combining, which makes this much slower than
> it should be.

Hi John,

The framebuffer should be mapped write-combining. In OpenBSD this is
requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
bbus_space_map(9) when mapping the framebuffer.

I'm fairly confident since until last January the initial mapping of
the framebuffer that we used wasn't write-combining. And things were
really, really slow.

Cheers,

Mark

> Index: rasops32.c
> ===
> RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 rasops32.c
> --- rasops32.c 25 May 2020 09:55:49 - 1.10
> +++ rasops32.c 26 Jun 2020 14:34:06 -
> @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> int
> rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> attr)
> {
> - int width, height, cnt, fs, fb, clr[2];
> + int width, height, step, cnt, fs, b, f;
> + uint32_t fb, clr[2];
> struct rasops_info *ri;
> - int32_t *dp, *rp;
> + int64_t *rp, q;
> + union {
> + int64_t q[4];
> + int32_t d[4][2];
> + } u;
> u_char *fr;
> 
> ri = (struct rasops_info *)cookie;
> @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> return 0;
> #endif
> 
> - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> 
> height = ri->ri_font->fontheight;
> width = ri->ri_font->fontwidth;
> + step = ri->ri_stride >> 3;
> 
> - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
> - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
> + b = ri->ri_devcmap[(attr >> 16) & 0xf];
> + f = ri->ri_devcmap[(attr >> 24) & 0xf];
> + u.d[0][0] = b; u.d[0][1] = b;
> + u.d[1][0] = b; u.d[1][1] = f;
> + u.d[2][0] = f; u.d[2][1] = b;
> + u.d[3][0] = f; u.d[3][1] = f;
> 
> if (uc == ' ') {
> + q = u.q[0];
> while (height--) {
> - dp = rp;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--)
> - *dp++ = clr[0];
> + /* the general, pixel-at-a-time case is fast enough */
> + for (cnt = 0; cnt < width; cnt++)
> + ((int *)rp)[cnt] = b;
> + rp += step;
> }
> } else {
> uc -= ri->ri_font->firstchar;
> fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale;
> fs = ri->ri_font->stride;
> -
> - while (height--) {
> - dp = rp;
> - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) |
> - (fr[0] << 24);
> - fr += fs;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--) {
> - *dp++ = clr[(fb >> 31) & 1];
> - fb <<= 1;
> - }
> + /* double-pixel special cases for the common widths */
> + switch (width) {
> + case 8:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + rp += step;
> + fr += 1;
> + }
> + break;
> + 
> + case 12:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> + rp += step;
> + fr += 2;

Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread johnc
I believe it is mapped as normally cached right now, rather than
uncached or write combining.

Reads aren't ultra-slow, and the timings of 48 byte writes appear to
involve a cacheline read.

128 byte writes are actually slower than 64 byte writes, which I
guessed might be because of automatic prefetching kicking in and
reading the following cacheline.


 Original Message 
Subject: Re: [PATCH} Optimized rasops32 putchar
From: Mark Kettenis 
Date: Sat, June 27, 2020 7:56 am
To: 
Cc: tech@openbsd.org

> From: 
> Date: Fri, 26 Jun 2020 07:42:50 -0700
> 
> Optimized 32 bit character rendering with unrolled rows and pairwise
> foreground / background pixel rendering.
> 
> If it weren't for the 5x8 font, I would have just assumed everything
> was an even width and made the fallback path also pairwise.
> 
> In isolation, the 16x32 character case got 2x faster, but that wasn't
> a huge real world speedup where the space rendering that was already
> at memory bandwidth limits accounted for most of the character
> rendering time. However, in combination with the previous fast
> conditional console scrolling that removes most of the space rendering,
> it becomes significant.
> 
> I also found that at least the efi and intel framebuffers are not
> currently mapped write combining, which makes this much slower than
> it should be.

Hi John,

The framebuffer should be mapped write-combining. In OpenBSD this is
requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
bbus_space_map(9) when mapping the framebuffer.

I'm fairly confident since until last January the initial mapping of
the framebuffer that we used wasn't write-combining. And things were
really, really slow.

Cheers,

Mark

> Index: rasops32.c
> ===
> RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 rasops32.c
> --- rasops32.c 25 May 2020 09:55:49 - 1.10
> +++ rasops32.c 26 Jun 2020 14:34:06 -
> @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> int
> rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> attr)
> {
> - int width, height, cnt, fs, fb, clr[2];
> + int width, height, step, cnt, fs, b, f;
> + uint32_t fb, clr[2];
> struct rasops_info *ri;
> - int32_t *dp, *rp;
> + int64_t *rp, q;
> + union {
> + int64_t q[4];
> + int32_t d[4][2];
> + } u;
> u_char *fr;
> 
> ri = (struct rasops_info *)cookie;
> @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> return 0;
> #endif
> 
> - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> 
> height = ri->ri_font->fontheight;
> width = ri->ri_font->fontwidth;
> + step = ri->ri_stride >> 3;
> 
> - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
> - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
> + b = ri->ri_devcmap[(attr >> 16) & 0xf];
> + f = ri->ri_devcmap[(attr >> 24) & 0xf];
> + u.d[0][0] = b; u.d[0][1] = b;
> + u.d[1][0] = b; u.d[1][1] = f;
> + u.d[2][0] = f; u.d[2][1] = b;
> + u.d[3][0] = f; u.d[3][1] = f;
> 
> if (uc == ' ') {
> + q = u.q[0];
> while (height--) {
> - dp = rp;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--)
> - *dp++ = clr[0];
> + /* the general, pixel-at-a-time case is fast enough */
> + for (cnt = 0; cnt < width; cnt++)
> + ((int *)rp)[cnt] = b;
> + rp += step;
> }
> } else {
> uc -= ri->ri_font->firstchar;
> fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale;
> fs = ri->ri_font->stride;
> -
> - while (height--) {
> - dp = rp;
> - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) |
> - (fr[0] << 24);
> - fr += fs;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--) {
> - *dp++ = clr[(fb >> 31) & 1];
> - fb <<= 1;
> - }
> + /* double-pixel special cases for the common widths */
> + switch (width) {
> + case 8:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + rp += step;
> + fr += 1;
> + }
> + break;
> + 
> + case 12:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> + rp += step;
> + fr += 2;
> + }
> + break;
> + 
> + case 16:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> + rp[6] = u.q[(fb >> 2) & 3];
> + rp[7] = u.q[fb & 3];
> + rp += step;
> + fr += 2;
> + }
> + break; 
> + case 32:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> 

Re: [PATCH] fast conditional console scrolling

2020-06-26 Thread johnc
I should have been more rigorous -- I had two different changes running
on my system, as well as forcing it to use the 12x24 font for a 160x45
console.

If you apply the "Optimized rasops32 putchar" patch I just posted, you
should see another significant speedup.


 Original Message 
Subject: Re: [PATCH] fast conditional console scrolling
From: Paul de Weerd 
Date: Fri, June 26, 2020 1:23 am
To: jo...@armadilloaerospace.com
Cc: "tech@openbsd.org" 

Hi John,

I tried your diff. I don't quite see the same 3x improvement that you
report, more like 2x. I timed 7 runs of ls -R /usr/ports:

Before diff, time ls -R /usr/ports | wc -l 2.897s on average
After diff, time ls -R /usr/ports | wc -l 2.707s on average

Before diff, time ls -R /usr/ports 2m53.067 on average
After diff, time ls -R /usr/ports 1m30.387 on average

Note that the 'before diff' runs were with a snapshot kernel. There
may be diffs in there that account for the difference between before
and after of the no-output runs. See dmesg and full stats below.

So, on average, a speed-up of ~48%.

Thanks!

Paul 'WEiRD' de Weerd




[PATCH} Optimized rasops32 putchar

2020-06-26 Thread johnc
Optimized 32 bit character rendering with unrolled rows and pairwise
foreground / background pixel rendering.

If it weren't for the 5x8 font, I would have just assumed everything
was an even width and made the fallback path also pairwise.

In isolation, the 16x32 character case got 2x faster, but that wasn't
a huge real world speedup where the space rendering that was already
at memory bandwidth limits accounted for most of the character
rendering time.  However, in combination with the previous fast
conditional console scrolling that removes most of the space rendering,
it becomes significant.

I also found that at least the efi and intel framebuffers are not
currently mapped write combining, which makes this much slower than
it should be.


Index: rasops32.c
===
RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
retrieving revision 1.10
diff -u -p -r1.10 rasops32.c
--- rasops32.c  25 May 2020 09:55:49 -  1.10
+++ rasops32.c  26 Jun 2020 14:34:06 -
@@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
 int
 rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
attr)
 {
-   int width, height, cnt, fs, fb, clr[2];
+   int width, height, step, cnt, fs, b, f;
+   uint32_t fb, clr[2];
struct rasops_info *ri;
-   int32_t *dp, *rp;
+   int64_t *rp, q;
+   union {
+   int64_t q[4];
+   int32_t d[4][2];
+   } u;
u_char *fr;
 
ri = (struct rasops_info *)cookie;
@@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
return 0;
 #endif
 
-   rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
+   rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
 
height = ri->ri_font->fontheight;
width = ri->ri_font->fontwidth;
+   step = ri->ri_stride >> 3;
 
-   clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
-   clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
+   b = ri->ri_devcmap[(attr >> 16) & 0xf];
+   f = ri->ri_devcmap[(attr >> 24) & 0xf];
+   u.d[0][0] = b; u.d[0][1] = b;
+   u.d[1][0] = b; u.d[1][1] = f;
+   u.d[2][0] = f; u.d[2][1] = b;
+   u.d[3][0] = f; u.d[3][1] = f;
 
if (uc == ' ') {
+   q = u.q[0];
while (height--) {
-   dp = rp;
-   DELTA(rp, ri->ri_stride, int32_t *);
-
-   for (cnt = width; cnt; cnt--)
-   *dp++ = clr[0];
+   /* the general, pixel-at-a-time case is fast enough */
+   for (cnt = 0; cnt < width; cnt++)
+   ((int *)rp)[cnt] = b;
+   rp += step;
}
} else {
uc -= ri->ri_font->firstchar;
fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale;
fs = ri->ri_font->stride;
-
-   while (height--) {
-   dp = rp;
-   fb = fr[3] | (fr[2] << 8) | (fr[1] << 16) |
-   (fr[0] << 24);
-   fr += fs;
-   DELTA(rp, ri->ri_stride, int32_t *);
-
-   for (cnt = width; cnt; cnt--) {
-   *dp++ = clr[(fb >> 31) & 1];
-   fb <<= 1;
-   }
+   /* double-pixel special cases for the common widths */
+   switch (width) {
+   case 8:
+   while (height--) {
+   fb = fr[0];
+   rp[0] = u.q[fb >> 6];
+   rp[1] = u.q[(fb >> 4) & 3];
+   rp[2] = u.q[(fb >> 2) & 3];
+   rp[3] = u.q[fb & 3];
+   rp += step;
+   fr += 1;
+   }
+   break;
+   
+   case 12:
+   while (height--) {
+   fb = fr[0];
+   rp[0] = u.q[fb >> 6];
+   rp[1] = u.q[(fb >> 4) & 3];
+   rp[2] = u.q[(fb >> 2) & 3];
+   rp[3] = u.q[fb & 3];
+   fb = fr[1];
+   rp[4] = u.q[fb >> 6];
+   rp[5] = u.q[(fb >> 4) & 3];
+   rp += step;
+   fr += 2;
+   }
+   break;
+   
+   case 16:
+   while (height--) {
+  

[PATCH] fast conditional console scrolling

2020-06-25 Thread johnc
This causes the write-only framebuffer console to only redraw the
chars that differ between the start and end positions.

'time ls -R /usr/src/sys' is 3x faster with this, because most of
the characters stay the same after a scroll.

If this looks good, I can do the same thing for clear rows and copy/
clear columns, although I will need to make a test case for them.

It would probably be a good idea to change the rasops interface to
have generic block copy and clear oeprations, versus the current
full-column / full-row interface, so tmux and friends could get the
full acceleration.

Index: rasops.c
===
RCS file: /cvs/src/sys/dev/rasops/rasops.c,v
retrieving revision 1.61
diff -u -p -r1.61 rasops.c
--- rasops.c25 May 2020 09:55:49 -  1.61
+++ rasops.c26 Jun 2020 04:14:13 -
@@ -1627,28 +1627,42 @@ rasops_vcons_copyrows(void *cookie, int 
struct rasops_info *ri = scr->rs_ri;
int cols = ri->ri_cols;
int row, col, rc;
+   int srcofs;
+   int move;
 
+   /* update the scrollback buffer if the entire screen is moving */
if (dst == 0 && (src + num == ri->ri_rows) && scr->rs_sbscreens > 0)
memmove(&scr->rs_bs[dst], &scr->rs_bs[src * cols],
-   ((ri->ri_rows * (scr->rs_sbscreens + 1) * cols) -
-   (src * cols)) * sizeof(struct wsdisplay_charcell));
-   else
+   ri->ri_rows * scr->rs_sbscreens * cols
+   * sizeof(struct wsdisplay_charcell));
+
+   /* copy everything */
+   if ((ri->ri_flg & RI_WRONLY) == 0 || !scr->rs_visible) {
memmove(&scr->rs_bs[dst * cols + scr->rs_dispoffset],
-   &scr->rs_bs[src * cols + scr->rs_dispoffset],
-   num * cols * sizeof(struct wsdisplay_charcell));
+   &scr->rs_bs[src * cols + scr->rs_dispoffset],
+   num * cols * sizeof(struct wsdisplay_charcell));
 
-   if (!scr->rs_visible)
-   return 0;
+   if (!scr->rs_visible)
+   return 0;
 
-   if ((ri->ri_flg & RI_WRONLY) == 0)
return ri->ri_copyrows(ri, src, dst, num);
+   }
 
-   for (row = dst; row < dst + num; row++) {
+   /* smart update, only redraw characters that are different */
+   srcofs = (src - dst) * cols;
+
+   for (move = 0 ; move < num ; move++) {
+   row = srcofs > 0 ? dst + move : dst + num - 1 - move;
for (col = 0; col < cols; col++) {
int off = row * cols + col + scr->rs_dispoffset;
-
-   rc = ri->ri_putchar(ri, row, col,
-   scr->rs_bs[off].uc, scr->rs_bs[off].attr);
+   int newc = scr->rs_bs[off+srcofs].uc;
+   int newa = scr->rs_bs[off+srcofs].attr;
+   if ( scr->rs_bs[off].uc == newc 
+   && scr->rs_bs[off].attr == newa )
+   continue;
+   scr->rs_bs[off].uc = newc;
+   scr->rs_bs[off].attr = newa;
+   rc = ri->ri_putchar(ri, row, col, newc, newa);
if (rc != 0)
return rc;
}




SSE in kernel?

2020-06-22 Thread johnc
Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
a sufficient guard?

I have a rasops32 putchar with SSE that is 2x faster.




Re: [patch] remove NPCDISPLAY from AMD64

2020-06-16 Thread johnc
Ok, I didn't know such things existed.

I would think that a coreboot framebuffer would be preferred over a
VGA console, unless there was a system that only had the framebuffer
in uncached-not-write-combined mapping.

 Original Message 
Subject: Re: [patch] remove NPCDISPLAY from AMD64
From: Jonathan Gray 
Date: Tue, June 16, 2020 9:39 pm
To: jo...@armadilloaerospace.com
Cc: "tech@openbsd.org" , j...@openbsd.org

On Tue, Jun 16, 2020 at 07:15:24PM -0700, jo...@armadilloaerospace.com
wrote:
> You can't put an ISA CGA/EGA/MGA in an AMD64 system, so these can
> go away.

While it is incredibly unlikely someone would try, amd64 capable
"industrial" motherboards with ISA exist. We don't build
pcdisplay(4) on amd64 by default however.

> 
> Does anyone know if there is an ordering reason that the coreboot
> efifb_cb_cnattach console is after the VGA attach? Things could be
> cleaned up a bit if the efifb entry points just checked for both
> efi and coreboot framebuffers in the same function.

https://marc.info/?l=openbsd-tech&m=146609528005661&w=2

"The cnattach path still has to be different to allow vga to
try to attach before doing the coreboot version"

It seems the coreboot framebuffer support was intended to be a last
resort if vga and efi gop were not available going by the initial mail
when it was proposed as a different driver:

https://marc.info/?l=openbsd-tech&m=14655876693&w=2

> 
> 
> Index: wscons_machdep.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/wscons_machdep.c,v
> retrieving revision 1.14
> diff -u -p -r1.14 wscons_machdep.c
> --- wscons_machdep.c 14 Oct 2017 04:44:43 - 1.14
> +++ wscons_machdep.c 17 Jun 2020 02:05:34 -
> @@ -35,18 +35,12 @@
> #include 
> 
> #include "vga.h"
> -#include "pcdisplay.h"
> -#if (NVGA > 0) || (NPCDISPLAY > 0)
> +#if (NVGA > 0)
> #include 
> #include 
> -#if (NVGA > 0)
> #include 
> #include 
> #endif
> -#if (NPCDISPLAY > 0)
> -#include 
> -#endif
> -#endif
> 
> #include "wsdisplay.h"
> #if NWSDISPLAY > 0
> @@ -146,10 +140,6 @@ wscn_video_init(void)
> #endif
> #if (NEFIFB > 0)
> if (efifb_cb_cnattach() == 0)
> - return (0);
> -#endif
> -#if (NPCDISPLAY > 0)
> - if (pcdisplay_cnattach(X86_BUS_SPACE_IO, X86_BUS_SPACE_MEM) == 0)
> return (0);
> #endif
> return (-1);
> 
> 
>



[patch] remove NPCDISPLAY from AMD64

2020-06-16 Thread johnc
You can't put an ISA CGA/EGA/MGA in an AMD64 system, so these can
go away.

Does anyone know if there is an ordering reason that the coreboot
efifb_cb_cnattach console is after the VGA attach? Things could be
cleaned up a bit if the efifb entry points just checked for both
efi and coreboot framebuffers in the same function.


Index: wscons_machdep.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/wscons_machdep.c,v
retrieving revision 1.14
diff -u -p -r1.14 wscons_machdep.c
--- wscons_machdep.c14 Oct 2017 04:44:43 -  1.14
+++ wscons_machdep.c17 Jun 2020 02:05:34 -
@@ -35,18 +35,12 @@
 #include 
 
 #include "vga.h"
-#include "pcdisplay.h"
-#if (NVGA > 0) || (NPCDISPLAY > 0)
+#if (NVGA > 0)
 #include 
 #include 
-#if (NVGA > 0)
 #include 
 #include 
 #endif
-#if (NPCDISPLAY > 0)
-#include 
-#endif
-#endif
 
 #include "wsdisplay.h"
 #if NWSDISPLAY > 0
@@ -146,10 +140,6 @@ wscn_video_init(void)
 #endif
 #if (NEFIFB > 0)
if (efifb_cb_cnattach() == 0)
-   return (0);
-#endif
-#if (NPCDISPLAY > 0)
-   if (pcdisplay_cnattach(X86_BUS_SPACE_IO, X86_BUS_SPACE_MEM) == 0)
return (0);
 #endif
return (-1);




Good CoreBoot system?

2020-06-13 Thread johnc
What would be a good system with CoreBoot to get for OpenBSD testing?

It looks like there are some edge cases where the CoreBoot framebuffer
support won't behave the same as the EFI framebuffer.




code style questions

2020-06-09 Thread johnc
Looking for some guidance to avoid proposing any unpopular diffs.

Style(9) says not to use static on file-local functions in the
kernel, because it interferes with the debugger.  They still show up
on some functions today; is this still an issue?

I usually advocate for directly inlining small functions that are only
called from one place, because it removes any doubt about whether it
is or ever should be called elsewhere.  For instance, these functions:

vaddr_t
efifb_early_map(paddr_t pa)
{
return pmap_set_pml4_early(pa);
}

void
efifb_early_cleanup(void)
{
pmap_clear_pml4_early();
}

Is that frowned on? I know I am on the longer-functions side of
the spectrum.

Also, style(9) says that prototypes should not have variable names
associated with the types.  I try to use good names in the headers
for documentation purposes; what is the thinking behind the rule?

int pcdisplay_copycols(void *id, int row, int srccol, int dstcol, int
ncols);
vs:
int pcdisplay_copycols(void *, int, int, int,int);





Shrink efifb.c code

2020-06-09 Thread johnc
Most of the display drivers wrap the rasops functions, but you can
get rid of a lot of code by swapping the cookies and just using
the rasops functions directly in wsdisplay_accessops.

It is unfortunate that an extra parameter was added to the
alloc_screen function, aparently for an obscure driver, so
rasops_alloc_screen can't be directly used.

If this looks good, I can make similar changes in the intel, amd,
and radeon display drivers, which would then justify adding a
rasops_alloc_screen2 function with a dummy parameter to replace
the wrapper in all four cases.

This also consolidates the four separate calls to rasops_init, also
fixing the fact that one of them did not include the RI_CENTER flag.

Index: efifb.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/efifb.c,v
retrieving revision 1.32
diff -u -p -r1.32 efifb.c
--- efifb.c 28 May 2020 20:26:25 -  1.32
+++ efifb.c 10 Jun 2020 00:08:30 -
@@ -92,18 +92,11 @@ struct efifb_softc {
 
 int efifb_match(struct device *, void *, void *);
 voidefifb_attach(struct device *, struct device *, void *);
-voidefifb_rasops_preinit(struct efifb *);
+voidefifb_rasops_init(struct efifb *, int);
 int efifb_ioctl(void *, u_long, caddr_t, int, struct proc *);
 paddr_t efifb_mmap(void *, off_t, int);
 int efifb_alloc_screen(void *, const struct wsscreen_descr *, void **,
int *, int *, uint32_t *);
-voidefifb_free_screen(void *, void *);
-int efifb_show_screen(void *, void *, int, void (*cb) (void *, int,
int),
-   void *);
-int efifb_getchar(void *, int, int, struct wsdisplay_charcell *);
-int efifb_list_font(void *, struct wsdisplay_font *);
-int efifb_load_font(void *, void *, struct wsdisplay_font *);
-voidefifb_scrollback(void *, void *, int lines);
 voidefifb_efiinfo_init(struct efifb *);
 voidefifb_cnattach_common(void);
 vaddr_t efifb_early_map(paddr_t);
@@ -132,12 +125,12 @@ struct wsdisplay_accessops efifb_accesso
.ioctl = efifb_ioctl,
.mmap = efifb_mmap,
.alloc_screen = efifb_alloc_screen,
-   .free_screen = efifb_free_screen,
-   .show_screen = efifb_show_screen,
-   .getchar = efifb_getchar,
-   .load_font = efifb_load_font,
-   .list_font = efifb_list_font,
-   .scrollback = efifb_scrollback,
+   .free_screen = rasops_free_screen,
+   .show_screen = rasops_show_screen,
+   .getchar = rasops_getchar,
+   .load_font = rasops_load_font,
+   .list_font = rasops_list_font,
+   .scrollback = rasops_scrollback,
 };
 
 struct cfdriver efifb_cd = {
@@ -198,9 +191,7 @@ efifb_attach(struct device *parent, stru
return;
}
ri->ri_bits = bus_space_vaddr(iot, ioh);
-   efifb_rasops_preinit(fb);
-   ri->ri_flg = RI_VCONS | RI_CENTER | RI_WRONLY;
-   rasops_init(ri, EFIFB_HEIGHT, EFIFB_WIDTH);
+   efifb_rasops_init(fb, RI_VCONS);
efifb_std_descr.ncols = ri->ri_cols;
efifb_std_descr.nrows = ri->ri_rows;
efifb_std_descr.textops = &ri->ri_ops;
@@ -218,22 +209,19 @@ efifb_attach(struct device *parent, stru
ccol = ri->ri_ccol;
crow = ri->ri_crow;
 
-   efifb_rasops_preinit(fb);
-   ri->ri_flg &= ~RI_CLEAR;
-   ri->ri_flg |= RI_VCONS | RI_WRONLY;
-
-   rasops_init(ri, EFIFB_HEIGHT, EFIFB_WIDTH);
+   efifb_rasops_init(fb, RI_VCONS);
 
ri->ri_ops.pack_attr(ri->ri_active, 0, 0, 0, &defattr);
wsdisplay_cnattach(&efifb_std_descr, ri->ri_active, ccol, crow,
defattr);
}
 
+   ri->ri_hw = sc;
memset(&aa, 0, sizeof(aa));
aa.console = console;
aa.scrdata = &efifb_screen_list;
aa.accessops = &efifb_accessops;
-   aa.accesscookie = sc;
+   aa.accesscookie = ri;
aa.defaultscreens = 0;
 
config_found_sm(self, &aa, wsemuldisplaydevprint,
@@ -241,7 +229,7 @@ efifb_attach(struct device *parent, stru
 }
 
 void
-efifb_rasops_preinit(struct efifb *fb)
+efifb_rasops_init(struct efifb *fb, int flags)
 {
 #define bmnum(_x) (fls(_x) - ffs(_x) + 1)
 #define bmpos(_x) (ffs(_x) - 1)
@@ -271,14 +259,16 @@ efifb_rasops_preinit(struct efifb *fb)
ri->ri_bpos = bmpos(bios_efiinfo->fb_blue_mask);
}
ri->ri_bs = efifb_bs;
+   /* if reinitializing, it is important to not clear all the flags */
+   ri->ri_flg &= ~RI_CLEAR;
+   ri->ri_flg |= flags | RI_CENTER | RI_WRONLY;
+   rasops_init(ri, EFIFB_HEIGHT, EFIFB_WIDTH);
 }
 
 int
 efifb_ioctl(void *v, u_long cmd, caddr_t data, int flag, struct proc
*p)
 {
-   struct efifb_softc  *sc = v;
-   struct efifb*fb = sc->sc_fb;
-   struct rasops_info  *ri = &fb->rinfo;
+   struct rasops_info 

Numeric config variables

2020-06-05 Thread johnc
I think I am missing some basic bit of configuration lore.

I hacked my kernel to change the font size selected by rasops, but I
would like to make a proper configuration interface for it, something
like an integer that biased the default size calculation.

It would be nice to be able to use /etc/wsconsctl.conf to set it, but
screens can't be reconfigured after they are created (I want to fix
that in the future), and it would be better to  have it take effect on
console startup, well before anything else gets loaded.

It feels like what I want is an integer variable that can be set to a
default in the kernel config file, then modified on an existing kernel
with config(8), or temporarily overridden with a parameter from
boot(8).

I see how I can make a numeric option in the config file, but it just
turns into a #define that can't be modified.

Is there any facility like this?




ttymalloc(int baud)

2020-05-26 Thread johnc
Another trivial code readability comment:

I see 24 references to this in the code, 20 of which pass 0, and 4 of
which pass 1,000,000 as the parameter.

Passing 0 defaults to a baud of 115200.  The baud determines the qlen,
which is 4096 for 115200 and 8192 for anything larger, so all it is
doing now is selecting between two buffer sizes via magic number
parameters.

It would probably be appropriate today to just always use 8192 and not
even take a parameter; it doesn't look like the selection between the
two is done in a rigorous way.

There is also a comment in umodem.c that has an outdated assumption:
/*
 * These are the maximum number of bytes transferred per frame.
 * Buffers are this large to deal with high speed wireless devices.
 * Capped at 1024 as ttymalloc() is limited to this amount.
 */
#define UMODEMIBUFSIZE 1024
#define UMODEMOBUFSIZE 1024





[PATCH} efifb support for wsmoused + smaller fonts

2020-05-24 Thread johnc
The efifb driver behaves almost identically to the inteldrm driver
for wscons, but did not implement the getchar() accessops, so
wsmoused would fail at startup.

Separately, I increased the maximum screen dimensions to 160x50 to
allow the 12x24 font to be used on an 1920 monitor, which looks great!

Unlike the intel driver, efifb has a static buffer of that size,
which is a non-trivial amount of memory.  Not setting the backing
store to save the memory results in an ultra-slow console until the
driver gets fully initialized, because it is doing scrolls by reading
from write combined memory instead of the redrawing all the
characters.

I plan on changing that to a dynamic allocation once I am more
comfortable with the reinitialization that goes on when autoconf
gets around to the console display device, so feel free to punt on
that if you want to wait.

Index: efifb.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/efifb.c,v
retrieving revision 1.27
diff -u -p -r1.27 efifb.c
--- efifb.c 24 Jan 2020 05:27:31 -  1.27
+++ efifb.c 24 May 2020 23:32:47 -
@@ -100,6 +100,7 @@ int  efifb_alloc_screen(void *, const st
 voidefifb_free_screen(void *, void *);
 int efifb_show_screen(void *, void *, int, void (*cb) (void *, int,
int),
void *);
+int efifb_getchar(void *, int, int, struct wsdisplay_charcell *);
 int efifb_list_font(void *, struct wsdisplay_font *);
 int efifb_load_font(void *, void *, struct wsdisplay_font *);
 voidefifb_scrollback(void *, void *, int lines);
@@ -114,8 +115,9 @@ const struct cfattach efifb_ca = {
sizeof(struct efifb_softc), efifb_match, efifb_attach, NULL
 };
 
-#defineEFIFB_WIDTH 100
-#defineEFIFB_HEIGHT31
+/* support a 12x24 font on a 1920x1200 screen */
+#defineEFIFB_WIDTH 160
+#defineEFIFB_HEIGHT50
 
 struct wsscreen_descr efifb_std_descr = { "std" };
 
@@ -133,6 +135,7 @@ struct wsdisplay_accessops efifb_accesso
.alloc_screen = efifb_alloc_screen,
.free_screen = efifb_free_screen,
.show_screen = efifb_show_screen,
+   .getchar = efifb_getchar,
.load_font = efifb_load_font,
.list_font = efifb_list_font,
.scrollback = efifb_scrollback,
@@ -371,6 +374,15 @@ efifb_show_screen(void *v, void *cookie,
struct rasops_info  *ri = &sc->sc_fb->rinfo;
 
return rasops_show_screen(ri, cookie, waitok, cb, cb_arg);
+}
+
+int
+efifb_getchar(void *v, int row, int col, struct wsdisplay_charcell
*cell)
+{
+   struct efifb_softc  *sc = v;
+   struct rasops_info  *ri = &sc->sc_fb->rinfo;
+
+   return rasops_getchar(ri, row, col, cell);
 }
 
 int




[PATCH] wsconsctl mouse.scaling

2020-05-20 Thread johnc
Ulf pointed out that you could scale the mouse speed with the cryptic:

wsconsctl mouse.param=0:401,1:401

Looking into it more, wsconsctl already has a floating point version
of this that sets both X and Y values, but it only shows up and
functions if you have a touchpad, not a mouse:

wsconsctl mouse.tp.scaling=0.1

Adding another exception to the FLG_DEAD check allow this to work on
the mouse, just the way I wanted.

I added an aliased parameter name so you could just use
"mouse.scaling" instead of specifying something named as a touchpad
variable, which also makes it self documenting when the parms are
listed.

It would make sense to remove the tp.scaling name, since this is
generic like the reverse_scrolling line, but there are probably some
people with the tp.scaling line in their current configuration files
that would need to make a change.

I would also suggest that the rawmode and scale parameters be
removed, since we don't want the USB level drivers doing (improper)
scaling before wscons does the proper scaling on it, and seeing the
old scale list in the parameter dump is confusing:

mouse.scale=0,0,0,0,0,0,0
mouse.scaling=0.500

Adding a line about mouse.scaling in the man page would be nice,
but having it show up in the parameter dump now may be sufficient.

Index: mouse.c
===
RCS file: /cvs/src/sbin/wsconsctl/mouse.c,v
retrieving revision 1.20
diff -u -p -r1.20 mouse.c
--- mouse.c 19 Aug 2019 21:42:33 -  1.20
+++ mouse.c 20 May 2020 20:16:51 -
@@ -54,6 +54,7 @@ struct field mouse_field_tab[] = {
 { "rawmode",   &rawmode,   FMT_UINT,   
FLG_MODIFY|FLG_INIT},
 { "scale", &wmcoords,  FMT_SCALE,  
FLG_MODIFY|FLG_INIT},
 /* mouse and touchpad configuration (mousecfg): */
+{ "scaling",   &cfg_scaling,   FMT_CFG,FLG_NORDBACK },
 { "reverse_scrolling", &cfg_revscroll, FMT_CFG,FLG_NORDBACK },
 /* touchpad-specific options: */
 { "tp.tapping",&cfg_tapping,   FMT_CFG,FLG_NORDBACK },
@@ -95,6 +96,7 @@ mouse_init(int devfd, int devidx) {
for (f = mouse_field_tab; f->name != NULL; f++)
if (f->format == FMT_CFG) {
if (f->valp != &cfg_param
+   && f->valp != &cfg_scaling
&& f->valp != &cfg_revscroll)
f->flags |= FLG_DEAD;
else




Re: Mouse movement speed

2020-05-18 Thread johnc
Ok, that works -- I could not tell what that parsing code did in
wsconsctl, and I thought the scaling code in wscons only got applied to
touchpads.
In that case, how about just allowing wsconsctl to accept mouse.speed as
a special floating point parameter that sets both the DX and DY scales
appropriately? I'll make a patch to the code and man page, and add a
comment about the debug interface format.
The scaling code in the USB HUD driver is definitely dropping low order
bits, and seems to be completely subsumed by the newer parm based code,
so removing WSMOUSEIO_SRES still seems like a good idea.

   Original Message 
  Subject: Re: Mouse movement speed
  From: Ulf Brosziewski 
  Date: Mon, May 18, 2020 12:16 pm
  To: jo...@armadilloaerospace.com, "tech@openbsd.org" 

  Hi John,

  you can slow down mice via wsconsctl, but this possibility is
  "hidden"
  in a debug and testing interface.

  You have seen the WSMOUSECFG_D*_SCALE parameters, two parameters in
  *.12 fixed-point format that determine the scaling of DX- and
  DY-deltas.
  You can read the values with

  $ wsconsctl mouse.param=0,1

  and change them by issuing, for example,

  $ wsconsctl mouse.param=0:401,1:401

  This should reduce the speed to 10% of its "raw" value.

  (You don't need the device number  in the commands if they apply
  to
  wsmouse0.)

  wsmouse tracks remainders when scaling, and slow movements shouldn't
  cause
  noticeable rounding errors at this layer. The effects, however, will
  also
  depend on what the receiving drivers make of the coordinates.


  On 5/18/20 01:17, jo...@armadilloaerospace.com wrote:
  > I enabled wsmoused for console mouse support, but the cursor was
  > unusably fast. This is a high resolution, high update rate USB
  gaming
  > mouse, but it was off by well over an order of magnitude.
  >
  > I searched for a global mouse speed setting, but the only thing I
  > found was a "mouse.scale=0,0,0,0,0,0,0" in wsconsctl, and I
  couldn't
  > figure out how to use it even after reading all the source.
  >
  > There is an ioctl for mouse resolution, but it only applies to PS/2
  > mice:
  >
  > /* Set resolution. Not applicable to all mouse types. */
  > #define WSMOUSEIO_SRES _IOW('W', 33, u_int)
  >
  > The wsmousecfg values for WSMOUSECFG_DX_SCALE and
  WSMOUSECFG_DY_SCALE
  > are only used for touchpads, and are zero for mice.
  >
  > The USB mouse driver can apply scaling to mouse delta movements if
  it
  > is taken out of raw mode and the wsmouse_calibcoords are set to
  > reasonable values, but the comments don't seem to be correct:
  >
  > /* Set/get sample coordinates for calibration */
  > #define WSMOUSE_CALIBCOORDS_MAX 16
  > #define WSMOUSE_CALIBCOORDS_RESET -1
  > struct wsmouse_calibcoords {
  > int minx, miny; /* minimum value of X/Y */
  > int maxx, maxy; /* maximum value of X/Y */
  > int swapxy; /* swap X/Y axis */
  > int resx, resy; /* X/Y resolution */
  > int samplelen; /* number of samples available or
  > WSMOUSE_CALIBCOORDS_RESET for raw mode */
  > struct wsmouse_calibcoord {
  > int rawx, rawy; /* raw coordinate */
  > int x, y; /* translated coordinate */
  > } samples[WSMOUSE_CALIBCOORDS_MAX]; /* sample coordinates */
  > };
  >
  > No kernel code uses the samples[] array, and the samplelen field is
  > interpreted as raw mode when non-0, and scaled by the other values
  > when it is 0.
  >
  > This explains wsconsctl's mouse.rawmode and mouse.scale[7] values,
  > but I still couldn't set them, getting
  > WSMOUSEIO_SCALIBCOORDS: Invalid argument for everything I tried.
  >
  > If I called the ioctl with my own code, I could clear raw mode and
  > scale the mouse speed down by 10x.
  >
  > That slows the mouse down, but the behavior is terrible.
  >
  > The scaling applied in hidms_input() would be reasonable for an
  > absolute referenced touch screen, but it isn't for integer valued
  > relative motion, because information is permanently lost in the
  > scaling -- you can slowly move the mouse an arbitrary distance
  > without the pointer moving at all, as the deltas are rounded to
  > zero, and it is done per-axis, so slight angles turn into strictly
  > axial motion.
  >
  > Mouse scaling needs to maintain a fractional residual for each
  axis.
  >
  > The user visible behavior I want to see is:
  >
  > wsconsctl mouse.speed=0.6
  >
  > This could overload the existing DX_SCALE value, but that may still
  > need to be used independently for touchpad configuration, so I
  > think it would be better to define a new WSMOUSECFG_SPEED
  > parameter with the same *.12 fixed point format as
  > WSMOUSECFG_DX_SCALE.
  >
  > If this sounds reasonable, I will go ahead and make a diff for the
  > kernel and wsconsctl to implement it that with proper residuals.
  >
  > For additional cleanup to make it net-negative-LoC, could
  everything
  > relating to struct wsmouse_calibcoords go away completely? It looks
  > like all the fields are now replicated as wsmousecfg 

Wsmoused cursor tracking

2020-05-17 Thread johnc
With the changes I proposed in the "mouse movement speed" message,
the text mode cursor can be slowed down to the point where it is
controllable, but it still isn't ideal for a few reasons:

It is still anisotropic, with vertical movement feeling faster than
horizontal due to the aspect of the character cells.

It would be at drastically different "screen crossing speeds" in X
versus the console.

They could both be addressed by having wscons track the cursor
position in pixels, and divide by the font size whenever it needs to
get a character cell location.  For a framebuffer based console that
would make the mouse feel identical in X and the console.  A VGA
text console would still be a few times faster than a high res X
screen, but it would be 8x closer than it is today.

The drm framebuffer drivers might even choose to use a pixel
addressable cursor in text modes.

Even with that change, the console cursor would still feel strange
relative to a conventional graphics cursor.

Presumably in an attempt to mitigate the crazy-fast movement speed,
wsmoused applies a dubious scaling to the mouse deltas:

two_power = 1;
dy = abs(event->value);
while (dy > 2) {
two_power++;
dy = dy / 2;
}
event->value = event->value / (NORMALIZE_DIVISOR * two_power);

This is sort of an inverse-acceleration, where the faster you move,
the more it divides the value down. This has the problem of truncating
fractional residuals and scaling the axis unequally, but worse, it
makes the speeds not even monotonically increasing.

So, the cursor movements are too fast, anisotropic in steady speeds,
lose residuals, and have a non-monotonic, axial-biased speed profile.

I am very confident that removing the event normalization code in
wsmoused and doing the pixel based tracking in wscons will make a much
better cursor in general, but it is also likely that there will be
some users that have made themselves completely comfortable with the
current idiosyncratic behavior, and will be put off by any changes.

They could get the raw speed back by setting mouse.speed=8, but if
they switch back and forth to X it still  won't be "the way it was".
If it was necessary to have a compatibility behavior flag, I would
rather see it in wsmoused than as a kernel mouse parameter.




Mouse movement speed

2020-05-17 Thread johnc
I enabled wsmoused for console mouse support, but the cursor was
unusably fast. This is a high resolution, high update rate USB gaming
mouse, but it was off by well over an order of magnitude.

I searched for a global mouse speed setting, but the only thing I
found was a "mouse.scale=0,0,0,0,0,0,0" in wsconsctl, and I couldn't
figure out how to use it even after reading all the source.

There is an ioctl for mouse resolution, but it only applies to PS/2
mice:

/* Set resolution. Not applicable to all mouse types. */
#define WSMOUSEIO_SRES _IOW('W', 33, u_int)

The wsmousecfg values for WSMOUSECFG_DX_SCALE and WSMOUSECFG_DY_SCALE
are only used for touchpads, and are zero for mice.

The USB mouse driver can apply scaling to mouse delta movements if it
is taken out of raw mode and the wsmouse_calibcoords are set to
reasonable values, but the comments don't seem to be correct:

/* Set/get sample coordinates for calibration */
#define WSMOUSE_CALIBCOORDS_MAX 16
#define WSMOUSE_CALIBCOORDS_RESET -1
struct wsmouse_calibcoords {
int minx, miny; /* minimum value of X/Y */
int maxx, maxy; /* maximum value of X/Y */
int swapxy; /* swap X/Y axis */
int resx, resy; /* X/Y resolution */
int samplelen; /* number of samples available or
WSMOUSE_CALIBCOORDS_RESET for raw mode */
struct wsmouse_calibcoord {
int rawx, rawy; /* raw coordinate */
int x, y; /* translated coordinate */
} samples[WSMOUSE_CALIBCOORDS_MAX]; /* sample coordinates */
};

No kernel code uses the samples[] array, and the samplelen field is
interpreted as raw mode when non-0, and scaled by the other values
when it is 0.

This explains wsconsctl's mouse.rawmode and mouse.scale[7] values,
but I still couldn't set them, getting
WSMOUSEIO_SCALIBCOORDS: Invalid argument for everything I tried.

If I called the ioctl with my own code, I could clear raw mode and
scale the mouse speed down by 10x.

That slows the mouse down, but the behavior is terrible.

The scaling applied in hidms_input() would be reasonable for an
absolute referenced touch screen, but it isn't for integer valued
relative motion, because information is permanently lost in the
scaling -- you can slowly move the mouse an arbitrary distance
without the pointer moving at all, as the deltas are rounded to
zero, and it is done per-axis, so slight angles turn into strictly
axial motion.

Mouse scaling needs to maintain a fractional residual for each axis.

The user visible behavior I want to see is:

wsconsctl mouse.speed=0.6

This could overload the existing DX_SCALE value, but that may still
need to be used independently for touchpad configuration, so I
think it would be better to define a new WSMOUSECFG_SPEED
parameter with the same *.12 fixed point format as
WSMOUSECFG_DX_SCALE.

If this sounds reasonable, I will go ahead and make a diff for the
kernel and wsconsctl to implement it that with proper residuals.

For additional cleanup to make it net-negative-LoC, could everything
relating to struct wsmouse_calibcoords go away completely? It looks
like all the fields are now replicated as wsmousecfg parms and
handled at the wsmouse level for the touchpad case, so doing it at
the usb device or other level is redundant, as well as being broken
for relative moves.

It would be nice to use parms for everything, also getting rid of
WSMOUSEIO_GTYPE and WSMOUSEIO_SRES.




Code changes for clarity

2020-05-16 Thread johnc
(Not sure if this belongs in tech@ or misc@)

What is the thinking around non-functional code changes that just
improve clarity without functionality changes?  I can imagine bad
experiences with that, but there is certainly room for improvement.

For instance, in the wsdisplay_emulops structure, there are:

int (*alloc_attr)(void *c, int fg, int bg, int flags, long *attrp);
void(*unpack_attr)(void *c, long attr, int *fg, int *bg, int *ul);

And at the end of the structure is this comment, showing that at
least someone (other than me) was confused by it:
/* XXX need a free_attr() ??? */

I would suggest that alloc_attr should be renamed to pack_attr, but
there are 84 matches on alloc_attr across 39 unique files.

Similarly, I am taking notes as I read the code, and adding a few
comments in various files and functions could help other people get
up to speed much faster in the future, but I am hesitant to submit
patches. I am hoping to write a decent article about my explorations,
but inline comments are more durable documentation.




Improving extended VGA text modes

2020-05-13 Thread johnc
The process of using 40/50 row vga text modes is pretty cumbersome now:

wsfontload -h 8 -e ibm /usr/share/misc/pcvtfonts/vt220l.808
wsconscfg -dF 1
wsconscfg -t 80x50 1

I just reported a bug and a fix with font corruption, but I want to
propose a
few usability improvements that I am interested in hacking on.

Not caring about extended VGA text modes is a perfectly reasonable
position today, so wave me off if this wouldn't be desirable.

Including a default 8x8 and 8x10 font in the kernel to avoid the need to
wsfontload would be nice, and allow it to be used for early boot. Is
4.5k
too much for a fringe feature?  2k for just the 8x8?

Wsconscfg could allow the -dF options at the same time as the type and
emul options, allowing an existing screen to be deleted and reconfigured
in
a single step.

Better yet would be to allow the screen type to be reconfigured on the
fly
without deleting it at all.

If that is done, then wscons could support ctrl-alt-+/- to dynamically
change
text modes.

In the modern world, it would be really nice to have similar ability to
change the fonts used by the framebuffer based consoles to get denser
text
modes.  Wsconsctl can report the display.fontwidth/height, but can't
change
it. Use of different spleen fonts could be defined as a "type" like the
VGA
modes and take advantage of the other changes.