Re: [PATCH] fast conditional console scrolling

2020-07-11 Thread Frederic Cambus
On Fri, Jul 10, 2020 at 03:26:16PM +0200, Frederic Cambus wrote:
> On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote:
> > I should have been more rigorous -- I had two different changes running
> > on my system, as well as forcing it to use the 12x24 font for a 160x45
> > console.
> > 
> > If you apply the "Optimized rasops32 putchar" patch I just posted, you
> > should see another significant speedup.
> 
> Leaving aside rasops32_putchar() optimizations for now, I tried this
> on radeondrm and simplefb (on armv7) with a 1920x1080 monitor and on
> top of what John and Paul are reporting, I'm also seeing improvements
> when cat'ing the text file [1] I usually use for rasops performance
> testing. It's up to 30% faster on both devices, which is significant.
> 
> The diff makes sense to me, and I think this should go in.
> 
> Anyone willing to OK this diff or to commit with my OK?

Committed with some minor style(9) formatting fixes pointed out by jcs@
offlist.

Thanks!



Re: [PATCH] fast conditional console scrolling

2020-07-10 Thread Frederic Cambus
On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote:
> I should have been more rigorous -- I had two different changes running
> on my system, as well as forcing it to use the 12x24 font for a 160x45
> console.
> 
> If you apply the "Optimized rasops32 putchar" patch I just posted, you
> should see another significant speedup.

Leaving aside rasops32_putchar() optimizations for now, I tried this
on radeondrm and simplefb (on armv7) with a 1920x1080 monitor and on
top of what John and Paul are reporting, I'm also seeing improvements
when cat'ing the text file [1] I usually use for rasops performance
testing. It's up to 30% faster on both devices, which is significant.

The diff makes sense to me, and I think this should go in.

Anyone willing to OK this diff or to commit with my OK?

[1] https://norvig.com/big.txt



Re: [PATCH] fast conditional console scrolling

2020-06-27 Thread Paul de Weerd
Hi John,

With both your diffs applied, results are indeed more like 3x speed-up
that I get on my machine.  Average over 7 runs ls -R /usr/ports was
64.169s making for just under 3x increase.  That's on 1920x1080 with
the standard font size for that resolution (120x33 console, so 16x32
font).

Thanks again,

Paul 'WEiRD' de Weerd

On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote:
| I should have been more rigorous -- I had two different changes running
| on my system, as well as forcing it to use the 12x24 font for a 160x45
| console.
| 
| If you apply the "Optimized rasops32 putchar" patch I just posted, you
| should see another significant speedup.
| 
| 
|  Original Message 
| Subject: Re: [PATCH] fast conditional console scrolling
| From: Paul de Weerd 
| Date: Fri, June 26, 2020 1:23 am
| To: jo...@armadilloaerospace.com
| Cc: "tech@openbsd.org" 
| 
| Hi John,
| 
| I tried your diff. I don't quite see the same 3x improvement that you
| report, more like 2x. I timed 7 runs of ls -R /usr/ports:
| 
| Before diff, time ls -R /usr/ports | wc -l 2.897s on average
| After diff, time ls -R /usr/ports | wc -l 2.707s on average
| 
| Before diff, time ls -R /usr/ports 2m53.067 on average
| After diff, time ls -R /usr/ports 1m30.387 on average
| 
| Note that the 'before diff' runs were with a snapshot kernel. There
| may be diffs in there that account for the difference between before
| and after of the no-output runs. See dmesg and full stats below.
| 
| So, on average, a speed-up of ~48%.
| 
| Thanks!
| 
| Paul 'WEiRD' de Weerd
| 
| 

-- 
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/ 



Re: [PATCH] fast conditional console scrolling

2020-06-26 Thread johnc
I should have been more rigorous -- I had two different changes running
on my system, as well as forcing it to use the 12x24 font for a 160x45
console.

If you apply the "Optimized rasops32 putchar" patch I just posted, you
should see another significant speedup.


 Original Message 
Subject: Re: [PATCH] fast conditional console scrolling
From: Paul de Weerd 
Date: Fri, June 26, 2020 1:23 am
To: jo...@armadilloaerospace.com
Cc: "tech@openbsd.org" 

Hi John,

I tried your diff. I don't quite see the same 3x improvement that you
report, more like 2x. I timed 7 runs of ls -R /usr/ports:

Before diff, time ls -R /usr/ports | wc -l 2.897s on average
After diff, time ls -R /usr/ports | wc -l 2.707s on average

Before diff, time ls -R /usr/ports 2m53.067 on average
After diff, time ls -R /usr/ports 1m30.387 on average

Note that the 'before diff' runs were with a snapshot kernel. There
may be diffs in there that account for the difference between before
and after of the no-output runs. See dmesg and full stats below.

So, on average, a speed-up of ~48%.

Thanks!

Paul 'WEiRD' de Weerd




Re: [PATCH] fast conditional console scrolling

2020-06-26 Thread Paul de Weerd
Hi John,

I tried your diff.  I don't quite see the same 3x improvement that you
report, more like 2x.  I timed 7 runs of ls -R /usr/ports:

Before diff, time ls -R /usr/ports | wc -l  2.897s on average
After diff,  time ls -R /usr/ports | wc -l  2.707s on average

Before diff, time ls -R /usr/ports  2m53.067 on average
After diff, time ls -R /usr/ports   1m30.387 on average

Note that the 'before diff' runs were with a snapshot kernel.  There
may be diffs in there that account for the difference between before
and after of the no-output runs.  See dmesg and full stats below.

So, on average, a speed-up of ~48%.

Thanks!

Paul 'WEiRD' de Weerd

--- full stats ---
pre-diff, no output post-diff, no output
realusersystem  realusersystem
02.94   00.58   02.40   02.70   00.58   02.12
02.88   00.56   02.37   02.71   00.39   02.32
03.03   00.46   02.60   02.70   00.43   02.26
02.85   00.52   02.36   02.69   00.54   02.18
02.88   00.45   02.43   02.62   00.53   02.10
02.87   00.50   02.38   02.72   00.62   02.11
02.83   00.57   02.29   02.81   00.45   02.36

pre-diff, with output   post-diff, with output
realusersystem  realusersystem
2m53.17 00.90   2m52.27 1m30.81 01.23   1m29.50
2m53.12 00.81   2m52.31 1m30.58 01.33   1m29.30
2m53.01 00.88   2m52.11 1m30.49 01.11   1m29.40
2m53.06 01.03   2m52.00 1m30.53 01.29   1m29.26
2m52.99 00.80   2m52.24 1m30.27 01.08   1m29.19
2m53.11 00.96   2m52.16 1m30.40 01.14   1m29.27
2m53.01 00.79   2m52,28 1m30.33 01.11   1m29.24
--

--- dmesg 
OpenBSD 6.7-current (GENERIC.MP) #296: Wed Jun 24 11:34:44 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34243903488 (32657MB)
avail mem = 33191059456 (31653MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xec410 (88 entries)
bios0: vendor Dell Inc. version "A22" date 02/01/2018
bios0: Dell Inc. OptiPlex 9020
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT SLIC LPIT SSDT SSDT SSDT HPET SSDT MCFG SSDT 
ASF! DMAR
acpi0: wakeup devices UAR1(S3) RP01(S4) PXSX(S4) PXSX(S4) PXSX(S4) RP05(S4) 
PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) GLAN(S4) EHC1(S3) EHC2(S3) XHC_(S4) 
HDEF(S4) PEG0(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3692.06 MHz, 06-3c-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: TSC skew=0 observed drift=0
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3691.46 MHz, 06-3c-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: TSC skew=1 observed drift=0
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3691.46 MHz, 06-3c-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: TSC skew=12 observed drift=0
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3691.46 MHz, 06-3c-03
cpu3: 

[PATCH] fast conditional console scrolling

2020-06-25 Thread johnc
This causes the write-only framebuffer console to only redraw the
chars that differ between the start and end positions.

'time ls -R /usr/src/sys' is 3x faster with this, because most of
the characters stay the same after a scroll.

If this looks good, I can do the same thing for clear rows and copy/
clear columns, although I will need to make a test case for them.

It would probably be a good idea to change the rasops interface to
have generic block copy and clear oeprations, versus the current
full-column / full-row interface, so tmux and friends could get the
full acceleration.

Index: rasops.c
===
RCS file: /cvs/src/sys/dev/rasops/rasops.c,v
retrieving revision 1.61
diff -u -p -r1.61 rasops.c
--- rasops.c25 May 2020 09:55:49 -  1.61
+++ rasops.c26 Jun 2020 04:14:13 -
@@ -1627,28 +1627,42 @@ rasops_vcons_copyrows(void *cookie, int 
struct rasops_info *ri = scr->rs_ri;
int cols = ri->ri_cols;
int row, col, rc;
+   int srcofs;
+   int move;
 
+   /* update the scrollback buffer if the entire screen is moving */
if (dst == 0 && (src + num == ri->ri_rows) && scr->rs_sbscreens > 0)
memmove(>rs_bs[dst], >rs_bs[src * cols],
-   ((ri->ri_rows * (scr->rs_sbscreens + 1) * cols) -
-   (src * cols)) * sizeof(struct wsdisplay_charcell));
-   else
+   ri->ri_rows * scr->rs_sbscreens * cols
+   * sizeof(struct wsdisplay_charcell));
+
+   /* copy everything */
+   if ((ri->ri_flg & RI_WRONLY) == 0 || !scr->rs_visible) {
memmove(>rs_bs[dst * cols + scr->rs_dispoffset],
-   >rs_bs[src * cols + scr->rs_dispoffset],
-   num * cols * sizeof(struct wsdisplay_charcell));
+   >rs_bs[src * cols + scr->rs_dispoffset],
+   num * cols * sizeof(struct wsdisplay_charcell));
 
-   if (!scr->rs_visible)
-   return 0;
+   if (!scr->rs_visible)
+   return 0;
 
-   if ((ri->ri_flg & RI_WRONLY) == 0)
return ri->ri_copyrows(ri, src, dst, num);
+   }
 
-   for (row = dst; row < dst + num; row++) {
+   /* smart update, only redraw characters that are different */
+   srcofs = (src - dst) * cols;
+
+   for (move = 0 ; move < num ; move++) {
+   row = srcofs > 0 ? dst + move : dst + num - 1 - move;
for (col = 0; col < cols; col++) {
int off = row * cols + col + scr->rs_dispoffset;
-
-   rc = ri->ri_putchar(ri, row, col,
-   scr->rs_bs[off].uc, scr->rs_bs[off].attr);
+   int newc = scr->rs_bs[off+srcofs].uc;
+   int newa = scr->rs_bs[off+srcofs].attr;
+   if ( scr->rs_bs[off].uc == newc 
+   && scr->rs_bs[off].attr == newa )
+   continue;
+   scr->rs_bs[off].uc = newc;
+   scr->rs_bs[off].attr = newa;
+   rc = ri->ri_putchar(ri, row, col, newc, newa);
if (rc != 0)
return rc;
}