Re: [PATCH] fast conditional console scrolling
On Fri, Jul 10, 2020 at 03:26:16PM +0200, Frederic Cambus wrote: > On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote: > > I should have been more rigorous -- I had two different changes running > > on my system, as well as forcing it to use the 12x24 font for a 160x45 > > console. > > > > If you apply the "Optimized rasops32 putchar" patch I just posted, you > > should see another significant speedup. > > Leaving aside rasops32_putchar() optimizations for now, I tried this > on radeondrm and simplefb (on armv7) with a 1920x1080 monitor and on > top of what John and Paul are reporting, I'm also seeing improvements > when cat'ing the text file [1] I usually use for rasops performance > testing. It's up to 30% faster on both devices, which is significant. > > The diff makes sense to me, and I think this should go in. > > Anyone willing to OK this diff or to commit with my OK? Committed with some minor style(9) formatting fixes pointed out by jcs@ offlist. Thanks!
Re: [PATCH] fast conditional console scrolling
On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote: > I should have been more rigorous -- I had two different changes running > on my system, as well as forcing it to use the 12x24 font for a 160x45 > console. > > If you apply the "Optimized rasops32 putchar" patch I just posted, you > should see another significant speedup. Leaving aside rasops32_putchar() optimizations for now, I tried this on radeondrm and simplefb (on armv7) with a 1920x1080 monitor and on top of what John and Paul are reporting, I'm also seeing improvements when cat'ing the text file [1] I usually use for rasops performance testing. It's up to 30% faster on both devices, which is significant. The diff makes sense to me, and I think this should go in. Anyone willing to OK this diff or to commit with my OK? [1] https://norvig.com/big.txt
Re: [PATCH] fast conditional console scrolling
Hi John, With both your diffs applied, results are indeed more like 3x speed-up that I get on my machine. Average over 7 runs ls -R /usr/ports was 64.169s making for just under 3x increase. That's on 1920x1080 with the standard font size for that resolution (120x33 console, so 16x32 font). Thanks again, Paul 'WEiRD' de Weerd On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote: | I should have been more rigorous -- I had two different changes running | on my system, as well as forcing it to use the 12x24 font for a 160x45 | console. | | If you apply the "Optimized rasops32 putchar" patch I just posted, you | should see another significant speedup. | | | Original Message | Subject: Re: [PATCH] fast conditional console scrolling | From: Paul de Weerd | Date: Fri, June 26, 2020 1:23 am | To: jo...@armadilloaerospace.com | Cc: "tech@openbsd.org" | | Hi John, | | I tried your diff. I don't quite see the same 3x improvement that you | report, more like 2x. I timed 7 runs of ls -R /usr/ports: | | Before diff, time ls -R /usr/ports | wc -l 2.897s on average | After diff, time ls -R /usr/ports | wc -l 2.707s on average | | Before diff, time ls -R /usr/ports 2m53.067 on average | After diff, time ls -R /usr/ports 1m30.387 on average | | Note that the 'before diff' runs were with a snapshot kernel. There | may be diffs in there that account for the difference between before | and after of the no-output runs. See dmesg and full stats below. | | So, on average, a speed-up of ~48%. | | Thanks! | | Paul 'WEiRD' de Weerd | | -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/
Re: [PATCH] fast conditional console scrolling
I should have been more rigorous -- I had two different changes running on my system, as well as forcing it to use the 12x24 font for a 160x45 console. If you apply the "Optimized rasops32 putchar" patch I just posted, you should see another significant speedup. Original Message Subject: Re: [PATCH] fast conditional console scrolling From: Paul de Weerd Date: Fri, June 26, 2020 1:23 am To: jo...@armadilloaerospace.com Cc: "tech@openbsd.org" Hi John, I tried your diff. I don't quite see the same 3x improvement that you report, more like 2x. I timed 7 runs of ls -R /usr/ports: Before diff, time ls -R /usr/ports | wc -l 2.897s on average After diff, time ls -R /usr/ports | wc -l 2.707s on average Before diff, time ls -R /usr/ports 2m53.067 on average After diff, time ls -R /usr/ports 1m30.387 on average Note that the 'before diff' runs were with a snapshot kernel. There may be diffs in there that account for the difference between before and after of the no-output runs. See dmesg and full stats below. So, on average, a speed-up of ~48%. Thanks! Paul 'WEiRD' de Weerd
Re: [PATCH] fast conditional console scrolling
Hi John, I tried your diff. I don't quite see the same 3x improvement that you report, more like 2x. I timed 7 runs of ls -R /usr/ports: Before diff, time ls -R /usr/ports | wc -l 2.897s on average After diff, time ls -R /usr/ports | wc -l 2.707s on average Before diff, time ls -R /usr/ports 2m53.067 on average After diff, time ls -R /usr/ports 1m30.387 on average Note that the 'before diff' runs were with a snapshot kernel. There may be diffs in there that account for the difference between before and after of the no-output runs. See dmesg and full stats below. So, on average, a speed-up of ~48%. Thanks! Paul 'WEiRD' de Weerd --- full stats --- pre-diff, no output post-diff, no output realusersystem realusersystem 02.94 00.58 02.40 02.70 00.58 02.12 02.88 00.56 02.37 02.71 00.39 02.32 03.03 00.46 02.60 02.70 00.43 02.26 02.85 00.52 02.36 02.69 00.54 02.18 02.88 00.45 02.43 02.62 00.53 02.10 02.87 00.50 02.38 02.72 00.62 02.11 02.83 00.57 02.29 02.81 00.45 02.36 pre-diff, with output post-diff, with output realusersystem realusersystem 2m53.17 00.90 2m52.27 1m30.81 01.23 1m29.50 2m53.12 00.81 2m52.31 1m30.58 01.33 1m29.30 2m53.01 00.88 2m52.11 1m30.49 01.11 1m29.40 2m53.06 01.03 2m52.00 1m30.53 01.29 1m29.26 2m52.99 00.80 2m52.24 1m30.27 01.08 1m29.19 2m53.11 00.96 2m52.16 1m30.40 01.14 1m29.27 2m53.01 00.79 2m52,28 1m30.33 01.11 1m29.24 -- --- dmesg OpenBSD 6.7-current (GENERIC.MP) #296: Wed Jun 24 11:34:44 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 34243903488 (32657MB) avail mem = 33191059456 (31653MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xec410 (88 entries) bios0: vendor Dell Inc. version "A22" date 02/01/2018 bios0: Dell Inc. OptiPlex 9020 acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC FPDT SLIC LPIT SSDT SSDT SSDT HPET SSDT MCFG SSDT ASF! DMAR acpi0: wakeup devices UAR1(S3) RP01(S4) PXSX(S4) PXSX(S4) PXSX(S4) RP05(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) GLAN(S4) EHC1(S3) EHC2(S3) XHC_(S4) HDEF(S4) PEG0(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3692.06 MHz, 06-3c-03 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: TSC skew=0 observed drift=0 cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3691.46 MHz, 06-3c-03 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: TSC skew=1 observed drift=0 cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3691.46 MHz, 06-3c-03 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: TSC skew=12 observed drift=0 cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, 3691.46 MHz, 06-3c-03 cpu3:
[PATCH] fast conditional console scrolling
This causes the write-only framebuffer console to only redraw the chars that differ between the start and end positions. 'time ls -R /usr/src/sys' is 3x faster with this, because most of the characters stay the same after a scroll. If this looks good, I can do the same thing for clear rows and copy/ clear columns, although I will need to make a test case for them. It would probably be a good idea to change the rasops interface to have generic block copy and clear oeprations, versus the current full-column / full-row interface, so tmux and friends could get the full acceleration. Index: rasops.c === RCS file: /cvs/src/sys/dev/rasops/rasops.c,v retrieving revision 1.61 diff -u -p -r1.61 rasops.c --- rasops.c25 May 2020 09:55:49 - 1.61 +++ rasops.c26 Jun 2020 04:14:13 - @@ -1627,28 +1627,42 @@ rasops_vcons_copyrows(void *cookie, int struct rasops_info *ri = scr->rs_ri; int cols = ri->ri_cols; int row, col, rc; + int srcofs; + int move; + /* update the scrollback buffer if the entire screen is moving */ if (dst == 0 && (src + num == ri->ri_rows) && scr->rs_sbscreens > 0) memmove(>rs_bs[dst], >rs_bs[src * cols], - ((ri->ri_rows * (scr->rs_sbscreens + 1) * cols) - - (src * cols)) * sizeof(struct wsdisplay_charcell)); - else + ri->ri_rows * scr->rs_sbscreens * cols + * sizeof(struct wsdisplay_charcell)); + + /* copy everything */ + if ((ri->ri_flg & RI_WRONLY) == 0 || !scr->rs_visible) { memmove(>rs_bs[dst * cols + scr->rs_dispoffset], - >rs_bs[src * cols + scr->rs_dispoffset], - num * cols * sizeof(struct wsdisplay_charcell)); + >rs_bs[src * cols + scr->rs_dispoffset], + num * cols * sizeof(struct wsdisplay_charcell)); - if (!scr->rs_visible) - return 0; + if (!scr->rs_visible) + return 0; - if ((ri->ri_flg & RI_WRONLY) == 0) return ri->ri_copyrows(ri, src, dst, num); + } - for (row = dst; row < dst + num; row++) { + /* smart update, only redraw characters that are different */ + srcofs = (src - dst) * cols; + + for (move = 0 ; move < num ; move++) { + row = srcofs > 0 ? dst + move : dst + num - 1 - move; for (col = 0; col < cols; col++) { int off = row * cols + col + scr->rs_dispoffset; - - rc = ri->ri_putchar(ri, row, col, - scr->rs_bs[off].uc, scr->rs_bs[off].attr); + int newc = scr->rs_bs[off+srcofs].uc; + int newa = scr->rs_bs[off+srcofs].attr; + if ( scr->rs_bs[off].uc == newc + && scr->rs_bs[off].attr == newa ) + continue; + scr->rs_bs[off].uc = newc; + scr->rs_bs[off].attr = newa; + rc = ri->ri_putchar(ri, row, col, newc, newa); if (rc != 0) return rc; }