Re: powerpc64: 64-bit-ize memmove.S

2020-06-27 Thread George Koehler
On Sat, 27 Jun 2020 01:27:14 +0200
Christian Weisgerber  wrote:

> That function simply copies as many (double)words plus a tail of
> bytes as the length argument specifies.  Neither source nor destination
> are checked for alignment, so this will happily run a loop of
> unaligned accesses, which doesn't sound very optimal.

I made a benchmark and concluded that unaligned word copies are slower
than aligned word copies, but faster than byte copies.  In most cases,
memmove.S is faster than memmove.c, but if aligned word copies between
unaligned buffers are possible, then memmove.c is faster.

The benchmark was on a 32-bit macppc G3 with
cpu0 at mainbus0: 750 (Revision 0x202): 400 MHz: 512KB backside cache

The benchmark has 4 implementations of memmove,
  stbu  =>  byte copy with lbzu,stbu loop
  stbx  =>  byte copy with lbzx,stbx,addi loop
  C =>  aligned word copy or byte copy (libc/string/memmove.c)
  asm   =>  unaligned word copy (libc/arch/powerpc/string/memmove.S)

It shows time measured by mftb (move from timebase).

1st bench: move 1 bytes up by 4 bytes, then down by 4 bytes, in
aligned buffer (offset 0).  asm wins:

$ ./bench 1 4 0
stbustbxC   asm
26392814792 633
25022814784 628
25012814783 627
25012814784 626

2nd bench: unaligned buffer (offset 1), but (src & 3) == (dst & 3), so
C does aligned word copies, while asm does misaligned.  C wins:

$ ./bench 1 4 1
stbustbxC   asm
26383006795 961
25022814786 938
25012814786 939
25012813785 939

3rd bench: move up then down by 5 bytes, src & 3 != dst & 3, can't
align word copies.  C does byte copies.  asm wins:

$ ./bench 1 5 0 
stbustbxC   asm
267528152514809
250128132504782
250228152504782
250128142503782

I think that memmove.S is probably better than memmove.c on G3.
I haven't run the bench on POWER9.



Re: Stuck in Needbuf state, trying to understand (6.7)

2020-06-27 Thread Bob Beck


No. 

I know *exactly* what needbuf is but to attempt to diagnose what your
problem is we need exact details. especially:

1) The configuration of your system including all the details of the filesystems
you have mounted, all options used, etc. 

2) The script you are using to generate the problem (Not a paraphrasing of what
you think the script does) What filesystems it is using. 



On Sat, Jun 27, 2020 at 08:09:18PM -0400, sven falempin wrote:
> On Fri, Jun 26, 2020 at 7:35 PM sven falempin 
> wrote:
> 
> >
> >
> > On Fri, Jun 26, 2020 at 5:22 PM Stuart Henderson 
> > wrote:
> >
> >> On 2020/06/26 15:30, sven falempin wrote:
> >> > behavior confirmed on current.
> >> >
> >> > Once the process stalls,  ( could be anything writing to the vnconfig
> >> disk,
> >> > cp , umount )
> >> > a few other calls like df , or ps, etc may hang, never the same
> >> > sp or mp kernel, reproduced on today's snapshots.
> >>
> >> vnconfig is used as part of "make release", many builds are done every
> >> week using this so it's not a general problem with vnconfig.
> >>
> >> Can you show some commands or a script to trigger the behaviour?
> >>
> >
> > the perl script use the system to call :
> >
> > vnconfig.
> > mount.
> > umount. <- saw hanged
> > cp.<- saw hanged
> > tar.<- saw hanged
> > svn up.<- saw hanged
> > and dd.
> > newfs.
> >
> > really nothing fancy, only stuff writing to disk got stuck.
> >
> > At one point it does a chroot but it never hangs near that , most of the
> > time it hangs before.
> >
> > The script has been used like 1000 times on 6.0 and maybe twice more on
> > 6.4.
> >
> > I have absolutely no idea what the 'needbuf' of top is .
> >
> > the script hangs at random position , always writing into vnconfig.
> >
> > I have no idea how to reproduce outside the perl script , so maybe it is
> > related
> > to some devious perl stdin/stdout buffer .
> >
> > Nevertheless there's like a 5% chance that's the script will work( slowly )
> >
> > Most of the system call are inside a routine to log
> >
> > sub debug_system {
> >   $logger->debug('running: '.join(' ', @_));
> >   return system(@_);
> > }
> >
> > so i can easily put things inside to try to understand the issue.
> >
> > It is really a strange behavior, and the device must be shut down
> > electrically.
> > Something really odd, i run syslogd on a buffer, and syslogc buffer is
> > stuck too
> > when the device stuck (but it supposed to be mostly already allocated
> > memory ).
> >
> > It's really like the vm does not want to give anymore bucket (<- i
> > don't know what i m talking about here,
> > but i looks like that anything that doesn't malloc is ok , computer reply
> > to ping , can do a few things for a while , and then complete
> > hang )
> >
> > I ran the 6.7 release on a VM somewhere and another device with many perl
> > script and they work.
> >
> > Only this fails 95% of the time and is VERY VERY slow when ok.
> > compared to what i saw in /usr/src the vnconfig is big ,  ( forgot to copy
> > df -h  ),
> > like 2GB
> >
> 
> 
> i put ktrace in front of the perl system call
> 
> An di was able to recover a 800MB trace
> 
> $ kdump -f ./trace.out | tail -20
> kdump: realloc: Cannot allocate memory
>  25955 UNKNOWN(1634890859)
>  72466 ? CALL  syscall()
> 
> 
> could that be of some use ?
> 
> 
> -- 
> --
> -
> Knowing is not enough; we must apply. Willing is not enough; we must do



Re: Stuck in Needbuf state, trying to understand (6.7)

2020-06-27 Thread sven falempin
On Fri, Jun 26, 2020 at 7:35 PM sven falempin 
wrote:

>
>
> On Fri, Jun 26, 2020 at 5:22 PM Stuart Henderson 
> wrote:
>
>> On 2020/06/26 15:30, sven falempin wrote:
>> > behavior confirmed on current.
>> >
>> > Once the process stalls,  ( could be anything writing to the vnconfig
>> disk,
>> > cp , umount )
>> > a few other calls like df , or ps, etc may hang, never the same
>> > sp or mp kernel, reproduced on today's snapshots.
>>
>> vnconfig is used as part of "make release", many builds are done every
>> week using this so it's not a general problem with vnconfig.
>>
>> Can you show some commands or a script to trigger the behaviour?
>>
>
> the perl script use the system to call :
>
> vnconfig.
> mount.
> umount. <- saw hanged
> cp.<- saw hanged
> tar.<- saw hanged
> svn up.<- saw hanged
> and dd.
> newfs.
>
> really nothing fancy, only stuff writing to disk got stuck.
>
> At one point it does a chroot but it never hangs near that , most of the
> time it hangs before.
>
> The script has been used like 1000 times on 6.0 and maybe twice more on
> 6.4.
>
> I have absolutely no idea what the 'needbuf' of top is .
>
> the script hangs at random position , always writing into vnconfig.
>
> I have no idea how to reproduce outside the perl script , so maybe it is
> related
> to some devious perl stdin/stdout buffer .
>
> Nevertheless there's like a 5% chance that's the script will work( slowly )
>
> Most of the system call are inside a routine to log
>
> sub debug_system {
>   $logger->debug('running: '.join(' ', @_));
>   return system(@_);
> }
>
> so i can easily put things inside to try to understand the issue.
>
> It is really a strange behavior, and the device must be shut down
> electrically.
> Something really odd, i run syslogd on a buffer, and syslogc buffer is
> stuck too
> when the device stuck (but it supposed to be mostly already allocated
> memory ).
>
> It's really like the vm does not want to give anymore bucket (<- i
> don't know what i m talking about here,
> but i looks like that anything that doesn't malloc is ok , computer reply
> to ping , can do a few things for a while , and then complete
> hang )
>
> I ran the 6.7 release on a VM somewhere and another device with many perl
> script and they work.
>
> Only this fails 95% of the time and is VERY VERY slow when ok.
> compared to what i saw in /usr/src the vnconfig is big ,  ( forgot to copy
> df -h  ),
> like 2GB
>


i put ktrace in front of the perl system call

An di was able to recover a 800MB trace

$ kdump -f ./trace.out | tail -20
kdump: realloc: Cannot allocate memory
 25955 UNKNOWN(1634890859)
 72466 ▒▒▒ CALL  syscall()


could that be of some use ?


-- 
--
-
Knowing is not enough; we must apply. Willing is not enough; we must do


Re: ldomctl: Fix init-system with multiple PCIe root complexes

2020-06-27 Thread Klemens Nanni
On Sat, Jun 20, 2020 at 01:05:22AM +0200, Klemens Nanni wrote:
> Opposed to all other (single CPU) machines I have encountered so far,
> the T4-2 has two instead of one PCIe root complexes.
> 
> ldomctl already accounts for this and iterates over them but lacks a
> simple skip condition when iterating over subdevices to avoid linking
> devices in one root complex to those in another.
> 
> This fixes `init-system' on my T4-2 where I have been using a lame
> work-around so far, but the recent report on bugs@ reminded me to look
> into it more closely this time.
> 
> Thanks to tracey for quickly providing details about his hardware for
> quick comparison.
Has anyone tried this (on machines other than T4-2)?  Koakuma on bugs@
reported that this fixes ldomctl on their T4-2 just like expected.

I'd like to commit this soon.
Feedback? OK?


Index: config.c
===
RCS file: /cvs/src/usr.sbin/ldomctl/config.c,v
retrieving revision 1.40
diff -u -p -r1.40 config.c
--- config.c24 May 2020 22:08:54 -  1.40
+++ config.c27 Jun 2020 23:35:38 -
@@ -1142,6 +1142,8 @@ hvmd_finalize_pcie_device(struct md *md,
md_link_node(md, node, parent);
 
TAILQ_FOREACH(subdevice, >guest->subdevice_list, link) {
+   if (strncmp(path, subdevice->path, strlen(path)) != 0)
+   continue;
TAILQ_FOREACH(component, , link) {
if (strcmp(subdevice->path, component->path) == 0)
md_link_node(md, parent, component->hv_node);



Re: 11n Tx aggregation for iwm(4)

2020-06-27 Thread Jesper Wallin
Tested on a "Intel Dual Band Wireless-AC 9260" rev 0x29, msix
(hw rev 0x320, fw ver 34.3125811985.0)

I seem to be getting "iwm0: fatal firmware error" a few seconds after
the 4-way handshake.  I can send a few packets, so it sure connects
and all, but then it fails shortly after.

iwm0: begin active scan
iwm0: INIT -> SCAN
iwm0: end active scan
iwm0: + 70:73:cb:cb:c3:86   40   +45 54M   ess  privacy   rsn  "FRA"
iwm0: SCAN -> AUTH
iwm0: sending auth to 70:73:cb:cb:c3:86 on channel 40 mode 11a
iwm0: AUTH -> ASSOC
iwm0: sending assoc_req to 70:73:cb:cb:c3:86 on channel 40 mode 11a
iwm0: ASSOC -> RUN
iwm0: associated with 70:73:cb:cb:c3:86 ssid "FRA" channel 40 start MCS 0 long 
preamble short slot time HT enabled
iwm0: missed beacon threshold set to 30 beacons, beacon interval is 100 TU
iwm0: received msg 1/4 of the 4-way handshake from 70:73:cb:cb:c3:86
iwm0: sending msg 2/4 of the 4-way handshake to 70:73:cb:cb:c3:86
iwm0: received msg 3/4 of the 4-way handshake from 70:73:cb:cb:c3:86
iwm0: sending msg 4/4 of the 4-way handshake to 70:73:cb:cb:c3:86
iwm0: sending action to 70:73:cb:cb:c3:86 on channel 40 mode 11n
iwm0: fatal firmware error



Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread johnc
I did some more tests, and I think the odd performance I am seeing
may be due to TLB thrash on the 32x64 characters with 4k pages,
since writing each character will require 64 data TLB.

Are huge page mappings supported in OpenBSD?

 Original Message 
Subject: Re: [PATCH} Optimized rasops32 putchar
From: Mark Kettenis 
Date: Sat, June 27, 2020 1:30 pm
To: 
Cc: tech@openbsd.org

> Content-Type: text/plain; charset="utf-8"
> From: 
> 
> I was doing my timings with a user mode program after mmmaping the
> efifb display, so the mapping might be different in the kernel.

That should still give you a write-combining mapping as efifb_mmap()
adds the PMAP_WC flag to the physical address.

Cachable on x86 means write-back cachable. And using a write-back
cachable mapping for a framebuffer often leads to interesting "damage"
where pixels in certain cache lines show up "late" on the display.
Not sure if you'd see that on recent Intel graphics hardware as the
current hardware designs are much more coherent than what was produced
in the past.

> Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES /
> WSDISPLAYIO_SMODE to the drm drivers by consolidating code into
> rasops. While the point of the DRM drivers is to get fully hardware
> accelerated drawing in X, there isn't any reason why they can't
> support dumb framebuffer mappings as well.

True. Although there are DRM interfaces that give you a dumb
framebuffer as well. Using those interfaces is a bit more complicated
though.

Centralising the code would be good. That code probably should use
bus_space_mmap(4) as the PMAP_WC flag is amd64-specific.
Unfortunately the amd64 implementation of bus_space_mmap(4) is
incomplete and doesn't actually implement write-combining for mappings
with the BUS_SPACE_MAP_PREFETCHABLE flag set. So that has to be fixed
as well.

>  Original Message 
> Subject: RE: [PATCH} Optimized rasops32 putchar
> From: 
> Date: Sat, June 27, 2020 11:13 am
> To: "Mark Kettenis" 
> Cc: "tech@openbsd.org" 
> 
> I believe it is mapped as normally cached right now, rather than
> uncached or write combining.
> 
> Reads aren't ultra-slow, and the timings of 48 byte writes appear to
> involve a cacheline read.
> 
> 128 byte writes are actually slower than 64 byte writes, which I
> guessed might be because of automatic prefetching kicking in and
> reading the following cacheline.
> 
> 
>  Original Message 
> Subject: Re: [PATCH} Optimized rasops32 putchar
> From: Mark Kettenis 
> Date: Sat, June 27, 2020 7:56 am
> To: 
> Cc: tech@openbsd.org
> 
> > From: 
> > Date: Fri, 26 Jun 2020 07:42:50 -0700
> > 
> > Optimized 32 bit character rendering with unrolled rows and pairwise
> > foreground / background pixel rendering.
> > 
> > If it weren't for the 5x8 font, I would have just assumed everything
> > was an even width and made the fallback path also pairwise.
> > 
> > In isolation, the 16x32 character case got 2x faster, but that wasn't
> > a huge real world speedup where the space rendering that was already
> > at memory bandwidth limits accounted for most of the character
> > rendering time. However, in combination with the previous fast
> > conditional console scrolling that removes most of the space rendering,
> > it becomes significant.
> > 
> > I also found that at least the efi and intel framebuffers are not
> > currently mapped write combining, which makes this much slower than
> > it should be.
> 
> Hi John,
> 
> The framebuffer should be mapped write-combining. In OpenBSD this is
> requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
> bbus_space_map(9) when mapping the framebuffer.
> 
> I'm fairly confident since until last January the initial mapping of
> the framebuffer that we used wasn't write-combining. And things were
> really, really slow.
> 
> Cheers,
> 
> Mark
> 
> > Index: rasops32.c
> > ===
> > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 rasops32.c
> > --- rasops32.c 25 May 2020 09:55:49 - 1.10
> > +++ rasops32.c 26 Jun 2020 14:34:06 -
> > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> > int
> > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> > attr)
> > {
> > - int width, height, cnt, fs, fb, clr[2];
> > + int width, height, step, cnt, fs, b, f;
> > + uint32_t fb, clr[2];
> > struct rasops_info *ri;
> > - int32_t *dp, *rp;
> > + int64_t *rp, q;
> > + union {
> > + int64_t q[4];
> > + int32_t d[4][2];
> > + } u;
> > u_char *fr;
> > 
> > ri = (struct rasops_info *)cookie;
> > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> > return 0;
> > #endif
> > 
> > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> > 
> > height = ri->ri_font->fontheight;
> > width = ri->ri_font->fontwidth;
> > + step = 

Re: OpenBSD.calendar patch

2020-06-27 Thread jungle boogie

Hi Again,

Here's a second attempt using git/got. Is this better?

diff 382c05176131a97b161018e0e88f5417f810eb9c /var/git/src
blob - b6b2ef6c918b12164e293c04db2be2dc45ab656a
file + usr.bin/calendar/calendars/calendar.openbsd
--- usr.bin/calendar/calendars/calendar.openbsd
+++ usr.bin/calendar/calendars/calendar.openbsd
@@ -10,15 +10,19 @@
 Jan 06 IPF gets integrated into the OpenBSD kernel, 1996
 Jan 06 NRL IPv6 addition to OpenBSD, 1999
 Jan 09 n2k10: Network hackathon, Melbourne, Australia, 17 developers, 2010
+Jan 12 u2k20: Uckermark hackathon, Urckermark, Germany, 14 developers, 2020
 Jan 13 n2k13: Network hackathon, Dunedin, New Zealand, 17 developers, 2013
+Jan 17 Antipodean hackathon, Wellington, New Zealand, 18 developers, 2019
 Jan 18 n2k14: Mini-hackathon, Dunedin, New Zealand, 15 developers, 2014
 Jan 20 Bind 9 goes into the tree, 2003
+Jan 20 a2k20: Antipodean hackathon, Hobart, Tasmania, 17 developers, 2020
 Jan 26 Anoncvs service inaugurated, 1996
 Jan 26 n2k9: Network hackathon, Basel, Switzerland, 19 developers, 2009
 Jan 27 OpenBSD/amd64 port is added, from NetBSD, 2004
 Jan 29 "second anoncvs server is 100 miles from the first", 1996
 Jan 31 OpenBSD/cats port is added, from NetBSD, 2004
 Feb 03 Describe the ports mechanism [in OpenBSD], 1997
+Feb 05 a2k18: Dunedin, New Zeland, 19 developers, 2018
 Feb 13 Unpatented fast block cipher for new password hashing, 1997
 Feb 14 GNU RCS expired from source tree, replaced with OpenRCS, 2007
 Feb 19 IPsec package by John Ioannidis and Angelos D. Keromytis, 1997
@@ -27,6 +31,7 @@ Feb 26bridge(4) transparent firewall added to OpenBSD
 Feb 28 Cryptographic services framework in OpenBSD, 2000
 Mar 09 Support for the VAX architecture removed, 2016
 Mar 10 OpenBSD/WWW translation started -- German, Spanish, Dutch, 2000
+Mar 28 t2k19: Taipei mini hackathon, Taipei, Taiwan, 16 developers, 2019
 Apr 01 OpenBSD/hppa64 port is added, 2005
 Apr 01	k2k11: Kernel hackathon, Hafnarfjordur, Iceland, 15 developers, 
2011
 Apr 10	f2k7: First filesystem hackathon, Vienna, Austria, 14 
developers, 2007

@@ -40,10 +45,12 @@ Apr 24  pf2k4: PF hackathon, Sechelt, BC, 12 developers
 Apr 27 i386/PAE work integrated, 2006
 May 01 OpenBSD 3.3 released, exploiting W^X, 2003
 May 05 n2k8: Network hackathon, Ito, Japan, 18 developers, 2008
+May 07 g2k19: General hackathon, Ottawa, Canada, 43 developers, 2019
 May 08 c2k3 General hackathon, Calgary, Alberta, 51 developers, 2003
 May 09 First commit to OpenBSD stable branch, OPENBSD_2_7, 2000
 May 09 OpenBSD/aviion port is added, 2006
 May 19 OpenBSD 2.3 released, including "ports" system, 1998
+May 19 OpenBSD 6.7 released, 48th release, 2020
 May 21 c2k5: General hackathon, Calgary, Alberta, 60 developers, 2005
 May 21 c2k6: General hackathon, Calgary, Alberta, 47 developers, 2006
 May 24 OpenBSD gets a trunk(4), 2005
@@ -62,6 +69,7 @@ Jun 15OpenBSD 2.7 released, including OpenSSH, 2000
 Jun 15 c2k: First general hackathon, Calgary, Alberta, 18 developers, 2000
 Jun 19 c2k4: General hackathon, Calgary, Alberta, 46 developers, 2004
 Jun 21 c2k1: Birth of PF hackathon, Cambridge, MA, 35 developers, 2001
+Jun 21 WireGuard imported into kernel, 2020
 Jun 23 OpenBSD/hppa started, based on Utah Lites and OSF MkLinux, 1998
 Jun 24 PF added. Insane amounts of work done by dhartmei@, 2001
 Jun 25 c2k10: General hackathon, Edmonton, Alberta, 46 developers, 2010
@@ -70,6 +78,7 @@ Jul 01add strlcpy/strlcat, safe and sensible string c
 Jul 02	c2k11: General hackathon, Edmonton, Alberta, Canada, 36 
developers, 2011

 Jul 07 g2k12: General hackathon, Budapest, Hungary, 41 developers, 2012
 Jul 08 g2k14: General hackathon, Ljubljana, Slovenia, 49 developers, 2014
+Jul 08 g2k18: General hackathon, Ljubljana, Slovenia, 39 developers, 2018
 Jul 11 OpenBSD goes wireless w/ if_wi addition, 1999
 Jul 23 OpenBSD goes multimedia with Brooktree 848 support, 1998
 Jul 24 Non-executable stack on most architectures, 2002
@@ -83,6 +92,7 @@ Aug 17OpenBSD/sparc64 port is added, from NetBSD, 200
 Aug 28	k2k6: IPsec hackathon, Schloss Kransberg, Germany, 14 
developers, 2006

 Sep 01 Support for the sparc (32bit) architecture removed, 2016
 Sep 03 Support for the zaurus architecture removed, 2016
+Sep 06	n2k18: Network hackathon, Usti nad Labem, Czech Republic, 11 
developers, 2018

 Sep 16 s2k11: General hackathon, Ljubljana, Slovenia, 25 developers, 2011
 Sep 17 n2k12: Network hackathon, Starnberg, Germany, 23 developers, 2012
 Sep 19	j2k10: Mini-hackathon, Sakae Mura, Nagano, Japan, 19 
developers, 2010

@@ -103,7 +113,9 @@ Oct 29  OpenBSD 3.6 released, featuring i386 and amd64
 Oct 30	OpenBSD 3.4 released, implementing W^X on i386 and AES in VIA 
C3, 2003

 Nov 01 OpenBSD 3.2 released, ftp mirrors preload for the first time, 2002
 Nov 01 v2k5: First ports hackathon, Venice, Italy, 12 developers, 2005
+Nov 03 l2k18: Libressl hackathon, Edmonton, Canada, 5 developers, 2018
 Nov 05 a2k11: ARM hackathon, Coimbra, 

Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread Mark Kettenis
> Content-Type: text/plain; charset="utf-8"
> From: 
> 
> I was doing my timings with a user mode program after mmmaping the
> efifb display, so the mapping might be different in the kernel.

That should still give you a write-combining mapping as efifb_mmap()
adds the PMAP_WC flag to the physical address.

Cachable on x86 means write-back cachable.  And using a write-back
cachable mapping for a framebuffer often leads to interesting "damage"
where pixels in certain cache lines show up "late" on the display.
Not sure if you'd see that on recent Intel graphics hardware as the
current hardware designs are much more coherent than what was produced
in the past.

> Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES /
> WSDISPLAYIO_SMODE to the drm drivers by consolidating code into
> rasops. While the point of the DRM drivers is to get fully hardware
> accelerated drawing in X, there isn't any reason why they can't
> support dumb framebuffer mappings as well.

True.  Although there are DRM interfaces that give you a dumb
framebuffer as well.  Using those interfaces is a bit more complicated
though.

Centralising the code would be good.  That code probably should use
bus_space_mmap(4) as the PMAP_WC flag is amd64-specific.
Unfortunately the amd64 implementation of bus_space_mmap(4) is
incomplete and doesn't actually implement write-combining for mappings
with the BUS_SPACE_MAP_PREFETCHABLE flag set.  So that has to be fixed
as well.

>  Original Message 
> Subject: RE: [PATCH} Optimized rasops32 putchar
> From: 
> Date: Sat, June 27, 2020 11:13 am
> To: "Mark Kettenis" 
> Cc: "tech@openbsd.org" 
> 
> I believe it is mapped as normally cached right now, rather than
> uncached or write combining.
> 
> Reads aren't ultra-slow, and the timings of 48 byte writes appear to
> involve a cacheline read.
> 
> 128 byte writes are actually slower than 64 byte writes, which I
> guessed might be because of automatic prefetching kicking in and
> reading the following cacheline.
> 
> 
>  Original Message 
> Subject: Re: [PATCH} Optimized rasops32 putchar
> From: Mark Kettenis 
> Date: Sat, June 27, 2020 7:56 am
> To: 
> Cc: tech@openbsd.org
> 
> > From: 
> > Date: Fri, 26 Jun 2020 07:42:50 -0700
> > 
> > Optimized 32 bit character rendering with unrolled rows and pairwise
> > foreground / background pixel rendering.
> > 
> > If it weren't for the 5x8 font, I would have just assumed everything
> > was an even width and made the fallback path also pairwise.
> > 
> > In isolation, the 16x32 character case got 2x faster, but that wasn't
> > a huge real world speedup where the space rendering that was already
> > at memory bandwidth limits accounted for most of the character
> > rendering time. However, in combination with the previous fast
> > conditional console scrolling that removes most of the space rendering,
> > it becomes significant.
> > 
> > I also found that at least the efi and intel framebuffers are not
> > currently mapped write combining, which makes this much slower than
> > it should be.
> 
> Hi John,
> 
> The framebuffer should be mapped write-combining. In OpenBSD this is
> requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
> bbus_space_map(9) when mapping the framebuffer.
> 
> I'm fairly confident since until last January the initial mapping of
> the framebuffer that we used wasn't write-combining. And things were
> really, really slow.
> 
> Cheers,
> 
> Mark
> 
> > Index: rasops32.c
> > ===
> > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 rasops32.c
> > --- rasops32.c 25 May 2020 09:55:49 - 1.10
> > +++ rasops32.c 26 Jun 2020 14:34:06 -
> > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> > int
> > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> > attr)
> > {
> > - int width, height, cnt, fs, fb, clr[2];
> > + int width, height, step, cnt, fs, b, f;
> > + uint32_t fb, clr[2];
> > struct rasops_info *ri;
> > - int32_t *dp, *rp;
> > + int64_t *rp, q;
> > + union {
> > + int64_t q[4];
> > + int32_t d[4][2];
> > + } u;
> > u_char *fr;
> > 
> > ri = (struct rasops_info *)cookie;
> > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> > return 0;
> > #endif
> > 
> > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> > 
> > height = ri->ri_font->fontheight;
> > width = ri->ri_font->fontwidth;
> > + step = ri->ri_stride >> 3;
> > 
> > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
> > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
> > + b = ri->ri_devcmap[(attr >> 16) & 0xf];
> > + f = ri->ri_devcmap[(attr >> 24) & 0xf];
> > + u.d[0][0] = b; u.d[0][1] = b;
> > + u.d[1][0] = b; u.d[1][1] = f;
> > + u.d[2][0] = f; u.d[2][1] = b;
> > + u.d[3][0] = f; u.d[3][1] = f;
> > 
> > if (uc == ' ') {
> > + q = 

Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread johnc
I was doing my timings with a user mode program after mmmaping the
efifb display, so the mapping might be different in the kernel.

Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES /
WSDISPLAYIO_SMODE to the drm drivers by consolidating code into
rasops. While the point of the DRM drivers is to get fully hardware
accelerated drawing in X, there isn't any reason why they can't
support dumb framebuffer mappings as well.


 Original Message 
Subject: RE: [PATCH} Optimized rasops32 putchar
From: 
Date: Sat, June 27, 2020 11:13 am
To: "Mark Kettenis" 
Cc: "tech@openbsd.org" 

I believe it is mapped as normally cached right now, rather than
uncached or write combining.

Reads aren't ultra-slow, and the timings of 48 byte writes appear to
involve a cacheline read.

128 byte writes are actually slower than 64 byte writes, which I
guessed might be because of automatic prefetching kicking in and
reading the following cacheline.


 Original Message 
Subject: Re: [PATCH} Optimized rasops32 putchar
From: Mark Kettenis 
Date: Sat, June 27, 2020 7:56 am
To: 
Cc: tech@openbsd.org

> From: 
> Date: Fri, 26 Jun 2020 07:42:50 -0700
> 
> Optimized 32 bit character rendering with unrolled rows and pairwise
> foreground / background pixel rendering.
> 
> If it weren't for the 5x8 font, I would have just assumed everything
> was an even width and made the fallback path also pairwise.
> 
> In isolation, the 16x32 character case got 2x faster, but that wasn't
> a huge real world speedup where the space rendering that was already
> at memory bandwidth limits accounted for most of the character
> rendering time. However, in combination with the previous fast
> conditional console scrolling that removes most of the space rendering,
> it becomes significant.
> 
> I also found that at least the efi and intel framebuffers are not
> currently mapped write combining, which makes this much slower than
> it should be.

Hi John,

The framebuffer should be mapped write-combining. In OpenBSD this is
requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
bbus_space_map(9) when mapping the framebuffer.

I'm fairly confident since until last January the initial mapping of
the framebuffer that we used wasn't write-combining. And things were
really, really slow.

Cheers,

Mark

> Index: rasops32.c
> ===
> RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 rasops32.c
> --- rasops32.c 25 May 2020 09:55:49 - 1.10
> +++ rasops32.c 26 Jun 2020 14:34:06 -
> @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> int
> rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> attr)
> {
> - int width, height, cnt, fs, fb, clr[2];
> + int width, height, step, cnt, fs, b, f;
> + uint32_t fb, clr[2];
> struct rasops_info *ri;
> - int32_t *dp, *rp;
> + int64_t *rp, q;
> + union {
> + int64_t q[4];
> + int32_t d[4][2];
> + } u;
> u_char *fr;
> 
> ri = (struct rasops_info *)cookie;
> @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> return 0;
> #endif
> 
> - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> 
> height = ri->ri_font->fontheight;
> width = ri->ri_font->fontwidth;
> + step = ri->ri_stride >> 3;
> 
> - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
> - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
> + b = ri->ri_devcmap[(attr >> 16) & 0xf];
> + f = ri->ri_devcmap[(attr >> 24) & 0xf];
> + u.d[0][0] = b; u.d[0][1] = b;
> + u.d[1][0] = b; u.d[1][1] = f;
> + u.d[2][0] = f; u.d[2][1] = b;
> + u.d[3][0] = f; u.d[3][1] = f;
> 
> if (uc == ' ') {
> + q = u.q[0];
> while (height--) {
> - dp = rp;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--)
> - *dp++ = clr[0];
> + /* the general, pixel-at-a-time case is fast enough */
> + for (cnt = 0; cnt < width; cnt++)
> + ((int *)rp)[cnt] = b;
> + rp += step;
> }
> } else {
> uc -= ri->ri_font->firstchar;
> fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale;
> fs = ri->ri_font->stride;
> -
> - while (height--) {
> - dp = rp;
> - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) |
> - (fr[0] << 24);
> - fr += fs;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--) {
> - *dp++ = clr[(fb >> 31) & 1];
> - fb <<= 1;
> - }
> + /* double-pixel special cases for the common widths */
> + switch (width) {
> + case 8:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + rp += step;
> + fr += 1;
> + }
> + break;
> + 
> + case 12:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> + rp += step;
> + fr += 2;

Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread johnc
I believe it is mapped as normally cached right now, rather than
uncached or write combining.

Reads aren't ultra-slow, and the timings of 48 byte writes appear to
involve a cacheline read.

128 byte writes are actually slower than 64 byte writes, which I
guessed might be because of automatic prefetching kicking in and
reading the following cacheline.


 Original Message 
Subject: Re: [PATCH} Optimized rasops32 putchar
From: Mark Kettenis 
Date: Sat, June 27, 2020 7:56 am
To: 
Cc: tech@openbsd.org

> From: 
> Date: Fri, 26 Jun 2020 07:42:50 -0700
> 
> Optimized 32 bit character rendering with unrolled rows and pairwise
> foreground / background pixel rendering.
> 
> If it weren't for the 5x8 font, I would have just assumed everything
> was an even width and made the fallback path also pairwise.
> 
> In isolation, the 16x32 character case got 2x faster, but that wasn't
> a huge real world speedup where the space rendering that was already
> at memory bandwidth limits accounted for most of the character
> rendering time. However, in combination with the previous fast
> conditional console scrolling that removes most of the space rendering,
> it becomes significant.
> 
> I also found that at least the efi and intel framebuffers are not
> currently mapped write combining, which makes this much slower than
> it should be.

Hi John,

The framebuffer should be mapped write-combining. In OpenBSD this is
requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
bbus_space_map(9) when mapping the framebuffer.

I'm fairly confident since until last January the initial mapping of
the framebuffer that we used wasn't write-combining. And things were
really, really slow.

Cheers,

Mark

> Index: rasops32.c
> ===
> RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 rasops32.c
> --- rasops32.c 25 May 2020 09:55:49 - 1.10
> +++ rasops32.c 26 Jun 2020 14:34:06 -
> @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
> int
> rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> attr)
> {
> - int width, height, cnt, fs, fb, clr[2];
> + int width, height, step, cnt, fs, b, f;
> + uint32_t fb, clr[2];
> struct rasops_info *ri;
> - int32_t *dp, *rp;
> + int64_t *rp, q;
> + union {
> + int64_t q[4];
> + int32_t d[4][2];
> + } u;
> u_char *fr;
> 
> ri = (struct rasops_info *)cookie;
> @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
> return 0;
> #endif
> 
> - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> 
> height = ri->ri_font->fontheight;
> width = ri->ri_font->fontwidth;
> + step = ri->ri_stride >> 3;
> 
> - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
> - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
> + b = ri->ri_devcmap[(attr >> 16) & 0xf];
> + f = ri->ri_devcmap[(attr >> 24) & 0xf];
> + u.d[0][0] = b; u.d[0][1] = b;
> + u.d[1][0] = b; u.d[1][1] = f;
> + u.d[2][0] = f; u.d[2][1] = b;
> + u.d[3][0] = f; u.d[3][1] = f;
> 
> if (uc == ' ') {
> + q = u.q[0];
> while (height--) {
> - dp = rp;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--)
> - *dp++ = clr[0];
> + /* the general, pixel-at-a-time case is fast enough */
> + for (cnt = 0; cnt < width; cnt++)
> + ((int *)rp)[cnt] = b;
> + rp += step;
> }
> } else {
> uc -= ri->ri_font->firstchar;
> fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale;
> fs = ri->ri_font->stride;
> -
> - while (height--) {
> - dp = rp;
> - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) |
> - (fr[0] << 24);
> - fr += fs;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--) {
> - *dp++ = clr[(fb >> 31) & 1];
> - fb <<= 1;
> - }
> + /* double-pixel special cases for the common widths */
> + switch (width) {
> + case 8:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + rp += step;
> + fr += 1;
> + }
> + break;
> + 
> + case 12:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> + rp += step;
> + fr += 2;
> + }
> + break;
> + 
> + case 16:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> + rp[6] = u.q[(fb >> 2) & 3];
> + rp[7] = u.q[fb & 3];
> + rp += step;
> + fr += 2;
> + }
> + break; 
> + case 32:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + fb = fr[1];
> + rp[4] = u.q[fb >> 6];
> + rp[5] = u.q[(fb >> 4) & 3];
> 

Re: [PATCH 3/6] crypto: cast: convert to use new modes 64-bit helpers

2020-06-27 Thread Joerg Sonnenberger
On Sat, Jun 27, 2020 at 10:36:58PM +0300, Dmitry Baryshkov wrote:
> + * 3. All advertising materials mentioning features or use of this software
> + *must display the following acknowledgement:
> + *"This product includes cryptographic software written by
> + * Eric Young (e...@cryptsoft.com)"
> + *The word 'cryptographic' can be left out if the rouines from the 
> library
> + *being used are not cryptographic related :-).

Is the typo in routines necessary?

Joreg



[PATCH 4/6] crypto: IDEA: convert to use new modes 64-bit helpers

2020-06-27 Thread Dmitry Baryshkov
Convert IDEA cipher to use 64-bit modes helper functions.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/idea/i_cbc.c   | 74 +++-
 src/lib/libcrypto/idea/i_cfb64.c | 57 ++--
 src/lib/libcrypto/idea/i_ofb64.c | 47 ++--
 3 files changed, 13 insertions(+), 165 deletions(-)

diff --git a/src/lib/libcrypto/idea/i_cbc.c b/src/lib/libcrypto/idea/i_cbc.c
index 5bb9640c3403..556a4aa5cbf3 100644
--- a/src/lib/libcrypto/idea/i_cbc.c
+++ b/src/lib/libcrypto/idea/i_cbc.c
@@ -57,81 +57,17 @@
  */
 
 #include 
+#include 
 #include "idea_lcl.h"
 
 void idea_cbc_encrypt(const unsigned char *in, unsigned char *out, long length,
 IDEA_KEY_SCHEDULE *ks, unsigned char *iv, int encrypt)
-   {
-   unsigned long tin0,tin1;
-   unsigned long tout0,tout1,xor0,xor1;
-   long l=length;
-   unsigned long tin[2];
-
+{
if (encrypt)
-   {
-   n2l(iv,tout0);
-   n2l(iv,tout1);
-   iv-=8;
-   for (l-=8; l>=0; l-=8)
-   {
-   n2l(in,tin0);
-   n2l(in,tin1);
-   tin0^=tout0;
-   tin1^=tout1;
-   tin[0]=tin0;
-   tin[1]=tin1;
-   idea_encrypt(tin,ks);
-   tout0=tin[0]; l2n(tout0,out);
-   tout1=tin[1]; l2n(tout1,out);
-   }
-   if (l != -8)
-   {
-   n2ln(in,tin0,tin1,l+8);
-   tin0^=tout0;
-   tin1^=tout1;
-   tin[0]=tin0;
-   tin[1]=tin1;
-   idea_encrypt(tin,ks);
-   tout0=tin[0]; l2n(tout0,out);
-   tout1=tin[1]; l2n(tout1,out);
-   }
-   l2n(tout0,iv);
-   l2n(tout1,iv);
-   }
+   CRYPTO_cbc64_encrypt(in, out, length, ks, iv, 
(block64_f)idea_ecb_encrypt);
else
-   {
-   n2l(iv,xor0);
-   n2l(iv,xor1);
-   iv-=8;
-   for (l-=8; l>=0; l-=8)
-   {
-   n2l(in,tin0); tin[0]=tin0;
-   n2l(in,tin1); tin[1]=tin1;
-   idea_encrypt(tin,ks);
-   tout0=tin[0]^xor0;
-   tout1=tin[1]^xor1;
-   l2n(tout0,out);
-   l2n(tout1,out);
-   xor0=tin0;
-   xor1=tin1;
-   }
-   if (l != -8)
-   {
-   n2l(in,tin0); tin[0]=tin0;
-   n2l(in,tin1); tin[1]=tin1;
-   idea_encrypt(tin,ks);
-   tout0=tin[0]^xor0;
-   tout1=tin[1]^xor1;
-   l2nn(tout0,tout1,out,l+8);
-   xor0=tin0;
-   xor1=tin1;
-   }
-   l2n(xor0,iv);
-   l2n(xor1,iv);
-   }
-   tin0=tin1=tout0=tout1=xor0=xor1=0;
-   tin[0]=tin[1]=0;
-   }
+   CRYPTO_cbc64_decrypt(in, out, length, ks, iv, 
(block64_f)idea_ecb_encrypt);
+}
 
 void idea_encrypt(unsigned long *d, IDEA_KEY_SCHEDULE *key)
{
diff --git a/src/lib/libcrypto/idea/i_cfb64.c b/src/lib/libcrypto/idea/i_cfb64.c
index b979aaef8669..a74b50d82309 100644
--- a/src/lib/libcrypto/idea/i_cfb64.c
+++ b/src/lib/libcrypto/idea/i_cfb64.c
@@ -57,6 +57,7 @@
  */
 
 #include 
+#include 
 #include "idea_lcl.h"
 
 /* The input and output encrypted as though 64bit cfb mode is being
@@ -67,56 +68,6 @@
 void idea_cfb64_encrypt(const unsigned char *in, unsigned char *out,
long length, IDEA_KEY_SCHEDULE *schedule,
unsigned char *ivec, int *num, int encrypt)
-   {
-   unsigned long v0,v1,t;
-   int n= *num;
-   long l=length;
-   unsigned long ti[2];
-   unsigned char *iv,c,cc;
-
-   iv=(unsigned char *)ivec;
-   if (encrypt)
-   {
-   while (l--)
-   {
-   if (n == 0)
-   {
-   n2l(iv,v0); ti[0]=v0;
-   n2l(iv,v1); ti[1]=v1;
-   idea_encrypt((unsigned long *)ti,schedule);
-   iv=(unsigned char *)ivec;
-   t=ti[0]; l2n(t,iv);
-   t=ti[1]; l2n(t,iv);
-   iv=(unsigned char *)ivec;
-   }
-   c= *(in++)^iv[n];
-   *(out++)=c;
-   iv[n]=c;
-   n=(n+1)&0x07;
-   }
-   }
-  

[PATCH 6/6] crypto: Gost 28147-89: convert to use new modes 64-bit helpers

2020-06-27 Thread Dmitry Baryshkov
Convert Gost 28147-89 cipher to use 64-bit modes helper functions.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/gost/gost2814789.c | 121 ++-
 1 file changed, 9 insertions(+), 112 deletions(-)

diff --git a/src/lib/libcrypto/gost/gost2814789.c 
b/src/lib/libcrypto/gost/gost2814789.c
index e285413ed460..bbd578ef7010 100644
--- a/src/lib/libcrypto/gost/gost2814789.c
+++ b/src/lib/libcrypto/gost/gost2814789.c
@@ -56,6 +56,7 @@
 #ifndef OPENSSL_NO_GOST
 #include 
 #include 
+#include 
 
 #include "gost_locl.h"
 
@@ -181,15 +182,17 @@ Gost2814789_ecb_encrypt(const unsigned char *in, unsigned 
char *out,
 }
 
 static inline void
-Gost2814789_encrypt_mesh(unsigned char *iv, GOST2814789_KEY *key)
+Gost2814789_encrypt_mesh(const unsigned char *in, unsigned char *out, 
GOST2814789_KEY *key)
 {
if (key->key_meshing && key->count == 1024) {
Gost2814789_cryptopro_key_mesh(key);
-   Gost2814789_encrypt(iv, iv, key);
-   key->count = 0;
+   Gost2814789_encrypt(in, out, key);
+   Gost2814789_encrypt(out, out, key);
+   key->count = 8;
+   } else {
+   Gost2814789_encrypt(in, out, key);
+   key->count += 8;
}
-   Gost2814789_encrypt(iv, iv, key);
-   key->count += 8;
 }
 
 static inline void
@@ -209,113 +212,7 @@ Gost2814789_cfb64_encrypt(const unsigned char *in, 
unsigned char *out,
 size_t len, GOST2814789_KEY *key, unsigned char *ivec, int *num,
 const int enc)
 {
-   unsigned int n;
-   size_t l = 0;
-
-   n = *num;
-
-   if (enc) {
-#if !defined(OPENSSL_SMALL_FOOTPRINT)
-   if (8 % sizeof(size_t) == 0) do { /* always true actually */
-   while (n && len) {
-   *(out++) = ivec[n] ^= *(in++);
-   --len;
-   n = (n + 1) % 8;
-   }
-#ifdef __STRICT_ALIGNMENT
-   if (((size_t)in | (size_t)out | (size_t)ivec) %
-   sizeof(size_t) != 0)
-   break;
-#endif
-   while (len >= 8) {
-   Gost2814789_encrypt_mesh(ivec, key);
-   for (; n < 8; n += sizeof(size_t)) {
-   *(size_t*)(out + n) =
-   *(size_t*)(ivec + n) ^=
-   *(size_t*)(in + n);
-   }
-   len -= 8;
-   out += 8;
-   in  += 8;
-   n = 0;
-   }
-   if (len) {
-   Gost2814789_encrypt_mesh(ivec, key);
-   while (len--) {
-   out[n] = ivec[n] ^= in[n];
-   ++n;
-   }
-   }
-   *num = n;
-   return;
-   } while (0);
-   /* the rest would be commonly eliminated by x86* compiler */
-#endif
-   while (l= 8) {
-   Gost2814789_encrypt_mesh(ivec, key);
-   for (; n < 8; n += sizeof(size_t)) {
-   size_t t = *(size_t*)(in + n);
-   *(size_t*)(out + n) =
-   *(size_t*)(ivec + n) ^ t;
-   *(size_t*)(ivec + n) = t;
-   }
-   len -= 8;
-   out += 8;
-   in  += 8;
-   n = 0;
-   }
-   if (len) {
-   Gost2814789_encrypt_mesh(ivec, key);
-   while (len--) {
-   unsigned char c;
-
-   out[n] = ivec[n] ^ (c = in[n]);
-   ivec[n] = c;
-   ++n;
-   }
-   }
-   *num = n;
-   return;
-   } while (0);
-   /* the rest would be commonly eliminated by x86* compiler */
-#endif
-   while (l < len) {
-   unsigned char c;
-
-   if (n == 0) {
-   Gost2814789_encrypt_mesh(ivec, key);
-   }
-   out[l] = ivec[n] ^ (c = in[l]); ivec[n] = c;
-   ++l;
-   n = (n + 1) % 8;
-   }
-   *num = n;
-   }
+   CRYPTO_cfb64_encrypt(in, out, len, key, 

[PATCH 1/6] modes: add functions implementing common code for 64-bit ciphers

2020-06-27 Thread Dmitry Baryshkov
64-bit ciphers are old, but it would be good to use common code for
their implementations.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/modes/cbc64.c | 202 
 src/lib/libcrypto/modes/cfb64.c | 169 ++
 src/lib/libcrypto/modes/ctr64.c | 174 +++
 src/lib/libcrypto/modes/modes.h |  26 
 src/lib/libcrypto/modes/ofb64.c | 119 +++
 5 files changed, 690 insertions(+)
 create mode 100644 src/lib/libcrypto/modes/cbc64.c
 create mode 100644 src/lib/libcrypto/modes/cfb64.c
 create mode 100644 src/lib/libcrypto/modes/ctr64.c
 create mode 100644 src/lib/libcrypto/modes/ofb64.c

diff --git a/src/lib/libcrypto/modes/cbc64.c b/src/lib/libcrypto/modes/cbc64.c
new file mode 100644
index ..ec65ac5d3468
--- /dev/null
+++ b/src/lib/libcrypto/modes/cbc64.c
@@ -0,0 +1,202 @@
+/* $OpenBSD: cbc64.c,v 1.4 2015/02/10 09:46:30 miod Exp $ */
+/* 
+ * Copyright (c) 2008 The OpenSSL Project.  All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer. 
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in
+ *the documentation and/or other materials provided with the
+ *distribution.
+ *
+ * 3. All advertising materials mentioning features or use of this
+ *software must display the following acknowledgment:
+ *"This product includes software developed by the OpenSSL Project
+ *for use in the OpenSSL Toolkit. (http://www.openssl.org/)"
+ *
+ * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to
+ *endorse or promote products derived from this software without
+ *prior written permission. For written permission, please contact
+ *openssl-c...@openssl.org.
+ *
+ * 5. Products derived from this software may not be called "OpenSSL"
+ *nor may "OpenSSL" appear in their names without prior written
+ *permission of the OpenSSL Project.
+ *
+ * 6. Redistributions of any form whatsoever must retain the following
+ *acknowledgment:
+ *"This product includes software developed by the OpenSSL Project
+ *for use in the OpenSSL Toolkit (http://www.openssl.org/)"
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY
+ * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE OpenSSL PROJECT OR
+ * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ * 
+ *
+ */
+
+#include 
+#include "modes_lcl.h"
+#include 
+
+#ifndef MODES_DEBUG
+# ifndef NDEBUG
+#  define NDEBUG
+# endif
+#endif
+
+#undef STRICT_ALIGNMENT
+#ifdef __STRICT_ALIGNMENT
+#define STRICT_ALIGNMENT 1
+#else
+#define STRICT_ALIGNMENT 0
+#endif
+
+void CRYPTO_cbc64_encrypt(const unsigned char *in, unsigned char *out,
+   size_t len, const void *key,
+   unsigned char ivec[8], block64_f block)
+{
+   size_t n;
+   const unsigned char *iv = ivec;
+
+#if !defined(OPENSSL_SMALL_FOOTPRINT)
+   if (STRICT_ALIGNMENT &&
+   ((size_t)in|(size_t)out|(size_t)ivec)%sizeof(size_t) != 0) {
+   while (len>=8) {
+   for(n=0; n<8; ++n)
+   out[n] = in[n] ^ iv[n];
+   (*block)(out, out, key);
+   iv = out;
+   len -= 8;
+   in  += 8;
+   out += 8;
+   }
+   } else {
+   while (len>=8) {
+   for(n=0; n<8; n+=sizeof(size_t))
+   *(size_t*)(out+n) =
+   *(size_t*)(in+n) ^ *(size_t*)(iv+n);
+   (*block)(out, out, key);
+   iv = out;
+   len -= 8;
+   in  += 8;
+   out += 8;
+   }
+   }
+#endif
+   while (len) {
+   for(n=0; n<8 && n=8) {
+   (*block)(in, out, key);
+ 

[PATCH 5/6] crypto: RC2: convert to use new modes 64-bit helpers

2020-06-27 Thread Dmitry Baryshkov
Convert RC2 cipher to use 64-bit modes helper functions.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/rc2/rc2.h  |   4 +-
 src/lib/libcrypto/rc2/rc2_cbc.c  | 111 +++
 src/lib/libcrypto/rc2/rc2_locl.h |   7 ++
 src/lib/libcrypto/rc2/rc2cfb64.c |  57 ++--
 src/lib/libcrypto/rc2/rc2ofb64.c |  47 ++---
 5 files changed, 55 insertions(+), 171 deletions(-)

diff --git a/src/lib/libcrypto/rc2/rc2.h b/src/lib/libcrypto/rc2/rc2.h
index 21511ff36ead..03df1433cc22 100644
--- a/src/lib/libcrypto/rc2/rc2.h
+++ b/src/lib/libcrypto/rc2/rc2.h
@@ -83,8 +83,8 @@ typedef struct rc2_key_st
 void RC2_set_key(RC2_KEY *key, int len, const unsigned char *data,int bits);
 void RC2_ecb_encrypt(const unsigned char *in,unsigned char *out,RC2_KEY *key,
 int enc);
-void RC2_encrypt(unsigned long *data,RC2_KEY *key);
-void RC2_decrypt(unsigned long *data,RC2_KEY *key);
+void RC2_encrypt(unsigned long *data,const RC2_KEY *key);
+void RC2_decrypt(unsigned long *data,const RC2_KEY *key);
 void RC2_cbc_encrypt(const unsigned char *in, unsigned char *out, long length,
RC2_KEY *ks, unsigned char *iv, int enc);
 void RC2_cfb64_encrypt(const unsigned char *in, unsigned char *out,
diff --git a/src/lib/libcrypto/rc2/rc2_cbc.c b/src/lib/libcrypto/rc2/rc2_cbc.c
index a947f1d3c3a1..276f3b3b4d61 100644
--- a/src/lib/libcrypto/rc2/rc2_cbc.c
+++ b/src/lib/libcrypto/rc2/rc2_cbc.c
@@ -57,86 +57,22 @@
  */
 
 #include 
+#include 
 #include "rc2_locl.h"
 
 void RC2_cbc_encrypt(const unsigned char *in, unsigned char *out, long length,
 RC2_KEY *ks, unsigned char *iv, int encrypt)
-   {
-   unsigned long tin0,tin1;
-   unsigned long tout0,tout1,xor0,xor1;
-   long l=length;
-   unsigned long tin[2];
-
+{
if (encrypt)
-   {
-   c2l(iv,tout0);
-   c2l(iv,tout1);
-   iv-=8;
-   for (l-=8; l>=0; l-=8)
-   {
-   c2l(in,tin0);
-   c2l(in,tin1);
-   tin0^=tout0;
-   tin1^=tout1;
-   tin[0]=tin0;
-   tin[1]=tin1;
-   RC2_encrypt(tin,ks);
-   tout0=tin[0]; l2c(tout0,out);
-   tout1=tin[1]; l2c(tout1,out);
-   }
-   if (l != -8)
-   {
-   c2ln(in,tin0,tin1,l+8);
-   tin0^=tout0;
-   tin1^=tout1;
-   tin[0]=tin0;
-   tin[1]=tin1;
-   RC2_encrypt(tin,ks);
-   tout0=tin[0]; l2c(tout0,out);
-   tout1=tin[1]; l2c(tout1,out);
-   }
-   l2c(tout0,iv);
-   l2c(tout1,iv);
-   }
+   CRYPTO_cbc64_encrypt(in, out, length, ks, iv, 
(block64_f)RC2_block_encrypt);
else
-   {
-   c2l(iv,xor0);
-   c2l(iv,xor1);
-   iv-=8;
-   for (l-=8; l>=0; l-=8)
-   {
-   c2l(in,tin0); tin[0]=tin0;
-   c2l(in,tin1); tin[1]=tin1;
-   RC2_decrypt(tin,ks);
-   tout0=tin[0]^xor0;
-   tout1=tin[1]^xor1;
-   l2c(tout0,out);
-   l2c(tout1,out);
-   xor0=tin0;
-   xor1=tin1;
-   }
-   if (l != -8)
-   {
-   c2l(in,tin0); tin[0]=tin0;
-   c2l(in,tin1); tin[1]=tin1;
-   RC2_decrypt(tin,ks);
-   tout0=tin[0]^xor0;
-   tout1=tin[1]^xor1;
-   l2cn(tout0,tout1,out,l+8);
-   xor0=tin0;
-   xor1=tin1;
-   }
-   l2c(xor0,iv);
-   l2c(xor1,iv);
-   }
-   tin0=tin1=tout0=tout1=xor0=xor1=0;
-   tin[0]=tin[1]=0;
-   }
+   CRYPTO_cbc64_decrypt(in, out, length, ks, iv, 
(block64_f)RC2_block_decrypt);
+}
 
-void RC2_encrypt(unsigned long *d, RC2_KEY *key)
+void RC2_encrypt(unsigned long *d, const RC2_KEY *key)
{
int i,n;
-   RC2_INT *p0,*p1;
+   const RC2_INT *p0,*p1;
RC2_INT x0,x1,x2,x3,t;
unsigned long l;
 
@@ -178,10 +114,10 @@ void RC2_encrypt(unsigned long *d, RC2_KEY *key)
d[1]=(unsigned long)(x2&0x)|((unsigned long)(x3&0x)<<16L);
}
 
-void RC2_decrypt(unsigned long *d, RC2_KEY *key)
+void RC2_decrypt(unsigned long *d, const RC2_KEY *key)
{
int i,n;
-   RC2_INT *p0,*p1;
+   const RC2_INT *p0,*p1;
RC2_INT x0,x1,x2,x3,t;
unsigned long l;
 
@@ -224,3 +160,32 @@ void RC2_decrypt(unsigned long *d, RC2_KEY 

[PATCH 3/6] crypto: cast: convert to use new modes 64-bit helpers

2020-06-27 Thread Dmitry Baryshkov
Convert CAST cipher to use 64-bit modes helper functions.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/Makefile|   2 +-
 src/lib/libcrypto/cast/c_cbc.c|  75 +
 src/lib/libcrypto/cast/c_cfb64.c  |  56 ++--
 src/lib/libcrypto/cast/c_enc.c| 108 --
 src/lib/libcrypto/cast/c_ofb64.c  |  46 ++---
 src/lib/libcrypto/cast/cast_lcl.h |   8 +++
 6 files changed, 120 insertions(+), 175 deletions(-)
 create mode 100644 src/lib/libcrypto/cast/c_cbc.c

diff --git a/src/lib/libcrypto/Makefile b/src/lib/libcrypto/Makefile
index 291af21965bf..2e20904ab840 100644
--- a/src/lib/libcrypto/Makefile
+++ b/src/lib/libcrypto/Makefile
@@ -89,7 +89,7 @@ SRCS+= buffer.c buf_err.c buf_str.c
 SRCS+= cmll_cfb.c cmll_ctr.c cmll_ecb.c cmll_ofb.c
 
 # cast/
-SRCS+= c_skey.c c_ecb.c c_enc.c c_cfb64.c c_ofb64.c
+SRCS+= c_skey.c c_ecb.c c_enc.c c_cfb64.c c_ofb64.c c_cbc.c
 
 # chacha/
 SRCS+= chacha.c
diff --git a/src/lib/libcrypto/cast/c_cbc.c b/src/lib/libcrypto/cast/c_cbc.c
new file mode 100644
index ..1dc32ad8ca54
--- /dev/null
+++ b/src/lib/libcrypto/cast/c_cbc.c
@@ -0,0 +1,75 @@
+/* $OpenBSD: c_cbc.c,v 1.5 2014/10/28 07:35:58 jsg Exp $ */
+/* Copyright (C) 1995-1998 Eric Young (e...@cryptsoft.com)
+ * All rights reserved.
+ *
+ * This package is an SSL implementation written
+ * by Eric Young (e...@cryptsoft.com).
+ * The implementation was written so as to conform with Netscapes SSL.
+ * 
+ * This library is free for commercial and non-commercial use as long as
+ * the following conditions are aheared to.  The following conditions
+ * apply to all code found in this distribution, be it the RC4, RSA,
+ * lhash, DES, etc., code; not just the SSL code.  The SSL documentation
+ * included with this distribution is covered by the same copyright terms
+ * except that the holder is Tim Hudson (t...@cryptsoft.com).
+ * 
+ * Copyright remains Eric Young's, and as such any Copyright notices in
+ * the code are not to be removed.
+ * If this package is used in a product, Eric Young should be given attribution
+ * as the author of the parts of the library used.
+ * This can be in the form of a textual message at program startup or
+ * in documentation (online or textual) provided with the package.
+ * 
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. All advertising materials mentioning features or use of this software
+ *must display the following acknowledgement:
+ *"This product includes cryptographic software written by
+ * Eric Young (e...@cryptsoft.com)"
+ *The word 'cryptographic' can be left out if the rouines from the library
+ *being used are not cryptographic related :-).
+ * 4. If you include any Windows specific code (or a derivative thereof) from 
+ *the apps directory (application code) you must include an 
acknowledgement:
+ *"This product includes software written by Tim Hudson 
(t...@cryptsoft.com)"
+ * 
+ * THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ * 
+ * The licence and distribution terms for any publically available version or
+ * derivative of this code cannot be changed.  i.e. this code cannot simply be
+ * copied and put under another distribution licence
+ * [including the GNU Public Licence.]
+ */
+
+#include 
+#include 
+#include "cast_lcl.h"
+
+/* The input and output encrypted as though 64bit cbc mode is being
+ * used.
+ */
+
+void CAST_cbc_encrypt(const unsigned char *in, unsigned char *out,
+   long length, const CAST_KEY *schedule, unsigned char 
*ivec,
+   int enc)
+{
+   if (enc)
+   CRYPTO_cbc64_encrypt(in, out, length, schedule, ivec, 
(block64_f)CAST_block_encrypt);
+   else
+   CRYPTO_cbc64_decrypt(in, out, length, schedule, ivec, 

[PATCH 2/6] crypto: bf: convert to use new modes 64-bit helpers

2020-06-27 Thread Dmitry Baryshkov
Convert Blowfish cipher to use 64-bit modes helper functions.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/Makefile  |   2 +-
 src/lib/libcrypto/bf/bf_cbc.c   |  83 ++-
 src/lib/libcrypto/bf/bf_cfb64.c |  57 ++--
 src/lib/libcrypto/bf/bf_enc.c   | 114 
 src/lib/libcrypto/bf/bf_locl.h  |   8 +++
 src/lib/libcrypto/bf/bf_ofb64.c |  47 ++---
 6 files changed, 51 insertions(+), 260 deletions(-)

diff --git a/src/lib/libcrypto/Makefile b/src/lib/libcrypto/Makefile
index 9207b93f321d..291af21965bf 100644
--- a/src/lib/libcrypto/Makefile
+++ b/src/lib/libcrypto/Makefile
@@ -65,7 +65,7 @@ SRCS+= evp_asn1.c asn_pack.c p5_pbe.c p5_pbev2.c p8_pkey.c 
asn_moid.c
 SRCS+= a_time_tm.c
 
 # bf/
-SRCS+= bf_skey.c bf_ecb.c bf_cfb64.c bf_ofb64.c
+SRCS+= bf_skey.c bf_ecb.c bf_cfb64.c bf_ofb64.c bf_cbc.c
 
 # bio/
 SRCS+= bio_lib.c bio_cb.c bio_err.c bio_meth.c
diff --git a/src/lib/libcrypto/bf/bf_cbc.c b/src/lib/libcrypto/bf/bf_cbc.c
index 6f45f9ae4c35..a9d3cf6d5541 100644
--- a/src/lib/libcrypto/bf/bf_cbc.c
+++ b/src/lib/libcrypto/bf/bf_cbc.c
@@ -57,87 +57,14 @@
  */
 
 #include 
+#include 
 #include "bf_locl.h"
 
 void BF_cbc_encrypt(const unsigned char *in, unsigned char *out, long length,
 const BF_KEY *schedule, unsigned char *ivec, int encrypt)
-   {
-   BF_LONG tin0,tin1;
-   BF_LONG tout0,tout1,xor0,xor1;
-   long l=length;
-   BF_LONG tin[2];
-
+{
if (encrypt)
-   {
-   n2l(ivec,tout0);
-   n2l(ivec,tout1);
-   ivec-=8;
-   for (l-=8; l>=0; l-=8)
-   {
-   n2l(in,tin0);
-   n2l(in,tin1);
-   tin0^=tout0;
-   tin1^=tout1;
-   tin[0]=tin0;
-   tin[1]=tin1;
-   BF_encrypt(tin,schedule);
-   tout0=tin[0];
-   tout1=tin[1];
-   l2n(tout0,out);
-   l2n(tout1,out);
-   }
-   if (l != -8)
-   {
-   n2ln(in,tin0,tin1,l+8);
-   tin0^=tout0;
-   tin1^=tout1;
-   tin[0]=tin0;
-   tin[1]=tin1;
-   BF_encrypt(tin,schedule);
-   tout0=tin[0];
-   tout1=tin[1];
-   l2n(tout0,out);
-   l2n(tout1,out);
-   }
-   l2n(tout0,ivec);
-   l2n(tout1,ivec);
-   }
+   CRYPTO_cbc64_encrypt(in, out, length, schedule, ivec, 
(block64_f)BF_block_encrypt);
else
-   {
-   n2l(ivec,xor0);
-   n2l(ivec,xor1);
-   ivec-=8;
-   for (l-=8; l>=0; l-=8)
-   {
-   n2l(in,tin0);
-   n2l(in,tin1);
-   tin[0]=tin0;
-   tin[1]=tin1;
-   BF_decrypt(tin,schedule);
-   tout0=tin[0]^xor0;
-   tout1=tin[1]^xor1;
-   l2n(tout0,out);
-   l2n(tout1,out);
-   xor0=tin0;
-   xor1=tin1;
-   }
-   if (l != -8)
-   {
-   n2l(in,tin0);
-   n2l(in,tin1);
-   tin[0]=tin0;
-   tin[1]=tin1;
-   BF_decrypt(tin,schedule);
-   tout0=tin[0]^xor0;
-   tout1=tin[1]^xor1;
-   l2nn(tout0,tout1,out,l+8);
-   xor0=tin0;
-   xor1=tin1;
-   }
-   l2n(xor0,ivec);
-   l2n(xor1,ivec);
-   }
-   tin0=tin1=tout0=tout1=xor0=xor1=0;
-   tin[0]=tin[1]=0;
-   }
-
+   CRYPTO_cbc64_decrypt(in, out, length, schedule, ivec, 
(block64_f)BF_block_decrypt);
+}
diff --git a/src/lib/libcrypto/bf/bf_cfb64.c b/src/lib/libcrypto/bf/bf_cfb64.c
index 6cc0bb999bd3..463080cb230f 100644
--- a/src/lib/libcrypto/bf/bf_cfb64.c
+++ b/src/lib/libcrypto/bf/bf_cfb64.c
@@ -57,6 +57,7 @@
  */
 
 #include 
+#include 
 #include "bf_locl.h"
 
 /* The input and output encrypted as though 64bit cfb mode is being
@@ -66,56 +67,6 @@
 
 void BF_cfb64_encrypt(const unsigned char *in, unsigned char *out, long length,
 const BF_KEY *schedule, unsigned char *ivec, int *num, int encrypt)
-   {
-   BF_LONG v0,v1,t;
-   int n= *num;
-   long l=length;
-   BF_LONG ti[2];
-   unsigned char *iv,c,cc;
-
-   iv=(unsigned char *)ivec;
-   if (encrypt)
-   {
-   while (l--)
-   {
-  

Re: pipex(4): use reference counters for `ifnet'

2020-06-27 Thread Vitaliy Makkoveev
On Sat, Jun 27, 2020 at 12:41:29PM +0200, Martin Pieuchot wrote:
> On 27/06/20(Sat) 01:02, Vitaliy Makkoveev wrote:
> > On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote:
> > > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote:
> > > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote:
> > > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote:
> > > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote:
> > > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote:
> > > > > > > > Updated diff. 
> > > > > > > > 
> > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So 
> > > > > > > > we can't
> > > > > > > > store index in session and be sure if_get(9) returned `ifnet' 
> > > > > > > > is our
> > > > > > > > original `ifnet'.
> > > > > > > 
> > > > > > > Why not?  The point of if_get(9) is to be sure.  If that doesn't 
> > > > > > > work
> > > > > > > for whatever reason then the if_get(9) interface has to be fixed. 
> > > > > > >  Which
> > > > > > > case doesn't work for you?  Do you have a reproducer?  
> > > > > > > 
> > > > > > > How does sessions stay around if their corresponding interface is
> > > > > > > destroyed?
> > > > > > 
> > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to 
> > > > > > session.
> > > > > > pipexintr() process these queues. pipexintr() and
> > > > > > pipex_destroy_session() are *always* different context. This mean we
> > > > > > *can't* free pipex(4) session without be sure there is no reference 
> > > > > > to
> > > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will 
> > > > > > cause use
> > > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4)
> > > > > > destroy sessions is wrong. While pppac(4) destroys sessions by
> > > > > > pipex_iface_fini() it's also wrong. Because we don't check 
> > > > > > `pipexinq'
> > > > > > and `pipexoutq' state. I'am said it again and again.
> > > > > 
> > > > > I understand.  Why is it a problem?  Using reference counting the way
> > > > > you're suggesting is *one* possible solution to a problem we don't 
> > > > > fully
> > > > > understand.  What are we trying to achieve?  Which problem are we 
> > > > > trying
> > > > > to solve?
> > > > 
> > > > Sorry, may be I misunderstand something.
> > > > 
> > > > `pipexoutq' case:
> > > > 
> > > > 1. pppac_start() calls pipex_output()
> > > > 2. pipex_output() calls pipex_ip_output()
> > > > 3. pipex_ip_output() calls pipex_ppp_enqueue()
> > > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add()
> > > > 
> > > > `pipexinq' cases:
> > > > 
> > > > 1.1. ether_input() calls pipex_pppoe_input()
> > > > 1.2. gre_input() calls gre_input_1()
> > > >  gre_input_1() calls pipex_pptp_input()
> > > > 1.3. udp_input() calls pipex_l2tp_input()
> > > > 
> > > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input()
> > > > 3. pipex_common_input() calls schednetisr() which is task_add()
> > > > 
> > > > task_add(9) just schedules the execution of the work specified by `tq'.
> > > > So we can do pipex_destroy_session() * between * schednetisr() and
> > > > pipexintr(). And we can do this right * now *, with our current locking.
> > > > And this is the problem I'am trying to solve.
> > > > 
> > > > My apologies if I'am wrong above. Please point me where I'am wrong.
> > > > 
> > > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding
> > > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4)
> > > > session. We should be shure `session' is still walid between
> > > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be
> > > > required in future.
> > > 
> > > Why not iterate over the queues and garbage collect the sessions that
> > > are about to be removed?  That's what the network stack was doing with
> > > mbuf queues prior to if_get(9) when interfaces where destroyed.
> > > 
> > 
> > Do you mean net/if.c:1185 and below? This is the queues associated with
> > this `ifp'. But for pipex(4) we should go through all mbufs associated
> > with pipex(4). This can be heavy if we have hundreds of sessions. Also
> > this would work until session destruction and `pr_input' are serialized.
> > 
> > Point me please the line in source to see if I'am wrong about `ifnet's
> > mbuf queues claninig.
> 
> Look at r1.329 of net/if.c.  Prior to this change if_detach_queues() was
> used to free all mbufs when an interface was removed.  Now lazy freeing
> is used everytime if_get(9) rerturns NULL.
> 
> This is possible because we store an index and not a pointer directly in
> the mbuf.
> 
> The advantage of storing a session pointer in `ph_cookie' is that no
> lookup is required in pipexintr(), right?  Maybe we could save a ID
> instead and do a lookup.  How big can be the `pipex_session_list'?
>

It's unlimited. In pppac(4) case you create the only one interface and
you can share it between the count of sessions you wish. In my 

OpenBSD.calendar patch

2020-06-27 Thread jungle boogie

Hi Friends,

Here's a small patch to the OpenBSD.calendar. I didn't want to spend too 
much time on this until I find out if it would be accepted.


Here's my changes:


--- /usr/share/calendar/calendar.openbsdFri Jun 26 21:01:56 2020
+++ calendar.openbsdSat Jun 27 01:37:40 2020
@@ -10,15 +10,19 @@
 Jan 06 IPF gets integrated into the OpenBSD kernel, 1996
 Jan 06 NRL IPv6 addition to OpenBSD, 1999
 Jan 09 n2k10: Network hackathon, Melbourne, Australia, 17 developers, 2010
+Jan 12 u2k20: Uckermark hackathon, Urckermark, Germany, 14 developers, 2020
 Jan 13 n2k13: Network hackathon, Dunedin, New Zealand, 17 developers, 2013
+Jan 17	a2k19: Antipodean hackathon, Wellington, New Zealand, 18 
developers, 2019

 Jan 18 n2k14: Mini-hackathon, Dunedin, New Zealand, 15 developers, 2014
 Jan 20 Bind 9 goes into the tree, 2003
+Jan 20 a2k20: Antipodean hackathon, Hobart, Tasmania, 17 developers, 2020
 Jan 26 Anoncvs service inaugurated, 1996
 Jan 26 n2k9: Network hackathon, Basel, Switzerland, 19 developers, 2009
 Jan 27 OpenBSD/amd64 port is added, from NetBSD, 2004
 Jan 29 "second anoncvs server is 100 miles from the first", 1996
 Jan 31 OpenBSD/cats port is added, from NetBSD, 2004
 Feb 03 Describe the ports mechanism [in OpenBSD], 1997
+Feb 05 a2k18: Dunedin, New Zeland, 19 developers, 2018
 Feb 13 Unpatented fast block cipher for new password hashing, 1997
 Feb 14 GNU RCS expired from source tree, replaced with OpenRCS, 2007
 Feb 19 IPsec package by John Ioannidis and Angelos D. Keromytis, 1997
@@ -27,6 +31,7 @@
 Feb 28 Cryptographic services framework in OpenBSD, 2000
 Mar 09 Support for the VAX architecture removed, 2016
 Mar 10 OpenBSD/WWW translation started -- German, Spanish, Dutch, 2000
+Mar 28 t2k19: Taipei mini hackathon, Taipei, Taiwan, 16 developers, 2019
 Apr 01 OpenBSD/hppa64 port is added, 2005
 Apr 01	k2k11: Kernel hackathon, Hafnarfjordur, Iceland, 15 developers, 
2011
 Apr 10	f2k7: First filesystem hackathon, Vienna, Austria, 14 
developers, 2007

@@ -40,10 +45,12 @@
 Apr 27 i386/PAE work integrated, 2006
 May 01 OpenBSD 3.3 released, exploiting W^X, 2003
 May 05 n2k8: Network hackathon, Ito, Japan, 18 developers, 2008
+May 07 g2k19: General hackathon, Ottawa, Canada, 43 developers, 2019
 May 08 c2k3 General hackathon, Calgary, Alberta, 51 developers, 2003
 May 09 First commit to OpenBSD stable branch, OPENBSD_2_7, 2000
 May 09 OpenBSD/aviion port is added, 2006
 May 19 OpenBSD 2.3 released, including "ports" system, 1998
+May 19 OpenBSD 6.7 released, 48th release, 2020
 May 21 c2k5: General hackathon, Calgary, Alberta, 60 developers, 2005
 May 21 c2k6: General hackathon, Calgary, Alberta, 47 developers, 2006
 May 24 OpenBSD gets a trunk(4), 2005
@@ -57,6 +64,7 @@
 Jun 04 c99: First hackathon (IPsec), Calgary, Alberta, 10 developers, 1999
 Jun 04 c2k2: General hackathon, Calgary, Alberta, 42 developers, 2002
 Jun 06 c2k8: General hackathon, Edmonton, Alberta, 55 developers, 2008
+Jun 21 WireGuard imported into kernel, 2020
 Jun 14 r2k6: First network hackathon, Hamburg, Germany, 6 developers, 2006
 Jun 15 OpenBSD 2.7 released, including OpenSSH, 2000
 Jun 15 c2k: First general hackathon, Calgary, Alberta, 18 developers, 2000
@@ -70,6 +78,7 @@
 Jul 02	c2k11: General hackathon, Edmonton, Alberta, Canada, 36 
developers, 2011

 Jul 07 g2k12: General hackathon, Budapest, Hungary, 41 developers, 2012
 Jul 08 g2k14: General hackathon, Ljubljana, Slovenia, 49 developers, 2014
+Jul 08 g2k18: General hackathon, Ljubljana, Slovenia, 39 developers, 2018
 Jul 11 OpenBSD goes wireless w/ if_wi addition, 1999
 Jul 23 OpenBSD goes multimedia with Brooktree 848 support, 1998
 Jul 24 Non-executable stack on most architectures, 2002
@@ -83,6 +92,7 @@
 Aug 28	k2k6: IPsec hackathon, Schloss Kransberg, Germany, 14 
developers, 2006

 Sep 01 Support for the sparc (32bit) architecture removed, 2016
 Sep 03 Support for the zaurus architecture removed, 2016
+Sep 06	n2k18: Network hackathon, Usti nad Labem, Czech Republic, 11 
developers, 2018

 Sep 16 s2k11: General hackathon, Ljubljana, Slovenia, 25 developers, 2011
 Sep 17 n2k12: Network hackathon, Starnberg, Germany, 23 developers, 2012
 Sep 19	j2k10: Mini-hackathon, Sakae Mura, Nagano, Japan, 19 
developers, 2010

@@ -103,7 +113,9 @@
 Oct 30	OpenBSD 3.4 released, implementing W^X on i386 and AES in VIA 
C3, 2003

 Nov 01 OpenBSD 3.2 released, ftp mirrors preload for the first time, 2002
 Nov 01 v2k5: First ports hackathon, Venice, Italy, 12 developers, 2005
+Nov 03 l2k18: Libressl hackathon, Edmonton, Canada, 5 developers, 2018
 Nov 05 a2k11: ARM hackathon, Coimbra, Portugal, 8 developers, 2011
+Nov 05 p2k19: Ports hackathon, Bucharest, Romania, 18 developers, 2019
 Nov 11 want.html added to OpenBSD/www, 1998
 Nov 12 p2k11: Ports hackathon, Budapest, Hungary, 15 developers, 2011
 Nov 14 c2k12: Coimbra hackathon, Coimbra, Portugal, 10 developers, 2012
@@ -112,6 +124,7 @@
 Nov 21 h2k9: Hardware hackathon, Coimbra, Portugal, 15 developers, 2009
 Nov 22 

Re: 11n Tx aggregation for iwm(4)

2020-06-27 Thread Denis Fondras
On Fri, Jun 26, 2020 at 02:45:53PM +0200, Stefan Sperling wrote:
> This patch adds support for 11n Tx aggregation to iwm(4).

iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev
0x73, msi

AP is Zyxel USG40W

Before :
bandwidth min/avg/max/std-dev = 9.800/14.000/14.214/0.606 Mbps

After :
bandwidth min/avg/max/std-dev = 8.124/47.270/57.076/8.906 Mbps



Re: fix races in if_clone_create()

2020-06-27 Thread Vitaliy Makkoveev
On Sat, Jun 27, 2020 at 12:10:24PM +0200, Martin Pieuchot wrote:
> On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote:
> > On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote:
> > > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote:
> > > > if_clone_create() has the races caused by context switch.
> > > 
> > > Can you share a backtrace of such race?  Where does the kernel panic?
> > >
> > 
> > This diff was inspired by thread [1]. As I explained [2] here is 3
> > issues that cause panics produced by command below:
> > 
> >  cut begin 
> > for i in 1 2 3; do while true; do ifconfig bridge0 create& \
> > ifconfig bridge0 destroy& done& done
> >  cut end 
> 
> Thanks, I couldn't reproduce it on any of the machines I tried.  Did you
> managed to reproduce it with other pseudo-devices or just with bridge0?
> 

In thread [1] you talked about bridge(4), tun(4) and vether(4). A first
I fixed races in if_clone_destroy() and I caught the races with
if_clone_create() while I run your initial comman but with vether(4)

 cut begin 
for i in 0 1 2 3 4 5 6 7; do while true; \
do cat /dev/vether0& ifconfig vether0 destroy& done& done
 cut end 

It's hard to reproduce this issue. The best chances for me is bare metal
8 cores, fully unloaded system, no X, no active processes, test started
at console and all output redirected to /dev/null. And it can take
*hours* to catch. I can't reproduce this on 2 cores. I can't reproduce
this at 4 cores under kvm but it's reproducible under virtual box under
osx. The hosts has 8 cores. I can reproduce this on bare metal with 4
cores, but also it takes time.

Routine called by `ifc_create' within if_clone_attach() is very specific
to each pseudo interface. if_attach() is the only common point to sleep
for them, but you also can sleep in any point of sleep before
`ifc_create' will call if_attach(), For exmaple you will alloc software
context with `M_WAITOK'.

bridge(4) is just the best way to reproduce to me.

You have all `ifnet's linked to `if_list'. ifunit() does linear search
in this list by compare `ifp->if_xname' and given `name'. So if you
inserted many `ifnet's to this list ifunit() will return you first. but
if_get(9) doesn't work with this list. So if you have the case I talk
above if_get(9) and ifname() are inconsistent. Some times in the stack
you use if_get(9) sometimes you use ifunit() so you work every time with
diffetrent `ifnet's with the same `if_xname'. You can't predict where
`ifnet' will be corrupted.

> > My system was stable with the last diff I did for thread [1]. But since
> > this final diff [3] which include fixes for tun(4) is quick and dirty
> > and not for commit I decided to make the diff to fix the races caused by
> > if_clone_create() at first.
> > 
> > I included screenshot with panic.
> 
> Thanks, interesting that the corruption happens on a list that should be
> initialized.  Does that mean the context switch on Thread 1 is happening
> before if_attach_common() is called?
> 

I don't know where it was. if_attach() doesn't checks if `ifnet' with
the name in `if_xname' already linked. You will insert passed `ifnet' in
any cases. If you have more then one `ifnet' with identical `if_xname'
you have broken ifunit() and if_get() logic.

Look at if_attach():

 cut begin 
if_attach(struct ifnet *ifp)
{
if_attach_common(ifp);
NET_LOCK();
TAILQ_INSERT_TAIL(, ifp, if_list); /* (1) */
if_attachsetup(ifp);
NET_UNLOCK();
}

You link `ifp' at (1). And it's still your `ifp' before and after context
switch ot without context switch. You will brake it later. The reason is
pseudo driver received the same `unit' more than once. And it created
two or more software context with identical `unit'. And internal pseudo
driver's logic is broken. Also ifunit() and if_get(9) are inconsistent
now. You can break memory everythere.

 cut end 
> You said your previous email that there's a context switch.  Do you know
> when it happens?  You could see that in ddb by looking at the backtrace
> of the other CPU.
> 
> Is the context switch leading to the race common to all pseudo-drivers
> or is it in the bridge(4) driver?

ddb(4) is useless. The panic occured while we are trying to if_detach()
already broken `ifnet'. There is no reces here. But the rases was
*before* and we inserted two or more `ifnet's with the same name to
`if_list'. This insertion is no panic condition.

The first time I caught this races while I connected to you [1] thread.
I inserted ifunit() call to if_attach() as below and received panic so
I'am shure about the reason:

 cut begin 
if_attach(struct ifnet *ifp)
{
if_attach_common(ifp);
NET_LOCK();
KASSERT(ifunit(ifp->if_xname));
TAILQ_INSERT_TAIL(, ifp, if_list);
if_attachsetup(ifp);
NET_UNLOCK();
}
 cut end 

But in thread [1] you said these races with pseudo interfaces are very
old well know 

Re: 11n Tx aggregation for iwm(4)

2020-06-27 Thread Tracey Emery
On Fri, Jun 26, 2020 at 02:45:53PM +0200, Stefan Sperling wrote:
> This patch adds support for 11n Tx aggregation to iwm(4).
> 
> Please help with testing if you can by running the patch and using wifi
> as usual. Nothing should change, except that Tx speed may potentially
> improve. If you have time to run before/after performance measurements with
> tcpbench or such, that would be nice. But it's not required for testing.
> 
> If Tx aggregation is active then netstat will show a non-zero output block ack
> agreement counter:
> 
> $ netstat -W iwm0 | grep 'output block'
> 3 new output block ack agreements
>   0 output block ack agreements timed out
> 
> It would be great to get at least one test for all the chipsets the driver
> supports: 7260, 7265, 3160, 3165, 3168, 8260, 8265, 9260, 9560
> The behaviour of the access point also matters a great deal. It won't
> hurt to test the same chipset against several different access points.
> 
> I have tested this version on 8265 only so far. I've run older revisions
> of this patch on 7265 so I'm confident that this chip will work, too.
> So far, the APs I have tested against are athn(4) in 11a mode and in 11n
> mode with the 'nomimo' nwflag, and a Sagemcom 11ac AP. All on 5Ghz channels.
> 

Sure you've got plenty of 8265 tests, but the diff tripled my speed
against my apple airport extreme.

iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev
0x78, msi

-- 

Tracey Emery



[PATCH 5/5] pkcs12: add support for GOST PFX files

2020-06-27 Thread Dmitry Baryshkov
Russian standard body has changed the way MAC key is calculated for
PKCS12 files. Generate proper keys depending on the digest type used for
MAC generation.

Sponsored by ROSA Linux

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/pkcs12/p12_key.c  | 18 ++
 src/lib/libcrypto/pkcs12/p12_mutl.c | 28 +---
 src/lib/libcrypto/pkcs12/pkcs12.h   |  5 +
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/lib/libcrypto/pkcs12/p12_key.c 
b/src/lib/libcrypto/pkcs12/p12_key.c
index d419a9d83598..9a5297a23131 100644
--- a/src/lib/libcrypto/pkcs12/p12_key.c
+++ b/src/lib/libcrypto/pkcs12/p12_key.c
@@ -195,3 +195,21 @@ end:
EVP_MD_CTX_cleanup();
return ret;
 }
+
+int
+PKCS12_key_gen_gost(const char *pass, int passlen, unsigned char *salt,
+int saltlen, int iter, int n, unsigned char *out,
+const EVP_MD *md_type)
+{
+   unsigned char buf[96];
+
+   if (n != PKCS12_GOST_KEY_LEN)
+   return 0;
+
+   if (!PKCS5_PBKDF2_HMAC(pass, passlen, salt, saltlen, iter, md_type, 
sizeof(buf), buf))
+   return 0;
+
+   memcpy(out, buf + sizeof(buf) - PKCS12_GOST_KEY_LEN, 
PKCS12_GOST_KEY_LEN);
+
+   return 1;
+}
diff --git a/src/lib/libcrypto/pkcs12/p12_mutl.c 
b/src/lib/libcrypto/pkcs12/p12_mutl.c
index f3132ec75f68..023bbbd92db1 100644
--- a/src/lib/libcrypto/pkcs12/p12_mutl.c
+++ b/src/lib/libcrypto/pkcs12/p12_mutl.c
@@ -74,6 +74,7 @@ PKCS12_gen_mac(PKCS12 *p12, const char *pass, int passlen,
 unsigned char *mac, unsigned int *maclen)
 {
const EVP_MD *md_type;
+   int md_type_nid;
HMAC_CTX hmac;
unsigned char key[EVP_MAX_MD_SIZE], *salt;
int saltlen, iter;
@@ -97,13 +98,26 @@ PKCS12_gen_mac(PKCS12 *p12, const char *pass, int passlen,
PKCS12error(PKCS12_R_UNKNOWN_DIGEST_ALGORITHM);
return 0;
}
-   md_size = EVP_MD_size(md_type);
-   if (md_size < 0)
-   return 0;
-   if (!PKCS12_key_gen(pass, passlen, salt, saltlen, PKCS12_MAC_ID, iter,
-   md_size, key, md_type)) {
-   PKCS12error(PKCS12_R_KEY_GEN_ERROR);
-   return 0;
+   md_type_nid = EVP_MD_type(md_type);
+   if ((md_type_nid == NID_id_GostR3411_94 ||
+md_type_nid == NID_id_tc26_gost3411_2012_256 ||
+md_type_nid == NID_id_tc26_gost3411_2012_512) &&
+   getenv("LEGACY_GOST_PKCS12") == NULL) {
+   md_size = PKCS12_GOST_KEY_LEN;
+   if (!PKCS12_key_gen_gost(pass, passlen, salt, saltlen, iter,
+   md_size, key, md_type)) {
+   PKCS12error(PKCS12_R_KEY_GEN_ERROR);
+   return 0;
+   }
+   } else {
+   md_size = EVP_MD_size(md_type);
+   if (md_size < 0)
+   return 0;
+   if (!PKCS12_key_gen(pass, passlen, salt, saltlen, 
PKCS12_MAC_ID, iter,
+   md_size, key, md_type)) {
+   PKCS12error(PKCS12_R_KEY_GEN_ERROR);
+   return 0;
+   }
}
HMAC_CTX_init();
if (!HMAC_Init_ex(, key, md_size, md_type, NULL) ||
diff --git a/src/lib/libcrypto/pkcs12/pkcs12.h 
b/src/lib/libcrypto/pkcs12/pkcs12.h
index 56635f9d7e0a..4dab109bbc3a 100644
--- a/src/lib/libcrypto/pkcs12/pkcs12.h
+++ b/src/lib/libcrypto/pkcs12/pkcs12.h
@@ -91,6 +91,11 @@ extern "C" {
 #define PKCS12_add_friendlyname PKCS12_add_friendlyname_asc
 #endif
 
+#define PKCS12_GOST_KEY_LEN 32
+int PKCS12_key_gen_gost(const char *pass, int passlen, unsigned char *salt,
+int saltlen, int iter, int n, unsigned char *out,
+const EVP_MD *md_type);
+
 /* MS key usage constants */
 
 #define KEY_EX 0x10
-- 
2.27.0



[PATCH 3/5] gost: support new PublicKeyParameters format

2020-06-27 Thread Dmitry Baryshkov
Add support for updated PublicKeyParameters format as defined by
draft-deremin-rfc4491-bis.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/gost/gost_asn1.c |  2 +-
 src/lib/libcrypto/gost/gostr341001_ameth.c | 42 --
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/src/lib/libcrypto/gost/gost_asn1.c 
b/src/lib/libcrypto/gost/gost_asn1.c
index 2652162777b7..703d64070449 100644
--- a/src/lib/libcrypto/gost/gost_asn1.c
+++ b/src/lib/libcrypto/gost/gost_asn1.c
@@ -190,7 +190,7 @@ static const ASN1_TEMPLATE GOST_KEY_PARAMS_seq_tt[] = {
.item = _OBJECT_it,
},
{
-   .flags = 0,
+   .flags = ASN1_TFLG_OPTIONAL,
.tag = 0,
.offset = offsetof(GOST_KEY_PARAMS, hash_params),
.field_name = "hash_params",
diff --git a/src/lib/libcrypto/gost/gostr341001_ameth.c 
b/src/lib/libcrypto/gost/gostr341001_ameth.c
index 0e9521178da5..7cb70ed420ae 100644
--- a/src/lib/libcrypto/gost/gostr341001_ameth.c
+++ b/src/lib/libcrypto/gost/gostr341001_ameth.c
@@ -90,9 +90,33 @@ decode_gost01_algor_params(EVP_PKEY *pkey, const unsigned 
char **p, int len)
return 0;
}
param_nid = OBJ_obj2nid(gkp->key_params);
-   digest_nid = OBJ_obj2nid(gkp->hash_params);
+   if (gkp->hash_params)
+   digest_nid = OBJ_obj2nid(gkp->hash_params);
+   else {
+   switch (param_nid) {
+   case NID_id_tc26_gost_3410_12_256_paramSetA:
+   case NID_id_tc26_gost_3410_12_256_paramSetB:
+   case NID_id_tc26_gost_3410_12_256_paramSetC:
+   case NID_id_tc26_gost_3410_12_256_paramSetD:
+   digest_nid = NID_id_tc26_gost3411_2012_256;
+   break;
+   case NID_id_tc26_gost_3410_12_512_paramSetTest:
+   case NID_id_tc26_gost_3410_12_512_paramSetA:
+   case NID_id_tc26_gost_3410_12_512_paramSetB:
+   case NID_id_tc26_gost_3410_12_512_paramSetC:
+   digest_nid = NID_id_tc26_gost3411_2012_512;
+   break;
+   default:
+   digest_nid = NID_undef;
+   }
+   }
GOST_KEY_PARAMS_free(gkp);
 
+   if (digest_nid == NID_undef) {
+   GOSTerror(GOST_R_BAD_PKEY_PARAMETERS_FORMAT);
+   return 0;
+   }
+
ec = pkey->pkey.gost;
if (ec == NULL) {
ec = GOST_KEY_new();
@@ -137,7 +161,21 @@ encode_gost01_algor_params(const EVP_PKEY *key)
pkey_param_nid =
EC_GROUP_get_curve_name(GOST_KEY_get0_group(key->pkey.gost));
gkp->key_params = OBJ_nid2obj(pkey_param_nid);
-   gkp->hash_params = OBJ_nid2obj(GOST_KEY_get_digest(key->pkey.gost));
+   switch (pkey_param_nid) {
+   case NID_id_GostR3410_2001_TestParamSet:
+   case NID_id_GostR3410_2001_CryptoPro_A_ParamSet:
+   case NID_id_GostR3410_2001_CryptoPro_B_ParamSet:
+   case NID_id_GostR3410_2001_CryptoPro_C_ParamSet:
+   case NID_id_GostR3410_2001_CryptoPro_XchA_ParamSet:
+   case NID_id_GostR3410_2001_CryptoPro_XchB_ParamSet:
+   case NID_id_tc26_gost_3410_12_512_paramSetA:
+   case NID_id_tc26_gost_3410_12_512_paramSetB:
+   gkp->hash_params = 
OBJ_nid2obj(GOST_KEY_get_digest(key->pkey.gost));
+   break;
+   default:
+   gkp->hash_params = NULL;
+   break;
+   }
/*gkp->cipher_params = OBJ_nid2obj(cipher_param_nid); */
params->length = i2d_GOST_KEY_PARAMS(gkp, >data);
if (params->length <= 0) {
-- 
2.27.0



[PATCH 4/5] gostr341001: support unwrapped private keys support

2020-06-27 Thread Dmitry Baryshkov
GOST private keys can be wrapped in OCTET STRING, INTEGER or come
unwrapped. Support the latter format.

Sponsored by ROSA Linux

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/gost/gost_asn1.c |  52 ++
 src/lib/libcrypto/gost/gost_asn1.h |  11 ++
 src/lib/libcrypto/gost/gostr341001_ameth.c | 115 +++--
 3 files changed, 169 insertions(+), 9 deletions(-)

diff --git a/src/lib/libcrypto/gost/gost_asn1.c 
b/src/lib/libcrypto/gost/gost_asn1.c
index 703d64070449..bfd81faa1ee2 100644
--- a/src/lib/libcrypto/gost/gost_asn1.c
+++ b/src/lib/libcrypto/gost/gost_asn1.c
@@ -17,6 +17,58 @@
 #include "gost_locl.h"
 #include "gost_asn1.h"
 
+static const ASN1_TEMPLATE MASKED_GOST_KEY_seq_tt[] = {
+   {
+   .flags = 0,
+   .tag = 0,
+   .offset = offsetof(MASKED_GOST_KEY, masked_priv_key),
+   .field_name = "masked_priv_key",
+   .item = _OCTET_STRING_it,
+   },
+   {
+   .flags = 0,
+   .tag = 0,
+   .offset = offsetof(MASKED_GOST_KEY, public_key),
+   .field_name = "public_key",
+   .item = _OCTET_STRING_it,
+   },
+};
+
+const ASN1_ITEM MASKED_GOST_KEY_it = {
+   .itype = ASN1_ITYPE_NDEF_SEQUENCE,
+   .utype = V_ASN1_SEQUENCE,
+   .templates = MASKED_GOST_KEY_seq_tt,
+   .tcount = sizeof(MASKED_GOST_KEY_seq_tt) / sizeof(ASN1_TEMPLATE),
+   .funcs = NULL,
+   .size = sizeof(MASKED_GOST_KEY),
+   .sname = "MASKED_GOST_KEY",
+};
+
+MASKED_GOST_KEY *
+d2i_MASKED_GOST_KEY(MASKED_GOST_KEY **a, const unsigned char **in, long len)
+{
+   return (MASKED_GOST_KEY *)ASN1_item_d2i((ASN1_VALUE **)a, in, len,
+   _GOST_KEY_it);
+}
+
+int
+i2d_MASKED_GOST_KEY(MASKED_GOST_KEY *a, unsigned char **out)
+{
+   return ASN1_item_i2d((ASN1_VALUE *)a, out, _GOST_KEY_it);
+}
+
+MASKED_GOST_KEY *
+MASKED_GOST_KEY_new(void)
+{
+   return (MASKED_GOST_KEY *)ASN1_item_new(_GOST_KEY_it);
+}
+
+void
+MASKED_GOST_KEY_free(MASKED_GOST_KEY *a)
+{
+   ASN1_item_free((ASN1_VALUE *)a, _GOST_KEY_it);
+}
+
 static const ASN1_TEMPLATE GOST_KEY_TRANSPORT_seq_tt[] = {
{
.flags = 0,
diff --git a/src/lib/libcrypto/gost/gost_asn1.h 
b/src/lib/libcrypto/gost/gost_asn1.h
index 7cabfc79c965..cdbda7b98b67 100644
--- a/src/lib/libcrypto/gost/gost_asn1.h
+++ b/src/lib/libcrypto/gost/gost_asn1.h
@@ -56,6 +56,17 @@
 
 __BEGIN_HIDDEN_DECLS
 
+typedef struct {
+   ASN1_OCTET_STRING *masked_priv_key;
+   ASN1_OCTET_STRING *public_key;
+} MASKED_GOST_KEY;
+
+MASKED_GOST_KEY *MASKED_GOST_KEY_new(void);
+void MASKED_GOST_KEY_free(MASKED_GOST_KEY *a);
+MASKED_GOST_KEY *d2i_MASKED_GOST_KEY(MASKED_GOST_KEY **a, const unsigned char 
**in, long len);
+int i2d_MASKED_GOST_KEY(MASKED_GOST_KEY *a, unsigned char **out);
+extern const ASN1_ITEM MASKED_GOST_KEY_it;
+
 typedef struct {
ASN1_OCTET_STRING *encrypted_key;
ASN1_OCTET_STRING *imit;
diff --git a/src/lib/libcrypto/gost/gostr341001_ameth.c 
b/src/lib/libcrypto/gost/gostr341001_ameth.c
index 7cb70ed420ae..880c17ceaab8 100644
--- a/src/lib/libcrypto/gost/gostr341001_ameth.c
+++ b/src/lib/libcrypto/gost/gostr341001_ameth.c
@@ -437,6 +437,70 @@ priv_print_gost01(BIO *out, const EVP_PKEY *pkey, int 
indent, ASN1_PCTX *pctx)
return pub_print_gost01(out, pkey, indent, pctx);
 }
 
+static BIGNUM *unmask_priv_key(EVP_PKEY *pk,
+   const unsigned char *buf, int len, int num_masks)
+{
+   BIGNUM *pknum_masked = NULL, *q, *mask;
+   const GOST_KEY *key_ptr = pk->pkey.gost;
+   const EC_GROUP *group = GOST_KEY_get0_group(key_ptr);
+   const unsigned char *p = buf + num_masks * len;
+   BN_CTX *ctx;
+
+   pknum_masked = GOST_le2bn(buf, len, NULL);
+   if (!pknum_masked) {
+   GOSTerror(ERR_R_MALLOC_FAILURE);
+   return NULL;
+   }
+
+   if (num_masks == 0)
+   return pknum_masked;
+
+   ctx = BN_CTX_new();
+   if (ctx == NULL) {
+   GOSTerror(ERR_R_MALLOC_FAILURE);
+   goto err;
+   }
+
+   BN_CTX_start(ctx);
+
+   q = BN_CTX_get(ctx);
+   if (!q) {
+   GOSTerror(ERR_R_MALLOC_FAILURE);
+   goto err;
+   }
+
+   mask = BN_CTX_get(ctx);
+   if (!mask) {
+   GOSTerror(ERR_R_MALLOC_FAILURE);
+   goto err;
+   }
+
+   if (EC_GROUP_get_order(group, q, NULL) <= 0) {
+   GOSTerror(ERR_R_EC_LIB);
+   goto err;
+   }
+
+   for (; p != buf; p -= len) {
+   if (GOST_le2bn(p, len, mask) == NULL ||
+   !BN_mod_mul(pknum_masked, pknum_masked, mask, q, ctx)) {
+   GOSTerror(ERR_R_BN_LIB);
+   goto err;
+   }
+   }
+
+   BN_CTX_end(ctx);
+   BN_CTX_free(ctx);
+
+   return pknum_masked;
+
+err:
+   BN_CTX_end(ctx);
+   BN_CTX_free(ctx);
+
+   

[PATCH 2/5] gost: use ECerror to report EC errors

2020-06-27 Thread Dmitry Baryshkov
GOST code uses GOSTerror(EC_R_foo) to report several errors. Use
ECerror(EC_R_foo) instead to make error messages match error code.

Sponsored by ROSA Linux.

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/gost/gostr341001_ameth.c |  2 +-
 src/lib/libcrypto/gost/gostr341001_key.c   | 14 +++---
 src/lib/libcrypto/gost/gostr341001_pmeth.c |  2 +-
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/lib/libcrypto/gost/gostr341001_ameth.c 
b/src/lib/libcrypto/gost/gostr341001_ameth.c
index 27a95f2069cd..0e9521178da5 100644
--- a/src/lib/libcrypto/gost/gostr341001_ameth.c
+++ b/src/lib/libcrypto/gost/gostr341001_ameth.c
@@ -547,7 +547,7 @@ param_decode_gost01(EVP_PKEY *pkey, const unsigned char 
**pder, int derlen)
}
group = EC_GROUP_new_by_curve_name(nid);
if (group == NULL) {
-   GOSTerror(EC_R_EC_GROUP_NEW_BY_NAME_FAILURE);
+   ECerror(EC_R_EC_GROUP_NEW_BY_NAME_FAILURE);
GOST_KEY_free(ec);
return 0;
}
diff --git a/src/lib/libcrypto/gost/gostr341001_key.c 
b/src/lib/libcrypto/gost/gostr341001_key.c
index 0af39f21bf33..74f8cab9d86c 100644
--- a/src/lib/libcrypto/gost/gostr341001_key.c
+++ b/src/lib/libcrypto/gost/gostr341001_key.c
@@ -121,7 +121,7 @@ GOST_KEY_check_key(const GOST_KEY *key)
return 0;
}
if (EC_POINT_is_at_infinity(key->group, key->pub_key) != 0) {
-   GOSTerror(EC_R_POINT_AT_INFINITY);
+   ECerror(EC_R_POINT_AT_INFINITY);
goto err;
}
if ((ctx = BN_CTX_new()) == NULL)
@@ -131,14 +131,14 @@ GOST_KEY_check_key(const GOST_KEY *key)
 
/* testing whether the pub_key is on the elliptic curve */
if (EC_POINT_is_on_curve(key->group, key->pub_key, ctx) == 0) {
-   GOSTerror(EC_R_POINT_IS_NOT_ON_CURVE);
+   ECerror(EC_R_POINT_IS_NOT_ON_CURVE);
goto err;
}
/* testing whether pub_key * order is the point at infinity */
if ((order = BN_new()) == NULL)
goto err;
if (EC_GROUP_get_order(key->group, order, ctx) == 0) {
-   GOSTerror(EC_R_INVALID_GROUP_ORDER);
+   ECerror(EC_R_INVALID_GROUP_ORDER);
goto err;
}
if (EC_POINT_mul(key->group, point, NULL, key->pub_key, order,
@@ -147,7 +147,7 @@ GOST_KEY_check_key(const GOST_KEY *key)
goto err;
}
if (EC_POINT_is_at_infinity(key->group, point) == 0) {
-   GOSTerror(EC_R_WRONG_ORDER);
+   ECerror(EC_R_WRONG_ORDER);
goto err;
}
/*
@@ -156,7 +156,7 @@ GOST_KEY_check_key(const GOST_KEY *key)
 */
if (key->priv_key != NULL) {
if (BN_cmp(key->priv_key, order) >= 0) {
-   GOSTerror(EC_R_WRONG_ORDER);
+   ECerror(EC_R_WRONG_ORDER);
goto err;
}
if (EC_POINT_mul(key->group, point, key->priv_key, NULL, NULL,
@@ -165,7 +165,7 @@ GOST_KEY_check_key(const GOST_KEY *key)
goto err;
}
if (EC_POINT_cmp(key->group, point, key->pub_key, ctx) != 0) {
-   GOSTerror(EC_R_INVALID_PRIVATE_KEY);
+   ECerror(EC_R_INVALID_PRIVATE_KEY);
goto err;
}
}
@@ -212,7 +212,7 @@ GOST_KEY_set_public_key_affine_coordinates(GOST_KEY *key, 
BIGNUM *x, BIGNUM *y)
 * out of range.
 */
if (BN_cmp(x, tx) != 0 || BN_cmp(y, ty) != 0) {
-   GOSTerror(EC_R_COORDINATES_OUT_OF_RANGE);
+   ECerror(EC_R_COORDINATES_OUT_OF_RANGE);
goto err;
}
if (GOST_KEY_set_public_key(key, point) == 0)
diff --git a/src/lib/libcrypto/gost/gostr341001_pmeth.c 
b/src/lib/libcrypto/gost/gostr341001_pmeth.c
index 0eb1d873deaf..0e0cae99e3fc 100644
--- a/src/lib/libcrypto/gost/gostr341001_pmeth.c
+++ b/src/lib/libcrypto/gost/gostr341001_pmeth.c
@@ -246,7 +246,7 @@ pkey_gost01_sign(EVP_PKEY_CTX *ctx, unsigned char *sig, 
size_t *siglen,
*siglen = 2 * size;
return 1;
} else if (*siglen < 2 * size) {
-   GOSTerror(EC_R_BUFFER_TOO_SMALL);
+   ECerror(EC_R_BUFFER_TOO_SMALL);
return 0;
}
if (tbs_len != 32 && tbs_len != 64) {
-- 
2.27.0



[PATCH 1/5] gost: populate params tables with new curves

2020-06-27 Thread Dmitry Baryshkov
Allow users to specify new curves via strings.

Sponsored by ROSA Linux

Signed-off-by: Dmitry Baryshkov 
---
 src/lib/libcrypto/gost/gostr341001_params.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/lib/libcrypto/gost/gostr341001_params.c 
b/src/lib/libcrypto/gost/gostr341001_params.c
index 282a21041999..9764964cdc1e 100644
--- a/src/lib/libcrypto/gost/gostr341001_params.c
+++ b/src/lib/libcrypto/gost/gostr341001_params.c
@@ -94,12 +94,22 @@ static const GostR3410_params GostR3410_256_params[] = {
{ "0",  NID_id_GostR3410_2001_TestParamSet },
{ "XA", NID_id_GostR3410_2001_CryptoPro_XchA_ParamSet },
{ "XB", NID_id_GostR3410_2001_CryptoPro_XchB_ParamSet },
+   { "TCA", NID_id_tc26_gost_3410_12_256_paramSetA },
+   { "TCB", NID_id_tc26_gost_3410_12_256_paramSetB },
+   { "TCC", NID_id_tc26_gost_3410_12_256_paramSetC },
+   { "TCD", NID_id_tc26_gost_3410_12_256_paramSetD },
{ NULL, NID_undef },
 };
 
 static const GostR3410_params GostR3410_512_params[] = {
{ "A",  NID_id_tc26_gost_3410_12_512_paramSetA },
{ "B",  NID_id_tc26_gost_3410_12_512_paramSetB },
+   { "C",  NID_id_tc26_gost_3410_12_512_paramSetC },
+   { "0",  NID_id_tc26_gost_3410_12_512_paramSetTest},
+   /* Duplicates for compatibility with OpenSSL */
+   { "TCA", NID_id_tc26_gost_3410_12_512_paramSetA },
+   { "TCB", NID_id_tc26_gost_3410_12_512_paramSetB },
+   { "TCC", NID_id_tc26_gost_3410_12_512_paramSetC },
{ NULL, NID_undef },
 };
 
-- 
2.27.0



Re: 11n Tx aggregation for iwm(4)

2020-06-27 Thread Johan Huldtgren
On 2020-06-26 20:11, Johan Huldtgren wrote:
> hello,
> 
> On 2020-06-26 14:45, Stefan Sperling wrote:
> > It would be great to get at least one test for all the chipsets the driver
> > supports: 7260, 7265, 3160, 3165, 3168, 8260, 8265, 9260, 9560
> > The behaviour of the access point also matters a great deal. It won't
> > hurt to test the same chipset against several different access points.
> 
> tested on:
> 
> iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi
> 
> AP is a Ruckus 7363.
> 
> $ netstat -W iwm0 | grep "output block"   
>   
> 
> 6 new output block ack agreements
> 0 output block ack agreements timed out
> 
> Before:
> 
> bandwidth min/avg/max/std-dev = 16.780/18.325/19.939/1.235 Mbps
> 
> After:
> 
> bandwidth min/avg/max/std-dev = 0.000/15.559/51.631/19.548 Mbps

Testing against a slightly different AP (Ruckus 7372):

before patch:

bandwidth min/avg/max/std-dev = 0.092/14.665/22.589/9.992 Mbps

after patch:

bandwidth min/avg/max/std-dev = 7.020/24.596/41.121/11.300 Mbps

This is the reported mode:

media: IEEE802.11 autoselect (HT-MCS13 mode 11n)

.jh



Re: ifconfig.8 Ar/Cm typo

2020-06-27 Thread Jason McIntyre
On Sat, Jun 27, 2020 at 02:48:18AM -0500, Matthew Martin wrote:
> A rule on a bridge interface that uses arp or rarp may be followed with
> a literal "request" or "reply" (cf. sbin/ifconfig/brconfig.c L1041 and
> 1048), so the Ar macro is incorrect as it's argument is not
> a placeholder.
> 

right/

> Aside: Is there a rule for when to list alternatives with foo | bar or
> foo Ns | Ns bar? in/out, arp/rarp, and request/reply are all the former
> sans-Ns; however, block/pass uses the Ns macro.
> 

normally we just use arg1 | arg2, but sometimes this becomes ambiguous:

rule block | pass [in | out]

do "in" and "out" go with both "block" and "pass", or just "pass"? so
sometimes we scrunch them up to make it clearer:

rule block|pass [in | out]

hence the need for Ns.

i just committed your diff, but it needed a little more:

Index: ifconfig.8
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v
retrieving revision 1.350
diff -u -r1.350 ifconfig.8
--- ifconfig.8  24 Jun 2020 17:40:10 -  1.350
+++ ifconfig.8  27 Jun 2020 15:31:01 -
@@ -751,7 +751,7 @@
 .Bk -words
 .Op Cm tag Ar tagname
 .Oo
-.Cm arp | rarp Op Ar request | reply
+.Cm arp | rarp Op Cm request | reply
 .Op Cm sha Ar lladdr
 .Op Cm spa Ar ipaddr
 .Op Cm tha Ar lladdr
@@ -779,9 +779,9 @@
 keyword for regular packets and
 .Cm rarp
 for reverse arp.
-.Ar request
+.Cm request
 and
-.Ar reply
+.Cm reply
 limit matches to requests or replies.
 The source and target host addresses can be matched with the
 .Cm sha

thanks for the diff!
jmc



Re: [PATCH} Optimized rasops32 putchar

2020-06-27 Thread Mark Kettenis
> From: 
> Date: Fri, 26 Jun 2020 07:42:50 -0700
> 
> Optimized 32 bit character rendering with unrolled rows and pairwise
> foreground / background pixel rendering.
> 
> If it weren't for the 5x8 font, I would have just assumed everything
> was an even width and made the fallback path also pairwise.
> 
> In isolation, the 16x32 character case got 2x faster, but that wasn't
> a huge real world speedup where the space rendering that was already
> at memory bandwidth limits accounted for most of the character
> rendering time.  However, in combination with the previous fast
> conditional console scrolling that removes most of the space rendering,
> it becomes significant.
> 
> I also found that at least the efi and intel framebuffers are not
> currently mapped write combining, which makes this much slower than
> it should be.

Hi John,

The framebuffer should be mapped write-combining.  In OpenBSD this is
requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to
bbus_space_map(9) when mapping the framebuffer.

I'm fairly confident since until last January the initial mapping of
the framebuffer that we used wasn't write-combining.  And things were
really, really slow.

Cheers,

Mark

> Index: rasops32.c
> ===
> RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 rasops32.c
> --- rasops32.c25 May 2020 09:55:49 -  1.10
> +++ rasops32.c26 Jun 2020 14:34:06 -
> @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri)
>  int
>  rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t
> attr)
>  {
> - int width, height, cnt, fs, fb, clr[2];
> + int width, height, step, cnt, fs, b, f;
> + uint32_t fb, clr[2];
>   struct rasops_info *ri;
> - int32_t *dp, *rp;
> + int64_t *rp, q;
> + union {
> + int64_t q[4];
> + int32_t d[4][2];
> + } u;
>   u_char *fr;
>  
>   ri = (struct rasops_info *)cookie;
> @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, 
>   return 0;
>  #endif
>  
> - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
> + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale);
>  
>   height = ri->ri_font->fontheight;
>   width = ri->ri_font->fontwidth;
> + step = ri->ri_stride >> 3;
>  
> - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf];
> - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf];
> + b = ri->ri_devcmap[(attr >> 16) & 0xf];
> + f = ri->ri_devcmap[(attr >> 24) & 0xf];
> + u.d[0][0] = b; u.d[0][1] = b;
> + u.d[1][0] = b; u.d[1][1] = f;
> + u.d[2][0] = f; u.d[2][1] = b;
> + u.d[3][0] = f; u.d[3][1] = f;
>  
>   if (uc == ' ') {
> + q = u.q[0];
>   while (height--) {
> - dp = rp;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--)
> - *dp++ = clr[0];
> + /* the general, pixel-at-a-time case is fast enough */
> + for (cnt = 0; cnt < width; cnt++)
> + ((int *)rp)[cnt] = b;
> + rp += step;
>   }
>   } else {
>   uc -= ri->ri_font->firstchar;
>   fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale;
>   fs = ri->ri_font->stride;
> -
> - while (height--) {
> - dp = rp;
> - fb = fr[3] | (fr[2] << 8) | (fr[1] << 16) |
> - (fr[0] << 24);
> - fr += fs;
> - DELTA(rp, ri->ri_stride, int32_t *);
> -
> - for (cnt = width; cnt; cnt--) {
> - *dp++ = clr[(fb >> 31) & 1];
> - fb <<= 1;
> - }
> + /* double-pixel special cases for the common widths */
> + switch (width) {
> + case 8:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> + rp[3] = u.q[fb & 3];
> + rp += step;
> + fr += 1;
> + }
> + break;
> + 
> + case 12:
> + while (height--) {
> + fb = fr[0];
> + rp[0] = u.q[fb >> 6];
> + rp[1] = u.q[(fb >> 4) & 3];
> + rp[2] = u.q[(fb >> 2) & 3];
> +   

Re: wg(4): encapsulated transport checksums

2020-06-27 Thread Theo de Raadt
> - Therefore, it's not necessary to check the IP checksum on ingress because:

There is actually a really good reason.

There are various counters (of all packets) which people observe to debug
network problems.

Now, if lower-level packets carrying wg with corruption don't increment
those counters, the statistics will be incorrect.

I think you are arguying to elide mandatory work in a lower layer of
network stack, isn't it a layer violation to insist like that?



Re: awk FS behaviour change

2020-06-27 Thread Jason McIntyre
On Sat, Jun 27, 2020 at 06:32:11AM -0600, Todd C. Miller wrote:
> On Sat, 27 Jun 2020 06:50:39 +0100, Jason McIntyre wrote:
> 
> > i'm not sure it reads better when we switch the emphasis from whitespace
> > to FS. i think it's better that people see how it normally works, then
> > the gories about FS. so i'd have kept the first part of the sentence,
> > but maybe reworked the FS bit.
> 
> I wasn't sure that was an improvement either.  Does this seem better?
> 
>  - todd
> 

yes, i think this is better. ok by me.
jmc

> Index: usr.bin/awk/awk.1
> ===
> RCS file: /cvs/src/usr.bin/awk/awk.1,v
> retrieving revision 1.54
> diff -u -p -u -r1.54 awk.1
> --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -  1.54
> +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 -
> @@ -130,26 +130,24 @@ and newlines are used as field separator
>  This is convenient when working with multi-line records.
>  .Pp
>  An input line is normally made up of fields separated by whitespace,
> -or by the regular expression
> -.Va FS .
> +or by the value of the field separator
> +.Va FS
> +at the time the line is read.
>  The fields are denoted
>  .Va $1 , $2 , ... ,
>  while
>  .Va $0
>  refers to the entire line.
> -If
>  .Va FS
> -is null, the input line is split into one field per character.
> -Lines are split into fields using the value of
> +may be set to either a single character or a regular expression.
> +As as special case, if
>  .Va FS
> -at the time the line is read.
> -Because of this,
> +is a single space
> +.Pq the default ,
> +fields will be split by one or more whitespace characters.
> +If
>  .Va FS
> -is usually set via the
> -.Fl F
> -option or inside of a
> -.Ic BEGIN
> -block.
> +is null, the input line is split into one field per character.
>  .Pp
>  Normally, any number of blanks separate fields.
>  In order to set the field separator to a single blank, use the
> @@ -171,6 +169,11 @@ as the field separator, use the
>  .Fl F
>  option with a value of
>  .Sq [t] .
> +The field separator is usually set via the
> +.Fl F
> +option or from inside a
> +.Ic BEGIN
> +block so that it takes effect before the input is read.
>  .Pp
>  A pattern-action statement has the form:
>  .Pp
> @@ -407,9 +410,9 @@ The name of the current input file.
>  .It Va FNR
>  Ordinal number of the current record in the current file.
>  .It Va FS
> -Regular expression used to separate fields; also settable
> -by option
> -.Fl F Ar fs .
> +Regular expression used to separate fields (default whitespace);
> +also settable by option
> +.Fl F Ar fs
>  .It Va NF
>  Number of fields in the current record.
>  .Va $NF
> 



Re: awk FS behaviour change

2020-06-27 Thread Klemens Nanni
On Sat, Jun 27, 2020 at 06:32:11AM -0600, Todd C. Miller wrote:
> I wasn't sure that was an improvement either.  Does this seem better?
To me it does, thanks.

OK kn

> Index: usr.bin/awk/awk.1
> ===
> RCS file: /cvs/src/usr.bin/awk/awk.1,v
> retrieving revision 1.54
> diff -u -p -u -r1.54 awk.1
> --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 -  1.54
> +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 -
> @@ -130,26 +130,24 @@ and newlines are used as field separator
>  This is convenient when working with multi-line records.
>  .Pp
>  An input line is normally made up of fields separated by whitespace,
> -or by the regular expression
> -.Va FS .
> +or by the value of the field separator
> +.Va FS
> +at the time the line is read.
>  The fields are denoted
>  .Va $1 , $2 , ... ,
>  while
>  .Va $0
>  refers to the entire line.
> -If
>  .Va FS
> -is null, the input line is split into one field per character.
> -Lines are split into fields using the value of
> +may be set to either a single character or a regular expression.
> +As as special case, if
>  .Va FS
> -at the time the line is read.
> -Because of this,
> +is a single space
> +.Pq the default ,
.Pq is probably not needed here, at the end you're doing also just using
"(default whitespace)".

> +fields will be split by one or more whitespace characters.
> +If
>  .Va FS
> -is usually set via the
> -.Fl F
> -option or inside of a
> -.Ic BEGIN
> -block.
> +is null, the input line is split into one field per character.
>  .Pp
>  Normally, any number of blanks separate fields.
>  In order to set the field separator to a single blank, use the
> @@ -171,6 +169,11 @@ as the field separator, use the
>  .Fl F
>  option with a value of
>  .Sq [t] .
> +The field separator is usually set via the
> +.Fl F
> +option or from inside a
> +.Ic BEGIN
> +block so that it takes effect before the input is read.
>  .Pp
>  A pattern-action statement has the form:
>  .Pp
> @@ -407,9 +410,9 @@ The name of the current input file.
>  .It Va FNR
>  Ordinal number of the current record in the current file.
>  .It Va FS
> -Regular expression used to separate fields; also settable
> -by option
> -.Fl F Ar fs .
> +Regular expression used to separate fields (default whitespace);
> +also settable by option
> +.Fl F Ar fs
Missing dot here (with trailing space after "fs").

>  .It Va NF
>  Number of fields in the current record.
>  .Va $NF
> 



Re: awk FS behaviour change

2020-06-27 Thread Todd C . Miller
On Sat, 27 Jun 2020 06:50:39 +0100, Jason McIntyre wrote:

> i'm not sure it reads better when we switch the emphasis from whitespace
> to FS. i think it's better that people see how it normally works, then
> the gories about FS. so i'd have kept the first part of the sentence,
> but maybe reworked the FS bit.

I wasn't sure that was an improvement either.  Does this seem better?

 - todd

Index: usr.bin/awk/awk.1
===
RCS file: /cvs/src/usr.bin/awk/awk.1,v
retrieving revision 1.54
diff -u -p -u -r1.54 awk.1
--- usr.bin/awk/awk.1   26 Jun 2020 21:50:06 -  1.54
+++ usr.bin/awk/awk.1   27 Jun 2020 12:29:21 -
@@ -130,26 +130,24 @@ and newlines are used as field separator
 This is convenient when working with multi-line records.
 .Pp
 An input line is normally made up of fields separated by whitespace,
-or by the regular expression
-.Va FS .
+or by the value of the field separator
+.Va FS
+at the time the line is read.
 The fields are denoted
 .Va $1 , $2 , ... ,
 while
 .Va $0
 refers to the entire line.
-If
 .Va FS
-is null, the input line is split into one field per character.
-Lines are split into fields using the value of
+may be set to either a single character or a regular expression.
+As as special case, if
 .Va FS
-at the time the line is read.
-Because of this,
+is a single space
+.Pq the default ,
+fields will be split by one or more whitespace characters.
+If
 .Va FS
-is usually set via the
-.Fl F
-option or inside of a
-.Ic BEGIN
-block.
+is null, the input line is split into one field per character.
 .Pp
 Normally, any number of blanks separate fields.
 In order to set the field separator to a single blank, use the
@@ -171,6 +169,11 @@ as the field separator, use the
 .Fl F
 option with a value of
 .Sq [t] .
+The field separator is usually set via the
+.Fl F
+option or from inside a
+.Ic BEGIN
+block so that it takes effect before the input is read.
 .Pp
 A pattern-action statement has the form:
 .Pp
@@ -407,9 +410,9 @@ The name of the current input file.
 .It Va FNR
 Ordinal number of the current record in the current file.
 .It Va FS
-Regular expression used to separate fields; also settable
-by option
-.Fl F Ar fs .
+Regular expression used to separate fields (default whitespace);
+also settable by option
+.Fl F Ar fs
 .It Va NF
 Number of fields in the current record.
 .Va $NF



Re: 11n Tx aggregation for iwm(4)

2020-06-27 Thread Tobias Heider
Works for me on a 7260.

[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.1 sec   108 MBytes  90.1 Mbits/sec



Re: pipex(4): use reference counters for `ifnet'

2020-06-27 Thread Martin Pieuchot
On 27/06/20(Sat) 01:02, Vitaliy Makkoveev wrote:
> On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote:
> > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote:
> > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote:
> > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote:
> > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote:
> > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote:
> > > > > > > Updated diff. 
> > > > > > > 
> > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we 
> > > > > > > can't
> > > > > > > store index in session and be sure if_get(9) returned `ifnet' is 
> > > > > > > our
> > > > > > > original `ifnet'.
> > > > > > 
> > > > > > Why not?  The point of if_get(9) is to be sure.  If that doesn't 
> > > > > > work
> > > > > > for whatever reason then the if_get(9) interface has to be fixed.  
> > > > > > Which
> > > > > > case doesn't work for you?  Do you have a reproducer?  
> > > > > > 
> > > > > > How does sessions stay around if their corresponding interface is
> > > > > > destroyed?
> > > > > 
> > > > > We have `pipexinq' and `pipexoutq' which can store pointers to 
> > > > > session.
> > > > > pipexintr() process these queues. pipexintr() and
> > > > > pipex_destroy_session() are *always* different context. This mean we
> > > > > *can't* free pipex(4) session without be sure there is no reference to
> > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause 
> > > > > use
> > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4)
> > > > > destroy sessions is wrong. While pppac(4) destroys sessions by
> > > > > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq'
> > > > > and `pipexoutq' state. I'am said it again and again.
> > > > 
> > > > I understand.  Why is it a problem?  Using reference counting the way
> > > > you're suggesting is *one* possible solution to a problem we don't fully
> > > > understand.  What are we trying to achieve?  Which problem are we trying
> > > > to solve?
> > > 
> > > Sorry, may be I misunderstand something.
> > > 
> > > `pipexoutq' case:
> > > 
> > > 1. pppac_start() calls pipex_output()
> > > 2. pipex_output() calls pipex_ip_output()
> > > 3. pipex_ip_output() calls pipex_ppp_enqueue()
> > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add()
> > > 
> > > `pipexinq' cases:
> > > 
> > > 1.1. ether_input() calls pipex_pppoe_input()
> > > 1.2. gre_input() calls gre_input_1()
> > >  gre_input_1() calls pipex_pptp_input()
> > > 1.3. udp_input() calls pipex_l2tp_input()
> > > 
> > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input()
> > > 3. pipex_common_input() calls schednetisr() which is task_add()
> > > 
> > > task_add(9) just schedules the execution of the work specified by `tq'.
> > > So we can do pipex_destroy_session() * between * schednetisr() and
> > > pipexintr(). And we can do this right * now *, with our current locking.
> > > And this is the problem I'am trying to solve.
> > > 
> > > My apologies if I'am wrong above. Please point me where I'am wrong.
> > > 
> > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding
> > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4)
> > > session. We should be shure `session' is still walid between
> > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be
> > > required in future.
> > 
> > Why not iterate over the queues and garbage collect the sessions that
> > are about to be removed?  That's what the network stack was doing with
> > mbuf queues prior to if_get(9) when interfaces where destroyed.
> > 
> 
> Do you mean net/if.c:1185 and below? This is the queues associated with
> this `ifp'. But for pipex(4) we should go through all mbufs associated
> with pipex(4). This can be heavy if we have hundreds of sessions. Also
> this would work until session destruction and `pr_input' are serialized.
> 
> Point me please the line in source to see if I'am wrong about `ifnet's
> mbuf queues claninig.

Look at r1.329 of net/if.c.  Prior to this change if_detach_queues() was
used to free all mbufs when an interface was removed.  Now lazy freeing
is used everytime if_get(9) rerturns NULL.

This is possible because we store an index and not a pointer directly in
the mbuf.

The advantage of storing a session pointer in `ph_cookie' is that no
lookup is required in pipexintr(), right?  Maybe we could save a ID
instead and do a lookup.  How big can be the `pipex_session_list'?



Re: fix races in if_clone_create()

2020-06-27 Thread Martin Pieuchot
On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote:
> On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote:
> > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote:
> > > if_clone_create() has the races caused by context switch.
> > 
> > Can you share a backtrace of such race?  Where does the kernel panic?
> >
> 
> This diff was inspired by thread [1]. As I explained [2] here is 3
> issues that cause panics produced by command below:
> 
>  cut begin 
> for i in 1 2 3; do while true; do ifconfig bridge0 create& \
>   ifconfig bridge0 destroy& done& done
>  cut end 

Thanks, I couldn't reproduce it on any of the machines I tried.  Did you
managed to reproduce it with other pseudo-devices or just with bridge0?

> My system was stable with the last diff I did for thread [1]. But since
> this final diff [3] which include fixes for tun(4) is quick and dirty
> and not for commit I decided to make the diff to fix the races caused by
> if_clone_create() at first.
> 
> I included screenshot with panic.

Thanks, interesting that the corruption happens on a list that should be
initialized.  Does that mean the context switch on Thread 1 is happening
before if_attach_common() is called?

You said your previous email that there's a context switch.  Do you know
when it happens?  You could see that in ddb by looking at the backtrace
of the other CPU.

Is the context switch leading to the race common to all pseudo-drivers
or is it in the bridge(4) driver?

Regarding your solution, do I understand correctly that the goal is to
serialize all if_clone_create()?  Is it really needed to remember which
unit is being currently created or can't we just serialize all of them?

The fact that a lock is not held over the cloning operation is imho
positive.



Re: wg(4): encapsulated transport checksums

2020-06-27 Thread Jason A. Donenfeld
Hi Richard,

Thanks for the patch. I had problems parsing some terminology in your
description, so I thought I'd lay out my understanding of the matter,
and you can let me know whether or not this corresponds with what you
had in mind:

- On egress, we must compute the packet checksum, because it may well
be forwarded by the receiving end after decapsulation. That doesn't
concern this patch, however.

- On ingress, we've already checked the poly1305 sum, so we have no
doubt that the packet has arrived without corruption.
- Therefore, it's not necessary to check the IP checksum on ingress because:
  * If the packet originated on the peer that did the encapsulation,
there's no chance for corruption;
  * If the packet did not originate on the peer that did the
encapsulation, it was that peer's responsibility to drop it if the
checksum was wrong;
  * If the packet does have an incorrect checksum, because the
originating peer did not check it, and we forward it along, the
machine we forward it to will drop it.

It seemed like from your message that you had a case in mind in which
it actually would be necessary to check the IP checksum on ingress,
but I didn't quite divine what you had in mind.

Jason

On Fri, Jun 26, 2020 at 10:03 PM  wrote:
>
> Hi,
>
> On its receive path, wg(4) uses the same mbuf for both the encrypted
> capsule and its encapsulated packet, which it passes up to the stack. We
> must therefore clear this mbuf's checksum status flags, as although the
> capsule may have been subject to hardware offload, its encapsulated packet
> was not.
>
> This ensures that the transport checksums of packets bound for local
> delivery are verified. That is necessary because, although the tunnel
> provides stronger integrity checks, the tunnel endpoints and the
> transport endpoints needn't coincide.
>
> However, as the network and tunnel endpoints _do_ conincide, it remains
> unncessary to check the per-hop IPv4 checksum.
>
> ok?
>
> Index: net/if_wg.c
> ===
> RCS file: /cvs/src/sys/net/if_wg.c,v
> retrieving revision 1.7
> diff -u -p -u -p -r1.7 if_wg.c
> --- net/if_wg.c 23 Jun 2020 10:03:49 -  1.7
> +++ net/if_wg.c 27 Jun 2020 02:48:37 -
> @@ -1660,14 +1660,10 @@ wg_decap(struct wg_softc *sc, struct mbu
> goto error;
> }
>
> -   /*
> -* We can mark incoming packet csum OK. We mark all flags OK
> -* irrespective to the packet type.
> -*/
> -   m->m_pkthdr.csum_flags |= (M_IPV4_CSUM_IN_OK | M_TCP_CSUM_IN_OK |
> -   M_UDP_CSUM_IN_OK | M_ICMP_CSUM_IN_OK);
> -   m->m_pkthdr.csum_flags &= ~(M_IPV4_CSUM_IN_BAD | M_TCP_CSUM_IN_BAD |
> -   M_UDP_CSUM_IN_BAD | M_ICMP_CSUM_IN_BAD);
> +   /* tunneled packet was not offloaded */
> +   m->m_pkthdr.csum_flags = 0;
> +   /* optimise: the tunnel provided a stronger integrity check */
> +   m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK;
>
> m->m_pkthdr.ph_ifidx = sc->sc_if.if_index;
> m->m_pkthdr.ph_rtableid = sc->sc_if.if_rdomain;



Re: [PATCH] fast conditional console scrolling

2020-06-27 Thread Paul de Weerd
Hi John,

With both your diffs applied, results are indeed more like 3x speed-up
that I get on my machine.  Average over 7 runs ls -R /usr/ports was
64.169s making for just under 3x increase.  That's on 1920x1080 with
the standard font size for that resolution (120x33 console, so 16x32
font).

Thanks again,

Paul 'WEiRD' de Weerd

On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote:
| I should have been more rigorous -- I had two different changes running
| on my system, as well as forcing it to use the 12x24 font for a 160x45
| console.
| 
| If you apply the "Optimized rasops32 putchar" patch I just posted, you
| should see another significant speedup.
| 
| 
|  Original Message 
| Subject: Re: [PATCH] fast conditional console scrolling
| From: Paul de Weerd 
| Date: Fri, June 26, 2020 1:23 am
| To: jo...@armadilloaerospace.com
| Cc: "tech@openbsd.org" 
| 
| Hi John,
| 
| I tried your diff. I don't quite see the same 3x improvement that you
| report, more like 2x. I timed 7 runs of ls -R /usr/ports:
| 
| Before diff, time ls -R /usr/ports | wc -l 2.897s on average
| After diff, time ls -R /usr/ports | wc -l 2.707s on average
| 
| Before diff, time ls -R /usr/ports 2m53.067 on average
| After diff, time ls -R /usr/ports 1m30.387 on average
| 
| Note that the 'before diff' runs were with a snapshot kernel. There
| may be diffs in there that account for the difference between before
| and after of the no-output runs. See dmesg and full stats below.
| 
| So, on average, a speed-up of ~48%.
| 
| Thanks!
| 
| Paul 'WEiRD' de Weerd
| 
| 

-- 
>[<++>-]<+++.>+++[<-->-]<.>+++[<+
+++>-]<.>++[<>-]<+.--.[-]
 http://www.weirdnet.nl/ 



ifconfig.8 Ar/Cm typo

2020-06-27 Thread Matthew Martin
A rule on a bridge interface that uses arp or rarp may be followed with
a literal "request" or "reply" (cf. sbin/ifconfig/brconfig.c L1041 and
1048), so the Ar macro is incorrect as it's argument is not
a placeholder.

Aside: Is there a rule for when to list alternatives with foo | bar or
foo Ns | Ns bar? in/out, arp/rarp, and request/reply are all the former
sans-Ns; however, block/pass uses the Ns macro.


diff --git ifconfig.8 ifconfig.8
index c522491ad45..2d1d2eb1974 100644
--- ifconfig.8
+++ ifconfig.8
@@ -751,7 +751,7 @@ like a hub or a wireless network.
 .Bk -words
 .Op Cm tag Ar tagname
 .Oo
-.Cm arp | rarp Op Ar request | reply
+.Cm arp | rarp Op Cm request | reply
 .Op Cm sha Ar lladdr
 .Op Cm spa Ar ipaddr
 .Op Cm tha Ar lladdr



Re: awk FS behaviour change

2020-06-27 Thread patrick keshishian
On Sat, Jun 27, 2020 at 06:50:39AM +0100, Jason McIntyre wrote:
> On Fri, Jun 26, 2020 at 09:28:00PM -0600, Todd C. Miller wrote:
> > On Fri, 26 Jun 2020 23:56:23 +0200, Klemens Nanni wrote:
> > 
> > > How about adding something like "Therefore, FS should be set with -F or
> > > in a BEGIN block before input is read." as second sentence in this
> > > paragraph?
> > 
> > That whole section is missing important details.  I've tried to add
> > the missing info without being too repetitive.
> > 
> >  - todd
> > 
> > Index: usr.bin/awk/awk.1
> > ===
> > RCS file: /cvs/src/usr.bin/awk/awk.1,v
> > retrieving revision 1.54
> > diff -u -p -u -r1.54 awk.1
> > --- usr.bin/awk/awk.1   26 Jun 2020 21:50:06 -  1.54
> > +++ usr.bin/awk/awk.1   27 Jun 2020 03:25:48 -
> > @@ -129,27 +129,25 @@ and newlines are used as field separator
> >  .Va FS ) .
> >  This is convenient when working with multi-line records.
> >  .Pp
> > -An input line is normally made up of fields separated by whitespace,
> > -or by the regular expression
> > -.Va FS .
> > +An input line is normally made up of fields split based on the value
> > +of the field separator
> > +.Va FS
> > +at the time the line is read.
> 
> i'm not sure it reads better when we switch the emphasis from whitespace
> to FS. i think it's better that people see how it normally works, then
> the gories about FS. so i'd have kept the first part of the sentence,
> but maybe reworked the FS bit.
> 
> >  The fields are denoted
> >  .Va $1 , $2 , ... ,
> >  while
> >  .Va $0
> >  refers to the entire line.
> > -If
> >  .Va FS
> > -is null, the input line is split into one field per character.
> > -Lines are split into fields using the value of
> > +may be set to either a single character or a regular expression.
> > +As as special case, if
> >  .Va FS
> > -at the time the line is read.
> > -Because of this,
> > +is a single space
> > +.Pq the default ,
> > +fields will be split by one or more whitespace characters.
> > +If
> >  .Va FS
> > -is usually set via the
> > -.Fl F
> > -option or inside of a
> > -.Ic BEGIN
> > -block.
> > +is null, the input line is split into one field per character.
> >  .Pp
> >  Normally, any number of blanks separate fields.
> >  In order to set the field separator to a single blank, use the
> > @@ -171,6 +169,11 @@ as the field separator, use the
> >  .Fl F
> >  option with a value of
> >  .Sq [t] .
> > +The field separator is usually set via the
> > +.Fl F
> > +option or from inside of a
> 
> that sounds odd, but it may be a US/UK thing: i would say either "from
> inside a block" or "from the inside of a block".

Maybe "... from inside of the" rather than "... from inside of a"

--patrick

> 
> jmc
> 
> > +.Ic BEGIN
> > +block so that it takes effect before the input is read.
> >  .Pp
> >  A pattern-action statement has the form:
> >  .Pp
> > @@ -407,9 +410,9 @@ The name of the current input file.
> >  .It Va FNR
> >  Ordinal number of the current record in the current file.
> >  .It Va FS
> > -Regular expression used to separate fields; also settable
> > -by option
> > -.Fl F Ar fs .
> > +Regular expression used to separate fields (default whitespace);
> > +also settable by option
> > +.Fl F Ar fs
> >  .It Va NF
> >  Number of fields in the current record.
> >  .Va $NF
> > 
> 



Re: 11n Tx aggregation for iwm(4)

2020-06-27 Thread Landry Breuil
On Fri, Jun 26, 2020 at 06:14:48PM +0200, Landry Breuil wrote:
> On Fri, Jun 26, 2020 at 02:45:53PM +0200, Stefan Sperling wrote:
> > This patch adds support for 11n Tx aggregation to iwm(4).
> > 
> > Please help with testing if you can by running the patch and using wifi
> > as usual. Nothing should change, except that Tx speed may potentially
> > improve. If you have time to run before/after performance measurements with
> > tcpbench or such, that would be nice. But it's not required for testing.
> > 
> > If Tx aggregation is active then netstat will show a non-zero output block 
> > ack
> > agreement counter:
> > 
> > $ netstat -W iwm0 | grep 'output block'
> > 3 new output block ack agreements
> > 0 output block ack agreements timed out
> > 
> > It would be great to get at least one test for all the chipsets the driver
> > supports: 7260, 7265, 3160, 3165, 3168, 8260, 8265, 9260, 9560
> > The behaviour of the access point also matters a great deal. It won't
> > hurt to test the same chipset against several different access points.
> > 
> > I have tested this version on 8265 only so far. I've run older revisions
> > of this patch on 7265 so I'm confident that this chip will work, too.
> > So far, the APs I have tested against are athn(4) in 11a mode and in 11n
> > mode with the 'nomimo' nwflag, and a Sagemcom 11ac AP. All on 5Ghz channels.
> 
> no difference on X1c3 w/
> iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x59, msi
> iwm0: hw rev 0x210, fw ver 17.3216344376.0,
> 
> using a crappy old fonera as AP, serving as a bridge to gw w/ tcpbench.
> 
> bandwidth min/avg/max/std-dev = 22.519/22.704/22.995/0.162 Mbps
> 
> same bw both ways it seems.

so no change against this old AP, which selects:
media: IEEE802.11 autoselect (OFDM48 mode 11g)
or sometimes
media: IEEE802.11 autoselect (OFDM12 mode 11g)
or
media: IEEE802.11 autoselect (OFDM6 mode 11g)

but if i connect to the ISP's box wifi, which selects:
media: IEEE802.11 autoselect (HT-MCS8 mode 11n)

the performance is horrible, i have a lot of lag, and tcpbench says:
bandwidth min/avg/max/std-dev = 0.000/1.576/10.069/2.781 Mbps

i have some iwm firmware errors in dmesg.

without the patch, its a bit the same:
bandwidth min/avg/max/std-dev = 0.000/1.836/9.846/2.292 Mbps

but no firmware errors afaict.
so dunno if the patch itself changes something, but the perf with the
ISP AP is awful. Cant remember if it was the case before as i seldomly
use it with OpenBSD as a client..

Landry