Re: powerpc64: 64-bit-ize memmove.S
On Sat, 27 Jun 2020 01:27:14 +0200 Christian Weisgerber wrote: > That function simply copies as many (double)words plus a tail of > bytes as the length argument specifies. Neither source nor destination > are checked for alignment, so this will happily run a loop of > unaligned accesses, which doesn't sound very optimal. I made a benchmark and concluded that unaligned word copies are slower than aligned word copies, but faster than byte copies. In most cases, memmove.S is faster than memmove.c, but if aligned word copies between unaligned buffers are possible, then memmove.c is faster. The benchmark was on a 32-bit macppc G3 with cpu0 at mainbus0: 750 (Revision 0x202): 400 MHz: 512KB backside cache The benchmark has 4 implementations of memmove, stbu => byte copy with lbzu,stbu loop stbx => byte copy with lbzx,stbx,addi loop C => aligned word copy or byte copy (libc/string/memmove.c) asm => unaligned word copy (libc/arch/powerpc/string/memmove.S) It shows time measured by mftb (move from timebase). 1st bench: move 1 bytes up by 4 bytes, then down by 4 bytes, in aligned buffer (offset 0). asm wins: $ ./bench 1 4 0 stbustbxC asm 26392814792 633 25022814784 628 25012814783 627 25012814784 626 2nd bench: unaligned buffer (offset 1), but (src & 3) == (dst & 3), so C does aligned word copies, while asm does misaligned. C wins: $ ./bench 1 4 1 stbustbxC asm 26383006795 961 25022814786 938 25012814786 939 25012813785 939 3rd bench: move up then down by 5 bytes, src & 3 != dst & 3, can't align word copies. C does byte copies. asm wins: $ ./bench 1 5 0 stbustbxC asm 267528152514809 250128132504782 250228152504782 250128142503782 I think that memmove.S is probably better than memmove.c on G3. I haven't run the bench on POWER9.
Re: Stuck in Needbuf state, trying to understand (6.7)
No. I know *exactly* what needbuf is but to attempt to diagnose what your problem is we need exact details. especially: 1) The configuration of your system including all the details of the filesystems you have mounted, all options used, etc. 2) The script you are using to generate the problem (Not a paraphrasing of what you think the script does) What filesystems it is using. On Sat, Jun 27, 2020 at 08:09:18PM -0400, sven falempin wrote: > On Fri, Jun 26, 2020 at 7:35 PM sven falempin > wrote: > > > > > > > On Fri, Jun 26, 2020 at 5:22 PM Stuart Henderson > > wrote: > > > >> On 2020/06/26 15:30, sven falempin wrote: > >> > behavior confirmed on current. > >> > > >> > Once the process stalls, ( could be anything writing to the vnconfig > >> disk, > >> > cp , umount ) > >> > a few other calls like df , or ps, etc may hang, never the same > >> > sp or mp kernel, reproduced on today's snapshots. > >> > >> vnconfig is used as part of "make release", many builds are done every > >> week using this so it's not a general problem with vnconfig. > >> > >> Can you show some commands or a script to trigger the behaviour? > >> > > > > the perl script use the system to call : > > > > vnconfig. > > mount. > > umount. <- saw hanged > > cp.<- saw hanged > > tar.<- saw hanged > > svn up.<- saw hanged > > and dd. > > newfs. > > > > really nothing fancy, only stuff writing to disk got stuck. > > > > At one point it does a chroot but it never hangs near that , most of the > > time it hangs before. > > > > The script has been used like 1000 times on 6.0 and maybe twice more on > > 6.4. > > > > I have absolutely no idea what the 'needbuf' of top is . > > > > the script hangs at random position , always writing into vnconfig. > > > > I have no idea how to reproduce outside the perl script , so maybe it is > > related > > to some devious perl stdin/stdout buffer . > > > > Nevertheless there's like a 5% chance that's the script will work( slowly ) > > > > Most of the system call are inside a routine to log > > > > sub debug_system { > > $logger->debug('running: '.join(' ', @_)); > > return system(@_); > > } > > > > so i can easily put things inside to try to understand the issue. > > > > It is really a strange behavior, and the device must be shut down > > electrically. > > Something really odd, i run syslogd on a buffer, and syslogc buffer is > > stuck too > > when the device stuck (but it supposed to be mostly already allocated > > memory ). > > > > It's really like the vm does not want to give anymore bucket (<- i > > don't know what i m talking about here, > > but i looks like that anything that doesn't malloc is ok , computer reply > > to ping , can do a few things for a while , and then complete > > hang ) > > > > I ran the 6.7 release on a VM somewhere and another device with many perl > > script and they work. > > > > Only this fails 95% of the time and is VERY VERY slow when ok. > > compared to what i saw in /usr/src the vnconfig is big , ( forgot to copy > > df -h ), > > like 2GB > > > > > i put ktrace in front of the perl system call > > An di was able to recover a 800MB trace > > $ kdump -f ./trace.out | tail -20 > kdump: realloc: Cannot allocate memory > 25955 UNKNOWN(1634890859) > 72466 ? CALL syscall() > > > could that be of some use ? > > > -- > -- > - > Knowing is not enough; we must apply. Willing is not enough; we must do
Re: Stuck in Needbuf state, trying to understand (6.7)
On Fri, Jun 26, 2020 at 7:35 PM sven falempin wrote: > > > On Fri, Jun 26, 2020 at 5:22 PM Stuart Henderson > wrote: > >> On 2020/06/26 15:30, sven falempin wrote: >> > behavior confirmed on current. >> > >> > Once the process stalls, ( could be anything writing to the vnconfig >> disk, >> > cp , umount ) >> > a few other calls like df , or ps, etc may hang, never the same >> > sp or mp kernel, reproduced on today's snapshots. >> >> vnconfig is used as part of "make release", many builds are done every >> week using this so it's not a general problem with vnconfig. >> >> Can you show some commands or a script to trigger the behaviour? >> > > the perl script use the system to call : > > vnconfig. > mount. > umount. <- saw hanged > cp.<- saw hanged > tar.<- saw hanged > svn up.<- saw hanged > and dd. > newfs. > > really nothing fancy, only stuff writing to disk got stuck. > > At one point it does a chroot but it never hangs near that , most of the > time it hangs before. > > The script has been used like 1000 times on 6.0 and maybe twice more on > 6.4. > > I have absolutely no idea what the 'needbuf' of top is . > > the script hangs at random position , always writing into vnconfig. > > I have no idea how to reproduce outside the perl script , so maybe it is > related > to some devious perl stdin/stdout buffer . > > Nevertheless there's like a 5% chance that's the script will work( slowly ) > > Most of the system call are inside a routine to log > > sub debug_system { > $logger->debug('running: '.join(' ', @_)); > return system(@_); > } > > so i can easily put things inside to try to understand the issue. > > It is really a strange behavior, and the device must be shut down > electrically. > Something really odd, i run syslogd on a buffer, and syslogc buffer is > stuck too > when the device stuck (but it supposed to be mostly already allocated > memory ). > > It's really like the vm does not want to give anymore bucket (<- i > don't know what i m talking about here, > but i looks like that anything that doesn't malloc is ok , computer reply > to ping , can do a few things for a while , and then complete > hang ) > > I ran the 6.7 release on a VM somewhere and another device with many perl > script and they work. > > Only this fails 95% of the time and is VERY VERY slow when ok. > compared to what i saw in /usr/src the vnconfig is big , ( forgot to copy > df -h ), > like 2GB > i put ktrace in front of the perl system call An di was able to recover a 800MB trace $ kdump -f ./trace.out | tail -20 kdump: realloc: Cannot allocate memory 25955 UNKNOWN(1634890859) 72466 ▒▒▒ CALL syscall() could that be of some use ? -- -- - Knowing is not enough; we must apply. Willing is not enough; we must do
Re: ldomctl: Fix init-system with multiple PCIe root complexes
On Sat, Jun 20, 2020 at 01:05:22AM +0200, Klemens Nanni wrote: > Opposed to all other (single CPU) machines I have encountered so far, > the T4-2 has two instead of one PCIe root complexes. > > ldomctl already accounts for this and iterates over them but lacks a > simple skip condition when iterating over subdevices to avoid linking > devices in one root complex to those in another. > > This fixes `init-system' on my T4-2 where I have been using a lame > work-around so far, but the recent report on bugs@ reminded me to look > into it more closely this time. > > Thanks to tracey for quickly providing details about his hardware for > quick comparison. Has anyone tried this (on machines other than T4-2)? Koakuma on bugs@ reported that this fixes ldomctl on their T4-2 just like expected. I'd like to commit this soon. Feedback? OK? Index: config.c === RCS file: /cvs/src/usr.sbin/ldomctl/config.c,v retrieving revision 1.40 diff -u -p -r1.40 config.c --- config.c24 May 2020 22:08:54 - 1.40 +++ config.c27 Jun 2020 23:35:38 - @@ -1142,6 +1142,8 @@ hvmd_finalize_pcie_device(struct md *md, md_link_node(md, node, parent); TAILQ_FOREACH(subdevice, >guest->subdevice_list, link) { + if (strncmp(path, subdevice->path, strlen(path)) != 0) + continue; TAILQ_FOREACH(component, , link) { if (strcmp(subdevice->path, component->path) == 0) md_link_node(md, parent, component->hv_node);
Re: 11n Tx aggregation for iwm(4)
Tested on a "Intel Dual Band Wireless-AC 9260" rev 0x29, msix (hw rev 0x320, fw ver 34.3125811985.0) I seem to be getting "iwm0: fatal firmware error" a few seconds after the 4-way handshake. I can send a few packets, so it sure connects and all, but then it fails shortly after. iwm0: begin active scan iwm0: INIT -> SCAN iwm0: end active scan iwm0: + 70:73:cb:cb:c3:86 40 +45 54M ess privacy rsn "FRA" iwm0: SCAN -> AUTH iwm0: sending auth to 70:73:cb:cb:c3:86 on channel 40 mode 11a iwm0: AUTH -> ASSOC iwm0: sending assoc_req to 70:73:cb:cb:c3:86 on channel 40 mode 11a iwm0: ASSOC -> RUN iwm0: associated with 70:73:cb:cb:c3:86 ssid "FRA" channel 40 start MCS 0 long preamble short slot time HT enabled iwm0: missed beacon threshold set to 30 beacons, beacon interval is 100 TU iwm0: received msg 1/4 of the 4-way handshake from 70:73:cb:cb:c3:86 iwm0: sending msg 2/4 of the 4-way handshake to 70:73:cb:cb:c3:86 iwm0: received msg 3/4 of the 4-way handshake from 70:73:cb:cb:c3:86 iwm0: sending msg 4/4 of the 4-way handshake to 70:73:cb:cb:c3:86 iwm0: sending action to 70:73:cb:cb:c3:86 on channel 40 mode 11n iwm0: fatal firmware error
Re: [PATCH} Optimized rasops32 putchar
I did some more tests, and I think the odd performance I am seeing may be due to TLB thrash on the 32x64 characters with 4k pages, since writing each character will require 64 data TLB. Are huge page mappings supported in OpenBSD? Original Message Subject: Re: [PATCH} Optimized rasops32 putchar From: Mark Kettenis Date: Sat, June 27, 2020 1:30 pm To: Cc: tech@openbsd.org > Content-Type: text/plain; charset="utf-8" > From: > > I was doing my timings with a user mode program after mmmaping the > efifb display, so the mapping might be different in the kernel. That should still give you a write-combining mapping as efifb_mmap() adds the PMAP_WC flag to the physical address. Cachable on x86 means write-back cachable. And using a write-back cachable mapping for a framebuffer often leads to interesting "damage" where pixels in certain cache lines show up "late" on the display. Not sure if you'd see that on recent Intel graphics hardware as the current hardware designs are much more coherent than what was produced in the past. > Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES / > WSDISPLAYIO_SMODE to the drm drivers by consolidating code into > rasops. While the point of the DRM drivers is to get fully hardware > accelerated drawing in X, there isn't any reason why they can't > support dumb framebuffer mappings as well. True. Although there are DRM interfaces that give you a dumb framebuffer as well. Using those interfaces is a bit more complicated though. Centralising the code would be good. That code probably should use bus_space_mmap(4) as the PMAP_WC flag is amd64-specific. Unfortunately the amd64 implementation of bus_space_mmap(4) is incomplete and doesn't actually implement write-combining for mappings with the BUS_SPACE_MAP_PREFETCHABLE flag set. So that has to be fixed as well. > Original Message > Subject: RE: [PATCH} Optimized rasops32 putchar > From: > Date: Sat, June 27, 2020 11:13 am > To: "Mark Kettenis" > Cc: "tech@openbsd.org" > > I believe it is mapped as normally cached right now, rather than > uncached or write combining. > > Reads aren't ultra-slow, and the timings of 48 byte writes appear to > involve a cacheline read. > > 128 byte writes are actually slower than 64 byte writes, which I > guessed might be because of automatic prefetching kicking in and > reading the following cacheline. > > > Original Message > Subject: Re: [PATCH} Optimized rasops32 putchar > From: Mark Kettenis > Date: Sat, June 27, 2020 7:56 am > To: > Cc: tech@openbsd.org > > > From: > > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > > > Optimized 32 bit character rendering with unrolled rows and pairwise > > foreground / background pixel rendering. > > > > If it weren't for the 5x8 font, I would have just assumed everything > > was an even width and made the fallback path also pairwise. > > > > In isolation, the 16x32 character case got 2x faster, but that wasn't > > a huge real world speedup where the space rendering that was already > > at memory bandwidth limits accounted for most of the character > > rendering time. However, in combination with the previous fast > > conditional console scrolling that removes most of the space rendering, > > it becomes significant. > > > > I also found that at least the efi and intel framebuffers are not > > currently mapped write combining, which makes this much slower than > > it should be. > > Hi John, > > The framebuffer should be mapped write-combining. In OpenBSD this is > requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to > bbus_space_map(9) when mapping the framebuffer. > > I'm fairly confident since until last January the initial mapping of > the framebuffer that we used wasn't write-combining. And things were > really, really slow. > > Cheers, > > Mark > > > Index: rasops32.c > > === > > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > > retrieving revision 1.10 > > diff -u -p -r1.10 rasops32.c > > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > > +++ rasops32.c 26 Jun 2020 14:34:06 - > > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > > int > > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > > attr) > > { > > - int width, height, cnt, fs, fb, clr[2]; > > + int width, height, step, cnt, fs, b, f; > > + uint32_t fb, clr[2]; > > struct rasops_info *ri; > > - int32_t *dp, *rp; > > + int64_t *rp, q; > > + union { > > + int64_t q[4]; > > + int32_t d[4][2]; > > + } u; > > u_char *fr; > > > > ri = (struct rasops_info *)cookie; > > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > > return 0; > > #endif > > > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > > > height = ri->ri_font->fontheight; > > width = ri->ri_font->fontwidth; > > + step =
Re: OpenBSD.calendar patch
Hi Again, Here's a second attempt using git/got. Is this better? diff 382c05176131a97b161018e0e88f5417f810eb9c /var/git/src blob - b6b2ef6c918b12164e293c04db2be2dc45ab656a file + usr.bin/calendar/calendars/calendar.openbsd --- usr.bin/calendar/calendars/calendar.openbsd +++ usr.bin/calendar/calendars/calendar.openbsd @@ -10,15 +10,19 @@ Jan 06 IPF gets integrated into the OpenBSD kernel, 1996 Jan 06 NRL IPv6 addition to OpenBSD, 1999 Jan 09 n2k10: Network hackathon, Melbourne, Australia, 17 developers, 2010 +Jan 12 u2k20: Uckermark hackathon, Urckermark, Germany, 14 developers, 2020 Jan 13 n2k13: Network hackathon, Dunedin, New Zealand, 17 developers, 2013 +Jan 17 Antipodean hackathon, Wellington, New Zealand, 18 developers, 2019 Jan 18 n2k14: Mini-hackathon, Dunedin, New Zealand, 15 developers, 2014 Jan 20 Bind 9 goes into the tree, 2003 +Jan 20 a2k20: Antipodean hackathon, Hobart, Tasmania, 17 developers, 2020 Jan 26 Anoncvs service inaugurated, 1996 Jan 26 n2k9: Network hackathon, Basel, Switzerland, 19 developers, 2009 Jan 27 OpenBSD/amd64 port is added, from NetBSD, 2004 Jan 29 "second anoncvs server is 100 miles from the first", 1996 Jan 31 OpenBSD/cats port is added, from NetBSD, 2004 Feb 03 Describe the ports mechanism [in OpenBSD], 1997 +Feb 05 a2k18: Dunedin, New Zeland, 19 developers, 2018 Feb 13 Unpatented fast block cipher for new password hashing, 1997 Feb 14 GNU RCS expired from source tree, replaced with OpenRCS, 2007 Feb 19 IPsec package by John Ioannidis and Angelos D. Keromytis, 1997 @@ -27,6 +31,7 @@ Feb 26bridge(4) transparent firewall added to OpenBSD Feb 28 Cryptographic services framework in OpenBSD, 2000 Mar 09 Support for the VAX architecture removed, 2016 Mar 10 OpenBSD/WWW translation started -- German, Spanish, Dutch, 2000 +Mar 28 t2k19: Taipei mini hackathon, Taipei, Taiwan, 16 developers, 2019 Apr 01 OpenBSD/hppa64 port is added, 2005 Apr 01 k2k11: Kernel hackathon, Hafnarfjordur, Iceland, 15 developers, 2011 Apr 10 f2k7: First filesystem hackathon, Vienna, Austria, 14 developers, 2007 @@ -40,10 +45,12 @@ Apr 24 pf2k4: PF hackathon, Sechelt, BC, 12 developers Apr 27 i386/PAE work integrated, 2006 May 01 OpenBSD 3.3 released, exploiting W^X, 2003 May 05 n2k8: Network hackathon, Ito, Japan, 18 developers, 2008 +May 07 g2k19: General hackathon, Ottawa, Canada, 43 developers, 2019 May 08 c2k3 General hackathon, Calgary, Alberta, 51 developers, 2003 May 09 First commit to OpenBSD stable branch, OPENBSD_2_7, 2000 May 09 OpenBSD/aviion port is added, 2006 May 19 OpenBSD 2.3 released, including "ports" system, 1998 +May 19 OpenBSD 6.7 released, 48th release, 2020 May 21 c2k5: General hackathon, Calgary, Alberta, 60 developers, 2005 May 21 c2k6: General hackathon, Calgary, Alberta, 47 developers, 2006 May 24 OpenBSD gets a trunk(4), 2005 @@ -62,6 +69,7 @@ Jun 15OpenBSD 2.7 released, including OpenSSH, 2000 Jun 15 c2k: First general hackathon, Calgary, Alberta, 18 developers, 2000 Jun 19 c2k4: General hackathon, Calgary, Alberta, 46 developers, 2004 Jun 21 c2k1: Birth of PF hackathon, Cambridge, MA, 35 developers, 2001 +Jun 21 WireGuard imported into kernel, 2020 Jun 23 OpenBSD/hppa started, based on Utah Lites and OSF MkLinux, 1998 Jun 24 PF added. Insane amounts of work done by dhartmei@, 2001 Jun 25 c2k10: General hackathon, Edmonton, Alberta, 46 developers, 2010 @@ -70,6 +78,7 @@ Jul 01add strlcpy/strlcat, safe and sensible string c Jul 02 c2k11: General hackathon, Edmonton, Alberta, Canada, 36 developers, 2011 Jul 07 g2k12: General hackathon, Budapest, Hungary, 41 developers, 2012 Jul 08 g2k14: General hackathon, Ljubljana, Slovenia, 49 developers, 2014 +Jul 08 g2k18: General hackathon, Ljubljana, Slovenia, 39 developers, 2018 Jul 11 OpenBSD goes wireless w/ if_wi addition, 1999 Jul 23 OpenBSD goes multimedia with Brooktree 848 support, 1998 Jul 24 Non-executable stack on most architectures, 2002 @@ -83,6 +92,7 @@ Aug 17OpenBSD/sparc64 port is added, from NetBSD, 200 Aug 28 k2k6: IPsec hackathon, Schloss Kransberg, Germany, 14 developers, 2006 Sep 01 Support for the sparc (32bit) architecture removed, 2016 Sep 03 Support for the zaurus architecture removed, 2016 +Sep 06 n2k18: Network hackathon, Usti nad Labem, Czech Republic, 11 developers, 2018 Sep 16 s2k11: General hackathon, Ljubljana, Slovenia, 25 developers, 2011 Sep 17 n2k12: Network hackathon, Starnberg, Germany, 23 developers, 2012 Sep 19 j2k10: Mini-hackathon, Sakae Mura, Nagano, Japan, 19 developers, 2010 @@ -103,7 +113,9 @@ Oct 29 OpenBSD 3.6 released, featuring i386 and amd64 Oct 30 OpenBSD 3.4 released, implementing W^X on i386 and AES in VIA C3, 2003 Nov 01 OpenBSD 3.2 released, ftp mirrors preload for the first time, 2002 Nov 01 v2k5: First ports hackathon, Venice, Italy, 12 developers, 2005 +Nov 03 l2k18: Libressl hackathon, Edmonton, Canada, 5 developers, 2018 Nov 05 a2k11: ARM hackathon, Coimbra,
Re: [PATCH} Optimized rasops32 putchar
> Content-Type: text/plain; charset="utf-8" > From: > > I was doing my timings with a user mode program after mmmaping the > efifb display, so the mapping might be different in the kernel. That should still give you a write-combining mapping as efifb_mmap() adds the PMAP_WC flag to the physical address. Cachable on x86 means write-back cachable. And using a write-back cachable mapping for a framebuffer often leads to interesting "damage" where pixels in certain cache lines show up "late" on the display. Not sure if you'd see that on recent Intel graphics hardware as the current hardware designs are much more coherent than what was produced in the past. > Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES / > WSDISPLAYIO_SMODE to the drm drivers by consolidating code into > rasops. While the point of the DRM drivers is to get fully hardware > accelerated drawing in X, there isn't any reason why they can't > support dumb framebuffer mappings as well. True. Although there are DRM interfaces that give you a dumb framebuffer as well. Using those interfaces is a bit more complicated though. Centralising the code would be good. That code probably should use bus_space_mmap(4) as the PMAP_WC flag is amd64-specific. Unfortunately the amd64 implementation of bus_space_mmap(4) is incomplete and doesn't actually implement write-combining for mappings with the BUS_SPACE_MAP_PREFETCHABLE flag set. So that has to be fixed as well. > Original Message > Subject: RE: [PATCH} Optimized rasops32 putchar > From: > Date: Sat, June 27, 2020 11:13 am > To: "Mark Kettenis" > Cc: "tech@openbsd.org" > > I believe it is mapped as normally cached right now, rather than > uncached or write combining. > > Reads aren't ultra-slow, and the timings of 48 byte writes appear to > involve a cacheline read. > > 128 byte writes are actually slower than 64 byte writes, which I > guessed might be because of automatic prefetching kicking in and > reading the following cacheline. > > > Original Message > Subject: Re: [PATCH} Optimized rasops32 putchar > From: Mark Kettenis > Date: Sat, June 27, 2020 7:56 am > To: > Cc: tech@openbsd.org > > > From: > > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > > > Optimized 32 bit character rendering with unrolled rows and pairwise > > foreground / background pixel rendering. > > > > If it weren't for the 5x8 font, I would have just assumed everything > > was an even width and made the fallback path also pairwise. > > > > In isolation, the 16x32 character case got 2x faster, but that wasn't > > a huge real world speedup where the space rendering that was already > > at memory bandwidth limits accounted for most of the character > > rendering time. However, in combination with the previous fast > > conditional console scrolling that removes most of the space rendering, > > it becomes significant. > > > > I also found that at least the efi and intel framebuffers are not > > currently mapped write combining, which makes this much slower than > > it should be. > > Hi John, > > The framebuffer should be mapped write-combining. In OpenBSD this is > requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to > bbus_space_map(9) when mapping the framebuffer. > > I'm fairly confident since until last January the initial mapping of > the framebuffer that we used wasn't write-combining. And things were > really, really slow. > > Cheers, > > Mark > > > Index: rasops32.c > > === > > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > > retrieving revision 1.10 > > diff -u -p -r1.10 rasops32.c > > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > > +++ rasops32.c 26 Jun 2020 14:34:06 - > > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > > int > > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > > attr) > > { > > - int width, height, cnt, fs, fb, clr[2]; > > + int width, height, step, cnt, fs, b, f; > > + uint32_t fb, clr[2]; > > struct rasops_info *ri; > > - int32_t *dp, *rp; > > + int64_t *rp, q; > > + union { > > + int64_t q[4]; > > + int32_t d[4][2]; > > + } u; > > u_char *fr; > > > > ri = (struct rasops_info *)cookie; > > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > > return 0; > > #endif > > > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > > > height = ri->ri_font->fontheight; > > width = ri->ri_font->fontwidth; > > + step = ri->ri_stride >> 3; > > > > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; > > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; > > + b = ri->ri_devcmap[(attr >> 16) & 0xf]; > > + f = ri->ri_devcmap[(attr >> 24) & 0xf]; > > + u.d[0][0] = b; u.d[0][1] = b; > > + u.d[1][0] = b; u.d[1][1] = f; > > + u.d[2][0] = f; u.d[2][1] = b; > > + u.d[3][0] = f; u.d[3][1] = f; > > > > if (uc == ' ') { > > + q =
Re: [PATCH} Optimized rasops32 putchar
I was doing my timings with a user mode program after mmmaping the efifb display, so the mapping might be different in the kernel. Related to that, I was going to add mmap / WSDISPLAYIO_LINEBYTES / WSDISPLAYIO_SMODE to the drm drivers by consolidating code into rasops. While the point of the DRM drivers is to get fully hardware accelerated drawing in X, there isn't any reason why they can't support dumb framebuffer mappings as well. Original Message Subject: RE: [PATCH} Optimized rasops32 putchar From: Date: Sat, June 27, 2020 11:13 am To: "Mark Kettenis" Cc: "tech@openbsd.org" I believe it is mapped as normally cached right now, rather than uncached or write combining. Reads aren't ultra-slow, and the timings of 48 byte writes appear to involve a cacheline read. 128 byte writes are actually slower than 64 byte writes, which I guessed might be because of automatic prefetching kicking in and reading the following cacheline. Original Message Subject: Re: [PATCH} Optimized rasops32 putchar From: Mark Kettenis Date: Sat, June 27, 2020 7:56 am To: Cc: tech@openbsd.org > From: > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > Optimized 32 bit character rendering with unrolled rows and pairwise > foreground / background pixel rendering. > > If it weren't for the 5x8 font, I would have just assumed everything > was an even width and made the fallback path also pairwise. > > In isolation, the 16x32 character case got 2x faster, but that wasn't > a huge real world speedup where the space rendering that was already > at memory bandwidth limits accounted for most of the character > rendering time. However, in combination with the previous fast > conditional console scrolling that removes most of the space rendering, > it becomes significant. > > I also found that at least the efi and intel framebuffers are not > currently mapped write combining, which makes this much slower than > it should be. Hi John, The framebuffer should be mapped write-combining. In OpenBSD this is requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to bbus_space_map(9) when mapping the framebuffer. I'm fairly confident since until last January the initial mapping of the framebuffer that we used wasn't write-combining. And things were really, really slow. Cheers, Mark > Index: rasops32.c > === > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > retrieving revision 1.10 > diff -u -p -r1.10 rasops32.c > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > +++ rasops32.c 26 Jun 2020 14:34:06 - > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > int > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > attr) > { > - int width, height, cnt, fs, fb, clr[2]; > + int width, height, step, cnt, fs, b, f; > + uint32_t fb, clr[2]; > struct rasops_info *ri; > - int32_t *dp, *rp; > + int64_t *rp, q; > + union { > + int64_t q[4]; > + int32_t d[4][2]; > + } u; > u_char *fr; > > ri = (struct rasops_info *)cookie; > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > return 0; > #endif > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > height = ri->ri_font->fontheight; > width = ri->ri_font->fontwidth; > + step = ri->ri_stride >> 3; > > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; > + b = ri->ri_devcmap[(attr >> 16) & 0xf]; > + f = ri->ri_devcmap[(attr >> 24) & 0xf]; > + u.d[0][0] = b; u.d[0][1] = b; > + u.d[1][0] = b; u.d[1][1] = f; > + u.d[2][0] = f; u.d[2][1] = b; > + u.d[3][0] = f; u.d[3][1] = f; > > if (uc == ' ') { > + q = u.q[0]; > while (height--) { > - dp = rp; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) > - *dp++ = clr[0]; > + /* the general, pixel-at-a-time case is fast enough */ > + for (cnt = 0; cnt < width; cnt++) > + ((int *)rp)[cnt] = b; > + rp += step; > } > } else { > uc -= ri->ri_font->firstchar; > fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale; > fs = ri->ri_font->stride; > - > - while (height--) { > - dp = rp; > - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) | > - (fr[0] << 24); > - fr += fs; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) { > - *dp++ = clr[(fb >> 31) & 1]; > - fb <<= 1; > - } > + /* double-pixel special cases for the common widths */ > + switch (width) { > + case 8: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + rp += step; > + fr += 1; > + } > + break; > + > + case 12: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; > + rp += step; > + fr += 2;
Re: [PATCH} Optimized rasops32 putchar
I believe it is mapped as normally cached right now, rather than uncached or write combining. Reads aren't ultra-slow, and the timings of 48 byte writes appear to involve a cacheline read. 128 byte writes are actually slower than 64 byte writes, which I guessed might be because of automatic prefetching kicking in and reading the following cacheline. Original Message Subject: Re: [PATCH} Optimized rasops32 putchar From: Mark Kettenis Date: Sat, June 27, 2020 7:56 am To: Cc: tech@openbsd.org > From: > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > Optimized 32 bit character rendering with unrolled rows and pairwise > foreground / background pixel rendering. > > If it weren't for the 5x8 font, I would have just assumed everything > was an even width and made the fallback path also pairwise. > > In isolation, the 16x32 character case got 2x faster, but that wasn't > a huge real world speedup where the space rendering that was already > at memory bandwidth limits accounted for most of the character > rendering time. However, in combination with the previous fast > conditional console scrolling that removes most of the space rendering, > it becomes significant. > > I also found that at least the efi and intel framebuffers are not > currently mapped write combining, which makes this much slower than > it should be. Hi John, The framebuffer should be mapped write-combining. In OpenBSD this is requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to bbus_space_map(9) when mapping the framebuffer. I'm fairly confident since until last January the initial mapping of the framebuffer that we used wasn't write-combining. And things were really, really slow. Cheers, Mark > Index: rasops32.c > === > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > retrieving revision 1.10 > diff -u -p -r1.10 rasops32.c > --- rasops32.c 25 May 2020 09:55:49 - 1.10 > +++ rasops32.c 26 Jun 2020 14:34:06 - > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > int > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > attr) > { > - int width, height, cnt, fs, fb, clr[2]; > + int width, height, step, cnt, fs, b, f; > + uint32_t fb, clr[2]; > struct rasops_info *ri; > - int32_t *dp, *rp; > + int64_t *rp, q; > + union { > + int64_t q[4]; > + int32_t d[4][2]; > + } u; > u_char *fr; > > ri = (struct rasops_info *)cookie; > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > return 0; > #endif > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > height = ri->ri_font->fontheight; > width = ri->ri_font->fontwidth; > + step = ri->ri_stride >> 3; > > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; > + b = ri->ri_devcmap[(attr >> 16) & 0xf]; > + f = ri->ri_devcmap[(attr >> 24) & 0xf]; > + u.d[0][0] = b; u.d[0][1] = b; > + u.d[1][0] = b; u.d[1][1] = f; > + u.d[2][0] = f; u.d[2][1] = b; > + u.d[3][0] = f; u.d[3][1] = f; > > if (uc == ' ') { > + q = u.q[0]; > while (height--) { > - dp = rp; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) > - *dp++ = clr[0]; > + /* the general, pixel-at-a-time case is fast enough */ > + for (cnt = 0; cnt < width; cnt++) > + ((int *)rp)[cnt] = b; > + rp += step; > } > } else { > uc -= ri->ri_font->firstchar; > fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale; > fs = ri->ri_font->stride; > - > - while (height--) { > - dp = rp; > - fb = fr[3] | (fr[2] << 8) | (fr[1] ><< 16) | > - (fr[0] << 24); > - fr += fs; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) { > - *dp++ = clr[(fb >> 31) & 1]; > - fb <<= 1; > - } > + /* double-pixel special cases for the common widths */ > + switch (width) { > + case 8: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + rp += step; > + fr += 1; > + } > + break; > + > + case 12: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; > + rp += step; > + fr += 2; > + } > + break; > + > + case 16: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; > + rp[6] = u.q[(fb >> 2) & 3]; > + rp[7] = u.q[fb & 3]; > + rp += step; > + fr += 2; > + } > + break; > + case 32: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + fb = fr[1]; > + rp[4] = u.q[fb >> 6]; > + rp[5] = u.q[(fb >> 4) & 3]; >
Re: [PATCH 3/6] crypto: cast: convert to use new modes 64-bit helpers
On Sat, Jun 27, 2020 at 10:36:58PM +0300, Dmitry Baryshkov wrote: > + * 3. All advertising materials mentioning features or use of this software > + *must display the following acknowledgement: > + *"This product includes cryptographic software written by > + * Eric Young (e...@cryptsoft.com)" > + *The word 'cryptographic' can be left out if the rouines from the > library > + *being used are not cryptographic related :-). Is the typo in routines necessary? Joreg
[PATCH 4/6] crypto: IDEA: convert to use new modes 64-bit helpers
Convert IDEA cipher to use 64-bit modes helper functions. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/idea/i_cbc.c | 74 +++- src/lib/libcrypto/idea/i_cfb64.c | 57 ++-- src/lib/libcrypto/idea/i_ofb64.c | 47 ++-- 3 files changed, 13 insertions(+), 165 deletions(-) diff --git a/src/lib/libcrypto/idea/i_cbc.c b/src/lib/libcrypto/idea/i_cbc.c index 5bb9640c3403..556a4aa5cbf3 100644 --- a/src/lib/libcrypto/idea/i_cbc.c +++ b/src/lib/libcrypto/idea/i_cbc.c @@ -57,81 +57,17 @@ */ #include +#include #include "idea_lcl.h" void idea_cbc_encrypt(const unsigned char *in, unsigned char *out, long length, IDEA_KEY_SCHEDULE *ks, unsigned char *iv, int encrypt) - { - unsigned long tin0,tin1; - unsigned long tout0,tout1,xor0,xor1; - long l=length; - unsigned long tin[2]; - +{ if (encrypt) - { - n2l(iv,tout0); - n2l(iv,tout1); - iv-=8; - for (l-=8; l>=0; l-=8) - { - n2l(in,tin0); - n2l(in,tin1); - tin0^=tout0; - tin1^=tout1; - tin[0]=tin0; - tin[1]=tin1; - idea_encrypt(tin,ks); - tout0=tin[0]; l2n(tout0,out); - tout1=tin[1]; l2n(tout1,out); - } - if (l != -8) - { - n2ln(in,tin0,tin1,l+8); - tin0^=tout0; - tin1^=tout1; - tin[0]=tin0; - tin[1]=tin1; - idea_encrypt(tin,ks); - tout0=tin[0]; l2n(tout0,out); - tout1=tin[1]; l2n(tout1,out); - } - l2n(tout0,iv); - l2n(tout1,iv); - } + CRYPTO_cbc64_encrypt(in, out, length, ks, iv, (block64_f)idea_ecb_encrypt); else - { - n2l(iv,xor0); - n2l(iv,xor1); - iv-=8; - for (l-=8; l>=0; l-=8) - { - n2l(in,tin0); tin[0]=tin0; - n2l(in,tin1); tin[1]=tin1; - idea_encrypt(tin,ks); - tout0=tin[0]^xor0; - tout1=tin[1]^xor1; - l2n(tout0,out); - l2n(tout1,out); - xor0=tin0; - xor1=tin1; - } - if (l != -8) - { - n2l(in,tin0); tin[0]=tin0; - n2l(in,tin1); tin[1]=tin1; - idea_encrypt(tin,ks); - tout0=tin[0]^xor0; - tout1=tin[1]^xor1; - l2nn(tout0,tout1,out,l+8); - xor0=tin0; - xor1=tin1; - } - l2n(xor0,iv); - l2n(xor1,iv); - } - tin0=tin1=tout0=tout1=xor0=xor1=0; - tin[0]=tin[1]=0; - } + CRYPTO_cbc64_decrypt(in, out, length, ks, iv, (block64_f)idea_ecb_encrypt); +} void idea_encrypt(unsigned long *d, IDEA_KEY_SCHEDULE *key) { diff --git a/src/lib/libcrypto/idea/i_cfb64.c b/src/lib/libcrypto/idea/i_cfb64.c index b979aaef8669..a74b50d82309 100644 --- a/src/lib/libcrypto/idea/i_cfb64.c +++ b/src/lib/libcrypto/idea/i_cfb64.c @@ -57,6 +57,7 @@ */ #include +#include #include "idea_lcl.h" /* The input and output encrypted as though 64bit cfb mode is being @@ -67,56 +68,6 @@ void idea_cfb64_encrypt(const unsigned char *in, unsigned char *out, long length, IDEA_KEY_SCHEDULE *schedule, unsigned char *ivec, int *num, int encrypt) - { - unsigned long v0,v1,t; - int n= *num; - long l=length; - unsigned long ti[2]; - unsigned char *iv,c,cc; - - iv=(unsigned char *)ivec; - if (encrypt) - { - while (l--) - { - if (n == 0) - { - n2l(iv,v0); ti[0]=v0; - n2l(iv,v1); ti[1]=v1; - idea_encrypt((unsigned long *)ti,schedule); - iv=(unsigned char *)ivec; - t=ti[0]; l2n(t,iv); - t=ti[1]; l2n(t,iv); - iv=(unsigned char *)ivec; - } - c= *(in++)^iv[n]; - *(out++)=c; - iv[n]=c; - n=(n+1)&0x07; - } - } -
[PATCH 6/6] crypto: Gost 28147-89: convert to use new modes 64-bit helpers
Convert Gost 28147-89 cipher to use 64-bit modes helper functions. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/gost/gost2814789.c | 121 ++- 1 file changed, 9 insertions(+), 112 deletions(-) diff --git a/src/lib/libcrypto/gost/gost2814789.c b/src/lib/libcrypto/gost/gost2814789.c index e285413ed460..bbd578ef7010 100644 --- a/src/lib/libcrypto/gost/gost2814789.c +++ b/src/lib/libcrypto/gost/gost2814789.c @@ -56,6 +56,7 @@ #ifndef OPENSSL_NO_GOST #include #include +#include #include "gost_locl.h" @@ -181,15 +182,17 @@ Gost2814789_ecb_encrypt(const unsigned char *in, unsigned char *out, } static inline void -Gost2814789_encrypt_mesh(unsigned char *iv, GOST2814789_KEY *key) +Gost2814789_encrypt_mesh(const unsigned char *in, unsigned char *out, GOST2814789_KEY *key) { if (key->key_meshing && key->count == 1024) { Gost2814789_cryptopro_key_mesh(key); - Gost2814789_encrypt(iv, iv, key); - key->count = 0; + Gost2814789_encrypt(in, out, key); + Gost2814789_encrypt(out, out, key); + key->count = 8; + } else { + Gost2814789_encrypt(in, out, key); + key->count += 8; } - Gost2814789_encrypt(iv, iv, key); - key->count += 8; } static inline void @@ -209,113 +212,7 @@ Gost2814789_cfb64_encrypt(const unsigned char *in, unsigned char *out, size_t len, GOST2814789_KEY *key, unsigned char *ivec, int *num, const int enc) { - unsigned int n; - size_t l = 0; - - n = *num; - - if (enc) { -#if !defined(OPENSSL_SMALL_FOOTPRINT) - if (8 % sizeof(size_t) == 0) do { /* always true actually */ - while (n && len) { - *(out++) = ivec[n] ^= *(in++); - --len; - n = (n + 1) % 8; - } -#ifdef __STRICT_ALIGNMENT - if (((size_t)in | (size_t)out | (size_t)ivec) % - sizeof(size_t) != 0) - break; -#endif - while (len >= 8) { - Gost2814789_encrypt_mesh(ivec, key); - for (; n < 8; n += sizeof(size_t)) { - *(size_t*)(out + n) = - *(size_t*)(ivec + n) ^= - *(size_t*)(in + n); - } - len -= 8; - out += 8; - in += 8; - n = 0; - } - if (len) { - Gost2814789_encrypt_mesh(ivec, key); - while (len--) { - out[n] = ivec[n] ^= in[n]; - ++n; - } - } - *num = n; - return; - } while (0); - /* the rest would be commonly eliminated by x86* compiler */ -#endif - while (l= 8) { - Gost2814789_encrypt_mesh(ivec, key); - for (; n < 8; n += sizeof(size_t)) { - size_t t = *(size_t*)(in + n); - *(size_t*)(out + n) = - *(size_t*)(ivec + n) ^ t; - *(size_t*)(ivec + n) = t; - } - len -= 8; - out += 8; - in += 8; - n = 0; - } - if (len) { - Gost2814789_encrypt_mesh(ivec, key); - while (len--) { - unsigned char c; - - out[n] = ivec[n] ^ (c = in[n]); - ivec[n] = c; - ++n; - } - } - *num = n; - return; - } while (0); - /* the rest would be commonly eliminated by x86* compiler */ -#endif - while (l < len) { - unsigned char c; - - if (n == 0) { - Gost2814789_encrypt_mesh(ivec, key); - } - out[l] = ivec[n] ^ (c = in[l]); ivec[n] = c; - ++l; - n = (n + 1) % 8; - } - *num = n; - } + CRYPTO_cfb64_encrypt(in, out, len, key,
[PATCH 1/6] modes: add functions implementing common code for 64-bit ciphers
64-bit ciphers are old, but it would be good to use common code for their implementations. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/modes/cbc64.c | 202 src/lib/libcrypto/modes/cfb64.c | 169 ++ src/lib/libcrypto/modes/ctr64.c | 174 +++ src/lib/libcrypto/modes/modes.h | 26 src/lib/libcrypto/modes/ofb64.c | 119 +++ 5 files changed, 690 insertions(+) create mode 100644 src/lib/libcrypto/modes/cbc64.c create mode 100644 src/lib/libcrypto/modes/cfb64.c create mode 100644 src/lib/libcrypto/modes/ctr64.c create mode 100644 src/lib/libcrypto/modes/ofb64.c diff --git a/src/lib/libcrypto/modes/cbc64.c b/src/lib/libcrypto/modes/cbc64.c new file mode 100644 index ..ec65ac5d3468 --- /dev/null +++ b/src/lib/libcrypto/modes/cbc64.c @@ -0,0 +1,202 @@ +/* $OpenBSD: cbc64.c,v 1.4 2015/02/10 09:46:30 miod Exp $ */ +/* + * Copyright (c) 2008 The OpenSSL Project. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in + *the documentation and/or other materials provided with the + *distribution. + * + * 3. All advertising materials mentioning features or use of this + *software must display the following acknowledgment: + *"This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit. (http://www.openssl.org/)" + * + * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to + *endorse or promote products derived from this software without + *prior written permission. For written permission, please contact + *openssl-c...@openssl.org. + * + * 5. Products derived from this software may not be called "OpenSSL" + *nor may "OpenSSL" appear in their names without prior written + *permission of the OpenSSL Project. + * + * 6. Redistributions of any form whatsoever must retain the following + *acknowledgment: + *"This product includes software developed by the OpenSSL Project + *for use in the OpenSSL Toolkit (http://www.openssl.org/)" + * + * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY + * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR + * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * + */ + +#include +#include "modes_lcl.h" +#include + +#ifndef MODES_DEBUG +# ifndef NDEBUG +# define NDEBUG +# endif +#endif + +#undef STRICT_ALIGNMENT +#ifdef __STRICT_ALIGNMENT +#define STRICT_ALIGNMENT 1 +#else +#define STRICT_ALIGNMENT 0 +#endif + +void CRYPTO_cbc64_encrypt(const unsigned char *in, unsigned char *out, + size_t len, const void *key, + unsigned char ivec[8], block64_f block) +{ + size_t n; + const unsigned char *iv = ivec; + +#if !defined(OPENSSL_SMALL_FOOTPRINT) + if (STRICT_ALIGNMENT && + ((size_t)in|(size_t)out|(size_t)ivec)%sizeof(size_t) != 0) { + while (len>=8) { + for(n=0; n<8; ++n) + out[n] = in[n] ^ iv[n]; + (*block)(out, out, key); + iv = out; + len -= 8; + in += 8; + out += 8; + } + } else { + while (len>=8) { + for(n=0; n<8; n+=sizeof(size_t)) + *(size_t*)(out+n) = + *(size_t*)(in+n) ^ *(size_t*)(iv+n); + (*block)(out, out, key); + iv = out; + len -= 8; + in += 8; + out += 8; + } + } +#endif + while (len) { + for(n=0; n<8 && n=8) { + (*block)(in, out, key); +
[PATCH 5/6] crypto: RC2: convert to use new modes 64-bit helpers
Convert RC2 cipher to use 64-bit modes helper functions. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/rc2/rc2.h | 4 +- src/lib/libcrypto/rc2/rc2_cbc.c | 111 +++ src/lib/libcrypto/rc2/rc2_locl.h | 7 ++ src/lib/libcrypto/rc2/rc2cfb64.c | 57 ++-- src/lib/libcrypto/rc2/rc2ofb64.c | 47 ++--- 5 files changed, 55 insertions(+), 171 deletions(-) diff --git a/src/lib/libcrypto/rc2/rc2.h b/src/lib/libcrypto/rc2/rc2.h index 21511ff36ead..03df1433cc22 100644 --- a/src/lib/libcrypto/rc2/rc2.h +++ b/src/lib/libcrypto/rc2/rc2.h @@ -83,8 +83,8 @@ typedef struct rc2_key_st void RC2_set_key(RC2_KEY *key, int len, const unsigned char *data,int bits); void RC2_ecb_encrypt(const unsigned char *in,unsigned char *out,RC2_KEY *key, int enc); -void RC2_encrypt(unsigned long *data,RC2_KEY *key); -void RC2_decrypt(unsigned long *data,RC2_KEY *key); +void RC2_encrypt(unsigned long *data,const RC2_KEY *key); +void RC2_decrypt(unsigned long *data,const RC2_KEY *key); void RC2_cbc_encrypt(const unsigned char *in, unsigned char *out, long length, RC2_KEY *ks, unsigned char *iv, int enc); void RC2_cfb64_encrypt(const unsigned char *in, unsigned char *out, diff --git a/src/lib/libcrypto/rc2/rc2_cbc.c b/src/lib/libcrypto/rc2/rc2_cbc.c index a947f1d3c3a1..276f3b3b4d61 100644 --- a/src/lib/libcrypto/rc2/rc2_cbc.c +++ b/src/lib/libcrypto/rc2/rc2_cbc.c @@ -57,86 +57,22 @@ */ #include +#include #include "rc2_locl.h" void RC2_cbc_encrypt(const unsigned char *in, unsigned char *out, long length, RC2_KEY *ks, unsigned char *iv, int encrypt) - { - unsigned long tin0,tin1; - unsigned long tout0,tout1,xor0,xor1; - long l=length; - unsigned long tin[2]; - +{ if (encrypt) - { - c2l(iv,tout0); - c2l(iv,tout1); - iv-=8; - for (l-=8; l>=0; l-=8) - { - c2l(in,tin0); - c2l(in,tin1); - tin0^=tout0; - tin1^=tout1; - tin[0]=tin0; - tin[1]=tin1; - RC2_encrypt(tin,ks); - tout0=tin[0]; l2c(tout0,out); - tout1=tin[1]; l2c(tout1,out); - } - if (l != -8) - { - c2ln(in,tin0,tin1,l+8); - tin0^=tout0; - tin1^=tout1; - tin[0]=tin0; - tin[1]=tin1; - RC2_encrypt(tin,ks); - tout0=tin[0]; l2c(tout0,out); - tout1=tin[1]; l2c(tout1,out); - } - l2c(tout0,iv); - l2c(tout1,iv); - } + CRYPTO_cbc64_encrypt(in, out, length, ks, iv, (block64_f)RC2_block_encrypt); else - { - c2l(iv,xor0); - c2l(iv,xor1); - iv-=8; - for (l-=8; l>=0; l-=8) - { - c2l(in,tin0); tin[0]=tin0; - c2l(in,tin1); tin[1]=tin1; - RC2_decrypt(tin,ks); - tout0=tin[0]^xor0; - tout1=tin[1]^xor1; - l2c(tout0,out); - l2c(tout1,out); - xor0=tin0; - xor1=tin1; - } - if (l != -8) - { - c2l(in,tin0); tin[0]=tin0; - c2l(in,tin1); tin[1]=tin1; - RC2_decrypt(tin,ks); - tout0=tin[0]^xor0; - tout1=tin[1]^xor1; - l2cn(tout0,tout1,out,l+8); - xor0=tin0; - xor1=tin1; - } - l2c(xor0,iv); - l2c(xor1,iv); - } - tin0=tin1=tout0=tout1=xor0=xor1=0; - tin[0]=tin[1]=0; - } + CRYPTO_cbc64_decrypt(in, out, length, ks, iv, (block64_f)RC2_block_decrypt); +} -void RC2_encrypt(unsigned long *d, RC2_KEY *key) +void RC2_encrypt(unsigned long *d, const RC2_KEY *key) { int i,n; - RC2_INT *p0,*p1; + const RC2_INT *p0,*p1; RC2_INT x0,x1,x2,x3,t; unsigned long l; @@ -178,10 +114,10 @@ void RC2_encrypt(unsigned long *d, RC2_KEY *key) d[1]=(unsigned long)(x2&0x)|((unsigned long)(x3&0x)<<16L); } -void RC2_decrypt(unsigned long *d, RC2_KEY *key) +void RC2_decrypt(unsigned long *d, const RC2_KEY *key) { int i,n; - RC2_INT *p0,*p1; + const RC2_INT *p0,*p1; RC2_INT x0,x1,x2,x3,t; unsigned long l; @@ -224,3 +160,32 @@ void RC2_decrypt(unsigned long *d, RC2_KEY
[PATCH 3/6] crypto: cast: convert to use new modes 64-bit helpers
Convert CAST cipher to use 64-bit modes helper functions. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/Makefile| 2 +- src/lib/libcrypto/cast/c_cbc.c| 75 + src/lib/libcrypto/cast/c_cfb64.c | 56 ++-- src/lib/libcrypto/cast/c_enc.c| 108 -- src/lib/libcrypto/cast/c_ofb64.c | 46 ++--- src/lib/libcrypto/cast/cast_lcl.h | 8 +++ 6 files changed, 120 insertions(+), 175 deletions(-) create mode 100644 src/lib/libcrypto/cast/c_cbc.c diff --git a/src/lib/libcrypto/Makefile b/src/lib/libcrypto/Makefile index 291af21965bf..2e20904ab840 100644 --- a/src/lib/libcrypto/Makefile +++ b/src/lib/libcrypto/Makefile @@ -89,7 +89,7 @@ SRCS+= buffer.c buf_err.c buf_str.c SRCS+= cmll_cfb.c cmll_ctr.c cmll_ecb.c cmll_ofb.c # cast/ -SRCS+= c_skey.c c_ecb.c c_enc.c c_cfb64.c c_ofb64.c +SRCS+= c_skey.c c_ecb.c c_enc.c c_cfb64.c c_ofb64.c c_cbc.c # chacha/ SRCS+= chacha.c diff --git a/src/lib/libcrypto/cast/c_cbc.c b/src/lib/libcrypto/cast/c_cbc.c new file mode 100644 index ..1dc32ad8ca54 --- /dev/null +++ b/src/lib/libcrypto/cast/c_cbc.c @@ -0,0 +1,75 @@ +/* $OpenBSD: c_cbc.c,v 1.5 2014/10/28 07:35:58 jsg Exp $ */ +/* Copyright (C) 1995-1998 Eric Young (e...@cryptsoft.com) + * All rights reserved. + * + * This package is an SSL implementation written + * by Eric Young (e...@cryptsoft.com). + * The implementation was written so as to conform with Netscapes SSL. + * + * This library is free for commercial and non-commercial use as long as + * the following conditions are aheared to. The following conditions + * apply to all code found in this distribution, be it the RC4, RSA, + * lhash, DES, etc., code; not just the SSL code. The SSL documentation + * included with this distribution is covered by the same copyright terms + * except that the holder is Tim Hudson (t...@cryptsoft.com). + * + * Copyright remains Eric Young's, and as such any Copyright notices in + * the code are not to be removed. + * If this package is used in a product, Eric Young should be given attribution + * as the author of the parts of the library used. + * This can be in the form of a textual message at program startup or + * in documentation (online or textual) provided with the package. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 3. All advertising materials mentioning features or use of this software + *must display the following acknowledgement: + *"This product includes cryptographic software written by + * Eric Young (e...@cryptsoft.com)" + *The word 'cryptographic' can be left out if the rouines from the library + *being used are not cryptographic related :-). + * 4. If you include any Windows specific code (or a derivative thereof) from + *the apps directory (application code) you must include an acknowledgement: + *"This product includes software written by Tim Hudson (t...@cryptsoft.com)" + * + * THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * The licence and distribution terms for any publically available version or + * derivative of this code cannot be changed. i.e. this code cannot simply be + * copied and put under another distribution licence + * [including the GNU Public Licence.] + */ + +#include +#include +#include "cast_lcl.h" + +/* The input and output encrypted as though 64bit cbc mode is being + * used. + */ + +void CAST_cbc_encrypt(const unsigned char *in, unsigned char *out, + long length, const CAST_KEY *schedule, unsigned char *ivec, + int enc) +{ + if (enc) + CRYPTO_cbc64_encrypt(in, out, length, schedule, ivec, (block64_f)CAST_block_encrypt); + else + CRYPTO_cbc64_decrypt(in, out, length, schedule, ivec,
[PATCH 2/6] crypto: bf: convert to use new modes 64-bit helpers
Convert Blowfish cipher to use 64-bit modes helper functions. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/Makefile | 2 +- src/lib/libcrypto/bf/bf_cbc.c | 83 ++- src/lib/libcrypto/bf/bf_cfb64.c | 57 ++-- src/lib/libcrypto/bf/bf_enc.c | 114 src/lib/libcrypto/bf/bf_locl.h | 8 +++ src/lib/libcrypto/bf/bf_ofb64.c | 47 ++--- 6 files changed, 51 insertions(+), 260 deletions(-) diff --git a/src/lib/libcrypto/Makefile b/src/lib/libcrypto/Makefile index 9207b93f321d..291af21965bf 100644 --- a/src/lib/libcrypto/Makefile +++ b/src/lib/libcrypto/Makefile @@ -65,7 +65,7 @@ SRCS+= evp_asn1.c asn_pack.c p5_pbe.c p5_pbev2.c p8_pkey.c asn_moid.c SRCS+= a_time_tm.c # bf/ -SRCS+= bf_skey.c bf_ecb.c bf_cfb64.c bf_ofb64.c +SRCS+= bf_skey.c bf_ecb.c bf_cfb64.c bf_ofb64.c bf_cbc.c # bio/ SRCS+= bio_lib.c bio_cb.c bio_err.c bio_meth.c diff --git a/src/lib/libcrypto/bf/bf_cbc.c b/src/lib/libcrypto/bf/bf_cbc.c index 6f45f9ae4c35..a9d3cf6d5541 100644 --- a/src/lib/libcrypto/bf/bf_cbc.c +++ b/src/lib/libcrypto/bf/bf_cbc.c @@ -57,87 +57,14 @@ */ #include +#include #include "bf_locl.h" void BF_cbc_encrypt(const unsigned char *in, unsigned char *out, long length, const BF_KEY *schedule, unsigned char *ivec, int encrypt) - { - BF_LONG tin0,tin1; - BF_LONG tout0,tout1,xor0,xor1; - long l=length; - BF_LONG tin[2]; - +{ if (encrypt) - { - n2l(ivec,tout0); - n2l(ivec,tout1); - ivec-=8; - for (l-=8; l>=0; l-=8) - { - n2l(in,tin0); - n2l(in,tin1); - tin0^=tout0; - tin1^=tout1; - tin[0]=tin0; - tin[1]=tin1; - BF_encrypt(tin,schedule); - tout0=tin[0]; - tout1=tin[1]; - l2n(tout0,out); - l2n(tout1,out); - } - if (l != -8) - { - n2ln(in,tin0,tin1,l+8); - tin0^=tout0; - tin1^=tout1; - tin[0]=tin0; - tin[1]=tin1; - BF_encrypt(tin,schedule); - tout0=tin[0]; - tout1=tin[1]; - l2n(tout0,out); - l2n(tout1,out); - } - l2n(tout0,ivec); - l2n(tout1,ivec); - } + CRYPTO_cbc64_encrypt(in, out, length, schedule, ivec, (block64_f)BF_block_encrypt); else - { - n2l(ivec,xor0); - n2l(ivec,xor1); - ivec-=8; - for (l-=8; l>=0; l-=8) - { - n2l(in,tin0); - n2l(in,tin1); - tin[0]=tin0; - tin[1]=tin1; - BF_decrypt(tin,schedule); - tout0=tin[0]^xor0; - tout1=tin[1]^xor1; - l2n(tout0,out); - l2n(tout1,out); - xor0=tin0; - xor1=tin1; - } - if (l != -8) - { - n2l(in,tin0); - n2l(in,tin1); - tin[0]=tin0; - tin[1]=tin1; - BF_decrypt(tin,schedule); - tout0=tin[0]^xor0; - tout1=tin[1]^xor1; - l2nn(tout0,tout1,out,l+8); - xor0=tin0; - xor1=tin1; - } - l2n(xor0,ivec); - l2n(xor1,ivec); - } - tin0=tin1=tout0=tout1=xor0=xor1=0; - tin[0]=tin[1]=0; - } - + CRYPTO_cbc64_decrypt(in, out, length, schedule, ivec, (block64_f)BF_block_decrypt); +} diff --git a/src/lib/libcrypto/bf/bf_cfb64.c b/src/lib/libcrypto/bf/bf_cfb64.c index 6cc0bb999bd3..463080cb230f 100644 --- a/src/lib/libcrypto/bf/bf_cfb64.c +++ b/src/lib/libcrypto/bf/bf_cfb64.c @@ -57,6 +57,7 @@ */ #include +#include #include "bf_locl.h" /* The input and output encrypted as though 64bit cfb mode is being @@ -66,56 +67,6 @@ void BF_cfb64_encrypt(const unsigned char *in, unsigned char *out, long length, const BF_KEY *schedule, unsigned char *ivec, int *num, int encrypt) - { - BF_LONG v0,v1,t; - int n= *num; - long l=length; - BF_LONG ti[2]; - unsigned char *iv,c,cc; - - iv=(unsigned char *)ivec; - if (encrypt) - { - while (l--) - { -
Re: pipex(4): use reference counters for `ifnet'
On Sat, Jun 27, 2020 at 12:41:29PM +0200, Martin Pieuchot wrote: > On 27/06/20(Sat) 01:02, Vitaliy Makkoveev wrote: > > On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote: > > > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote: > > > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > > > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > > > > > Updated diff. > > > > > > > > > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So > > > > > > > > we can't > > > > > > > > store index in session and be sure if_get(9) returned `ifnet' > > > > > > > > is our > > > > > > > > original `ifnet'. > > > > > > > > > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't > > > > > > > work > > > > > > > for whatever reason then the if_get(9) interface has to be fixed. > > > > > > > Which > > > > > > > case doesn't work for you? Do you have a reproducer? > > > > > > > > > > > > > > How does sessions stay around if their corresponding interface is > > > > > > > destroyed? > > > > > > > > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to > > > > > > session. > > > > > > pipexintr() process these queues. pipexintr() and > > > > > > pipex_destroy_session() are *always* different context. This mean we > > > > > > *can't* free pipex(4) session without be sure there is no reference > > > > > > to > > > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will > > > > > > cause use > > > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > > > > > destroy sessions is wrong. While pppac(4) destroys sessions by > > > > > > pipex_iface_fini() it's also wrong. Because we don't check > > > > > > `pipexinq' > > > > > > and `pipexoutq' state. I'am said it again and again. > > > > > > > > > > I understand. Why is it a problem? Using reference counting the way > > > > > you're suggesting is *one* possible solution to a problem we don't > > > > > fully > > > > > understand. What are we trying to achieve? Which problem are we > > > > > trying > > > > > to solve? > > > > > > > > Sorry, may be I misunderstand something. > > > > > > > > `pipexoutq' case: > > > > > > > > 1. pppac_start() calls pipex_output() > > > > 2. pipex_output() calls pipex_ip_output() > > > > 3. pipex_ip_output() calls pipex_ppp_enqueue() > > > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() > > > > > > > > `pipexinq' cases: > > > > > > > > 1.1. ether_input() calls pipex_pppoe_input() > > > > 1.2. gre_input() calls gre_input_1() > > > > gre_input_1() calls pipex_pptp_input() > > > > 1.3. udp_input() calls pipex_l2tp_input() > > > > > > > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() > > > > 3. pipex_common_input() calls schednetisr() which is task_add() > > > > > > > > task_add(9) just schedules the execution of the work specified by `tq'. > > > > So we can do pipex_destroy_session() * between * schednetisr() and > > > > pipexintr(). And we can do this right * now *, with our current locking. > > > > And this is the problem I'am trying to solve. > > > > > > > > My apologies if I'am wrong above. Please point me where I'am wrong. > > > > > > > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding > > > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) > > > > session. We should be shure `session' is still walid between > > > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be > > > > required in future. > > > > > > Why not iterate over the queues and garbage collect the sessions that > > > are about to be removed? That's what the network stack was doing with > > > mbuf queues prior to if_get(9) when interfaces where destroyed. > > > > > > > Do you mean net/if.c:1185 and below? This is the queues associated with > > this `ifp'. But for pipex(4) we should go through all mbufs associated > > with pipex(4). This can be heavy if we have hundreds of sessions. Also > > this would work until session destruction and `pr_input' are serialized. > > > > Point me please the line in source to see if I'am wrong about `ifnet's > > mbuf queues claninig. > > Look at r1.329 of net/if.c. Prior to this change if_detach_queues() was > used to free all mbufs when an interface was removed. Now lazy freeing > is used everytime if_get(9) rerturns NULL. > > This is possible because we store an index and not a pointer directly in > the mbuf. > > The advantage of storing a session pointer in `ph_cookie' is that no > lookup is required in pipexintr(), right? Maybe we could save a ID > instead and do a lookup. How big can be the `pipex_session_list'? > It's unlimited. In pppac(4) case you create the only one interface and you can share it between the count of sessions you wish. In my
OpenBSD.calendar patch
Hi Friends, Here's a small patch to the OpenBSD.calendar. I didn't want to spend too much time on this until I find out if it would be accepted. Here's my changes: --- /usr/share/calendar/calendar.openbsdFri Jun 26 21:01:56 2020 +++ calendar.openbsdSat Jun 27 01:37:40 2020 @@ -10,15 +10,19 @@ Jan 06 IPF gets integrated into the OpenBSD kernel, 1996 Jan 06 NRL IPv6 addition to OpenBSD, 1999 Jan 09 n2k10: Network hackathon, Melbourne, Australia, 17 developers, 2010 +Jan 12 u2k20: Uckermark hackathon, Urckermark, Germany, 14 developers, 2020 Jan 13 n2k13: Network hackathon, Dunedin, New Zealand, 17 developers, 2013 +Jan 17 a2k19: Antipodean hackathon, Wellington, New Zealand, 18 developers, 2019 Jan 18 n2k14: Mini-hackathon, Dunedin, New Zealand, 15 developers, 2014 Jan 20 Bind 9 goes into the tree, 2003 +Jan 20 a2k20: Antipodean hackathon, Hobart, Tasmania, 17 developers, 2020 Jan 26 Anoncvs service inaugurated, 1996 Jan 26 n2k9: Network hackathon, Basel, Switzerland, 19 developers, 2009 Jan 27 OpenBSD/amd64 port is added, from NetBSD, 2004 Jan 29 "second anoncvs server is 100 miles from the first", 1996 Jan 31 OpenBSD/cats port is added, from NetBSD, 2004 Feb 03 Describe the ports mechanism [in OpenBSD], 1997 +Feb 05 a2k18: Dunedin, New Zeland, 19 developers, 2018 Feb 13 Unpatented fast block cipher for new password hashing, 1997 Feb 14 GNU RCS expired from source tree, replaced with OpenRCS, 2007 Feb 19 IPsec package by John Ioannidis and Angelos D. Keromytis, 1997 @@ -27,6 +31,7 @@ Feb 28 Cryptographic services framework in OpenBSD, 2000 Mar 09 Support for the VAX architecture removed, 2016 Mar 10 OpenBSD/WWW translation started -- German, Spanish, Dutch, 2000 +Mar 28 t2k19: Taipei mini hackathon, Taipei, Taiwan, 16 developers, 2019 Apr 01 OpenBSD/hppa64 port is added, 2005 Apr 01 k2k11: Kernel hackathon, Hafnarfjordur, Iceland, 15 developers, 2011 Apr 10 f2k7: First filesystem hackathon, Vienna, Austria, 14 developers, 2007 @@ -40,10 +45,12 @@ Apr 27 i386/PAE work integrated, 2006 May 01 OpenBSD 3.3 released, exploiting W^X, 2003 May 05 n2k8: Network hackathon, Ito, Japan, 18 developers, 2008 +May 07 g2k19: General hackathon, Ottawa, Canada, 43 developers, 2019 May 08 c2k3 General hackathon, Calgary, Alberta, 51 developers, 2003 May 09 First commit to OpenBSD stable branch, OPENBSD_2_7, 2000 May 09 OpenBSD/aviion port is added, 2006 May 19 OpenBSD 2.3 released, including "ports" system, 1998 +May 19 OpenBSD 6.7 released, 48th release, 2020 May 21 c2k5: General hackathon, Calgary, Alberta, 60 developers, 2005 May 21 c2k6: General hackathon, Calgary, Alberta, 47 developers, 2006 May 24 OpenBSD gets a trunk(4), 2005 @@ -57,6 +64,7 @@ Jun 04 c99: First hackathon (IPsec), Calgary, Alberta, 10 developers, 1999 Jun 04 c2k2: General hackathon, Calgary, Alberta, 42 developers, 2002 Jun 06 c2k8: General hackathon, Edmonton, Alberta, 55 developers, 2008 +Jun 21 WireGuard imported into kernel, 2020 Jun 14 r2k6: First network hackathon, Hamburg, Germany, 6 developers, 2006 Jun 15 OpenBSD 2.7 released, including OpenSSH, 2000 Jun 15 c2k: First general hackathon, Calgary, Alberta, 18 developers, 2000 @@ -70,6 +78,7 @@ Jul 02 c2k11: General hackathon, Edmonton, Alberta, Canada, 36 developers, 2011 Jul 07 g2k12: General hackathon, Budapest, Hungary, 41 developers, 2012 Jul 08 g2k14: General hackathon, Ljubljana, Slovenia, 49 developers, 2014 +Jul 08 g2k18: General hackathon, Ljubljana, Slovenia, 39 developers, 2018 Jul 11 OpenBSD goes wireless w/ if_wi addition, 1999 Jul 23 OpenBSD goes multimedia with Brooktree 848 support, 1998 Jul 24 Non-executable stack on most architectures, 2002 @@ -83,6 +92,7 @@ Aug 28 k2k6: IPsec hackathon, Schloss Kransberg, Germany, 14 developers, 2006 Sep 01 Support for the sparc (32bit) architecture removed, 2016 Sep 03 Support for the zaurus architecture removed, 2016 +Sep 06 n2k18: Network hackathon, Usti nad Labem, Czech Republic, 11 developers, 2018 Sep 16 s2k11: General hackathon, Ljubljana, Slovenia, 25 developers, 2011 Sep 17 n2k12: Network hackathon, Starnberg, Germany, 23 developers, 2012 Sep 19 j2k10: Mini-hackathon, Sakae Mura, Nagano, Japan, 19 developers, 2010 @@ -103,7 +113,9 @@ Oct 30 OpenBSD 3.4 released, implementing W^X on i386 and AES in VIA C3, 2003 Nov 01 OpenBSD 3.2 released, ftp mirrors preload for the first time, 2002 Nov 01 v2k5: First ports hackathon, Venice, Italy, 12 developers, 2005 +Nov 03 l2k18: Libressl hackathon, Edmonton, Canada, 5 developers, 2018 Nov 05 a2k11: ARM hackathon, Coimbra, Portugal, 8 developers, 2011 +Nov 05 p2k19: Ports hackathon, Bucharest, Romania, 18 developers, 2019 Nov 11 want.html added to OpenBSD/www, 1998 Nov 12 p2k11: Ports hackathon, Budapest, Hungary, 15 developers, 2011 Nov 14 c2k12: Coimbra hackathon, Coimbra, Portugal, 10 developers, 2012 @@ -112,6 +124,7 @@ Nov 21 h2k9: Hardware hackathon, Coimbra, Portugal, 15 developers, 2009 Nov 22
Re: 11n Tx aggregation for iwm(4)
On Fri, Jun 26, 2020 at 02:45:53PM +0200, Stefan Sperling wrote: > This patch adds support for 11n Tx aggregation to iwm(4). iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7260" rev 0x73, msi AP is Zyxel USG40W Before : bandwidth min/avg/max/std-dev = 9.800/14.000/14.214/0.606 Mbps After : bandwidth min/avg/max/std-dev = 8.124/47.270/57.076/8.906 Mbps
Re: fix races in if_clone_create()
On Sat, Jun 27, 2020 at 12:10:24PM +0200, Martin Pieuchot wrote: > On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote: > > On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote: > > > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote: > > > > if_clone_create() has the races caused by context switch. > > > > > > Can you share a backtrace of such race? Where does the kernel panic? > > > > > > > This diff was inspired by thread [1]. As I explained [2] here is 3 > > issues that cause panics produced by command below: > > > > cut begin > > for i in 1 2 3; do while true; do ifconfig bridge0 create& \ > > ifconfig bridge0 destroy& done& done > > cut end > > Thanks, I couldn't reproduce it on any of the machines I tried. Did you > managed to reproduce it with other pseudo-devices or just with bridge0? > In thread [1] you talked about bridge(4), tun(4) and vether(4). A first I fixed races in if_clone_destroy() and I caught the races with if_clone_create() while I run your initial comman but with vether(4) cut begin for i in 0 1 2 3 4 5 6 7; do while true; \ do cat /dev/vether0& ifconfig vether0 destroy& done& done cut end It's hard to reproduce this issue. The best chances for me is bare metal 8 cores, fully unloaded system, no X, no active processes, test started at console and all output redirected to /dev/null. And it can take *hours* to catch. I can't reproduce this on 2 cores. I can't reproduce this at 4 cores under kvm but it's reproducible under virtual box under osx. The hosts has 8 cores. I can reproduce this on bare metal with 4 cores, but also it takes time. Routine called by `ifc_create' within if_clone_attach() is very specific to each pseudo interface. if_attach() is the only common point to sleep for them, but you also can sleep in any point of sleep before `ifc_create' will call if_attach(), For exmaple you will alloc software context with `M_WAITOK'. bridge(4) is just the best way to reproduce to me. You have all `ifnet's linked to `if_list'. ifunit() does linear search in this list by compare `ifp->if_xname' and given `name'. So if you inserted many `ifnet's to this list ifunit() will return you first. but if_get(9) doesn't work with this list. So if you have the case I talk above if_get(9) and ifname() are inconsistent. Some times in the stack you use if_get(9) sometimes you use ifunit() so you work every time with diffetrent `ifnet's with the same `if_xname'. You can't predict where `ifnet' will be corrupted. > > My system was stable with the last diff I did for thread [1]. But since > > this final diff [3] which include fixes for tun(4) is quick and dirty > > and not for commit I decided to make the diff to fix the races caused by > > if_clone_create() at first. > > > > I included screenshot with panic. > > Thanks, interesting that the corruption happens on a list that should be > initialized. Does that mean the context switch on Thread 1 is happening > before if_attach_common() is called? > I don't know where it was. if_attach() doesn't checks if `ifnet' with the name in `if_xname' already linked. You will insert passed `ifnet' in any cases. If you have more then one `ifnet' with identical `if_xname' you have broken ifunit() and if_get() logic. Look at if_attach(): cut begin if_attach(struct ifnet *ifp) { if_attach_common(ifp); NET_LOCK(); TAILQ_INSERT_TAIL(, ifp, if_list); /* (1) */ if_attachsetup(ifp); NET_UNLOCK(); } You link `ifp' at (1). And it's still your `ifp' before and after context switch ot without context switch. You will brake it later. The reason is pseudo driver received the same `unit' more than once. And it created two or more software context with identical `unit'. And internal pseudo driver's logic is broken. Also ifunit() and if_get(9) are inconsistent now. You can break memory everythere. cut end > You said your previous email that there's a context switch. Do you know > when it happens? You could see that in ddb by looking at the backtrace > of the other CPU. > > Is the context switch leading to the race common to all pseudo-drivers > or is it in the bridge(4) driver? ddb(4) is useless. The panic occured while we are trying to if_detach() already broken `ifnet'. There is no reces here. But the rases was *before* and we inserted two or more `ifnet's with the same name to `if_list'. This insertion is no panic condition. The first time I caught this races while I connected to you [1] thread. I inserted ifunit() call to if_attach() as below and received panic so I'am shure about the reason: cut begin if_attach(struct ifnet *ifp) { if_attach_common(ifp); NET_LOCK(); KASSERT(ifunit(ifp->if_xname)); TAILQ_INSERT_TAIL(, ifp, if_list); if_attachsetup(ifp); NET_UNLOCK(); } cut end But in thread [1] you said these races with pseudo interfaces are very old well know
Re: 11n Tx aggregation for iwm(4)
On Fri, Jun 26, 2020 at 02:45:53PM +0200, Stefan Sperling wrote: > This patch adds support for 11n Tx aggregation to iwm(4). > > Please help with testing if you can by running the patch and using wifi > as usual. Nothing should change, except that Tx speed may potentially > improve. If you have time to run before/after performance measurements with > tcpbench or such, that would be nice. But it's not required for testing. > > If Tx aggregation is active then netstat will show a non-zero output block ack > agreement counter: > > $ netstat -W iwm0 | grep 'output block' > 3 new output block ack agreements > 0 output block ack agreements timed out > > It would be great to get at least one test for all the chipsets the driver > supports: 7260, 7265, 3160, 3165, 3168, 8260, 8265, 9260, 9560 > The behaviour of the access point also matters a great deal. It won't > hurt to test the same chipset against several different access points. > > I have tested this version on 8265 only so far. I've run older revisions > of this patch on 7265 so I'm confident that this chip will work, too. > So far, the APs I have tested against are athn(4) in 11a mode and in 11n > mode with the 'nomimo' nwflag, and a Sagemcom 11ac AP. All on 5Ghz channels. > Sure you've got plenty of 8265 tests, but the diff tripled my speed against my apple airport extreme. iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi -- Tracey Emery
[PATCH 5/5] pkcs12: add support for GOST PFX files
Russian standard body has changed the way MAC key is calculated for PKCS12 files. Generate proper keys depending on the digest type used for MAC generation. Sponsored by ROSA Linux Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/pkcs12/p12_key.c | 18 ++ src/lib/libcrypto/pkcs12/p12_mutl.c | 28 +--- src/lib/libcrypto/pkcs12/pkcs12.h | 5 + 3 files changed, 44 insertions(+), 7 deletions(-) diff --git a/src/lib/libcrypto/pkcs12/p12_key.c b/src/lib/libcrypto/pkcs12/p12_key.c index d419a9d83598..9a5297a23131 100644 --- a/src/lib/libcrypto/pkcs12/p12_key.c +++ b/src/lib/libcrypto/pkcs12/p12_key.c @@ -195,3 +195,21 @@ end: EVP_MD_CTX_cleanup(); return ret; } + +int +PKCS12_key_gen_gost(const char *pass, int passlen, unsigned char *salt, +int saltlen, int iter, int n, unsigned char *out, +const EVP_MD *md_type) +{ + unsigned char buf[96]; + + if (n != PKCS12_GOST_KEY_LEN) + return 0; + + if (!PKCS5_PBKDF2_HMAC(pass, passlen, salt, saltlen, iter, md_type, sizeof(buf), buf)) + return 0; + + memcpy(out, buf + sizeof(buf) - PKCS12_GOST_KEY_LEN, PKCS12_GOST_KEY_LEN); + + return 1; +} diff --git a/src/lib/libcrypto/pkcs12/p12_mutl.c b/src/lib/libcrypto/pkcs12/p12_mutl.c index f3132ec75f68..023bbbd92db1 100644 --- a/src/lib/libcrypto/pkcs12/p12_mutl.c +++ b/src/lib/libcrypto/pkcs12/p12_mutl.c @@ -74,6 +74,7 @@ PKCS12_gen_mac(PKCS12 *p12, const char *pass, int passlen, unsigned char *mac, unsigned int *maclen) { const EVP_MD *md_type; + int md_type_nid; HMAC_CTX hmac; unsigned char key[EVP_MAX_MD_SIZE], *salt; int saltlen, iter; @@ -97,13 +98,26 @@ PKCS12_gen_mac(PKCS12 *p12, const char *pass, int passlen, PKCS12error(PKCS12_R_UNKNOWN_DIGEST_ALGORITHM); return 0; } - md_size = EVP_MD_size(md_type); - if (md_size < 0) - return 0; - if (!PKCS12_key_gen(pass, passlen, salt, saltlen, PKCS12_MAC_ID, iter, - md_size, key, md_type)) { - PKCS12error(PKCS12_R_KEY_GEN_ERROR); - return 0; + md_type_nid = EVP_MD_type(md_type); + if ((md_type_nid == NID_id_GostR3411_94 || +md_type_nid == NID_id_tc26_gost3411_2012_256 || +md_type_nid == NID_id_tc26_gost3411_2012_512) && + getenv("LEGACY_GOST_PKCS12") == NULL) { + md_size = PKCS12_GOST_KEY_LEN; + if (!PKCS12_key_gen_gost(pass, passlen, salt, saltlen, iter, + md_size, key, md_type)) { + PKCS12error(PKCS12_R_KEY_GEN_ERROR); + return 0; + } + } else { + md_size = EVP_MD_size(md_type); + if (md_size < 0) + return 0; + if (!PKCS12_key_gen(pass, passlen, salt, saltlen, PKCS12_MAC_ID, iter, + md_size, key, md_type)) { + PKCS12error(PKCS12_R_KEY_GEN_ERROR); + return 0; + } } HMAC_CTX_init(); if (!HMAC_Init_ex(, key, md_size, md_type, NULL) || diff --git a/src/lib/libcrypto/pkcs12/pkcs12.h b/src/lib/libcrypto/pkcs12/pkcs12.h index 56635f9d7e0a..4dab109bbc3a 100644 --- a/src/lib/libcrypto/pkcs12/pkcs12.h +++ b/src/lib/libcrypto/pkcs12/pkcs12.h @@ -91,6 +91,11 @@ extern "C" { #define PKCS12_add_friendlyname PKCS12_add_friendlyname_asc #endif +#define PKCS12_GOST_KEY_LEN 32 +int PKCS12_key_gen_gost(const char *pass, int passlen, unsigned char *salt, +int saltlen, int iter, int n, unsigned char *out, +const EVP_MD *md_type); + /* MS key usage constants */ #define KEY_EX 0x10 -- 2.27.0
[PATCH 3/5] gost: support new PublicKeyParameters format
Add support for updated PublicKeyParameters format as defined by draft-deremin-rfc4491-bis. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/gost/gost_asn1.c | 2 +- src/lib/libcrypto/gost/gostr341001_ameth.c | 42 -- 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/src/lib/libcrypto/gost/gost_asn1.c b/src/lib/libcrypto/gost/gost_asn1.c index 2652162777b7..703d64070449 100644 --- a/src/lib/libcrypto/gost/gost_asn1.c +++ b/src/lib/libcrypto/gost/gost_asn1.c @@ -190,7 +190,7 @@ static const ASN1_TEMPLATE GOST_KEY_PARAMS_seq_tt[] = { .item = _OBJECT_it, }, { - .flags = 0, + .flags = ASN1_TFLG_OPTIONAL, .tag = 0, .offset = offsetof(GOST_KEY_PARAMS, hash_params), .field_name = "hash_params", diff --git a/src/lib/libcrypto/gost/gostr341001_ameth.c b/src/lib/libcrypto/gost/gostr341001_ameth.c index 0e9521178da5..7cb70ed420ae 100644 --- a/src/lib/libcrypto/gost/gostr341001_ameth.c +++ b/src/lib/libcrypto/gost/gostr341001_ameth.c @@ -90,9 +90,33 @@ decode_gost01_algor_params(EVP_PKEY *pkey, const unsigned char **p, int len) return 0; } param_nid = OBJ_obj2nid(gkp->key_params); - digest_nid = OBJ_obj2nid(gkp->hash_params); + if (gkp->hash_params) + digest_nid = OBJ_obj2nid(gkp->hash_params); + else { + switch (param_nid) { + case NID_id_tc26_gost_3410_12_256_paramSetA: + case NID_id_tc26_gost_3410_12_256_paramSetB: + case NID_id_tc26_gost_3410_12_256_paramSetC: + case NID_id_tc26_gost_3410_12_256_paramSetD: + digest_nid = NID_id_tc26_gost3411_2012_256; + break; + case NID_id_tc26_gost_3410_12_512_paramSetTest: + case NID_id_tc26_gost_3410_12_512_paramSetA: + case NID_id_tc26_gost_3410_12_512_paramSetB: + case NID_id_tc26_gost_3410_12_512_paramSetC: + digest_nid = NID_id_tc26_gost3411_2012_512; + break; + default: + digest_nid = NID_undef; + } + } GOST_KEY_PARAMS_free(gkp); + if (digest_nid == NID_undef) { + GOSTerror(GOST_R_BAD_PKEY_PARAMETERS_FORMAT); + return 0; + } + ec = pkey->pkey.gost; if (ec == NULL) { ec = GOST_KEY_new(); @@ -137,7 +161,21 @@ encode_gost01_algor_params(const EVP_PKEY *key) pkey_param_nid = EC_GROUP_get_curve_name(GOST_KEY_get0_group(key->pkey.gost)); gkp->key_params = OBJ_nid2obj(pkey_param_nid); - gkp->hash_params = OBJ_nid2obj(GOST_KEY_get_digest(key->pkey.gost)); + switch (pkey_param_nid) { + case NID_id_GostR3410_2001_TestParamSet: + case NID_id_GostR3410_2001_CryptoPro_A_ParamSet: + case NID_id_GostR3410_2001_CryptoPro_B_ParamSet: + case NID_id_GostR3410_2001_CryptoPro_C_ParamSet: + case NID_id_GostR3410_2001_CryptoPro_XchA_ParamSet: + case NID_id_GostR3410_2001_CryptoPro_XchB_ParamSet: + case NID_id_tc26_gost_3410_12_512_paramSetA: + case NID_id_tc26_gost_3410_12_512_paramSetB: + gkp->hash_params = OBJ_nid2obj(GOST_KEY_get_digest(key->pkey.gost)); + break; + default: + gkp->hash_params = NULL; + break; + } /*gkp->cipher_params = OBJ_nid2obj(cipher_param_nid); */ params->length = i2d_GOST_KEY_PARAMS(gkp, >data); if (params->length <= 0) { -- 2.27.0
[PATCH 4/5] gostr341001: support unwrapped private keys support
GOST private keys can be wrapped in OCTET STRING, INTEGER or come unwrapped. Support the latter format. Sponsored by ROSA Linux Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/gost/gost_asn1.c | 52 ++ src/lib/libcrypto/gost/gost_asn1.h | 11 ++ src/lib/libcrypto/gost/gostr341001_ameth.c | 115 +++-- 3 files changed, 169 insertions(+), 9 deletions(-) diff --git a/src/lib/libcrypto/gost/gost_asn1.c b/src/lib/libcrypto/gost/gost_asn1.c index 703d64070449..bfd81faa1ee2 100644 --- a/src/lib/libcrypto/gost/gost_asn1.c +++ b/src/lib/libcrypto/gost/gost_asn1.c @@ -17,6 +17,58 @@ #include "gost_locl.h" #include "gost_asn1.h" +static const ASN1_TEMPLATE MASKED_GOST_KEY_seq_tt[] = { + { + .flags = 0, + .tag = 0, + .offset = offsetof(MASKED_GOST_KEY, masked_priv_key), + .field_name = "masked_priv_key", + .item = _OCTET_STRING_it, + }, + { + .flags = 0, + .tag = 0, + .offset = offsetof(MASKED_GOST_KEY, public_key), + .field_name = "public_key", + .item = _OCTET_STRING_it, + }, +}; + +const ASN1_ITEM MASKED_GOST_KEY_it = { + .itype = ASN1_ITYPE_NDEF_SEQUENCE, + .utype = V_ASN1_SEQUENCE, + .templates = MASKED_GOST_KEY_seq_tt, + .tcount = sizeof(MASKED_GOST_KEY_seq_tt) / sizeof(ASN1_TEMPLATE), + .funcs = NULL, + .size = sizeof(MASKED_GOST_KEY), + .sname = "MASKED_GOST_KEY", +}; + +MASKED_GOST_KEY * +d2i_MASKED_GOST_KEY(MASKED_GOST_KEY **a, const unsigned char **in, long len) +{ + return (MASKED_GOST_KEY *)ASN1_item_d2i((ASN1_VALUE **)a, in, len, + _GOST_KEY_it); +} + +int +i2d_MASKED_GOST_KEY(MASKED_GOST_KEY *a, unsigned char **out) +{ + return ASN1_item_i2d((ASN1_VALUE *)a, out, _GOST_KEY_it); +} + +MASKED_GOST_KEY * +MASKED_GOST_KEY_new(void) +{ + return (MASKED_GOST_KEY *)ASN1_item_new(_GOST_KEY_it); +} + +void +MASKED_GOST_KEY_free(MASKED_GOST_KEY *a) +{ + ASN1_item_free((ASN1_VALUE *)a, _GOST_KEY_it); +} + static const ASN1_TEMPLATE GOST_KEY_TRANSPORT_seq_tt[] = { { .flags = 0, diff --git a/src/lib/libcrypto/gost/gost_asn1.h b/src/lib/libcrypto/gost/gost_asn1.h index 7cabfc79c965..cdbda7b98b67 100644 --- a/src/lib/libcrypto/gost/gost_asn1.h +++ b/src/lib/libcrypto/gost/gost_asn1.h @@ -56,6 +56,17 @@ __BEGIN_HIDDEN_DECLS +typedef struct { + ASN1_OCTET_STRING *masked_priv_key; + ASN1_OCTET_STRING *public_key; +} MASKED_GOST_KEY; + +MASKED_GOST_KEY *MASKED_GOST_KEY_new(void); +void MASKED_GOST_KEY_free(MASKED_GOST_KEY *a); +MASKED_GOST_KEY *d2i_MASKED_GOST_KEY(MASKED_GOST_KEY **a, const unsigned char **in, long len); +int i2d_MASKED_GOST_KEY(MASKED_GOST_KEY *a, unsigned char **out); +extern const ASN1_ITEM MASKED_GOST_KEY_it; + typedef struct { ASN1_OCTET_STRING *encrypted_key; ASN1_OCTET_STRING *imit; diff --git a/src/lib/libcrypto/gost/gostr341001_ameth.c b/src/lib/libcrypto/gost/gostr341001_ameth.c index 7cb70ed420ae..880c17ceaab8 100644 --- a/src/lib/libcrypto/gost/gostr341001_ameth.c +++ b/src/lib/libcrypto/gost/gostr341001_ameth.c @@ -437,6 +437,70 @@ priv_print_gost01(BIO *out, const EVP_PKEY *pkey, int indent, ASN1_PCTX *pctx) return pub_print_gost01(out, pkey, indent, pctx); } +static BIGNUM *unmask_priv_key(EVP_PKEY *pk, + const unsigned char *buf, int len, int num_masks) +{ + BIGNUM *pknum_masked = NULL, *q, *mask; + const GOST_KEY *key_ptr = pk->pkey.gost; + const EC_GROUP *group = GOST_KEY_get0_group(key_ptr); + const unsigned char *p = buf + num_masks * len; + BN_CTX *ctx; + + pknum_masked = GOST_le2bn(buf, len, NULL); + if (!pknum_masked) { + GOSTerror(ERR_R_MALLOC_FAILURE); + return NULL; + } + + if (num_masks == 0) + return pknum_masked; + + ctx = BN_CTX_new(); + if (ctx == NULL) { + GOSTerror(ERR_R_MALLOC_FAILURE); + goto err; + } + + BN_CTX_start(ctx); + + q = BN_CTX_get(ctx); + if (!q) { + GOSTerror(ERR_R_MALLOC_FAILURE); + goto err; + } + + mask = BN_CTX_get(ctx); + if (!mask) { + GOSTerror(ERR_R_MALLOC_FAILURE); + goto err; + } + + if (EC_GROUP_get_order(group, q, NULL) <= 0) { + GOSTerror(ERR_R_EC_LIB); + goto err; + } + + for (; p != buf; p -= len) { + if (GOST_le2bn(p, len, mask) == NULL || + !BN_mod_mul(pknum_masked, pknum_masked, mask, q, ctx)) { + GOSTerror(ERR_R_BN_LIB); + goto err; + } + } + + BN_CTX_end(ctx); + BN_CTX_free(ctx); + + return pknum_masked; + +err: + BN_CTX_end(ctx); + BN_CTX_free(ctx); + +
[PATCH 2/5] gost: use ECerror to report EC errors
GOST code uses GOSTerror(EC_R_foo) to report several errors. Use ECerror(EC_R_foo) instead to make error messages match error code. Sponsored by ROSA Linux. Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/gost/gostr341001_ameth.c | 2 +- src/lib/libcrypto/gost/gostr341001_key.c | 14 +++--- src/lib/libcrypto/gost/gostr341001_pmeth.c | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/lib/libcrypto/gost/gostr341001_ameth.c b/src/lib/libcrypto/gost/gostr341001_ameth.c index 27a95f2069cd..0e9521178da5 100644 --- a/src/lib/libcrypto/gost/gostr341001_ameth.c +++ b/src/lib/libcrypto/gost/gostr341001_ameth.c @@ -547,7 +547,7 @@ param_decode_gost01(EVP_PKEY *pkey, const unsigned char **pder, int derlen) } group = EC_GROUP_new_by_curve_name(nid); if (group == NULL) { - GOSTerror(EC_R_EC_GROUP_NEW_BY_NAME_FAILURE); + ECerror(EC_R_EC_GROUP_NEW_BY_NAME_FAILURE); GOST_KEY_free(ec); return 0; } diff --git a/src/lib/libcrypto/gost/gostr341001_key.c b/src/lib/libcrypto/gost/gostr341001_key.c index 0af39f21bf33..74f8cab9d86c 100644 --- a/src/lib/libcrypto/gost/gostr341001_key.c +++ b/src/lib/libcrypto/gost/gostr341001_key.c @@ -121,7 +121,7 @@ GOST_KEY_check_key(const GOST_KEY *key) return 0; } if (EC_POINT_is_at_infinity(key->group, key->pub_key) != 0) { - GOSTerror(EC_R_POINT_AT_INFINITY); + ECerror(EC_R_POINT_AT_INFINITY); goto err; } if ((ctx = BN_CTX_new()) == NULL) @@ -131,14 +131,14 @@ GOST_KEY_check_key(const GOST_KEY *key) /* testing whether the pub_key is on the elliptic curve */ if (EC_POINT_is_on_curve(key->group, key->pub_key, ctx) == 0) { - GOSTerror(EC_R_POINT_IS_NOT_ON_CURVE); + ECerror(EC_R_POINT_IS_NOT_ON_CURVE); goto err; } /* testing whether pub_key * order is the point at infinity */ if ((order = BN_new()) == NULL) goto err; if (EC_GROUP_get_order(key->group, order, ctx) == 0) { - GOSTerror(EC_R_INVALID_GROUP_ORDER); + ECerror(EC_R_INVALID_GROUP_ORDER); goto err; } if (EC_POINT_mul(key->group, point, NULL, key->pub_key, order, @@ -147,7 +147,7 @@ GOST_KEY_check_key(const GOST_KEY *key) goto err; } if (EC_POINT_is_at_infinity(key->group, point) == 0) { - GOSTerror(EC_R_WRONG_ORDER); + ECerror(EC_R_WRONG_ORDER); goto err; } /* @@ -156,7 +156,7 @@ GOST_KEY_check_key(const GOST_KEY *key) */ if (key->priv_key != NULL) { if (BN_cmp(key->priv_key, order) >= 0) { - GOSTerror(EC_R_WRONG_ORDER); + ECerror(EC_R_WRONG_ORDER); goto err; } if (EC_POINT_mul(key->group, point, key->priv_key, NULL, NULL, @@ -165,7 +165,7 @@ GOST_KEY_check_key(const GOST_KEY *key) goto err; } if (EC_POINT_cmp(key->group, point, key->pub_key, ctx) != 0) { - GOSTerror(EC_R_INVALID_PRIVATE_KEY); + ECerror(EC_R_INVALID_PRIVATE_KEY); goto err; } } @@ -212,7 +212,7 @@ GOST_KEY_set_public_key_affine_coordinates(GOST_KEY *key, BIGNUM *x, BIGNUM *y) * out of range. */ if (BN_cmp(x, tx) != 0 || BN_cmp(y, ty) != 0) { - GOSTerror(EC_R_COORDINATES_OUT_OF_RANGE); + ECerror(EC_R_COORDINATES_OUT_OF_RANGE); goto err; } if (GOST_KEY_set_public_key(key, point) == 0) diff --git a/src/lib/libcrypto/gost/gostr341001_pmeth.c b/src/lib/libcrypto/gost/gostr341001_pmeth.c index 0eb1d873deaf..0e0cae99e3fc 100644 --- a/src/lib/libcrypto/gost/gostr341001_pmeth.c +++ b/src/lib/libcrypto/gost/gostr341001_pmeth.c @@ -246,7 +246,7 @@ pkey_gost01_sign(EVP_PKEY_CTX *ctx, unsigned char *sig, size_t *siglen, *siglen = 2 * size; return 1; } else if (*siglen < 2 * size) { - GOSTerror(EC_R_BUFFER_TOO_SMALL); + ECerror(EC_R_BUFFER_TOO_SMALL); return 0; } if (tbs_len != 32 && tbs_len != 64) { -- 2.27.0
[PATCH 1/5] gost: populate params tables with new curves
Allow users to specify new curves via strings. Sponsored by ROSA Linux Signed-off-by: Dmitry Baryshkov --- src/lib/libcrypto/gost/gostr341001_params.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/lib/libcrypto/gost/gostr341001_params.c b/src/lib/libcrypto/gost/gostr341001_params.c index 282a21041999..9764964cdc1e 100644 --- a/src/lib/libcrypto/gost/gostr341001_params.c +++ b/src/lib/libcrypto/gost/gostr341001_params.c @@ -94,12 +94,22 @@ static const GostR3410_params GostR3410_256_params[] = { { "0", NID_id_GostR3410_2001_TestParamSet }, { "XA", NID_id_GostR3410_2001_CryptoPro_XchA_ParamSet }, { "XB", NID_id_GostR3410_2001_CryptoPro_XchB_ParamSet }, + { "TCA", NID_id_tc26_gost_3410_12_256_paramSetA }, + { "TCB", NID_id_tc26_gost_3410_12_256_paramSetB }, + { "TCC", NID_id_tc26_gost_3410_12_256_paramSetC }, + { "TCD", NID_id_tc26_gost_3410_12_256_paramSetD }, { NULL, NID_undef }, }; static const GostR3410_params GostR3410_512_params[] = { { "A", NID_id_tc26_gost_3410_12_512_paramSetA }, { "B", NID_id_tc26_gost_3410_12_512_paramSetB }, + { "C", NID_id_tc26_gost_3410_12_512_paramSetC }, + { "0", NID_id_tc26_gost_3410_12_512_paramSetTest}, + /* Duplicates for compatibility with OpenSSL */ + { "TCA", NID_id_tc26_gost_3410_12_512_paramSetA }, + { "TCB", NID_id_tc26_gost_3410_12_512_paramSetB }, + { "TCC", NID_id_tc26_gost_3410_12_512_paramSetC }, { NULL, NID_undef }, }; -- 2.27.0
Re: 11n Tx aggregation for iwm(4)
On 2020-06-26 20:11, Johan Huldtgren wrote: > hello, > > On 2020-06-26 14:45, Stefan Sperling wrote: > > It would be great to get at least one test for all the chipsets the driver > > supports: 7260, 7265, 3160, 3165, 3168, 8260, 8265, 9260, 9560 > > The behaviour of the access point also matters a great deal. It won't > > hurt to test the same chipset against several different access points. > > tested on: > > iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi > > AP is a Ruckus 7363. > > $ netstat -W iwm0 | grep "output block" > > > 6 new output block ack agreements > 0 output block ack agreements timed out > > Before: > > bandwidth min/avg/max/std-dev = 16.780/18.325/19.939/1.235 Mbps > > After: > > bandwidth min/avg/max/std-dev = 0.000/15.559/51.631/19.548 Mbps Testing against a slightly different AP (Ruckus 7372): before patch: bandwidth min/avg/max/std-dev = 0.092/14.665/22.589/9.992 Mbps after patch: bandwidth min/avg/max/std-dev = 7.020/24.596/41.121/11.300 Mbps This is the reported mode: media: IEEE802.11 autoselect (HT-MCS13 mode 11n) .jh
Re: ifconfig.8 Ar/Cm typo
On Sat, Jun 27, 2020 at 02:48:18AM -0500, Matthew Martin wrote: > A rule on a bridge interface that uses arp or rarp may be followed with > a literal "request" or "reply" (cf. sbin/ifconfig/brconfig.c L1041 and > 1048), so the Ar macro is incorrect as it's argument is not > a placeholder. > right/ > Aside: Is there a rule for when to list alternatives with foo | bar or > foo Ns | Ns bar? in/out, arp/rarp, and request/reply are all the former > sans-Ns; however, block/pass uses the Ns macro. > normally we just use arg1 | arg2, but sometimes this becomes ambiguous: rule block | pass [in | out] do "in" and "out" go with both "block" and "pass", or just "pass"? so sometimes we scrunch them up to make it clearer: rule block|pass [in | out] hence the need for Ns. i just committed your diff, but it needed a little more: Index: ifconfig.8 === RCS file: /cvs/src/sbin/ifconfig/ifconfig.8,v retrieving revision 1.350 diff -u -r1.350 ifconfig.8 --- ifconfig.8 24 Jun 2020 17:40:10 - 1.350 +++ ifconfig.8 27 Jun 2020 15:31:01 - @@ -751,7 +751,7 @@ .Bk -words .Op Cm tag Ar tagname .Oo -.Cm arp | rarp Op Ar request | reply +.Cm arp | rarp Op Cm request | reply .Op Cm sha Ar lladdr .Op Cm spa Ar ipaddr .Op Cm tha Ar lladdr @@ -779,9 +779,9 @@ keyword for regular packets and .Cm rarp for reverse arp. -.Ar request +.Cm request and -.Ar reply +.Cm reply limit matches to requests or replies. The source and target host addresses can be matched with the .Cm sha thanks for the diff! jmc
Re: [PATCH} Optimized rasops32 putchar
> From: > Date: Fri, 26 Jun 2020 07:42:50 -0700 > > Optimized 32 bit character rendering with unrolled rows and pairwise > foreground / background pixel rendering. > > If it weren't for the 5x8 font, I would have just assumed everything > was an even width and made the fallback path also pairwise. > > In isolation, the 16x32 character case got 2x faster, but that wasn't > a huge real world speedup where the space rendering that was already > at memory bandwidth limits accounted for most of the character > rendering time. However, in combination with the previous fast > conditional console scrolling that removes most of the space rendering, > it becomes significant. > > I also found that at least the efi and intel framebuffers are not > currently mapped write combining, which makes this much slower than > it should be. Hi John, The framebuffer should be mapped write-combining. In OpenBSD this is requested by specifying the BUS_SPACE_MAP_PREFETCHABLE flag to bbus_space_map(9) when mapping the framebuffer. I'm fairly confident since until last January the initial mapping of the framebuffer that we used wasn't write-combining. And things were really, really slow. Cheers, Mark > Index: rasops32.c > === > RCS file: /cvs/src/sys/dev/rasops/rasops32.c,v > retrieving revision 1.10 > diff -u -p -r1.10 rasops32.c > --- rasops32.c25 May 2020 09:55:49 - 1.10 > +++ rasops32.c26 Jun 2020 14:34:06 - > @@ -65,9 +65,14 @@ rasops32_init(struct rasops_info *ri) > int > rasops32_putchar(void *cookie, int row, int col, u_int uc, uint32_t > attr) > { > - int width, height, cnt, fs, fb, clr[2]; > + int width, height, step, cnt, fs, b, f; > + uint32_t fb, clr[2]; > struct rasops_info *ri; > - int32_t *dp, *rp; > + int64_t *rp, q; > + union { > + int64_t q[4]; > + int32_t d[4][2]; > + } u; > u_char *fr; > > ri = (struct rasops_info *)cookie; > @@ -81,48 +86,128 @@ rasops32_putchar(void *cookie, int row, > return 0; > #endif > > - rp = (int32_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > + rp = (int64_t *)(ri->ri_bits + row*ri->ri_yscale + col*ri->ri_xscale); > > height = ri->ri_font->fontheight; > width = ri->ri_font->fontwidth; > + step = ri->ri_stride >> 3; > > - clr[0] = ri->ri_devcmap[(attr >> 16) & 0xf]; > - clr[1] = ri->ri_devcmap[(attr >> 24) & 0xf]; > + b = ri->ri_devcmap[(attr >> 16) & 0xf]; > + f = ri->ri_devcmap[(attr >> 24) & 0xf]; > + u.d[0][0] = b; u.d[0][1] = b; > + u.d[1][0] = b; u.d[1][1] = f; > + u.d[2][0] = f; u.d[2][1] = b; > + u.d[3][0] = f; u.d[3][1] = f; > > if (uc == ' ') { > + q = u.q[0]; > while (height--) { > - dp = rp; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) > - *dp++ = clr[0]; > + /* the general, pixel-at-a-time case is fast enough */ > + for (cnt = 0; cnt < width; cnt++) > + ((int *)rp)[cnt] = b; > + rp += step; > } > } else { > uc -= ri->ri_font->firstchar; > fr = (u_char *)ri->ri_font->data + uc * ri->ri_fontscale; > fs = ri->ri_font->stride; > - > - while (height--) { > - dp = rp; > - fb = fr[3] | (fr[2] << 8) | (fr[1] << 16) | > - (fr[0] << 24); > - fr += fs; > - DELTA(rp, ri->ri_stride, int32_t *); > - > - for (cnt = width; cnt; cnt--) { > - *dp++ = clr[(fb >> 31) & 1]; > - fb <<= 1; > - } > + /* double-pixel special cases for the common widths */ > + switch (width) { > + case 8: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > + rp[3] = u.q[fb & 3]; > + rp += step; > + fr += 1; > + } > + break; > + > + case 12: > + while (height--) { > + fb = fr[0]; > + rp[0] = u.q[fb >> 6]; > + rp[1] = u.q[(fb >> 4) & 3]; > + rp[2] = u.q[(fb >> 2) & 3]; > +
Re: wg(4): encapsulated transport checksums
> - Therefore, it's not necessary to check the IP checksum on ingress because: There is actually a really good reason. There are various counters (of all packets) which people observe to debug network problems. Now, if lower-level packets carrying wg with corruption don't increment those counters, the statistics will be incorrect. I think you are arguying to elide mandatory work in a lower layer of network stack, isn't it a layer violation to insist like that?
Re: awk FS behaviour change
On Sat, Jun 27, 2020 at 06:32:11AM -0600, Todd C. Miller wrote: > On Sat, 27 Jun 2020 06:50:39 +0100, Jason McIntyre wrote: > > > i'm not sure it reads better when we switch the emphasis from whitespace > > to FS. i think it's better that people see how it normally works, then > > the gories about FS. so i'd have kept the first part of the sentence, > > but maybe reworked the FS bit. > > I wasn't sure that was an improvement either. Does this seem better? > > - todd > yes, i think this is better. ok by me. jmc > Index: usr.bin/awk/awk.1 > === > RCS file: /cvs/src/usr.bin/awk/awk.1,v > retrieving revision 1.54 > diff -u -p -u -r1.54 awk.1 > --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 - 1.54 > +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 - > @@ -130,26 +130,24 @@ and newlines are used as field separator > This is convenient when working with multi-line records. > .Pp > An input line is normally made up of fields separated by whitespace, > -or by the regular expression > -.Va FS . > +or by the value of the field separator > +.Va FS > +at the time the line is read. > The fields are denoted > .Va $1 , $2 , ... , > while > .Va $0 > refers to the entire line. > -If > .Va FS > -is null, the input line is split into one field per character. > -Lines are split into fields using the value of > +may be set to either a single character or a regular expression. > +As as special case, if > .Va FS > -at the time the line is read. > -Because of this, > +is a single space > +.Pq the default , > +fields will be split by one or more whitespace characters. > +If > .Va FS > -is usually set via the > -.Fl F > -option or inside of a > -.Ic BEGIN > -block. > +is null, the input line is split into one field per character. > .Pp > Normally, any number of blanks separate fields. > In order to set the field separator to a single blank, use the > @@ -171,6 +169,11 @@ as the field separator, use the > .Fl F > option with a value of > .Sq [t] . > +The field separator is usually set via the > +.Fl F > +option or from inside a > +.Ic BEGIN > +block so that it takes effect before the input is read. > .Pp > A pattern-action statement has the form: > .Pp > @@ -407,9 +410,9 @@ The name of the current input file. > .It Va FNR > Ordinal number of the current record in the current file. > .It Va FS > -Regular expression used to separate fields; also settable > -by option > -.Fl F Ar fs . > +Regular expression used to separate fields (default whitespace); > +also settable by option > +.Fl F Ar fs > .It Va NF > Number of fields in the current record. > .Va $NF >
Re: awk FS behaviour change
On Sat, Jun 27, 2020 at 06:32:11AM -0600, Todd C. Miller wrote: > I wasn't sure that was an improvement either. Does this seem better? To me it does, thanks. OK kn > Index: usr.bin/awk/awk.1 > === > RCS file: /cvs/src/usr.bin/awk/awk.1,v > retrieving revision 1.54 > diff -u -p -u -r1.54 awk.1 > --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 - 1.54 > +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 - > @@ -130,26 +130,24 @@ and newlines are used as field separator > This is convenient when working with multi-line records. > .Pp > An input line is normally made up of fields separated by whitespace, > -or by the regular expression > -.Va FS . > +or by the value of the field separator > +.Va FS > +at the time the line is read. > The fields are denoted > .Va $1 , $2 , ... , > while > .Va $0 > refers to the entire line. > -If > .Va FS > -is null, the input line is split into one field per character. > -Lines are split into fields using the value of > +may be set to either a single character or a regular expression. > +As as special case, if > .Va FS > -at the time the line is read. > -Because of this, > +is a single space > +.Pq the default , .Pq is probably not needed here, at the end you're doing also just using "(default whitespace)". > +fields will be split by one or more whitespace characters. > +If > .Va FS > -is usually set via the > -.Fl F > -option or inside of a > -.Ic BEGIN > -block. > +is null, the input line is split into one field per character. > .Pp > Normally, any number of blanks separate fields. > In order to set the field separator to a single blank, use the > @@ -171,6 +169,11 @@ as the field separator, use the > .Fl F > option with a value of > .Sq [t] . > +The field separator is usually set via the > +.Fl F > +option or from inside a > +.Ic BEGIN > +block so that it takes effect before the input is read. > .Pp > A pattern-action statement has the form: > .Pp > @@ -407,9 +410,9 @@ The name of the current input file. > .It Va FNR > Ordinal number of the current record in the current file. > .It Va FS > -Regular expression used to separate fields; also settable > -by option > -.Fl F Ar fs . > +Regular expression used to separate fields (default whitespace); > +also settable by option > +.Fl F Ar fs Missing dot here (with trailing space after "fs"). > .It Va NF > Number of fields in the current record. > .Va $NF >
Re: awk FS behaviour change
On Sat, 27 Jun 2020 06:50:39 +0100, Jason McIntyre wrote: > i'm not sure it reads better when we switch the emphasis from whitespace > to FS. i think it's better that people see how it normally works, then > the gories about FS. so i'd have kept the first part of the sentence, > but maybe reworked the FS bit. I wasn't sure that was an improvement either. Does this seem better? - todd Index: usr.bin/awk/awk.1 === RCS file: /cvs/src/usr.bin/awk/awk.1,v retrieving revision 1.54 diff -u -p -u -r1.54 awk.1 --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 - 1.54 +++ usr.bin/awk/awk.1 27 Jun 2020 12:29:21 - @@ -130,26 +130,24 @@ and newlines are used as field separator This is convenient when working with multi-line records. .Pp An input line is normally made up of fields separated by whitespace, -or by the regular expression -.Va FS . +or by the value of the field separator +.Va FS +at the time the line is read. The fields are denoted .Va $1 , $2 , ... , while .Va $0 refers to the entire line. -If .Va FS -is null, the input line is split into one field per character. -Lines are split into fields using the value of +may be set to either a single character or a regular expression. +As as special case, if .Va FS -at the time the line is read. -Because of this, +is a single space +.Pq the default , +fields will be split by one or more whitespace characters. +If .Va FS -is usually set via the -.Fl F -option or inside of a -.Ic BEGIN -block. +is null, the input line is split into one field per character. .Pp Normally, any number of blanks separate fields. In order to set the field separator to a single blank, use the @@ -171,6 +169,11 @@ as the field separator, use the .Fl F option with a value of .Sq [t] . +The field separator is usually set via the +.Fl F +option or from inside a +.Ic BEGIN +block so that it takes effect before the input is read. .Pp A pattern-action statement has the form: .Pp @@ -407,9 +410,9 @@ The name of the current input file. .It Va FNR Ordinal number of the current record in the current file. .It Va FS -Regular expression used to separate fields; also settable -by option -.Fl F Ar fs . +Regular expression used to separate fields (default whitespace); +also settable by option +.Fl F Ar fs .It Va NF Number of fields in the current record. .Va $NF
Re: 11n Tx aggregation for iwm(4)
Works for me on a 7260. [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 108 MBytes 90.1 Mbits/sec
Re: pipex(4): use reference counters for `ifnet'
On 27/06/20(Sat) 01:02, Vitaliy Makkoveev wrote: > On Fri, Jun 26, 2020 at 09:15:38PM +0200, Martin Pieuchot wrote: > > On 26/06/20(Fri) 17:53, Vitaliy Makkoveev wrote: > > > On Fri, Jun 26, 2020 at 02:29:03PM +0200, Martin Pieuchot wrote: > > > > On 26/06/20(Fri) 12:35, Vitaliy Makkoveev wrote: > > > > > On Fri, Jun 26, 2020 at 10:23:42AM +0200, Martin Pieuchot wrote: > > > > > > On 25/06/20(Thu) 19:56, Vitaliy Makkoveev wrote: > > > > > > > Updated diff. > > > > > > > > > > > > > > OpenBSD uses 16 bit counter for allocate interface indexes. So we > > > > > > > can't > > > > > > > store index in session and be sure if_get(9) returned `ifnet' is > > > > > > > our > > > > > > > original `ifnet'. > > > > > > > > > > > > Why not? The point of if_get(9) is to be sure. If that doesn't > > > > > > work > > > > > > for whatever reason then the if_get(9) interface has to be fixed. > > > > > > Which > > > > > > case doesn't work for you? Do you have a reproducer? > > > > > > > > > > > > How does sessions stay around if their corresponding interface is > > > > > > destroyed? > > > > > > > > > > We have `pipexinq' and `pipexoutq' which can store pointers to > > > > > session. > > > > > pipexintr() process these queues. pipexintr() and > > > > > pipex_destroy_session() are *always* different context. This mean we > > > > > *can't* free pipex(4) session without be sure there is no reference to > > > > > this session in `pipexinq' or `pipexoutq'. Elsewhere this will cause > > > > > use > > > > > afret free issue. Look please at net/pipex.c:846. The way pppx(4) > > > > > destroy sessions is wrong. While pppac(4) destroys sessions by > > > > > pipex_iface_fini() it's also wrong. Because we don't check `pipexinq' > > > > > and `pipexoutq' state. I'am said it again and again. > > > > > > > > I understand. Why is it a problem? Using reference counting the way > > > > you're suggesting is *one* possible solution to a problem we don't fully > > > > understand. What are we trying to achieve? Which problem are we trying > > > > to solve? > > > > > > Sorry, may be I misunderstand something. > > > > > > `pipexoutq' case: > > > > > > 1. pppac_start() calls pipex_output() > > > 2. pipex_output() calls pipex_ip_output() > > > 3. pipex_ip_output() calls pipex_ppp_enqueue() > > > 4. pipex_ppp_enqueue() calls schednetisr() which is task_add() > > > > > > `pipexinq' cases: > > > > > > 1.1. ether_input() calls pipex_pppoe_input() > > > 1.2. gre_input() calls gre_input_1() > > > gre_input_1() calls pipex_pptp_input() > > > 1.3. udp_input() calls pipex_l2tp_input() > > > > > > 2. pipex_{pppoe,pptp,l2tp}_input() calls pipex_common_input() > > > 3. pipex_common_input() calls schednetisr() which is task_add() > > > > > > task_add(9) just schedules the execution of the work specified by `tq'. > > > So we can do pipex_destroy_session() * between * schednetisr() and > > > pipexintr(). And we can do this right * now *, with our current locking. > > > And this is the problem I'am trying to solve. > > > > > > My apologies if I'am wrong above. Please point me where I'am wrong. > > > > > > Also before pipex_{pppoe,pptp,l2tp}_input() we call corresponding > > > pipex_{pptp,l2tp}_lookup_session() to obtain pointer to pipex(4) > > > session. We should be shure `session' is still walid between > > > pipex_*_lookup() and pipex_*_input(). It's not required now, but will be > > > required in future. > > > > Why not iterate over the queues and garbage collect the sessions that > > are about to be removed? That's what the network stack was doing with > > mbuf queues prior to if_get(9) when interfaces where destroyed. > > > > Do you mean net/if.c:1185 and below? This is the queues associated with > this `ifp'. But for pipex(4) we should go through all mbufs associated > with pipex(4). This can be heavy if we have hundreds of sessions. Also > this would work until session destruction and `pr_input' are serialized. > > Point me please the line in source to see if I'am wrong about `ifnet's > mbuf queues claninig. Look at r1.329 of net/if.c. Prior to this change if_detach_queues() was used to free all mbufs when an interface was removed. Now lazy freeing is used everytime if_get(9) rerturns NULL. This is possible because we store an index and not a pointer directly in the mbuf. The advantage of storing a session pointer in `ph_cookie' is that no lookup is required in pipexintr(), right? Maybe we could save a ID instead and do a lookup. How big can be the `pipex_session_list'?
Re: fix races in if_clone_create()
On 27/06/20(Sat) 00:35, Vitaliy Makkoveev wrote: > On Fri, Jun 26, 2020 at 09:12:16PM +0200, Martin Pieuchot wrote: > > On 26/06/20(Fri) 16:56, Vitaliy Makkoveev wrote: > > > if_clone_create() has the races caused by context switch. > > > > Can you share a backtrace of such race? Where does the kernel panic? > > > > This diff was inspired by thread [1]. As I explained [2] here is 3 > issues that cause panics produced by command below: > > cut begin > for i in 1 2 3; do while true; do ifconfig bridge0 create& \ > ifconfig bridge0 destroy& done& done > cut end Thanks, I couldn't reproduce it on any of the machines I tried. Did you managed to reproduce it with other pseudo-devices or just with bridge0? > My system was stable with the last diff I did for thread [1]. But since > this final diff [3] which include fixes for tun(4) is quick and dirty > and not for commit I decided to make the diff to fix the races caused by > if_clone_create() at first. > > I included screenshot with panic. Thanks, interesting that the corruption happens on a list that should be initialized. Does that mean the context switch on Thread 1 is happening before if_attach_common() is called? You said your previous email that there's a context switch. Do you know when it happens? You could see that in ddb by looking at the backtrace of the other CPU. Is the context switch leading to the race common to all pseudo-drivers or is it in the bridge(4) driver? Regarding your solution, do I understand correctly that the goal is to serialize all if_clone_create()? Is it really needed to remember which unit is being currently created or can't we just serialize all of them? The fact that a lock is not held over the cloning operation is imho positive.
Re: wg(4): encapsulated transport checksums
Hi Richard, Thanks for the patch. I had problems parsing some terminology in your description, so I thought I'd lay out my understanding of the matter, and you can let me know whether or not this corresponds with what you had in mind: - On egress, we must compute the packet checksum, because it may well be forwarded by the receiving end after decapsulation. That doesn't concern this patch, however. - On ingress, we've already checked the poly1305 sum, so we have no doubt that the packet has arrived without corruption. - Therefore, it's not necessary to check the IP checksum on ingress because: * If the packet originated on the peer that did the encapsulation, there's no chance for corruption; * If the packet did not originate on the peer that did the encapsulation, it was that peer's responsibility to drop it if the checksum was wrong; * If the packet does have an incorrect checksum, because the originating peer did not check it, and we forward it along, the machine we forward it to will drop it. It seemed like from your message that you had a case in mind in which it actually would be necessary to check the IP checksum on ingress, but I didn't quite divine what you had in mind. Jason On Fri, Jun 26, 2020 at 10:03 PM wrote: > > Hi, > > On its receive path, wg(4) uses the same mbuf for both the encrypted > capsule and its encapsulated packet, which it passes up to the stack. We > must therefore clear this mbuf's checksum status flags, as although the > capsule may have been subject to hardware offload, its encapsulated packet > was not. > > This ensures that the transport checksums of packets bound for local > delivery are verified. That is necessary because, although the tunnel > provides stronger integrity checks, the tunnel endpoints and the > transport endpoints needn't coincide. > > However, as the network and tunnel endpoints _do_ conincide, it remains > unncessary to check the per-hop IPv4 checksum. > > ok? > > Index: net/if_wg.c > === > RCS file: /cvs/src/sys/net/if_wg.c,v > retrieving revision 1.7 > diff -u -p -u -p -r1.7 if_wg.c > --- net/if_wg.c 23 Jun 2020 10:03:49 - 1.7 > +++ net/if_wg.c 27 Jun 2020 02:48:37 - > @@ -1660,14 +1660,10 @@ wg_decap(struct wg_softc *sc, struct mbu > goto error; > } > > - /* > -* We can mark incoming packet csum OK. We mark all flags OK > -* irrespective to the packet type. > -*/ > - m->m_pkthdr.csum_flags |= (M_IPV4_CSUM_IN_OK | M_TCP_CSUM_IN_OK | > - M_UDP_CSUM_IN_OK | M_ICMP_CSUM_IN_OK); > - m->m_pkthdr.csum_flags &= ~(M_IPV4_CSUM_IN_BAD | M_TCP_CSUM_IN_BAD | > - M_UDP_CSUM_IN_BAD | M_ICMP_CSUM_IN_BAD); > + /* tunneled packet was not offloaded */ > + m->m_pkthdr.csum_flags = 0; > + /* optimise: the tunnel provided a stronger integrity check */ > + m->m_pkthdr.csum_flags |= M_IPV4_CSUM_IN_OK; > > m->m_pkthdr.ph_ifidx = sc->sc_if.if_index; > m->m_pkthdr.ph_rtableid = sc->sc_if.if_rdomain;
Re: [PATCH] fast conditional console scrolling
Hi John, With both your diffs applied, results are indeed more like 3x speed-up that I get on my machine. Average over 7 runs ls -R /usr/ports was 64.169s making for just under 3x increase. That's on 1920x1080 with the standard font size for that resolution (120x33 console, so 16x32 font). Thanks again, Paul 'WEiRD' de Weerd On Fri, Jun 26, 2020 at 07:49:55AM -0700, jo...@armadilloaerospace.com wrote: | I should have been more rigorous -- I had two different changes running | on my system, as well as forcing it to use the 12x24 font for a 160x45 | console. | | If you apply the "Optimized rasops32 putchar" patch I just posted, you | should see another significant speedup. | | | Original Message | Subject: Re: [PATCH] fast conditional console scrolling | From: Paul de Weerd | Date: Fri, June 26, 2020 1:23 am | To: jo...@armadilloaerospace.com | Cc: "tech@openbsd.org" | | Hi John, | | I tried your diff. I don't quite see the same 3x improvement that you | report, more like 2x. I timed 7 runs of ls -R /usr/ports: | | Before diff, time ls -R /usr/ports | wc -l 2.897s on average | After diff, time ls -R /usr/ports | wc -l 2.707s on average | | Before diff, time ls -R /usr/ports 2m53.067 on average | After diff, time ls -R /usr/ports 1m30.387 on average | | Note that the 'before diff' runs were with a snapshot kernel. There | may be diffs in there that account for the difference between before | and after of the no-output runs. See dmesg and full stats below. | | So, on average, a speed-up of ~48%. | | Thanks! | | Paul 'WEiRD' de Weerd | | -- >[<++>-]<+++.>+++[<-->-]<.>+++[<+ +++>-]<.>++[<>-]<+.--.[-] http://www.weirdnet.nl/
ifconfig.8 Ar/Cm typo
A rule on a bridge interface that uses arp or rarp may be followed with a literal "request" or "reply" (cf. sbin/ifconfig/brconfig.c L1041 and 1048), so the Ar macro is incorrect as it's argument is not a placeholder. Aside: Is there a rule for when to list alternatives with foo | bar or foo Ns | Ns bar? in/out, arp/rarp, and request/reply are all the former sans-Ns; however, block/pass uses the Ns macro. diff --git ifconfig.8 ifconfig.8 index c522491ad45..2d1d2eb1974 100644 --- ifconfig.8 +++ ifconfig.8 @@ -751,7 +751,7 @@ like a hub or a wireless network. .Bk -words .Op Cm tag Ar tagname .Oo -.Cm arp | rarp Op Ar request | reply +.Cm arp | rarp Op Cm request | reply .Op Cm sha Ar lladdr .Op Cm spa Ar ipaddr .Op Cm tha Ar lladdr
Re: awk FS behaviour change
On Sat, Jun 27, 2020 at 06:50:39AM +0100, Jason McIntyre wrote: > On Fri, Jun 26, 2020 at 09:28:00PM -0600, Todd C. Miller wrote: > > On Fri, 26 Jun 2020 23:56:23 +0200, Klemens Nanni wrote: > > > > > How about adding something like "Therefore, FS should be set with -F or > > > in a BEGIN block before input is read." as second sentence in this > > > paragraph? > > > > That whole section is missing important details. I've tried to add > > the missing info without being too repetitive. > > > > - todd > > > > Index: usr.bin/awk/awk.1 > > === > > RCS file: /cvs/src/usr.bin/awk/awk.1,v > > retrieving revision 1.54 > > diff -u -p -u -r1.54 awk.1 > > --- usr.bin/awk/awk.1 26 Jun 2020 21:50:06 - 1.54 > > +++ usr.bin/awk/awk.1 27 Jun 2020 03:25:48 - > > @@ -129,27 +129,25 @@ and newlines are used as field separator > > .Va FS ) . > > This is convenient when working with multi-line records. > > .Pp > > -An input line is normally made up of fields separated by whitespace, > > -or by the regular expression > > -.Va FS . > > +An input line is normally made up of fields split based on the value > > +of the field separator > > +.Va FS > > +at the time the line is read. > > i'm not sure it reads better when we switch the emphasis from whitespace > to FS. i think it's better that people see how it normally works, then > the gories about FS. so i'd have kept the first part of the sentence, > but maybe reworked the FS bit. > > > The fields are denoted > > .Va $1 , $2 , ... , > > while > > .Va $0 > > refers to the entire line. > > -If > > .Va FS > > -is null, the input line is split into one field per character. > > -Lines are split into fields using the value of > > +may be set to either a single character or a regular expression. > > +As as special case, if > > .Va FS > > -at the time the line is read. > > -Because of this, > > +is a single space > > +.Pq the default , > > +fields will be split by one or more whitespace characters. > > +If > > .Va FS > > -is usually set via the > > -.Fl F > > -option or inside of a > > -.Ic BEGIN > > -block. > > +is null, the input line is split into one field per character. > > .Pp > > Normally, any number of blanks separate fields. > > In order to set the field separator to a single blank, use the > > @@ -171,6 +169,11 @@ as the field separator, use the > > .Fl F > > option with a value of > > .Sq [t] . > > +The field separator is usually set via the > > +.Fl F > > +option or from inside of a > > that sounds odd, but it may be a US/UK thing: i would say either "from > inside a block" or "from the inside of a block". Maybe "... from inside of the" rather than "... from inside of a" --patrick > > jmc > > > +.Ic BEGIN > > +block so that it takes effect before the input is read. > > .Pp > > A pattern-action statement has the form: > > .Pp > > @@ -407,9 +410,9 @@ The name of the current input file. > > .It Va FNR > > Ordinal number of the current record in the current file. > > .It Va FS > > -Regular expression used to separate fields; also settable > > -by option > > -.Fl F Ar fs . > > +Regular expression used to separate fields (default whitespace); > > +also settable by option > > +.Fl F Ar fs > > .It Va NF > > Number of fields in the current record. > > .Va $NF > > >
Re: 11n Tx aggregation for iwm(4)
On Fri, Jun 26, 2020 at 06:14:48PM +0200, Landry Breuil wrote: > On Fri, Jun 26, 2020 at 02:45:53PM +0200, Stefan Sperling wrote: > > This patch adds support for 11n Tx aggregation to iwm(4). > > > > Please help with testing if you can by running the patch and using wifi > > as usual. Nothing should change, except that Tx speed may potentially > > improve. If you have time to run before/after performance measurements with > > tcpbench or such, that would be nice. But it's not required for testing. > > > > If Tx aggregation is active then netstat will show a non-zero output block > > ack > > agreement counter: > > > > $ netstat -W iwm0 | grep 'output block' > > 3 new output block ack agreements > > 0 output block ack agreements timed out > > > > It would be great to get at least one test for all the chipsets the driver > > supports: 7260, 7265, 3160, 3165, 3168, 8260, 8265, 9260, 9560 > > The behaviour of the access point also matters a great deal. It won't > > hurt to test the same chipset against several different access points. > > > > I have tested this version on 8265 only so far. I've run older revisions > > of this patch on 7265 so I'm confident that this chip will work, too. > > So far, the APs I have tested against are athn(4) in 11a mode and in 11n > > mode with the 'nomimo' nwflag, and a Sagemcom 11ac AP. All on 5Ghz channels. > > no difference on X1c3 w/ > iwm0 at pci2 dev 0 function 0 "Intel Dual Band Wireless AC 7265" rev 0x59, msi > iwm0: hw rev 0x210, fw ver 17.3216344376.0, > > using a crappy old fonera as AP, serving as a bridge to gw w/ tcpbench. > > bandwidth min/avg/max/std-dev = 22.519/22.704/22.995/0.162 Mbps > > same bw both ways it seems. so no change against this old AP, which selects: media: IEEE802.11 autoselect (OFDM48 mode 11g) or sometimes media: IEEE802.11 autoselect (OFDM12 mode 11g) or media: IEEE802.11 autoselect (OFDM6 mode 11g) but if i connect to the ISP's box wifi, which selects: media: IEEE802.11 autoselect (HT-MCS8 mode 11n) the performance is horrible, i have a lot of lag, and tcpbench says: bandwidth min/avg/max/std-dev = 0.000/1.576/10.069/2.781 Mbps i have some iwm firmware errors in dmesg. without the patch, its a bit the same: bandwidth min/avg/max/std-dev = 0.000/1.836/9.846/2.292 Mbps but no firmware errors afaict. so dunno if the patch itself changes something, but the perf with the ISP AP is awful. Cant remember if it was the case before as i seldomly use it with OpenBSD as a client.. Landry