Re: Unlock top part of uvm_fault()
On Thu, 22 Apr 2021 15:38:53 +0200 Martin Pieuchot wrote: > Diff below remove the KERNEL_LOCK()/UNLOCK() dance from uvm_fault() for > both amd64 and sparc64. That means the kernel lock will only be taken > for lower faults and some amap/anon code will now run without it. I made a similar diff, below, for macppc and powerpc64. In my small trials, the build times were about the same on my 4-core powerpc64 and slightly faster on my 2-core macppc. My macppc kernel had a problem: it froze at boot, but the problem went away after I reordered the kernel. In my guess, this unlocking diff isn't the cause of the problem, because the freeze was too early (before copyright); the other cpu wouldn't have entered the kernel, so the kernel lock wouldn't prevent the freeze. My powerpc64 trial was "make -j4" in src/gnu/usr.bin/clang on 4-core POWER9 in Talos II. Before diff: 117m45.96s real 361m58.87s user68m44.66s system After diff: 115m24.53s real 362m02.65s user58m40.99s system I see that system time went down by 10 minutes, but real time is about the same (down by 2 minutes). My macppc trial was "dpb devel/cmake" on 2-cpu Power Mac G5. Before diff:03:44:50 After diff: 03:33:44 Real time went down by 11 minutes. Maybe the unlocking diff works better on a 2-core machine, or maybe the unlocking diff does nothing and I saved 11 minutes by random luck. --George Index: arch/powerpc/powerpc/trap.c === RCS file: /cvs/src/sys/arch/powerpc/powerpc/trap.c,v retrieving revision 1.119 diff -u -p -r1.119 trap.c --- arch/powerpc/powerpc/trap.c 11 Mar 2021 11:16:59 - 1.119 +++ arch/powerpc/powerpc/trap.c 28 Apr 2021 02:49:44 - @@ -282,9 +282,7 @@ trap(struct trapframe *frame) else ftype = PROT_READ; - KERNEL_LOCK(); error = uvm_fault(map, trunc_page(va), 0, ftype); - KERNEL_UNLOCK(); if (error == 0) return; @@ -318,10 +316,8 @@ trap(struct trapframe *frame) } else vftype = ftype = PROT_READ; - KERNEL_LOCK(); error = uvm_fault(&p->p_vmspace->vm_map, trunc_page(frame->dar), 0, ftype); - KERNEL_UNLOCK(); if (error == 0) { uvm_grow(p, frame->dar); @@ -343,10 +339,8 @@ trap(struct trapframe *frame) ftype = PROT_READ | PROT_EXEC; - KERNEL_LOCK(); error = uvm_fault(&p->p_vmspace->vm_map, trunc_page(frame->srr0), 0, ftype); - KERNEL_UNLOCK(); if (error == 0) { uvm_grow(p, frame->srr0); Index: arch/powerpc64/powerpc64/trap.c === RCS file: /cvs/src/sys/arch/powerpc64/powerpc64/trap.c,v retrieving revision 1.49 diff -u -p -r1.49 trap.c --- arch/powerpc64/powerpc64/trap.c 9 Jan 2021 13:14:02 - 1.49 +++ arch/powerpc64/powerpc64/trap.c 24 Apr 2021 03:21:02 - @@ -126,9 +126,7 @@ trap(struct trapframe *frame) access_type = PROT_READ | PROT_WRITE; else access_type = PROT_READ; - KERNEL_LOCK(); error = uvm_fault(map, trunc_page(va), 0, access_type); - KERNEL_UNLOCK(); if (error == 0) return; @@ -237,9 +235,7 @@ trap(struct trapframe *frame) access_type = PROT_READ | PROT_WRITE; else access_type = PROT_READ; - KERNEL_LOCK(); error = uvm_fault(map, trunc_page(va), 0, access_type); - KERNEL_UNLOCK(); if (error == 0) uvm_grow(p, va); @@ -284,9 +280,7 @@ trap(struct trapframe *frame) map = &p->p_vmspace->vm_map; va = frame->srr0; access_type = PROT_READ | PROT_EXEC; - KERNEL_LOCK(); error = uvm_fault(map, trunc_page(va), 0, access_type); - KERNEL_UNLOCK(); if (error == 0) uvm_grow(p, va);
Re: sysctl net.inet.ip.arpqueued read only
Vitaliy Makkoveev writes: > On Tue, Apr 27, 2021 at 09:12:56PM +0200, Alexander Bluhm wrote: >> On Tue, Apr 27, 2021 at 12:18:02PM -0600, Theo de Raadt wrote: >> > Actually, your variation seems pretty good. Is there any reason to not >> > use this type of define? >> >> This would look like this. >> >> I think sysctl_int() and sysctl_rdint() should be the primitive >> functions. This brings us back the 4.4BSD implementation. Then >> sysctl_int_bounded() builds the magic on top like sysctl_int_lower() >> does it. sysctl_bounded_arr() is a wrapper around it. >> >> I just added a few defines that my simple grep found. We could >> search for more. > > This looks much better, I like the direction. I like this too. I somehow got the impression that macros are severely frowned upon and didn't offer this kind of interface before. If you get this submitted, I can do a pass through the codebase to be sure we catch them all. Thanks Greg
Re: arp mbuf delist
> On 28 Apr 2021, at 00:45, Alexander Bluhm wrote: > > On Sun, Apr 25, 2021 at 06:08:17PM +1000, Jonathan Matthew wrote: >> On Sun, Apr 25, 2021 at 09:44:16AM +0200, Alexandr Nedvedicky wrote: >> This already exists, it's called mq_delist() > > Here is the diff with mq_delist(). > >> We'd need some other way to do the 'mbuf is back in queue' detection, >> but I agree this seems like a sensible thing to do. > > This part is ugly as before. I have put a printf in the discard > case it and I never hit it. Any idea why we need this and how to > fix it? > We did ARP resolution before this loop and `rt’ contains valid address. Also `la_mq’ manipulations are serialized so I doubt this thread could reinsert packet while performing ifp->if_enqueue(). But I can’t said this never happened. But you are going to make this code run in parallel other thread could fail ARP resolution and insert (not reinsert) packet while this thread performs local `ml’ walkthrough. For this case I doubt we should clean `la_mq’. But can we be sure we are not reinserted packet? > ok? > > bluhm > > Index: netinet/if_ether.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.246 > diff -u -p -r1.246 if_ether.c > --- netinet/if_ether.c26 Apr 2021 07:55:16 - 1.246 > +++ netinet/if_ether.c27 Apr 2021 21:02:27 - > @@ -594,7 +594,9 @@ arpcache(struct ifnet *ifp, struct ether > struct in_addr *spa = (struct in_addr *)ea->arp_spa; > char addr[INET_ADDRSTRLEN]; > struct ifnet *rifp; > + struct mbuf_list ml; > struct mbuf *m; > + unsigned int len; > int changed = 0; > > KERNEL_ASSERT_LOCKED(); > @@ -666,21 +668,17 @@ arpcache(struct ifnet *ifp, struct ether > > la->la_asked = 0; > la->la_refreshed = 0; > - while ((m = mq_dequeue(&la->la_mq)) != NULL) { > - unsigned int len; > - > - atomic_dec_int(&la_hold_total); > - len = mq_len(&la->la_mq); > - > + mq_delist(&la->la_mq, &ml); > + len = ml_len(&ml); > + while ((m = ml_dequeue(&ml)) != NULL) { > ifp->if_output(ifp, m, rt_key(rt), rt); > - > - /* XXXSMP we discard if other CPU enqueues */ > - if (mq_len(&la->la_mq) > len) { > - /* mbuf is back in queue. Discard. */ > - atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); > - break; > - } > } > + /* XXXSMP we discard if other CPU enqueues */ > + if (mq_len(&la->la_mq) > 0) { > + /* mbuf is back in queue. Discard. */ > + atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq)); > + } else > + atomic_sub_int(&la_hold_total, len); > > return (0); > } >
Re: [External] : arp mbuf delist
On Wed, Apr 28, 2021 at 03:19:47AM +0300, Vitaliy Makkoveev wrote: > The code not only breaks loop but also cleans the queue. Should this be > kept? In IPv6 nd6_cache_lladdr() we have neither queue nor loop, just a single mbuf on hold. But we have that discard logic, too. if (ln->ln_hold) { struct mbuf *n = ln->ln_hold; ln->ln_hold = NULL; /* * we assume ifp is not a p2p here, so just * set the 2nd argument as the 1st one. */ ifp->if_output(ifp, n, rt_key(rt), rt); if (ln->ln_hold == n) { /* n is back in ln_hold. Discard. */ m_freem(ln->ln_hold); ln->ln_hold = NULL; } } So I guess there is a reason to clean up. bluhm
Re: [External] : arp mbuf delist
On Wed, Apr 28, 2021 at 12:10:31AM +0200, Alexandr Nedvedicky wrote: > Hello, > > On Tue, Apr 27, 2021 at 11:45:19PM +0200, Alexander Bluhm wrote: > > On Sun, Apr 25, 2021 at 06:08:17PM +1000, Jonathan Matthew wrote: > > > On Sun, Apr 25, 2021 at 09:44:16AM +0200, Alexandr Nedvedicky wrote: > > > This already exists, it's called mq_delist() > > > > Here is the diff with mq_delist(). > > > > > We'd need some other way to do the 'mbuf is back in queue' detection, > > > but I agree this seems like a sensible thing to do. > > > > This part is ugly as before. I have put a printf in the discard > > case it and I never hit it. Any idea why we need this and how to > > fix it? > > > > ok? > > > > bluhm > > > > Index: netinet/if_ether.c > > === > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > > retrieving revision 1.246 > > diff -u -p -r1.246 if_ether.c > > --- netinet/if_ether.c 26 Apr 2021 07:55:16 - 1.246 > > +++ netinet/if_ether.c 27 Apr 2021 21:02:27 - > > @@ -594,7 +594,9 @@ arpcache(struct ifnet *ifp, struct ether > > struct in_addr *spa = (struct in_addr *)ea->arp_spa; > > char addr[INET_ADDRSTRLEN]; > > struct ifnet *rifp; > > + struct mbuf_list ml; > > struct mbuf *m; > > + unsigned int len; > > int changed = 0; > > > > KERNEL_ASSERT_LOCKED(); > > @@ -666,21 +668,17 @@ arpcache(struct ifnet *ifp, struct ether > > > > la->la_asked = 0; > > la->la_refreshed = 0; > > - while ((m = mq_dequeue(&la->la_mq)) != NULL) { > > - unsigned int len; > > - > > - atomic_dec_int(&la_hold_total); > > - len = mq_len(&la->la_mq); > > - > > + mq_delist(&la->la_mq, &ml); > > + len = ml_len(&ml); > > + while ((m = ml_dequeue(&ml)) != NULL) { > > ifp->if_output(ifp, m, rt_key(rt), rt); > > - > > - /* XXXSMP we discard if other CPU enqueues */ > > - if (mq_len(&la->la_mq) > len) { > > - /* mbuf is back in queue. Discard. */ > > - atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); > > - break; > > - } > > } > > > + /* XXXSMP we discard if other CPU enqueues */ > > + if (mq_len(&la->la_mq) > 0) { > > + /* mbuf is back in queue. Discard. */ > > + atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq)); > > + } else > > + atomic_sub_int(&la_hold_total, len); > > > > I think /* XXXSMP ... */ code-chunk is no longer needed. I think your > change, which introduces ml_dequeue(), turns that into an useless > artifact. > > If I understand old code right this extra check tries to make sure > the while() loop completes. I assume ifp->if_output() calls > ether_output(), > which in turn calls ether_encap() -> ether_resolve(). If ether_resolve() > fails to find ARP entry (for whatever reason, perhaps explicit 'arp -d -a' > or interface down), the packet is inserted back to la->la_mq. This would > keep the while loop spinning. > The code not only breaks loop but also cleans the queue. Should this be kept? > using ml_dequeue() solves that problem, because the while() loop now > consumes a private list of packets, which can not grow. It's granted > the loop will finish. > > So I would just delete those lines. > > > thanks and > regards > sashan > > > > return (0); > > } >
Re: time to add NET_ASSERT_WLOCKED()
On Wed, Apr 28, 2021 at 01:44:42AM +0200, Alexandr Nedvedicky wrote: > Hello, > > with moving towards NET_RLOCK...() shall we add an explicit > assert to state caller owns netlock exclusively? I propose > to introduce NET_ASSERT_WLOCKED() > > NET_ASSERT_WLOCKED() I had exacly the same diff. OK bluhm@ > 8<---8<---8<--8< > diff --git a/sys/sys/systm.h b/sys/sys/systm.h > index a26d7f98f21..2dc9d3274fa 100644 > --- a/sys/sys/systm.h > +++ b/sys/sys/systm.h > @@ -361,9 +361,17 @@ do { > \ > splassert_fail(RW_READ, _s, __func__); \ > } while (0) > > +#define NET_ASSERT_WLOCKED() > \ > +do { \ > + int _s = rw_status(&netlock); \ > + if ((splassert_ctl > 0) && (_s != RW_WRITE))\ > + splassert_fail(RW_WRITE, _s, __func__); \ > +} while (0) > + > #else /* DIAGNOSTIC */ > #define NET_ASSERT_UNLOCKED() do {} while (0) > #define NET_ASSERT_LOCKED() do {} while (0) > +#define NET_ASSERT_WLOCKED()do {} while (0) > #endif /* !DIAGNOSTIC */ > > __returns_twice int setjmp(label_t *);
Re: [External] : arp locking
Hello, On Wed, Apr 28, 2021 at 01:49:27AM +0200, Alexander Bluhm wrote: > On Wed, Apr 28, 2021 at 12:46:31AM +0200, Alexandr Nedvedicky wrote: > > looks good in general. I have just two nits/questions. > > mvs@ has asked the same questions. > > > > struct llinfo_arp { > > > - LIST_ENTRY(llinfo_arp) la_list; > > > - struct rtentry *la_rt; /* backpointer to rtentry */ > > > - struct mbuf_queuela_mq; /* packet hold queue */ > > > + LIST_ENTRY(llinfo_arp) la_list; /* [mN] global arp_list */ > > > + struct rtentry *la_rt; /* [I] backpointer to rtentry */ > > > + struct mbuf_queuela_mq; /* [I] packet hold queue */ > > ^^^ > > I think la_mq is not immutable. packets > > get inserted and removed. so the queue changes. it does not seem > > to be immutable. > > Yes, I was confused. I will remove this [I]. thanks > > > > @@ -144,6 +155,8 @@ arp_rtrequest(struct ifnet *ifp, int req > > > struct sockaddr *gate = rt->rt_gateway; > > > struct llinfo_arp *la = (struct llinfo_arp *)rt->rt_llinfo; > > > > > > + NET_ASSERT_LOCKED(); > > > + > > > > is there any case, where arp_rtrequest() is being called as a reader > > on netlock? Looks like arp_rtrequest() is always being called as > > a writer on netlock. > > My multiple nettaskq diff to run forwarding in parallel will change > that to shared lock. > > > I'm not sure we need to grab arp_mtx if we own netlock exclusively here. > > > > Also we should think of introducin NET_ASSERT_WLOCKED() I think. > > I had actually implement it. It proved that this path will be > called shared. > > > > @@ -194,13 +207,17 @@ arp_rtrequest(struct ifnet *ifp, int req > > > rt->rt_flags |= RTF_LLINFO; > > > if ((rt->rt_flags & RTF_LOCAL) == 0) > > > rt->rt_expire = getuptime(); > > > + mtx_enter(&arp_mtx); > > > LIST_INSERT_HEAD(&arp_list, la, la_list); > > > + mtx_leave(&arp_mtx); > > > break; > > > > > > case RTM_DELETE: > > > if (la == NULL) > > > break; > > > + mtx_enter(&arp_mtx); > > > LIST_REMOVE(la, la_list); > > > + mtx_leave(&arp_mtx); > > > rt->rt_llinfo = NULL; > > > rt->rt_flags &= ~RTF_LLINFO; > > > atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); > > > > > These mutexes are needed if we want run IP output on multiple CPU. > The quick workaround was to put kernel lock around arpresolve(). > But small mutex results in less contension. > thanks for clarification. I have no further objections/comments. I'm OK with your diff with fixed comment at la_mq llinfo_arp member. regards sashan
Re: [External] : arp locking
On Wed, Apr 28, 2021 at 12:46:31AM +0200, Alexandr Nedvedicky wrote: > looks good in general. I have just two nits/questions. mvs@ has asked the same questions. > > struct llinfo_arp { > > - LIST_ENTRY(llinfo_arp) la_list; > > - struct rtentry *la_rt; /* backpointer to rtentry */ > > - struct mbuf_queuela_mq; /* packet hold queue */ > > + LIST_ENTRY(llinfo_arp) la_list; /* [mN] global arp_list */ > > + struct rtentry *la_rt; /* [I] backpointer to rtentry */ > > + struct mbuf_queuela_mq; /* [I] packet hold queue */ > ^^^ > I think la_mq is not immutable. packets > get inserted and removed. so the queue changes. it does not seem > to be immutable. Yes, I was confused. I will remove this [I]. > > @@ -144,6 +155,8 @@ arp_rtrequest(struct ifnet *ifp, int req > > struct sockaddr *gate = rt->rt_gateway; > > struct llinfo_arp *la = (struct llinfo_arp *)rt->rt_llinfo; > > > > + NET_ASSERT_LOCKED(); > > + > > is there any case, where arp_rtrequest() is being called as a reader > on netlock? Looks like arp_rtrequest() is always being called as > a writer on netlock. My multiple nettaskq diff to run forwarding in parallel will change that to shared lock. > I'm not sure we need to grab arp_mtx if we own netlock exclusively here. > > Also we should think of introducin NET_ASSERT_WLOCKED() I think. I had actually implement it. It proved that this path will be called shared. > > @@ -194,13 +207,17 @@ arp_rtrequest(struct ifnet *ifp, int req > > rt->rt_flags |= RTF_LLINFO; > > if ((rt->rt_flags & RTF_LOCAL) == 0) > > rt->rt_expire = getuptime(); > > + mtx_enter(&arp_mtx); > > LIST_INSERT_HEAD(&arp_list, la, la_list); > > + mtx_leave(&arp_mtx); > > break; > > > > case RTM_DELETE: > > if (la == NULL) > > break; > > + mtx_enter(&arp_mtx); > > LIST_REMOVE(la, la_list); > > + mtx_leave(&arp_mtx); > > rt->rt_llinfo = NULL; > > rt->rt_flags &= ~RTF_LLINFO; > > atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); > > These mutexes are needed if we want run IP output on multiple CPU. The quick workaround was to put kernel lock around arpresolve(). But small mutex results in less contension. bluhm
time to add NET_ASSERT_WLOCKED()
Hello, with moving towards NET_RLOCK...() shall we add an explicit assert to state caller owns netlock exclusively? I propose to introduce NET_ASSERT_WLOCKED() NET_ASSERT_WLOCKED() thanks and regards sashan 8<---8<---8<--8< diff --git a/sys/sys/systm.h b/sys/sys/systm.h index a26d7f98f21..2dc9d3274fa 100644 --- a/sys/sys/systm.h +++ b/sys/sys/systm.h @@ -361,9 +361,17 @@ do { \ splassert_fail(RW_READ, _s, __func__); \ } while (0) +#defineNET_ASSERT_WLOCKED() \ +do { \ + int _s = rw_status(&netlock); \ + if ((splassert_ctl > 0) && (_s != RW_WRITE))\ + splassert_fail(RW_WRITE, _s, __func__); \ +} while (0) + #else /* DIAGNOSTIC */ #defineNET_ASSERT_UNLOCKED() do {} while (0) #defineNET_ASSERT_LOCKED() do {} while (0) +#defineNET_ASSERT_WLOCKED()do {} while (0) #endif /* !DIAGNOSTIC */ __returns_twice intsetjmp(label_t *);
Re: enable dt(4)
On Mon, Apr 26, 2021 at 05:13:55PM +0200, Sebastien Marie wrote: > > I can't vouch that it builds for all architectures... Did anyone do > > that? Number 1 rule: don't break Theo's build. > > One test would be to build on i386 (with full release process): we are > near the limit currently, so it could overflow the available disk > space very easily. I did a release build on i386. > Regarding usefulness on archs, dt(4) has proper frames-skip support > for amd64, prowerpc64 and sparc64 only (others archs would default to > "don't skip frames in stack traces"). On i386 dt(4) does not strip the stack trace. But it builds and the device works. The output is useful, the upper layers of the graph are redundant. http://bluhm.genua.de/files/kstack-i386.svg Better commit the architectures where it works instead of waiting until someone fixes alpha, loongson, ... ok? bluhm Index: conf/GENERIC === RCS file: /data/mirror/openbsd/cvs/src/sys/conf/GENERIC,v retrieving revision 1.276 diff -u -p -r1.276 GENERIC --- conf/GENERIC22 Apr 2021 10:23:07 - 1.276 +++ conf/GENERIC26 Apr 2021 13:37:19 - @@ -82,7 +82,6 @@ pseudo-device msts1 # MSTS line discipl pseudo-device endrun 1 # EndRun line discipline pseudo-device vnd 4 # vnode disk devices pseudo-device ksyms 1 # kernel symbols device -#pseudo-device dt # Dynamic Tracer # clonable devices pseudo-device bpfilter# packet filter Index: arch/i386/conf/GENERIC === RCS file: /data/mirror/openbsd/cvs/src/sys/arch/i386/conf/GENERIC,v retrieving revision 1.855 diff -u -p -r1.855 GENERIC --- arch/i386/conf/GENERIC 4 Feb 2021 16:25:39 - 1.855 +++ arch/i386/conf/GENERIC 26 Apr 2021 13:35:33 - @@ -763,6 +763,7 @@ owctr* at onewire? # Counter device pseudo-device pctr1 pseudo-device nvram 1 pseudo-device hotplug 1 # devices hot plugging +pseudo-device dt # mouse & keyboard multiplexor pseudo-devices pseudo-device wsmux 2 Index: arch/amd64/conf/GENERIC === RCS file: /data/mirror/openbsd/cvs/src/sys/arch/amd64/conf/GENERIC,v retrieving revision 1.497 diff -u -p -r1.497 GENERIC --- arch/amd64/conf/GENERIC 4 Feb 2021 16:25:38 - 1.497 +++ arch/amd64/conf/GENERIC 26 Apr 2021 13:36:10 - @@ -685,6 +685,7 @@ owctr* at onewire? # Counter device pseudo-device pctr1 pseudo-device nvram 1 pseudo-device hotplug 1 # devices hot plugging +pseudo-device dt # mouse & keyboard multiplexor pseudo-devices pseudo-device wsmux 2 Index: arch/arm64/conf/GENERIC === RCS file: /data/mirror/openbsd/cvs/src/sys/arch/arm64/conf/GENERIC,v retrieving revision 1.194 diff -u -p -r1.194 GENERIC --- arch/arm64/conf/GENERIC 7 Apr 2021 17:12:22 - 1.194 +++ arch/arm64/conf/GENERIC 26 Apr 2021 13:36:22 - @@ -490,6 +490,7 @@ owctr* at onewire? # Counter device # Pseudo-Devices pseudo-device openprom pseudo-device hotplug 1 # devices hot plugging +pseudo-device dt # mouse & keyboard multiplexor pseudo-devices pseudo-device wsmux 2 Index: arch/powerpc64/conf/GENERIC === RCS file: /data/mirror/openbsd/cvs/src/sys/arch/powerpc64/conf/GENERIC,v retrieving revision 1.25 diff -u -p -r1.25 GENERIC --- arch/powerpc64/conf/GENERIC 4 Feb 2021 16:25:39 - 1.25 +++ arch/powerpc64/conf/GENERIC 26 Apr 2021 13:37:00 - @@ -186,4 +186,5 @@ brgphy* at mii? # Broadcom Gigabit PH # Pseudo-Devices pseudo-device openprom +pseudo-device dt pseudo-device wsmux 2
Re: arp locking
On Tue, Apr 27, 2021 at 07:27:43PM +0200, Alexander Bluhm wrote: > Hi, > > I would like to document the locking mechanism of the global variables > in ARP. > > The global arp_list is protected by net lock. This is not sufficent > when we switch to shared netlock. So I added a mutex for insertion > and removal if netlock is not exclusive. > > ok? > > bluhm > Hi. I doubt `la_mq' should be marked as [I]. This is not true. It's mutable but through api, so related locking description should be placed where `mbuf_queue' structure is defined. Just leave `la_mq' without any marks as we do in other similar places. Also do you want to run arp_rtrequest() with shared lock in future? I'm asking because we run in with exclusive lock held. > Index: netinet/if_ether.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.246 > diff -u -p -r1.246 if_ether.c > --- netinet/if_ether.c26 Apr 2021 07:55:16 - 1.246 > +++ netinet/if_ether.c27 Apr 2021 17:23:17 - > @@ -65,10 +65,19 @@ > #include > #endif > > +/* > + * Locks used to protect struct members in this file: > + * a atomic operations > + * I immutable after creation > + * K kernel lock > + * m arp mutex, needed when net lock is shared > + * N net lock > + */ > + > struct llinfo_arp { > - LIST_ENTRY(llinfo_arp) la_list; > - struct rtentry *la_rt; /* backpointer to rtentry */ > - struct mbuf_queuela_mq; /* packet hold queue */ > + LIST_ENTRY(llinfo_arp) la_list; /* [mN] global arp_list */ > + struct rtentry *la_rt; /* [I] backpointer to rtentry */ > + struct mbuf_queuela_mq; /* [I] packet hold queue */ > time_t la_refreshed; /* when was refresh sent */ > int la_asked; /* number of queries sent */ > }; > @@ -76,9 +85,9 @@ struct llinfo_arp { > #define LA_HOLD_TOTAL 100 > > /* timer values */ > -int arpt_prune = (5 * 60); /* walk list every 5 minutes */ > -int arpt_keep = (20 * 60); /* once resolved, cache for 20 minutes */ > -int arpt_down = 20; /* once declared down, don't send for 20 secs */ > +int arpt_prune = (5 * 60); /* [I] walk list every 5 minutes */ > +int arpt_keep = (20 * 60); /* [a] once resolved, cache for 20 minutes */ > +int arpt_down = 20; /* [a] once declared down, don't send for 20 secs */ > > struct mbuf *arppullup(struct mbuf *m); > void arpinvalidate(struct rtentry *); > @@ -92,11 +101,12 @@ void arpreply(struct ifnet *, struct mbu > unsigned int); > > struct niqueue arpinq = NIQUEUE_INITIALIZER(50, NETISR_ARP); > +struct mutex arp_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); > > -LIST_HEAD(, llinfo_arp) arp_list; > -struct pool arp_pool; /* pool for llinfo_arp structures */ > -int arp_maxtries = 5; > -int la_hold_total; > +LIST_HEAD(, llinfo_arp) arp_list; /* [Nm] list of all llinfo_arp structures > */ > +struct pool arp_pool; /* [I] pool for llinfo_arp structures */ > +int arp_maxtries = 5; /* [I] arp requests before set to rejected */ > +int la_hold_total; /* [a] packets currently in the arp queue */ > > #ifdef NFSCLIENT > /* revarp state */ > @@ -117,6 +127,7 @@ arptimer(void *arg) > > NET_LOCK(); > timeout_add_sec(to, arpt_prune); > + /* Net lock is exclusive, no arp mutex needed for arp_list here. */ > LIST_FOREACH_SAFE(la, &arp_list, la_list, nla) { > struct rtentry *rt = la->la_rt; > > @@ -144,6 +155,8 @@ arp_rtrequest(struct ifnet *ifp, int req > struct sockaddr *gate = rt->rt_gateway; > struct llinfo_arp *la = (struct llinfo_arp *)rt->rt_llinfo; > > + NET_ASSERT_LOCKED(); > + > if (ISSET(rt->rt_flags, > RTF_GATEWAY|RTF_BROADCAST|RTF_MULTICAST|RTF_MPLS)) > return; > @@ -194,13 +207,17 @@ arp_rtrequest(struct ifnet *ifp, int req > rt->rt_flags |= RTF_LLINFO; > if ((rt->rt_flags & RTF_LOCAL) == 0) > rt->rt_expire = getuptime(); > + mtx_enter(&arp_mtx); > LIST_INSERT_HEAD(&arp_list, la, la_list); > + mtx_leave(&arp_mtx); > break; > > case RTM_DELETE: > if (la == NULL) > break; > + mtx_enter(&arp_mtx); > LIST_REMOVE(la, la_list); > + mtx_leave(&arp_mtx); > rt->rt_llinfo = NULL; > rt->rt_flags &= ~RTF_LLINFO; > atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); >
Re: [External] : arp locking
Hello, looks good in general. I have just two nits/questions. > ok? > > bluhm > > Index: netinet/if_ether.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.246 > diff -u -p -r1.246 if_ether.c > --- netinet/if_ether.c26 Apr 2021 07:55:16 - 1.246 > +++ netinet/if_ether.c27 Apr 2021 17:23:17 - > @@ -65,10 +65,19 @@ > #include > #endif > > +/* > + * Locks used to protect struct members in this file: > + * a atomic operations > + * I immutable after creation > + * K kernel lock > + * m arp mutex, needed when net lock is shared > + * N net lock > + */ > + > struct llinfo_arp { > - LIST_ENTRY(llinfo_arp) la_list; > - struct rtentry *la_rt; /* backpointer to rtentry */ > - struct mbuf_queuela_mq; /* packet hold queue */ > + LIST_ENTRY(llinfo_arp) la_list; /* [mN] global arp_list */ > + struct rtentry *la_rt; /* [I] backpointer to rtentry */ > + struct mbuf_queuela_mq; /* [I] packet hold queue */ ^^^ I think la_mq is not immutable. packets get inserted and removed. so the queue changes. it does not seem to be immutable. > time_t la_refreshed; /* when was refresh sent */ > int la_asked; /* number of queries sent */ > }; > @@ -144,6 +155,8 @@ arp_rtrequest(struct ifnet *ifp, int req > struct sockaddr *gate = rt->rt_gateway; > struct llinfo_arp *la = (struct llinfo_arp *)rt->rt_llinfo; > > + NET_ASSERT_LOCKED(); > + is there any case, where arp_rtrequest() is being called as a reader on netlock? Looks like arp_rtrequest() is always being called as a writer on netlock. I'm not sure we need to grab arp_mtx if we own netlock exclusively here. Also we should think of introducin NET_ASSERT_WLOCKED() I think. > if (ISSET(rt->rt_flags, > RTF_GATEWAY|RTF_BROADCAST|RTF_MULTICAST|RTF_MPLS)) > return; > @@ -194,13 +207,17 @@ arp_rtrequest(struct ifnet *ifp, int req > rt->rt_flags |= RTF_LLINFO; > if ((rt->rt_flags & RTF_LOCAL) == 0) > rt->rt_expire = getuptime(); > + mtx_enter(&arp_mtx); > LIST_INSERT_HEAD(&arp_list, la, la_list); > + mtx_leave(&arp_mtx); > break; > > case RTM_DELETE: > if (la == NULL) > break; > + mtx_enter(&arp_mtx); > LIST_REMOVE(la, la_list); > + mtx_leave(&arp_mtx); > rt->rt_llinfo = NULL; > rt->rt_flags &= ~RTF_LLINFO; > atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); > thanks and regards sashan
iscsictl.8: Update man page to reflect new behavior of reload command
Hi all, Attached is a diff which update iscsictl.8 to reflect the recent change in behavior of iscsictl's reload command. Thanks, Ash diff --git a/usr.sbin/iscsictl/iscsictl.8 b/usr.sbin/iscsictl/iscsictl.8 index 5886a0f8f1b..1d27978eac5 100644 --- a/usr.sbin/iscsictl/iscsictl.8 +++ b/usr.sbin/iscsictl/iscsictl.8 @@ -47,7 +47,9 @@ to communicate with The following commands are available: .Bl -tag -width Ds .It Cm reload -Reload the configuration file. +Reload the configuration file and wait to return until iscsid reports all +connections have completed (successfully or otherwise), or for up to 10 +seconds. .It Cm show Op Cm summary Show a list of all configured sessions. .It Cm show Cm vscsi stats
Re: [External] : arp mbuf delist
Hello, On Tue, Apr 27, 2021 at 11:45:19PM +0200, Alexander Bluhm wrote: > On Sun, Apr 25, 2021 at 06:08:17PM +1000, Jonathan Matthew wrote: > > On Sun, Apr 25, 2021 at 09:44:16AM +0200, Alexandr Nedvedicky wrote: > > This already exists, it's called mq_delist() > > Here is the diff with mq_delist(). > > > We'd need some other way to do the 'mbuf is back in queue' detection, > > but I agree this seems like a sensible thing to do. > > This part is ugly as before. I have put a printf in the discard > case it and I never hit it. Any idea why we need this and how to > fix it? > > ok? > > bluhm > > Index: netinet/if_ether.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v > retrieving revision 1.246 > diff -u -p -r1.246 if_ether.c > --- netinet/if_ether.c26 Apr 2021 07:55:16 - 1.246 > +++ netinet/if_ether.c27 Apr 2021 21:02:27 - > @@ -594,7 +594,9 @@ arpcache(struct ifnet *ifp, struct ether > struct in_addr *spa = (struct in_addr *)ea->arp_spa; > char addr[INET_ADDRSTRLEN]; > struct ifnet *rifp; > + struct mbuf_list ml; > struct mbuf *m; > + unsigned int len; > int changed = 0; > > KERNEL_ASSERT_LOCKED(); > @@ -666,21 +668,17 @@ arpcache(struct ifnet *ifp, struct ether > > la->la_asked = 0; > la->la_refreshed = 0; > - while ((m = mq_dequeue(&la->la_mq)) != NULL) { > - unsigned int len; > - > - atomic_dec_int(&la_hold_total); > - len = mq_len(&la->la_mq); > - > + mq_delist(&la->la_mq, &ml); > + len = ml_len(&ml); > + while ((m = ml_dequeue(&ml)) != NULL) { > ifp->if_output(ifp, m, rt_key(rt), rt); > - > - /* XXXSMP we discard if other CPU enqueues */ > - if (mq_len(&la->la_mq) > len) { > - /* mbuf is back in queue. Discard. */ > - atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); > - break; > - } > } > + /* XXXSMP we discard if other CPU enqueues */ > + if (mq_len(&la->la_mq) > 0) { > + /* mbuf is back in queue. Discard. */ > + atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq)); > + } else > + atomic_sub_int(&la_hold_total, len); > I think /* XXXSMP ... */ code-chunk is no longer needed. I think your change, which introduces ml_dequeue(), turns that into an useless artifact. If I understand old code right this extra check tries to make sure the while() loop completes. I assume ifp->if_output() calls ether_output(), which in turn calls ether_encap() -> ether_resolve(). If ether_resolve() fails to find ARP entry (for whatever reason, perhaps explicit 'arp -d -a' or interface down), the packet is inserted back to la->la_mq. This would keep the while loop spinning. using ml_dequeue() solves that problem, because the while() loop now consumes a private list of packets, which can not grow. It's granted the loop will finish. So I would just delete those lines. thanks and regards sashan > return (0); > }
arp mbuf delist
On Sun, Apr 25, 2021 at 06:08:17PM +1000, Jonathan Matthew wrote: > On Sun, Apr 25, 2021 at 09:44:16AM +0200, Alexandr Nedvedicky wrote: > This already exists, it's called mq_delist() Here is the diff with mq_delist(). > We'd need some other way to do the 'mbuf is back in queue' detection, > but I agree this seems like a sensible thing to do. This part is ugly as before. I have put a printf in the discard case it and I never hit it. Any idea why we need this and how to fix it? ok? bluhm Index: netinet/if_ether.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.246 diff -u -p -r1.246 if_ether.c --- netinet/if_ether.c 26 Apr 2021 07:55:16 - 1.246 +++ netinet/if_ether.c 27 Apr 2021 21:02:27 - @@ -594,7 +594,9 @@ arpcache(struct ifnet *ifp, struct ether struct in_addr *spa = (struct in_addr *)ea->arp_spa; char addr[INET_ADDRSTRLEN]; struct ifnet *rifp; + struct mbuf_list ml; struct mbuf *m; + unsigned int len; int changed = 0; KERNEL_ASSERT_LOCKED(); @@ -666,21 +668,17 @@ arpcache(struct ifnet *ifp, struct ether la->la_asked = 0; la->la_refreshed = 0; - while ((m = mq_dequeue(&la->la_mq)) != NULL) { - unsigned int len; - - atomic_dec_int(&la_hold_total); - len = mq_len(&la->la_mq); - + mq_delist(&la->la_mq, &ml); + len = ml_len(&ml); + while ((m = ml_dequeue(&ml)) != NULL) { ifp->if_output(ifp, m, rt_key(rt), rt); - - /* XXXSMP we discard if other CPU enqueues */ - if (mq_len(&la->la_mq) > len) { - /* mbuf is back in queue. Discard. */ - atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq)); - break; - } } + /* XXXSMP we discard if other CPU enqueues */ + if (mq_len(&la->la_mq) > 0) { + /* mbuf is back in queue. Discard. */ + atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq)); + } else + atomic_sub_int(&la_hold_total, len); return (0); }
Re: Respect X-Forwarded-Proto in httpd
On Tue Apr 27, 2021 at 4:55 PM EDT, Stuart Henderson wrote: > It's the other way round, this (or proto= in the newer standardised > Forwarded header) would be set by a reverse proxy to indicate the > protocol that the client request came in on so that something running on > the webserver could react accordingly (either in URL construction or to > issue a redirect to https if wanted). Yeah - out in the wild, reverse proxies rewrite Location themselves (in potentially more complicated ways, because whatever they're proxying might think it's http://web-internal-123:). E.g., nginx proxy_pass does this, and it can be manually changed with proxy_redirect. What I don't think nginx does is rewrite its own Location headers based on incoming forwarding headers, like this diff does. If I generate a 301 from nginx over nc with X-Forwarded-Proto: https, the Location it gives me starts with http://. It can probably be *configured* to do it somehow, it's a behemoth.
Re: Respect X-Forwarded-Proto in httpd
On 2021/04/27 16:23, Raymond E. Pasco wrote: > On Tue Apr 27, 2021 at 3:40 PM EDT, Stuart Henderson wrote: > > How does this work with other web servers? For example, I don't see the > > string X-Forwarded-Proto in nginx or Apache httpd (and the use of other > > X-Forwarded headers in them are only for adding to requests when running > > as a proxy itself, or picking up the client IP from headers rather than > > TCP). > > I think this header is usually set by administrators in a configuration > file, at least for nginx; something that aims to be more out-of-the-box > like Caddy sets it automatically. > > My understanding is that common reverse proxies can be told to rewrite > the Location header in this and similar cases, but I haven't looked > closely at it. > It's the other way round, this (or proto= in the newer standardised Forwarded header) would be set by a reverse proxy to indicate the protocol that the client request came in on so that something running on the webserver could react accordingly (either in URL construction or to issue a redirect to https if wanted).
Re: Respect X-Forwarded-Proto in httpd
On Tue Apr 27, 2021 at 3:40 PM EDT, Stuart Henderson wrote: > How does this work with other web servers? For example, I don't see the > string X-Forwarded-Proto in nginx or Apache httpd (and the use of other > X-Forwarded headers in them are only for adding to requests when running > as a proxy itself, or picking up the client IP from headers rather than > TCP). I think this header is usually set by administrators in a configuration file, at least for nginx; something that aims to be more out-of-the-box like Caddy sets it automatically. My understanding is that common reverse proxies can be told to rewrite the Location header in this and similar cases, but I haven't looked closely at it.
Re: Respect X-Forwarded-Proto in httpd
Stuart Henderson writes: > On 2021/04/27 10:40, Vincent Lee wrote: >> >> Hi all, >> >> Consider the following situation. A reverse proxy which performs TLS >> termination is deployed in front of httpd, which listens unencrypted on >> localhost. >> >> There is code in httpd to handle the case where a directory is accessed, >> but the path named does not end with a slash. In this case, httpd >> issues a 301 redirect to the path with a slash appended. >> The logic here sets the protocol of the redirect path based on >> whether the httpd virtual server is configured with TLS, but this isn't >> enough because it will cause redirects to plain http when there is a >> reverse proxy in front of httpd that performs TLS termination. >> This will either cause an extra redirect round trip to get back to HTTPS, >> or break the site if it's not publicly served on plain HTTP. >> >> Instead, we should be reading X-Forwarded-Proto, which most reverse proxies >> add to inform the backing server what the original protocol was. >> (relayd doesn't expose this to my knowledge, but I will look into doing so) >> >> The below attached diff does this for httpd. This is my first diff to the >> project, so please give feedback on whether I've done everything right. > > How does this work with other web servers? For example, I don't see the > string X-Forwarded-Proto in nginx or Apache httpd (and the use of other > X-Forwarded headers in them are only for adding to requests when running > as a proxy itself, or picking up the client IP from headers rather than > TCP). I concur in what you're seeing in other codebases. On the proxy side, what I'm finding is any support for X-Forwarded-Proto is runtime config and effectively manual intervention by whomever is configuring the proxy. For Nginx, it seems they have an "autoindex" feature, but don't account for X-Forwarded-Proto like in the proposed diff. What nginx *does* support is disabling absolute redirects changing the way the Location header gets constructed in the redirect. (See [1].) Maybe that makes more sense as a feature in httpd(8)? It should achieve the correct end result. (Maybe it's already possible? Need to check.) -dv [1] https://nginx.org/en/docs/http/ngx_http_core_module.html#absolute_redirect
Re: sysctl net.inet.ip.arpqueued read only
On Tue, Apr 27, 2021 at 09:12:56PM +0200, Alexander Bluhm wrote: > On Tue, Apr 27, 2021 at 12:18:02PM -0600, Theo de Raadt wrote: > > Actually, your variation seems pretty good. Is there any reason to not > > use this type of define? > > This would look like this. > > I think sysctl_int() and sysctl_rdint() should be the primitive > functions. This brings us back the 4.4BSD implementation. Then > sysctl_int_bounded() builds the magic on top like sysctl_int_lower() > does it. sysctl_bounded_arr() is a wrapper around it. > > I just added a few defines that my simple grep found. We could > search for more. > > bluhm > This looks much better, I like the direction. > Index: kern/kern_sysctl.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v > retrieving revision 1.390 > diff -u -p -r1.390 kern_sysctl.c > --- kern/kern_sysctl.c23 Apr 2021 07:21:02 - 1.390 > +++ kern/kern_sysctl.c27 Apr 2021 19:09:18 - > @@ -818,7 +818,8 @@ debug_sysctl(int *name, u_int namelen, v > * Reads, or writes that lower the value > */ > int > -sysctl_int_lower(void *oldp, size_t *oldlenp, void *newp, size_t newlen, int > *valp) > +sysctl_int_lower(void *oldp, size_t *oldlenp, void *newp, size_t newlen, > +int *valp) > { > unsigned int oval = *valp, val = *valp; > int error; > @@ -841,35 +842,40 @@ sysctl_int_lower(void *oldp, size_t *old > int > sysctl_int(void *oldp, size_t *oldlenp, void *newp, size_t newlen, int *valp) > { > - return (sysctl_int_bounded(oldp, oldlenp, newp, newlen, valp, 0, 0)); > -} > - > -int > -sysctl_int_bounded(void *oldp, size_t *oldlenp, void *newp, size_t newlen, > -int *valp, int minimum, int maximum) > -{ > int error = 0; > - int val; > > if (oldp && *oldlenp < sizeof(int)) > return (ENOMEM); > if (newp && newlen != sizeof(int)) > return (EINVAL); > *oldlenp = sizeof(int); > - val = *valp; > if (oldp) > - error = copyout(&val, oldp, sizeof(int)); > + error = copyout(valp, oldp, sizeof(int)); > if (error == 0 && newp) > - error = copyin(newp, &val, sizeof(int)); > - if (error) > - return (error); > - if (minimum == maximum || (minimum <= val && val <= maximum)) > - *valp = val; > - else > - error = EINVAL; > + error = copyin(newp, valp, sizeof(int)); > return (error); > } > > +int > +sysctl_int_bounded(void *oldp, size_t *oldlenp, void *newp, size_t newlen, > +int *valp, int minimum, int maximum) > +{ > + int val = *valp; > + int error; > + > + /* read only */ > + if (newp == NULL || minimum > maximum) > + return (sysctl_rdint(oldp, oldlenp, newp, *valp)); > + > + if ((error = sysctl_int(oldp, oldlenp, newp, newlen, &val))) > + return (error); > + /* bounded and not within limits */ > + if (minimum < maximum && (val < minimum || maximum < val)) > + return (EINVAL); > + *valp = val; > + return (0); > +} > + > /* > * As above, but read-only. > */ > @@ -901,14 +907,8 @@ sysctl_bounded_arr(const struct sysctl_b > return (ENOTDIR); > for (i = 0; i < valplen; ++i) { > if (valpp[i].mib == name[0]) { > - if (valpp[i].minimum <= valpp[i].maximum) { > - return (sysctl_int_bounded(oldp, oldlenp, newp, > - newlen, valpp[i].var, valpp[i].minimum, > - valpp[i].maximum)); > - } else { > - return (sysctl_rdint(oldp, oldlenp, newp, > - *valpp[i].var)); > - } > + return (sysctl_int_bounded(oldp, oldlenp, newp, newlen, > + valpp[i].var, valpp[i].minimum, valpp[i].maximum)); > } > } > return (EOPNOTSUPP); > Index: kern/kern_tc.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_tc.c,v > retrieving revision 1.71 > diff -u -p -r1.71 kern_tc.c > --- kern/kern_tc.c23 Feb 2021 04:44:31 - 1.71 > +++ kern/kern_tc.c27 Apr 2021 18:57:29 - > @@ -829,7 +829,7 @@ inittimecounter(void) > } > > const struct sysctl_bounded_args tc_vars[] = { > - { KERN_TIMECOUNTER_TICK, &tc_tick, 1, 0 }, > + { KERN_TIMECOUNTER_TICK, &tc_tick, SYSCTL_INT_READONLY }, > { KERN_TIMECOUNTER_TIMESTEPWARNINGS, ×tepwarnings, 0, 1 }, > }; > > Index: kern/sysv_sem.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/sysv_sem.c,v > retrieving revision 1.60 > diff -u -p -r1.60 sysv_sem.c > --- kern/sysv_sem.c 17 Nov 2020 03:23:54 - 1.60 > +++ kern/sysv_se
Re: Respect X-Forwarded-Proto in httpd
On 2021/04/27 10:40, Vincent Lee wrote: > > Hi all, > > Consider the following situation. A reverse proxy which performs TLS > termination is deployed in front of httpd, which listens unencrypted on > localhost. > > There is code in httpd to handle the case where a directory is accessed, > but the path named does not end with a slash. In this case, httpd > issues a 301 redirect to the path with a slash appended. > The logic here sets the protocol of the redirect path based on > whether the httpd virtual server is configured with TLS, but this isn't > enough because it will cause redirects to plain http when there is a > reverse proxy in front of httpd that performs TLS termination. > This will either cause an extra redirect round trip to get back to HTTPS, > or break the site if it's not publicly served on plain HTTP. > > Instead, we should be reading X-Forwarded-Proto, which most reverse proxies > add to inform the backing server what the original protocol was. > (relayd doesn't expose this to my knowledge, but I will look into doing so) > > The below attached diff does this for httpd. This is my first diff to the > project, so please give feedback on whether I've done everything right. How does this work with other web servers? For example, I don't see the string X-Forwarded-Proto in nginx or Apache httpd (and the use of other X-Forwarded headers in them are only for adding to requests when running as a proxy itself, or picking up the client IP from headers rather than TCP).
Re: Respect X-Forwarded-Proto in httpd
Vincent Lee writes: > Hi all, > > Consider the following situation. A reverse proxy which performs TLS > termination is deployed in front of httpd, which listens unencrypted on > localhost. > > There is code in httpd to handle the case where a directory is accessed, > but the path named does not end with a slash. In this case, httpd > issues a 301 redirect to the path with a slash appended. > The logic here sets the protocol of the redirect path based on > whether the httpd virtual server is configured with TLS, but this isn't > enough because it will cause redirects to plain http when there is a > reverse proxy in front of httpd that performs TLS termination. > This will either cause an extra redirect round trip to get back to HTTPS, > or break the site if it's not publicly served on plain HTTP. > > Instead, we should be reading X-Forwarded-Proto, which most reverse proxies > add to inform the backing server what the original protocol was. > (relayd doesn't expose this to my knowledge, but I will look into doing so) Makes sense to me. > > The below attached diff does this for httpd. This is my first diff to the > project, so please give feedback on whether I've done everything right. > Thanks! Some comments below in the diff. > Thanks! > > PS: the X-Forwarded-* headers seem to now have a standardized form > called "Forwarded", but I opted for the X- version since the rest > of httpd only supports the X- headers, to keep consistency. I only looked briefly, but per relayd.conf.5 it seems relayd(8) doesn't automatically apply these headers but one can do so via configuration of an http relay. Can you try testing your suggested change with that configuration? To me it makes sense that we make sure relayd(8) and httpd(8) work well together. > > Index: server_file.c > === > RCS file: /cvs/src/usr.sbin/httpd/server_file.c,v > retrieving revision 1.69 > diff -u -p -u -p -r1.69 server_file.c > --- server_file.c 16 Mar 2021 06:44:14 - 1.69 > +++ server_file.c 27 Apr 2021 15:25:51 - > @@ -83,11 +83,28 @@ server_file_access(struct httpd *env, st > > /* Redirect to path with trailing "/" */ > if (path[strlen(path) - 1] != '/') { > + const char *proto; *proto should be declared at the top of the function to keep with the style. > + > + proto = srv_conf->flags & SRVFLAG_TLS > + ? "https" : "http"; > + > + /* If request specifies a X-Forwarded-Proto header, > + * respect that. */ See style(8) for the multi-line comment format. > + key.kv_key = "X-Forwarded-Proto"; > + if ((r = kv_find(&desc->http_headers, &key)) != NULL && > + r->kv_value != NULL) { > + if (strcmp("https", r->kv_value) == 0) > + proto = "https"; > + else if (strcmp("http", r->kv_value) == 0) > + proto = "http"; > + else > + return (500); > + } > + > if ((encodedpath = url_encode(desc->http_path)) == NULL) > return (500); > - if (asprintf(&newpath, "http%s://%s%s/", > - srv_conf->flags & SRVFLAG_TLS ? "s" : "", > - desc->http_host, encodedpath) == -1) { > + if (asprintf(&newpath, "%s://%s%s/", > + proto, desc->http_host, encodedpath) == -1) { > free(encodedpath); > return (500); > } -dv
Re: sysctl net.inet.ip.arpqueued read only
On Tue, Apr 27, 2021 at 12:18:02PM -0600, Theo de Raadt wrote: > Actually, your variation seems pretty good. Is there any reason to not > use this type of define? This would look like this. I think sysctl_int() and sysctl_rdint() should be the primitive functions. This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic on top like sysctl_int_lower() does it. sysctl_bounded_arr() is a wrapper around it. I just added a few defines that my simple grep found. We could search for more. bluhm Index: kern/kern_sysctl.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_sysctl.c,v retrieving revision 1.390 diff -u -p -r1.390 kern_sysctl.c --- kern/kern_sysctl.c 23 Apr 2021 07:21:02 - 1.390 +++ kern/kern_sysctl.c 27 Apr 2021 19:09:18 - @@ -818,7 +818,8 @@ debug_sysctl(int *name, u_int namelen, v * Reads, or writes that lower the value */ int -sysctl_int_lower(void *oldp, size_t *oldlenp, void *newp, size_t newlen, int *valp) +sysctl_int_lower(void *oldp, size_t *oldlenp, void *newp, size_t newlen, +int *valp) { unsigned int oval = *valp, val = *valp; int error; @@ -841,35 +842,40 @@ sysctl_int_lower(void *oldp, size_t *old int sysctl_int(void *oldp, size_t *oldlenp, void *newp, size_t newlen, int *valp) { - return (sysctl_int_bounded(oldp, oldlenp, newp, newlen, valp, 0, 0)); -} - -int -sysctl_int_bounded(void *oldp, size_t *oldlenp, void *newp, size_t newlen, -int *valp, int minimum, int maximum) -{ int error = 0; - int val; if (oldp && *oldlenp < sizeof(int)) return (ENOMEM); if (newp && newlen != sizeof(int)) return (EINVAL); *oldlenp = sizeof(int); - val = *valp; if (oldp) - error = copyout(&val, oldp, sizeof(int)); + error = copyout(valp, oldp, sizeof(int)); if (error == 0 && newp) - error = copyin(newp, &val, sizeof(int)); - if (error) - return (error); - if (minimum == maximum || (minimum <= val && val <= maximum)) - *valp = val; - else - error = EINVAL; + error = copyin(newp, valp, sizeof(int)); return (error); } +int +sysctl_int_bounded(void *oldp, size_t *oldlenp, void *newp, size_t newlen, +int *valp, int minimum, int maximum) +{ + int val = *valp; + int error; + + /* read only */ + if (newp == NULL || minimum > maximum) + return (sysctl_rdint(oldp, oldlenp, newp, *valp)); + + if ((error = sysctl_int(oldp, oldlenp, newp, newlen, &val))) + return (error); + /* bounded and not within limits */ + if (minimum < maximum && (val < minimum || maximum < val)) + return (EINVAL); + *valp = val; + return (0); +} + /* * As above, but read-only. */ @@ -901,14 +907,8 @@ sysctl_bounded_arr(const struct sysctl_b return (ENOTDIR); for (i = 0; i < valplen; ++i) { if (valpp[i].mib == name[0]) { - if (valpp[i].minimum <= valpp[i].maximum) { - return (sysctl_int_bounded(oldp, oldlenp, newp, - newlen, valpp[i].var, valpp[i].minimum, - valpp[i].maximum)); - } else { - return (sysctl_rdint(oldp, oldlenp, newp, - *valpp[i].var)); - } + return (sysctl_int_bounded(oldp, oldlenp, newp, newlen, + valpp[i].var, valpp[i].minimum, valpp[i].maximum)); } } return (EOPNOTSUPP); Index: kern/kern_tc.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_tc.c,v retrieving revision 1.71 diff -u -p -r1.71 kern_tc.c --- kern/kern_tc.c 23 Feb 2021 04:44:31 - 1.71 +++ kern/kern_tc.c 27 Apr 2021 18:57:29 - @@ -829,7 +829,7 @@ inittimecounter(void) } const struct sysctl_bounded_args tc_vars[] = { - { KERN_TIMECOUNTER_TICK, &tc_tick, 1, 0 }, + { KERN_TIMECOUNTER_TICK, &tc_tick, SYSCTL_INT_READONLY }, { KERN_TIMECOUNTER_TIMESTEPWARNINGS, ×tepwarnings, 0, 1 }, }; Index: kern/sysv_sem.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/sysv_sem.c,v retrieving revision 1.60 diff -u -p -r1.60 sysv_sem.c --- kern/sysv_sem.c 17 Nov 2020 03:23:54 - 1.60 +++ kern/sysv_sem.c 27 Apr 2021 18:57:33 - @@ -860,10 +860,10 @@ sema_reallocate(int val) } const struct sysctl_bounded_args sysvsem_vars[] = { - { KERN_SEMINFO_SEMUME, &seminfo.semume, 1, 0 }, - { KERN_SEMINFO_SEMUSZ, &seminfo.semusz, 1, 0 }, - { KERN_SEMINF
[PATCH] df(1): add -m, -g to display megabyte and gigabyte blocks
For parity with NetBSD/FreeBSD df(1). I made them incompatible with -P, as -h is; NetBSD also only supports 512/1024 with -P and FreeBSD uses -P to force 512. --- bin/df/df.1 | 30 +- bin/df/df.c | 33 - 2 files changed, 53 insertions(+), 10 deletions(-) diff --git a/bin/df/df.1 b/bin/df/df.1 index 89dd51aac8d..f4a29f67d99 100644 --- a/bin/df/df.1 +++ b/bin/df/df.1 @@ -64,6 +64,14 @@ options, below). .Pp The options are as follows: .Bl -tag -width Ds +.It Fl g +By default, all sizes are reported in 512-byte block counts. +The +.Fl g +option causes the numbers to be reported in gigabyte counts. +This option is incompatible with the +.Fl P +option. .It Fl h "Human-readable" output. Use unit suffixes: Byte, Kilobyte, Megabyte, @@ -88,6 +96,14 @@ Display statistics only about mounted file systems with the flag set. If a non-local file system is given as an argument, a warning is issued and no information is given on that file system. +.It Fl m +By default, all sizes are reported in 512-byte block counts. +The +.Fl m +option causes the numbers to be reported in megabyte counts. +This option is incompatible with the +.Fl P +option. .It Fl n Print out the previously obtained statistics from the file systems. This option should be used if it is possible that one or more @@ -117,9 +133,11 @@ that file system. .Pp It is not an error to specify more than one of the mutually exclusive options -.Fl h +.Fl h , +.Fl k , +.Fl m , and -.Fl k . +.Fl g . Where more than one of these options is specified, the last option given overrides the others. .Sh ENVIRONMENT @@ -128,9 +146,11 @@ the last option given overrides the others. If the environment variable .Ev BLOCKSIZE is set, and the -.Fl h +.Fl h , +.Fl k , +.Fl m , or -.Fl k +.Fl g options are not specified, the block counts will be displayed in units of that size block. .El @@ -159,7 +179,7 @@ utility is compliant with the specification. .Pp The flags -.Op Fl hiln , +.Op Fl ghilmn , as well as the .Ev BLOCKSIZE environment variable, diff --git a/bin/df/df.c b/bin/df/df.c index fd51f906f89..b33370792d7 100644 --- a/bin/df/df.c +++ b/bin/df/df.c @@ -63,7 +63,7 @@ extern int e2fs_df(int, char *, struct statfs *); extern int ffs_df(int, char *, struct statfs *); static int raw_df(char *, struct statfs *); -inthflag, iflag, kflag, lflag, nflag, Pflag; +intgflag, hflag, iflag, kflag, lflag, mflag, nflag, Pflag; char **typelist = NULL; int @@ -79,11 +79,19 @@ main(int argc, char *argv[]) if (pledge("stdio rpath", NULL) == -1) err(1, "pledge"); - while ((ch = getopt(argc, argv, "hiklnPt:")) != -1) + while ((ch = getopt(argc, argv, "ghiklmnPt:")) != -1) switch (ch) { + case 'g': + gflag = 1; + hflag = 0; + kflag = 0; + mflag = 0; + break; case 'h': hflag = 1; kflag = 0; + mflag = 0; + gflag = 0; break; case 'i': iflag = 1; @@ -91,10 +99,17 @@ main(int argc, char *argv[]) case 'k': kflag = 1; hflag = 0; + mflag = 0; + gflag = 0; break; case 'l': lflag = 1; break; + case 'm': + mflag = 1; + hflag = 0; + kflag = 0; + break; case 'n': nflag = 1; break; @@ -112,8 +127,8 @@ main(int argc, char *argv[]) argc -= optind; argv += optind; - if ((iflag || hflag) && Pflag) { - warnx("-h and -i are incompatible with -P"); + if ((gflag || iflag || hflag || mflag) && Pflag) { + warnx("-g, -h, -i, and -m are incompatible with -P"); usage(); } @@ -357,6 +372,14 @@ bsdprint(struct statfs *mntbuf, long mntsize, int maxwidth) blocksize = 1024; header = "1K-blocks"; headerlen = strlen(header); + } else if (mflag) { + blocksize = 1048576; + header = "1M-blocks"; + headerlen = strlen(header); + } else if (gflag) { + blocksize = 1073741824; + header = "1G-blocks"; + headerlen = strlen(header); } else header = getbsize(&headerlen, &blocksize); (void)printf("%-*.*s %s Used Avail Capacity", @@ -455,7 +478,7 @@ st
Re: sysctl net.inet.ip.arpqueued read only
Alexander Bluhm wrote: > On Tue, Apr 27, 2021 at 10:37:25AM -0600, Theo de Raadt wrote: > > > Would 0, 0 min, max be a simple and obvious way to say "read only" ? > > > > That is not as terrible. > > Yes. But it has another undocumented side effect. I think > sysctl_bounded_arr() inherits the minimum == maximum check from > sysctl_int_bounded() which means unbounded. The latter is only > used in sysctl_int(). > > > Or maybe a define like: > > + #define SYSCTL_BOUNDED_ARR_READONLY 0,0 > > Which can then be used in-place without confusion. > > I thought of that, too. Then reading the code would be easy. To > cover all features we would need > > #define SYSCTL_INT_UNBOUNDED 0,0 > #define SYSCTL_INT_READONLY 1,0 > > Both work for sysctl_bounded_arr(), > sysctl_int_bounded(SYSCTL_INT_READONLY) could be implemented. > > Maybe that is too complex. Actually, your variation seems pretty good. Is there any reason to not use this type of define?
Re: sysctl net.inet.ip.arpqueued read only
On Tue, Apr 27, 2021 at 10:37:25AM -0600, Theo de Raadt wrote: > > Would 0, 0 min, max be a simple and obvious way to say "read only" ? > > That is not as terrible. Yes. But it has another undocumented side effect. I think sysctl_bounded_arr() inherits the minimum == maximum check from sysctl_int_bounded() which means unbounded. The latter is only used in sysctl_int(). > Or maybe a define like: > + #define SYSCTL_BOUNDED_ARR_READONLY 0,0 > Which can then be used in-place without confusion. I thought of that, too. Then reading the code would be easy. To cover all features we would need #define SYSCTL_INT_UNBOUNDED0,0 #define SYSCTL_INT_READONLY 1,0 Both work for sysctl_bounded_arr(), sysctl_int_bounded(SYSCTL_INT_READONLY) could be implemented. Maybe that is too complex. > But whatever we do, it must be documented clearly. Yes, please. bluhm
Respect X-Forwarded-Proto in httpd
Hi all, Consider the following situation. A reverse proxy which performs TLS termination is deployed in front of httpd, which listens unencrypted on localhost. There is code in httpd to handle the case where a directory is accessed, but the path named does not end with a slash. In this case, httpd issues a 301 redirect to the path with a slash appended. The logic here sets the protocol of the redirect path based on whether the httpd virtual server is configured with TLS, but this isn't enough because it will cause redirects to plain http when there is a reverse proxy in front of httpd that performs TLS termination. This will either cause an extra redirect round trip to get back to HTTPS, or break the site if it's not publicly served on plain HTTP. Instead, we should be reading X-Forwarded-Proto, which most reverse proxies add to inform the backing server what the original protocol was. (relayd doesn't expose this to my knowledge, but I will look into doing so) The below attached diff does this for httpd. This is my first diff to the project, so please give feedback on whether I've done everything right. Thanks! PS: the X-Forwarded-* headers seem to now have a standardized form called "Forwarded", but I opted for the X- version since the rest of httpd only supports the X- headers, to keep consistency. Index: server_file.c === RCS file: /cvs/src/usr.sbin/httpd/server_file.c,v retrieving revision 1.69 diff -u -p -u -p -r1.69 server_file.c --- server_file.c 16 Mar 2021 06:44:14 - 1.69 +++ server_file.c 27 Apr 2021 15:25:51 - @@ -83,11 +83,28 @@ server_file_access(struct httpd *env, st /* Redirect to path with trailing "/" */ if (path[strlen(path) - 1] != '/') { + const char *proto; + + proto = srv_conf->flags & SRVFLAG_TLS + ? "https" : "http"; + + /* If request specifies a X-Forwarded-Proto header, +* respect that. */ + key.kv_key = "X-Forwarded-Proto"; + if ((r = kv_find(&desc->http_headers, &key)) != NULL && + r->kv_value != NULL) { + if (strcmp("https", r->kv_value) == 0) + proto = "https"; + else if (strcmp("http", r->kv_value) == 0) + proto = "http"; + else + return (500); + } + if ((encodedpath = url_encode(desc->http_path)) == NULL) return (500); - if (asprintf(&newpath, "http%s://%s%s/", - srv_conf->flags & SRVFLAG_TLS ? "s" : "", - desc->http_host, encodedpath) == -1) { + if (asprintf(&newpath, "%s://%s%s/", + proto, desc->http_host, encodedpath) == -1) { free(encodedpath); return (500); }
arp locking
Hi, I would like to document the locking mechanism of the global variables in ARP. The global arp_list is protected by net lock. This is not sufficent when we switch to shared netlock. So I added a mutex for insertion and removal if netlock is not exclusive. ok? bluhm Index: netinet/if_ether.c === RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.246 diff -u -p -r1.246 if_ether.c --- netinet/if_ether.c 26 Apr 2021 07:55:16 - 1.246 +++ netinet/if_ether.c 27 Apr 2021 17:23:17 - @@ -65,10 +65,19 @@ #include #endif +/* + * Locks used to protect struct members in this file: + * a atomic operations + * I immutable after creation + * K kernel lock + * m arp mutex, needed when net lock is shared + * N net lock + */ + struct llinfo_arp { - LIST_ENTRY(llinfo_arp) la_list; - struct rtentry *la_rt; /* backpointer to rtentry */ - struct mbuf_queuela_mq; /* packet hold queue */ + LIST_ENTRY(llinfo_arp) la_list; /* [mN] global arp_list */ + struct rtentry *la_rt; /* [I] backpointer to rtentry */ + struct mbuf_queuela_mq; /* [I] packet hold queue */ time_t la_refreshed; /* when was refresh sent */ int la_asked; /* number of queries sent */ }; @@ -76,9 +85,9 @@ struct llinfo_arp { #define LA_HOLD_TOTAL 100 /* timer values */ -intarpt_prune = (5 * 60); /* walk list every 5 minutes */ -intarpt_keep = (20 * 60); /* once resolved, cache for 20 minutes */ -intarpt_down = 20; /* once declared down, don't send for 20 secs */ +intarpt_prune = (5 * 60); /* [I] walk list every 5 minutes */ +intarpt_keep = (20 * 60); /* [a] once resolved, cache for 20 minutes */ +intarpt_down = 20; /* [a] once declared down, don't send for 20 secs */ struct mbuf *arppullup(struct mbuf *m); void arpinvalidate(struct rtentry *); @@ -92,11 +101,12 @@ void arpreply(struct ifnet *, struct mbu unsigned int); struct niqueue arpinq = NIQUEUE_INITIALIZER(50, NETISR_ARP); +struct mutex arp_mtx = MUTEX_INITIALIZER(IPL_SOFTNET); -LIST_HEAD(, llinfo_arp) arp_list; -struct pool arp_pool; /* pool for llinfo_arp structures */ -intarp_maxtries = 5; -intla_hold_total; +LIST_HEAD(, llinfo_arp) arp_list; /* [Nm] list of all llinfo_arp structures */ +struct pool arp_pool; /* [I] pool for llinfo_arp structures */ +intarp_maxtries = 5; /* [I] arp requests before set to rejected */ +intla_hold_total; /* [a] packets currently in the arp queue */ #ifdef NFSCLIENT /* revarp state */ @@ -117,6 +127,7 @@ arptimer(void *arg) NET_LOCK(); timeout_add_sec(to, arpt_prune); + /* Net lock is exclusive, no arp mutex needed for arp_list here. */ LIST_FOREACH_SAFE(la, &arp_list, la_list, nla) { struct rtentry *rt = la->la_rt; @@ -144,6 +155,8 @@ arp_rtrequest(struct ifnet *ifp, int req struct sockaddr *gate = rt->rt_gateway; struct llinfo_arp *la = (struct llinfo_arp *)rt->rt_llinfo; + NET_ASSERT_LOCKED(); + if (ISSET(rt->rt_flags, RTF_GATEWAY|RTF_BROADCAST|RTF_MULTICAST|RTF_MPLS)) return; @@ -194,13 +207,17 @@ arp_rtrequest(struct ifnet *ifp, int req rt->rt_flags |= RTF_LLINFO; if ((rt->rt_flags & RTF_LOCAL) == 0) rt->rt_expire = getuptime(); + mtx_enter(&arp_mtx); LIST_INSERT_HEAD(&arp_list, la, la_list); + mtx_leave(&arp_mtx); break; case RTM_DELETE: if (la == NULL) break; + mtx_enter(&arp_mtx); LIST_REMOVE(la, la_list); + mtx_leave(&arp_mtx); rt->rt_llinfo = NULL; rt->rt_flags &= ~RTF_LLINFO; atomic_sub_int(&la_hold_total, mq_purge(&la->la_mq));
Re: sysctl net.inet.ip.arpqueued read only
Chris Cappuccio wrote: > Vitaliy Makkoveev [m...@openbsd.org] wrote: > > > > > > > On 26 Apr 2021, at 01:43, Theo de Raadt wrote: > > > > > > I am not a fan of this strange behaviour, where the min+max values > > > have additional behaviours. It is too surprising, and surprising > > > often turns into error-prone. > > > > Agreed. Also according sysctl_int_bounded() code this behaviour looks > > like non obvious side effect. > > > > Would 0, 0 min, max be a simple and obvious way to say "read only" ? That is not as terrible. Or maybe a define like: + #define SYSCTL_BOUNDED_ARR_READONLY 0,0 int sysctl_bounded_arr(const struct sysctl_bounded_args *, u_int, int *, u_int, void *, size_t *, void *, size_t); Which can then be used in-place without confusion. But whatever we do, it must be documented clearly.
Re: sysctl net.inet.ip.arpqueued read only
Vitaliy Makkoveev [m...@openbsd.org] wrote: > > > > On 26 Apr 2021, at 01:43, Theo de Raadt wrote: > > > > I am not a fan of this strange behaviour, where the min+max values > > have additional behaviours. It is too surprising, and surprising > > often turns into error-prone. > > Agreed. Also according sysctl_int_bounded() code this behaviour looks > like non obvious side effect. > Would 0, 0 min, max be a simple and obvious way to say "read only" ?
softraid(4) crypto/raid1c refactoring
Refactor softraid crypto code to allow use of a discipline-specific data structure for RAID1C volumes, as requested by jsing@ during review of my initial RAID1C patch. This patch should effectively be a cosmetic change. The whole point of this patch is to allow the data structure changes made here in softraidvar.h. It works in my testing but more testing would be very welcome, given that this touches the disk I/O path of machines using softraid crypto. ok? diff d5cea33885618bf7e096efc36fffbecc9b13ed21 fcfd3d6487eca3ffe994e6a46e37df66b37e80d1 blob - d143a2398b5aba3070dc25bafd02e38f8f10a0c1 blob + 48bd613f374bc6ad085ae5d9e0eeef50a6367941 --- sys/dev/softraid_crypto.c +++ sys/dev/softraid_crypto.c @@ -54,30 +54,44 @@ #include -struct sr_crypto_wu *sr_crypto_prepare(struct sr_workunit *, int); -intsr_crypto_create_keys(struct sr_discipline *); -intsr_crypto_get_kdf(struct bioc_createraid *, - struct sr_discipline *); +struct sr_crypto_wu *sr_crypto_prepare(struct sr_workunit *, + struct sr_crypto *, int); intsr_crypto_decrypt(u_char *, u_char *, u_char *, size_t, int); intsr_crypto_encrypt(u_char *, u_char *, u_char *, size_t, int); -intsr_crypto_decrypt_key(struct sr_discipline *); +intsr_crypto_decrypt_key(struct sr_discipline *, + struct sr_crypto *); intsr_crypto_change_maskkey(struct sr_discipline *, - struct sr_crypto_kdfinfo *, struct sr_crypto_kdfinfo *); + struct sr_crypto *, struct sr_crypto_kdfinfo *, + struct sr_crypto_kdfinfo *); intsr_crypto_create(struct sr_discipline *, struct bioc_createraid *, int, int64_t); intsr_crypto_meta_create(struct sr_discipline *, - struct bioc_createraid *); + struct sr_crypto *, struct bioc_createraid *); +intsr_crypto_set_key(struct sr_discipline *, struct sr_crypto *, + struct bioc_createraid *, int, void *); intsr_crypto_assemble(struct sr_discipline *, struct bioc_createraid *, int, void *); +void sr_crypto_free_sessions(struct sr_discipline *, + struct sr_crypto *); +intsr_crypto_alloc_resources_internal(struct sr_discipline *, + struct sr_crypto *); intsr_crypto_alloc_resources(struct sr_discipline *); +void sr_crypto_free_resources_internal(struct sr_discipline *, + struct sr_crypto *); void sr_crypto_free_resources(struct sr_discipline *); +intsr_crypto_ioctl_internal(struct sr_discipline *, + struct sr_crypto *, struct bioc_discipline *); intsr_crypto_ioctl(struct sr_discipline *, struct bioc_discipline *); +intsr_crypto_meta_opt_handler_internal(struct sr_discipline *, + struct sr_crypto *, struct sr_meta_opt_hdr *); intsr_crypto_meta_opt_handler(struct sr_discipline *, struct sr_meta_opt_hdr *); void sr_crypto_write(struct cryptop *); intsr_crypto_rw(struct sr_workunit *); intsr_crypto_dev_rw(struct sr_workunit *, struct sr_crypto_wu *); +void sr_crypto_done_internal(struct sr_workunit *, + struct sr_crypto *); void sr_crypto_done(struct sr_workunit *); void sr_crypto_read(struct cryptop *); void sr_crypto_calculate_check_hmac_sha1(u_int8_t *, int, @@ -85,7 +99,7 @@ void sr_crypto_calculate_check_hmac_sha1(u_int8_t *, void sr_crypto_hotplug(struct sr_discipline *, struct disk *, int); #ifdef SR_DEBUG0 -voidsr_crypto_dumpkeys(struct sr_discipline *); +voidsr_crypto_dumpkeys(struct sr_crypto *); #endif /* Discipline initialisation. */ @@ -129,7 +143,7 @@ sr_crypto_create(struct sr_discipline *sd, struct bioc sd->sd_meta->ssdi.ssd_size = coerced_size; - rv = sr_crypto_meta_create(sd, bc); + rv = sr_crypto_meta_create(sd, &sd->mds.mdd_crypto, bc); if (rv) return (rv); @@ -138,7 +152,8 @@ sr_crypto_create(struct sr_discipline *sd, struct bioc } int -sr_crypto_meta_create(struct sr_discipline *sd, struct bioc_createraid *bc) +sr_crypto_meta_create(struct sr_discipline *sd, struct sr_crypto *mdd_crypto, +struct bioc_createraid *bc) { struct sr_meta_opt_item *omi; int rv = EINVAL; @@ -158,19 +173,19 @@ sr_crypto_meta_create(struct sr_discipline *sd, struct omi->omi_som->som_type = SR_OPT_CRYPTO; omi->omi_som->som_length = sizeof(struct sr_meta_crypto); SLIST_INSERT_HEAD(&sd->sd_meta_opt, omi, omi_link); - sd->mds.mdd_crypto.scr_meta = (struct sr_meta_crypto *)omi->omi_som; + mdd_crypto->scr_meta = (struct sr_m
Re: Driver for Zynq-7000 system-level control registers
On Mon, Apr 26, 2021 at 05:25:18PM +0200, Mark Kettenis wrote: > > Date: Mon, 26 Apr 2021 14:19:38 + > > From: Visa Hankala > > > > The following diff adds a preliminary driver for the system-level > > control registers of Xilinx Zynq-7000. It enables system reset. It also > > adds clock bits for use with the SDIO and Gigabit Ethernet controllers. > > > > On some arm64 and armv7 platforms, there are separate drivers for clocks > > and resets. However, on Zynq-7000 it looks more natural to use a single > > driver. Below is an outline of the relevant part of the device tree: > > > > slcr: slcr@f800 { > > #address-cells = <1>; > > #size-cells = <1>; > > compatible = "xlnx,zynq-slcr", "syscon", > > "simple-mfd"; > > reg = <0xF800 0x1000>; > > ranges; > > clkc: clkc@100 { > > #clock-cells = <1>; > > compatible = "xlnx,ps7-clkc"; > > fclk-enable = <0>; > > clock-output-names = "armpll", "ddrpll", > > ...; > > reg = <0x100 0x100>; > > }; > > > > > > rstc: rstc@200 { > > compatible = "xlnx,zynq-reset"; > > reg = <0x200 0x48>; > > #reset-cells = <1>; > > syscon = <&slcr>; > > }; > > > > pinctrl0: pinctrl@700 { > > compatible = "xlnx,pinctrl-zynq"; > > reg = <0x700 0x200>; > > syscon = <&slcr>; > > }; > > }; > > > > OK? > > Hmm, I'm not sure. Your driver doesn't provide pinctrl support. I'm > not sure how much code you'd need for that, but if it is a significant > amount of code, having separate clock and pinctrl drivers would make > sense. I see, adding pinctrl logic would be easier if clocks and resets were handled in separate drivers. I have now reorganized the code to reflect this. Both the clock and reset drivers access the control registers by using common routines. I have put them in the reset driver file. However, if this looks too ugly, I can add them in a separate .c file. The use of the mutex might be overzealous. The main point is to prevent accidental interleaving when lifting the write protection for register update. OK? Index: share/man/man4/man4.armv7/Makefile === RCS file: src/share/man/man4/man4.armv7/Makefile,v retrieving revision 1.28 diff -u -p -r1.28 Makefile --- share/man/man4/man4.armv7/Makefile 10 Apr 2020 22:26:46 - 1.28 +++ share/man/man4/man4.armv7/Makefile 27 Apr 2021 12:42:20 - @@ -6,7 +6,7 @@ MAN=agtimer.4 amdisplay.4 ampintc.4 amp omap.4 omclock.4 omcm.4 omdog.4 omgpio.4 ommmc.4 omrng.4 omsysc.4 \ omwugen.4 prcm.4 \ sxie.4 sxiintc.4 \ - sxitimer.4 sxits.4 sysreg.4 + sxitimer.4 sxits.4 sysreg.4 zqclock.4 zqreset.4 MANSUBDIR=armv7 Index: share/man/man4/man4.armv7/zqclock.4 === RCS file: share/man/man4/man4.armv7/zqclock.4 diff -N share/man/man4/man4.armv7/zqclock.4 --- /dev/null 1 Jan 1970 00:00:00 - +++ share/man/man4/man4.armv7/zqclock.4 27 Apr 2021 12:42:20 - @@ -0,0 +1,37 @@ +.\"$OpenBSD$ +.\" +.\" Copyright (c) 2021 Visa Hankala +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. +.\" +.Dd $Mdocdate$ +.Dt ZQCLOCK 4 +.Os +.Sh NAME +.Nm zqclock +.Nd Xilinx Zynq-7000 clock controller +.Sh SYNOPSIS +.Cd "zqclock* at fdt?" +.Sh DESCRIPTION +The +.Nm +driver controls the clock signals for the integrated components +of Zynq-7000 SoCs. +.Sh SEE ALSO +.Xr intro 4 , +.Xr zqreset 4 +.Sh HISTORY +The +.Nm +driver first appeared in +.Ox 7.0 . Index: share/man/man4/man4.armv7/zqreset.4 =
net80211 RA: gather more stats per probe attempt
This patch tweaks the heuristic RA is using to decide whether enough statistics have been gathered for a candidate Tx rate. The goal is to avoid Tx rate choices that might turn out to be too optimistic. In my testing RA now scales upwards a little bit more slowly while the distance towards the AP is decreasing. The difference is not huge and taking the extra time to gather more reliable stats should be worth it. In practice this only affects the case where we probe upwards. If the current Tx rate starts seeing loss we will still scale down very quickly. Tests on iwn, iwm, and athn would be appreciated. This patch is based on a larger collection of patches by Christian Ehrhardt. I have made stylistic tweaks for consistency. diff a9cfc08b1ea80b0c0dc35851c61760cf3d3657ba /usr/src blob - ec419b8e52930370c9fb4ced48fca3382243b9d6 file + sys/net80211/ieee80211_ra.c --- sys/net80211/ieee80211_ra.c +++ sys/net80211/ieee80211_ra.c @@ -41,8 +41,6 @@ void ieee80211_ra_probe_next_rateset(struct ieee80211_ struct ieee80211_node *, const struct ieee80211_ht_rateset *); intieee80211_ra_next_mcs(struct ieee80211_ra_node *, struct ieee80211_node *); -intieee80211_ra_probe_valid(struct ieee80211_ra_node *, - struct ieee80211_node *); void ieee80211_ra_probe_done(struct ieee80211_ra_node *); intieee80211_ra_intra_mode_ra_finished( struct ieee80211_ra_node *, struct ieee80211_node *); @@ -57,6 +55,7 @@ void ieee80211_ra_probe_next_rate(struct ieee80211_ra_ intieee80211_ra_valid_tx_mcs(struct ieee80211com *, int); uint32_t ieee80211_ra_valid_rates(struct ieee80211com *, struct ieee80211_node *); +intieee80211_ra_probe_valid(struct ieee80211_ra_goodput_stats *); /* We use fixed point arithmetic with 64 bit integers. */ #define RA_FP_SHIFT21 @@ -155,12 +154,6 @@ ieee80211_ra_get_txrate(int mcs, int sgi20) /* A rate's goodput has to be at least this much larger to be "better". */ #define IEEE80211_RA_RATE_THRESHOLD(RA_FP_1 / 64) /* ~ 0.015 */ -/* Number of (sub-)frames which render a probe valid. */ -#define IEEE80211_RA_MIN_PROBE_FRAMES 8 - -/* Number of Tx retries which, alternatively, render a probe valid. */ -#define IEEE80211_RA_MAX_PROBE_RETRIES 4 - int ieee80211_ra_next_lower_intra_rate(struct ieee80211_ra_node *rn, struct ieee80211_node *ni) @@ -345,13 +338,6 @@ ieee80211_ra_next_mcs(struct ieee80211_ra_node *rn, return next; } -int -ieee80211_ra_probe_valid(struct ieee80211_ra_node *rn, -struct ieee80211_node *ni) -{ - return rn->valid_probes & (1UL << ni->ni_txmcs); -} - void ieee80211_ra_probe_clear(struct ieee80211_ra_node *rn, struct ieee80211_node *ni) @@ -536,6 +522,21 @@ ieee80211_ra_valid_rates(struct ieee80211com *ic, stru return valid_mcs; } +int +ieee80211_ra_probe_valid(struct ieee80211_ra_goodput_stats *g) +{ + /* 128 packets make up a valid probe in any case. */ + if (g->nprobe_pkts >= 128) + return 1; + + /* 8 packets with > 75% loss make a valid probe, too. */ + if (g->nprobe_pkts >= 8 && + g->nprobe_pkts - g->nprobe_fail < g->nprobe_pkts / 4) + return 1; + + return 0; +} + void ieee80211_ra_add_stats_ht(struct ieee80211_ra_node *rn, struct ieee80211com *ic, struct ieee80211_node *ni, @@ -562,11 +563,11 @@ ieee80211_ra_add_stats_ht(struct ieee80211_ra_node *rn g->nprobe_pkts += total; g->nprobe_fail += fail; - if (g->nprobe_pkts < IEEE80211_RA_MIN_PROBE_FRAMES && -g->nprobe_fail < IEEE80211_RA_MAX_PROBE_RETRIES) { + if (!ieee80211_ra_probe_valid(g)) { splx(s); return; } + rn->valid_probes |= 1U << mcs; if (g->nprobe_fail > g->nprobe_pkts) { DPRINTF(("%s fail %u > pkts %u\n", @@ -577,7 +578,6 @@ ieee80211_ra_add_stats_ht(struct ieee80211_ra_node *rn sfer = g->nprobe_fail << RA_FP_SHIFT; sfer /= g->nprobe_pkts; - rn->valid_probes |= 1U << mcs; g->nprobe_fail = 0; g->nprobe_pkts = 0; @@ -615,7 +615,7 @@ ieee80211_ra_choose(struct ieee80211_ra_node *rn, stru if (rn->probing) { /* Probe another rate or settle at the best rate. */ - if (!ieee80211_ra_probe_valid(rn, ni)) { + if (!(rn->valid_probes & (1UL << ni->ni_txmcs))) { splx(s); return; }
iwn/iwm/iwx: keep track of beacon parameter changes
Christian Ehrhardt reported an issue where changes in the ERP protection settings in beacons caused noticeable packet loss on iwm(4). I've found that there are a few parameters in beacons which can change at run-time but don't get updated in hardware, simply because the drivers do not implement the corresponding hooks which are provided by net80211. The patch below ensures that the following parameters will be kept up-to-date at run-time by iwn, iwm, and iwx: - HT protection settings (this was already implemented) - ERP (11g) protection setting - short slottime setting - short preamble setting - EDCA (QoS) parameters I am renaming ic_update_htprot() to ic_updateprot() since it now includes ERP. The node parameter of this function is always ic->ic_bss so I've dropped it. Tested on iwn 6250, iwm 8265, and iwx ax200. No regressions found. A few additional test reports would be nice. For most people this patch should simply not change anything. ERP protection changes should only occur when 802.11b devices start or stop using the access point's channel. diff refs/heads/master refs/heads/updateprot blob - 1d7c376ff8cdf661401c89e6f13e4b2a819d70c2 blob + c243e2932471c805ea6f6b846d103bb054becc36 --- sys/dev/pci/if_iwm.c +++ sys/dev/pci/if_iwm.c @@ -328,8 +328,10 @@ void iwm_init_channel_map(struct iwm_softc *, const ui const uint8_t *nvm_channels, int nchan); intiwm_mimo_enabled(struct iwm_softc *); void iwm_setup_ht_rates(struct iwm_softc *); -void iwm_htprot_task(void *); -void iwm_update_htprot(struct ieee80211com *, struct ieee80211_node *); +void iwm_mac_ctxt_task(void *); +void iwm_updateprot(struct ieee80211com *); +void iwm_updateslot(struct ieee80211com *); +void iwm_updateedca(struct ieee80211com *); void iwm_init_reorder_buffer(struct iwm_reorder_buffer *, uint16_t, uint16_t); void iwm_clear_reorder_buffer(struct iwm_softc *, struct iwm_rxba_data *); @@ -3170,7 +3172,7 @@ iwm_sta_rx_agg(struct iwm_softc *sc, struct ieee80211_ } void -iwm_htprot_task(void *arg) +iwm_mac_ctxt_task(void *arg) { struct iwm_softc *sc = arg; struct ieee80211com *ic = &sc->sc_ic; @@ -3183,30 +3185,42 @@ iwm_htprot_task(void *arg) return; } - /* This call updates HT protection based on in->in_ni.ni_htop1. */ err = iwm_mac_ctxt_cmd(sc, in, IWM_FW_CTXT_ACTION_MODIFY, 1); if (err) - printf("%s: could not change HT protection: error %d\n", - DEVNAME(sc), err); + printf("%s: failed to update MAC\n", DEVNAME(sc)); refcnt_rele_wake(&sc->task_refs); splx(s); } -/* - * This function is called by upper layer when HT protection settings in - * beacons have changed. - */ void -iwm_update_htprot(struct ieee80211com *ic, struct ieee80211_node *ni) +iwm_updateprot(struct ieee80211com *ic) { struct iwm_softc *sc = ic->ic_softc; - /* assumes that ni == ic->ic_bss */ - iwm_add_task(sc, systq, &sc->htprot_task); + if (ic->ic_state == IEEE80211_S_RUN) + iwm_add_task(sc, systq, &sc->mac_ctxt_task); } void +iwm_updateslot(struct ieee80211com *ic) +{ + struct iwm_softc *sc = ic->ic_softc; + + if (ic->ic_state == IEEE80211_S_RUN) + iwm_add_task(sc, systq, &sc->mac_ctxt_task); +} + +void +iwm_updateedca(struct ieee80211com *ic) +{ + struct iwm_softc *sc = ic->ic_softc; + + if (ic->ic_state == IEEE80211_S_RUN) + iwm_add_task(sc, systq, &sc->mac_ctxt_task); +} + +void iwm_ba_task(void *arg) { struct iwm_softc *sc = arg; @@ -8026,7 +8040,7 @@ iwm_newstate(struct ieee80211com *ic, enum ieee80211_s if (ic->ic_state == IEEE80211_S_RUN) { timeout_del(&sc->sc_calib_to); iwm_del_task(sc, systq, &sc->ba_task); - iwm_del_task(sc, systq, &sc->htprot_task); + iwm_del_task(sc, systq, &sc->mac_ctxt_task); for (i = 0; i < nitems(sc->sc_rxba_data); i++) { struct iwm_rxba_data *rxba = &sc->sc_rxba_data[i]; iwm_clear_reorder_buffer(sc, rxba); @@ -8808,7 +8822,7 @@ iwm_stop(struct ifnet *ifp) task_del(systq, &sc->init_task); iwm_del_task(sc, sc->sc_nswq, &sc->newstate_task); iwm_del_task(sc, systq, &sc->ba_task); - iwm_del_task(sc, systq, &sc->htprot_task); + iwm_del_task(sc, systq, &sc->mac_ctxt_task); KASSERT(sc->task_refs.refs >= 1); refcnt_finalize(&sc->task_refs, "iwmstop"); @@ -10250,7 +10264,7 @@ iwm_attach(struct device *parent, struct device *self, task_set(&sc->init_task, iwm_init_task, sc); task_set(&sc->newstate_task, iwm_newstate_task, sc); task_set(&sc->ba_task, iwm_ba_task, sc); - task_set(&sc->htprot_task, iwm_htprot_task, sc); + task_set(&sc->mac_ctxt_task, iwm_mac_ctxt_task, sc); ic->ic_node_alloc = iwm_node_alloc;