date:20231004





• Vitaliy Makkoveev [2023-10-05 01:10]:

On Thu, Oct 05, 2023 at 12:08:55AM +0200, Kirill Miazine wrote:

• Vitaliy Makkoveev [2023-10-05 00:02]:

On 5 Oct 2023, at 00:56, Kirill Miazine  wrote:

new diff doesn't prevent hang in test scenario either.



Which one?


I meant to say new diffS, as I had applied both... what I have now is this:



Understood.

The problem lays here:

ifq_start_task(void *p)
{
 struct ifqueue *ifq = p;
 struct ifnet *ifp = ifq->ifq_if;

 if (!ISSET(ifp->if_flags, IFF_RUNNING) ||
 ifq_empty(ifq) || ifq_is_oactive(ifq))
 return;

 ifp->if_qstart(ifq);
}

wg_qstart(struct ifqueue *ifq)
{
 /* [...] */
 while ((m = ifq_dequeue(ifq)) != NULL) {
 /* [...] */
}

wg_peer_destroy(struct wg_peer *peer)
{
 /* [...] */
 NET_LOCK();
 while (!ifq_empty(&sc->sc_if.if_snd)) {
 NET_UNLOCK();
 tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
 NET_LOCK();
 }
 NET_UNLOCK();
 /* [...] */
}

1. wg_output() placed some packets to sc->sc_if.if_snd and scheduled
ifq_start_task() to run.

2. You performed "ifconfig wg1 down", so wg_down() cleared IFF_RUNNING
flag.

3. ifq_start_task() started to run, IFF_RUNNING is not set, so
wg_qstart() will not be called as the ifq_dequeue(). Packets rests
within sc->sc_if.if_snd. The interface is down, so nothing would
schedule ifq_start_task() to run.

4. You performed "ifconfig wg1 destroy". The while(!ifq_empty()) loop is
infinite because nothing would empty sc->sc_if.if_snd at this point.

The unlocked !ISSET(ifp->if_flags, IFF_RUNNING), ifq_empty() and
ifq_is_oactive() are bad, but netlock dances provide caches
synchronisation.

I have no quick solution for this. Probably we should rethink
ifq_start_task().

This diff checks IFF_RUNNING flag within while (!ifq_empty()) loop of
wg_peer_destroy(). If the flag is not set queue will be purged and check
performed again. I intentionally keep netlock to prevent ifconfig
manipulations on the interface.


I confirm that just the diff below solved the issue


Index: sys/net/if_wg.c
===
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- sys/net/if_wg.c 26 Sep 2023 15:16:44 -  1.31
+++ sys/net/if_wg.c 4 Oct 2023 23:09:14 -
@@ -509,6 +509,13 @@ wg_peer_destroy(struct wg_peer *peer)
  
  	NET_LOCK();

while (!ifq_empty(&sc->sc_if.if_snd)) {
+   /*
+* XXX: `if_snd' of stopped interface could still packets
+*/
+   if (!ISSET(sc->sc_if.if_flags, IFF_RUNNING)) {
+   ifq_purge(&sc->sc_if.if_snd);
+   continue;
+   }
NET_UNLOCK();
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
NET_LOCK();

Re: [patch] [arm64] cpu.c patch based on amd64 idea, provides more debug for multicore kernel

2023-10-04 Thread Klemens Nanni

On Mon, Sep 25, 2023 at 01:33:31PM +, Klemens Nanni wrote:
> On Tue, Jul 25, 2023 at 01:30:43PM +0300, Slava Voronzoff wrote:
> > Hi, pinging and refreshing this patch
> > 
> > What it does:
> > allow arm64 cpus to break from the loop of waiting to start core and
> > drop to DDB or OS.
> > 
> > Patch based on same concept in amd64 cpu.c
> > 
> > Any suggestions? Good to go?
> 
> So instead of waiting possibly forever for secondary CPUs to come up,
> you can continue debug the system and/or continue boot with less CPUs.
> 
> Apart from the trailing empty line you introduce, the approach does
> match amd64 (down to the for loop lacking a space after semicolon).
> 
> That allows making progress on these machines and I don't see a downside,
> so OK kn modulo the empty line.
> 
> Any input from our arm64 hackers?
> 
> Same diff ten days ago: 
> https://marc.info/?l=openbsd-bugs&m=169465443200821&w=2

Anyone?


Index: sys/arch/arm64//arm64/cpu.c
===
RCS file: /cvs/src/sys/arch/arm64/arm64/cpu.c,v
retrieving revision 1.98
diff -u -p -r1.98 cpu.c
--- sys/arch/arm64//arm64/cpu.c 10 Aug 2023 19:29:32 -  1.98
+++ sys/arch/arm64//arm64/cpu.c 25 Sep 2023 13:24:39 -
@@ -1096,6 +1096,8 @@ cpu_start_secondary(struct cpu_info *ci,
 void
 cpu_boot_secondary(struct cpu_info *ci)
 {
+   int i;
+
atomic_setbits_int(&ci->ci_flags, CPUF_GO);
__asm volatile("dsb sy; sev" ::: "memory");
 
@@ -1105,8 +1107,16 @@ cpu_boot_secondary(struct cpu_info *ci)
 */
arm_send_ipi(ci, ARM_IPI_NOP);
 
-   while ((ci->ci_flags & CPUF_RUNNING) == 0)
+   for (i = 1000; (!(ci->ci_flags & CPUF_RUNNING)) && i>0;i--) {
__asm volatile("wfe");
+   }
+   if (! (ci->ci_flags & CPUF_RUNNING)) {
+   printf("cpu %d failed to start\n", ci->ci_cpuid);
+#if defined(MPDEBUG) && defined(DDB)
+   printf("dropping into debugger; continue from here to resume 
boot\n");
+   db_enter();
+#endif
+   }
 }
 
 void

Re: wg destroy hangs

On Thu, Oct 05, 2023 at 12:08:55AM +0200, Kirill Miazine wrote:
> • Vitaliy Makkoveev [2023-10-05 00:02]:
> > > On 5 Oct 2023, at 00:56, Kirill Miazine  wrote:
> > > 
> > > new diff doesn't prevent hang in test scenario either.
> > > 
> > 
> > Which one?
> 
> I meant to say new diffS, as I had applied both... what I have now is this:
> 

Understood.

The problem lays here:

ifq_start_task(void *p)
{
struct ifqueue *ifq = p;
struct ifnet *ifp = ifq->ifq_if;

if (!ISSET(ifp->if_flags, IFF_RUNNING) ||
ifq_empty(ifq) || ifq_is_oactive(ifq))
return;

ifp->if_qstart(ifq);
}

wg_qstart(struct ifqueue *ifq)
{
/* [...] */ 
while ((m = ifq_dequeue(ifq)) != NULL) {
/* [...] */ 
}

wg_peer_destroy(struct wg_peer *peer)
{
/* [...] */ 
NET_LOCK();
while (!ifq_empty(&sc->sc_if.if_snd)) {
NET_UNLOCK();
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
NET_LOCK();
}
NET_UNLOCK();
/* [...] */ 
}

1. wg_output() placed some packets to sc->sc_if.if_snd and scheduled
ifq_start_task() to run.

2. You performed "ifconfig wg1 down", so wg_down() cleared IFF_RUNNING
flag.

3. ifq_start_task() started to run, IFF_RUNNING is not set, so
wg_qstart() will not be called as the ifq_dequeue(). Packets rests
within sc->sc_if.if_snd. The interface is down, so nothing would
schedule ifq_start_task() to run.

4. You performed "ifconfig wg1 destroy". The while(!ifq_empty()) loop is
infinite because nothing would empty sc->sc_if.if_snd at this point.

The unlocked !ISSET(ifp->if_flags, IFF_RUNNING), ifq_empty() and 
ifq_is_oactive() are bad, but netlock dances provide caches
synchronisation.

I have no quick solution for this. Probably we should rethink
ifq_start_task().

This diff checks IFF_RUNNING flag within while (!ifq_empty()) loop of
wg_peer_destroy(). If the flag is not set queue will be purged and check
performed again. I intentionally keep netlock to prevent ifconfig
manipulations on the interface.

Index: sys/net/if_wg.c
===
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- sys/net/if_wg.c 26 Sep 2023 15:16:44 -  1.31
+++ sys/net/if_wg.c 4 Oct 2023 23:09:14 -
@@ -509,6 +509,13 @@ wg_peer_destroy(struct wg_peer *peer)

NET_LOCK();
while (!ifq_empty(&sc->sc_if.if_snd)) {
+   /*
+* XXX: `if_snd' of stopped interface could still packets
+*/
+   if (!ISSET(sc->sc_if.if_flags, IFF_RUNNING)) {
+   ifq_purge(&sc->sc_if.if_snd);
+   continue;
+   }
NET_UNLOCK();
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
NET_LOCK();

Re: syslogd retry dns lookup

On Sun, Sep 03, 2023 at 02:00:46AM +0200, Alexander Bluhm wrote:
> When DNS lookup for remote loghost in @ line in syslog.conf does
> not work at startup, retry in intervals.

Thanks Paul de Weerd for testing my diff.  Together we improved the
debug output to make clear what is going on.  I also added generic
loghost_resolve() and loghost_retry() functions for UDP and TCP.

I am looking for ok now.

bluhm

Index: usr.sbin/syslogd/privsep.c
===
RCS file: /data/mirror/openbsd/cvs/src/usr.sbin/syslogd/privsep.c,v
retrieving revision 1.76
diff -u -p -r1.76 privsep.c
--- usr.sbin/syslogd/privsep.c  11 Aug 2023 04:45:06 -  1.76
+++ usr.sbin/syslogd/privsep.c  2 Oct 2023 16:48:55 -
@@ -742,8 +742,8 @@ priv_config_parse_done(void)
 /* Name/service to address translation.  Response is placed into addr.
  * Return 0 for success or < 0 for error like getaddrinfo(3) */
 int
-priv_getaddrinfo(char *proto, char *host, char *serv, struct sockaddr *addr,
-size_t addr_len)
+priv_getaddrinfo(const char *proto, const char *host, const char *serv,
+struct sockaddr *addr, size_t addr_len)
 {
char protocpy[5], hostcpy[NI_MAXHOST], servcpy[NI_MAXSERV];
int cmd, ret_len;
Index: usr.sbin/syslogd/syslogd.c
===
RCS file: /data/mirror/openbsd/cvs/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.277
diff -u -p -r1.277 syslogd.c
--- usr.sbin/syslogd/syslogd.c  16 Mar 2023 18:22:08 -  1.277
+++ usr.sbin/syslogd/syslogd.c  2 Oct 2023 18:46:27 -
@@ -156,9 +156,12 @@ struct filed {
struct sockaddr_storage  f_addr;
struct buffertls f_buftls;
struct bufferevent  *f_bufev;
+   struct event f_ev;
struct tls  *f_ctx;
+   char*f_ipproto;
char*f_host;
-   int  f_reconnectwait;
+   char*f_port;
+   int  f_retrywait;
} f_forw;   /* forwarding address */
charf_fname[PATH_MAX];
struct {
@@ -319,7 +322,9 @@ void tcp_dropcb(struct bufferevent *, v
 voidtcp_writecb(struct bufferevent *, void *);
 voidtcp_errorcb(struct bufferevent *, short, void *);
 voidtcp_connectcb(int, short, void *);
-voidtcp_connect_retry(struct bufferevent *, struct filed *);
+int loghost_resolve(struct filed *);
+voidloghost_retry(struct filed *);
+voidudp_resolvecb(int, short, void *);
 int tcpbuf_countmsg(struct bufferevent *bufev);
 voiddie_signalcb(int, short, void *);
 voidmark_timercb(int, short, void *);
@@ -962,12 +967,15 @@ socket_bind(const char *proto, const cha
res->ai_socktype | SOCK_NONBLOCK, res->ai_protocol)) == -1)
continue;
 
-   if (getnameinfo(res->ai_addr, res->ai_addrlen, hostname,
+   error = getnameinfo(res->ai_addr, res->ai_addrlen, hostname,
sizeof(hostname), servname, sizeof(servname),
NI_NUMERICHOST | NI_NUMERICSERV |
-   (res->ai_socktype == SOCK_DGRAM ? NI_DGRAM : 0)) != 0) {
-   log_debug("Malformed bind address");
-   hostname[0] = servname[0] = '\0';
+   (res->ai_socktype == SOCK_DGRAM ? NI_DGRAM : 0));
+   if (error) {
+   log_warnx("malformed bind address host \"%s\": %s",
+   host, gai_strerror(error));
+   strlcpy(hostname, hostname_unknown, sizeof(hostname));
+   strlcpy(servname, hostname_unknown, sizeof(servname));
}
if (shutread && shutdown(*fdp, SHUT_RD) == -1) {
log_warn("shutdown SHUT_RD "
@@ -1130,7 +1138,7 @@ acceptcb(int lfd, short event, void *arg
socklen_tsslen;
char hostname[NI_MAXHOST], servname[NI_MAXSERV];
char*peername;
-   int  fd;
+   int  fd, error;
 
sslen = sizeof(ss);
if ((fd = reserve_accept4(lfd, event, ev, tcp_acceptcb,
@@ -1143,17 +1151,21 @@ acceptcb(int lfd, short event, void *arg
}
log_debug("Accepting tcp connection");
 
-   if (getnameinfo((struct sockaddr *)&ss, sslen, hostname,
+   error = getnameinfo((struct sockaddr *)&ss, sslen, hostname,
sizeof(hostname), servname, sizeof(servname),
-   NI_NUMERICHOST | NI_NUMERICSERV) != 0 ||
-   asprintf(&peername, ss.ss_family == AF_INET6 ?
+   NI_NUMERICHOST | NI_NUMERICSERV);
+   if (error) {
+

tcp syn cache unlock

Hi,

This is a first step to unlock TCP syn cache.  The timer function
is independent of the socket code.  That makes it easy to start
there.

Introduce tcp syn cache mutex.  Document fields protected by net
lock and mutex.  Devide timer function in parts protected by mutex
and sending with netlock.  I had to split the flags field in dynamic
flags protected by mutex and fixed flags set during initialization.
Note that sc_dynflags is u_int to prevent the compiler puts both
flags into the same 32 bit access.  Needed on alpha and maybe
elsewhere.

Still missing:
- Use shared net lock for sending.  Basically route cache prevents
  that.  But I have to solve that for shared sending in general.
- Allow shared net lock for calls from tcp_input().
- Run timer without kernel lock.  I am not aware of such a feature.
  There is already some network code that could benefit from that.
  Can we get timer without kernel lock like TASKQ_MPSAFE implements
  it for tasks?

ok?

bluhm

Index: netinet/tcp_input.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_input.c,v
retrieving revision 1.391
diff -u -p -r1.391 tcp_input.c
--- netinet/tcp_input.c 3 Sep 2023 21:37:17 -   1.391
+++ netinet/tcp_input.c 4 Oct 2023 22:40:05 -
@@ -3084,15 +3084,24 @@ tcp_mss_adv(struct mbuf *m, int af)
  * state for SYN_RECEIVED.
  */
 
+/*
+ * Locks used to protect global data and struct members:
+ * N   net lock
+ * S   syn_cache_mtx   tcp syn cache global mutex
+ */
+
 /* syn hash parameters */
-inttcp_syn_hash_size = TCP_SYN_HASH_SIZE;
-inttcp_syn_cache_limit = TCP_SYN_HASH_SIZE*TCP_SYN_BUCKET_SIZE;
-inttcp_syn_bucket_limit = 3*TCP_SYN_BUCKET_SIZE;
-inttcp_syn_use_limit = 10;
+inttcp_syn_hash_size = TCP_SYN_HASH_SIZE;  /* [N] size of hash table */
+inttcp_syn_cache_limit =   /* [N] global entry limit */
+   TCP_SYN_HASH_SIZE * TCP_SYN_BUCKET_SIZE;
+inttcp_syn_bucket_limit =  /* [N] per bucket limit */
+   3 * TCP_SYN_BUCKET_SIZE;
+inttcp_syn_use_limit = 10; /* [N] reseed after uses */
 
 struct pool syn_cache_pool;
 struct syn_cache_set tcp_syn_cache[2];
 int tcp_syn_cache_active;
+struct mutex syn_cache_mtx = MUTEX_INITIALIZER(IPL_SOFTNET);
 
 #define SYN_HASH(sa, sp, dp, rand) \
(((sa)->s_addr ^ (rand)[0]) *   \
@@ -3134,7 +3143,10 @@ do { 
\
 void
 syn_cache_rm(struct syn_cache *sc)
 {
-   sc->sc_flags |= SCF_DEAD;
+   MUTEX_ASSERT_LOCKED(&syn_cache_mtx);
+
+   KASSERT(!ISSET(sc->sc_dynflags, SCF_DEAD));
+   SET(sc->sc_dynflags, SCF_DEAD);
TAILQ_REMOVE(&sc->sc_buckethead->sch_bucket, sc, sc_bucketq);
sc->sc_tp = NULL;
LIST_REMOVE(sc, sc_tpq);
@@ -3151,11 +3163,10 @@ syn_cache_put(struct syn_cache *sc)
if (refcnt_rele(&sc->sc_refcnt) == 0)
return;
 
+   /* Dealing with last reference, no lock needed. */
m_free(sc->sc_ipopts);
-   if (sc->sc_route4.ro_rt != NULL) {
-   rtfree(sc->sc_route4.ro_rt);
-   sc->sc_route4.ro_rt = NULL;
-   }
+   rtfree(sc->sc_route4.ro_rt);
+
pool_put(&syn_cache_pool, sc);
 }
 
@@ -3190,6 +3201,7 @@ syn_cache_insert(struct syn_cache *sc, s
int i;
 
NET_ASSERT_LOCKED();
+   MUTEX_ASSERT_LOCKED(&syn_cache_mtx);
 
/*
 * If there are no entries in the hash table, reinitialize
@@ -,12 +3345,10 @@ syn_cache_timer(void *arg)
uint64_t now;
int lastref;
 
-   NET_LOCK();
-   if (sc->sc_flags & SCF_DEAD)
+   mtx_enter(&syn_cache_mtx);
+   if (ISSET(sc->sc_dynflags, SCF_DEAD))
goto freeit;
 
-   now = tcp_now();
-
if (__predict_false(sc->sc_rxtshift == TCP_MAXRXTSHIFT)) {
/* Drop it -- too many retransmissions. */
goto dropit;
@@ -3353,18 +3363,22 @@ syn_cache_timer(void *arg)
if (sc->sc_rxttot >= tcptv_keep_init)
goto dropit;
 
-   tcpstat_inc(tcps_sc_retransmitted);
-   (void) syn_cache_respond(sc, NULL, now);
-
/* Advance the timer back-off. */
sc->sc_rxtshift++;
TCPT_RANGESET(sc->sc_rxtcur,
TCPTV_SRTTDFLT * tcp_backoff[sc->sc_rxtshift], TCPTV_MIN,
TCPTV_REXMTMAX);
-   if (!timeout_add_msec(&sc->sc_timer, sc->sc_rxtcur))
-   syn_cache_put(sc);
+   if (timeout_add_msec(&sc->sc_timer, sc->sc_rxtcur))
+   refcnt_take(&sc->sc_refcnt);
+   mtx_leave(&syn_cache_mtx);
 
+   NET_LOCK();
+   now = tcp_now();
+   (void) syn_cache_respond(sc, NULL, now);
+   tcpstat_inc(tcps_sc_retransmitted);
NET_UNLOCK();
+
+   syn_cache_put(sc);
return;
 
  dropit:
@@ -3375,8 +3389,8 @@ syn_cache_timer(void *arg)
KASSERT(lastref == 0);

Re: wg destroy hangs


• Vitaliy Makkoveev [2023-10-05 00:02]:

On 5 Oct 2023, at 00:56, Kirill Miazine  wrote:

new diff doesn't prevent hang in test scenario either.



Which one?


I meant to say new diffS, as I had applied both... what I have now is this:

===
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- net/if_wg.c 26 Sep 2023 15:16:44 -  1.31
+++ net/if_wg.c 4 Oct 2023 22:05:05 -
@@ -507,13 +507,8 @@ wg_peer_destroy(struct wg_peer *peer)

noise_remote_clear(&peer->p_remote);

-   NET_LOCK();
-   while (!ifq_empty(&sc->sc_if.if_snd)) {
-   NET_UNLOCK();
+   while (!ifq_empty(&sc->sc_if.if_snd))
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
-   NET_LOCK();
-   }
-   NET_UNLOCK();

taskq_barrier(wg_crypt_taskq);
taskq_barrier(net_tq(sc->sc_if.if_index));
@@ -2580,6 +2575,7 @@ wg_down(struct wg_softc *sc)
wg_unbind(sc);
rw_exit_read(&sc->sc_lock);
NET_LOCK();
+   ifq_purge(&sc->sc_if.if_snd);
 }

 int
Index: net/ifq.c
===
RCS file: /cvs/src/sys/net/ifq.c,v
retrieving revision 1.50
diff -u -p -r1.50 ifq.c
--- net/ifq.c   30 Jul 2023 05:39:52 -  1.50
+++ net/ifq.c   4 Oct 2023 22:05:05 -
@@ -529,6 +529,14 @@ ifq_hdatalen(struct ifqueue *ifq)
return (len);
 }

+void
+ifq_set_maxlen(struct ifqueue *ifq, unsigned int maxlen)
+{
+   mtx_enter(&ifq->ifq_mtx);
+   ifq->ifq_maxlen = maxlen;
+   mtx_leave(&ifq->ifq_mtx);
+}
+
 unsigned int
 ifq_purge(struct ifqueue *ifq)
 {
Index: net/ifq.h
===
RCS file: /cvs/src/sys/net/ifq.h,v
retrieving revision 1.38
diff -u -p -r1.38 ifq.h
--- net/ifq.h   30 Jul 2023 05:39:52 -  1.38
+++ net/ifq.h   4 Oct 2023 22:05:05 -
@@ -435,6 +435,7 @@ void ifq_deq_commit(struct ifqueue *,
 voidifq_deq_rollback(struct ifqueue *, struct mbuf *);
 struct mbuf*ifq_dequeue(struct ifqueue *);
 int ifq_hdatalen(struct ifqueue *);
+voidifq_set_maxlen(struct ifqueue *, unsigned int);
 voidifq_mfreem(struct ifqueue *, struct mbuf *);
 voidifq_mfreeml(struct ifqueue *, struct mbuf_list *);
 unsigned intifq_purge(struct ifqueue *);
@@ -448,9 +449,8 @@ int  ifq_deq_sleep(struct ifqueue *, st
 const char *, volatile unsigned int *,
 volatile unsigned int *);

-#defineifq_len(_ifq)   ((_ifq)->ifq_len)
-#defineifq_empty(_ifq) (ifq_len(_ifq) == 0)
-#defineifq_set_maxlen(_ifq, _l)((_ifq)->ifq_maxlen = (_l))
+#define ifq_len(_ifq)  READ_ONCE((_ifq)->ifq_len)
+#define ifq_empty(_ifq)(ifq_len(_ifq) == 0)

 static inline int
 ifq_is_priq(struct ifqueue *ifq)
@@ -490,8 +490,8 @@ int  ifiq_input(struct ifiqueue *, stru
 int ifiq_enqueue(struct ifiqueue *, struct mbuf *);
 voidifiq_add_data(struct ifiqueue *, struct if_data *);

-#defineifiq_len(_ifiq) ml_len(&(_ifiq)->ifiq_ml)
-#defineifiq_empty(_ifiq)   ml_empty(&(_ifiq)->ifiq_ml)
+#define ifiq_len(_ifiq)READ_ONCE(ml_len(&(_ifiq)->ifiq_ml))
+#define ifiq_empty(_ifiq)  (ifiq_len(_ifiq) == 0)

 #endif /* _KERNEL */

Re: wg destroy hangs

> On 5 Oct 2023, at 00:56, Kirill Miazine  wrote:
> 
> new diff doesn't prevent hang in test scenario either.
> 

Which one?

Re: wg destroy hangs





• Vitaliy Makkoveev [2023-10-04 23:38]:

On 5 Oct 2023, at 00:31, Alexander Bluhm  wrote:

On Wed, Oct 04, 2023 at 11:03:27PM +0300, Vitaliy Makkoveev wrote:

On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:

On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:

If it happns again, could you send an 'ps axlww | grep ifconifg'
output?  Then we see the wait channel where it hangs in the kernel.

$ ps axlww
   UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME COMMAND


Here it happened again:

 0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00
ifconfig wg1 destroy


wg_peer_destroy()
...
NET_LOCK();
while (!ifq_empty(&sc->sc_if.if_snd)) {
NET_UNLOCK();
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
NET_LOCK();
}
NET_UNLOCK();

This net lock dance looks fishy.  And the sleep has a timeout of 1
milli second.  But that is may be per packet.  So if you have a
long queue or the queue refills somehow, it will take forever.

I think the difference in the usage is constant traffic that keeps
the send queue full.  The timeout hides the problem when there are
only a few packets.



This should ensure wg_qstart() will not dereference dying `peer'. Looks
crappy and potentially could block forever, but should work. However
netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
while netlock released before tsleep(), so it serializes nothing but
stops packets processing.

Kirill, does this diff help?


I doubt that it changes much.  When netlock is not taken, the queue
can still be filled with packets.

Removing this ugly netlock makes sense anyway.  But without any
synchronisation just reading a variable feels wrong.  Can we add a
read once like for mq_len in sys/mbuf.h?  And the ifq_set_maxlen()
also looks very unsafe.  For mbuf queues I added a mutex, interface
queues should do the same.

ok?



I guess this is uniprocessor machine, so synchronisation is not
related.

new diff doesn't prevent hang in test scenario either.

wg destroy would hang on both UP and MP machines -- the fresh Vultr test 
machine is MP.



diff is ok mvs.


bluhm

Index: net/ifq.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v
retrieving revision 1.50
diff -u -p -r1.50 ifq.c
--- net/ifq.c   30 Jul 2023 05:39:52 -  1.50
+++ net/ifq.c   4 Oct 2023 21:04:20 -
@@ -529,6 +529,14 @@ ifq_hdatalen(struct ifqueue *ifq)
return (len);
}

+void
+ifq_set_maxlen(struct ifqueue *ifq, unsigned int maxlen)
+{
+   mtx_enter(&ifq->ifq_mtx);
+   ifq->ifq_maxlen = maxlen;
+   mtx_leave(&ifq->ifq_mtx);
+}
+
unsigned int
ifq_purge(struct ifqueue *ifq)
{
Index: net/ifq.h
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.h,v
retrieving revision 1.38
diff -u -p -r1.38 ifq.h
--- net/ifq.h   30 Jul 2023 05:39:52 -  1.38
+++ net/ifq.h   4 Oct 2023 21:09:04 -
@@ -435,6 +435,7 @@ void ifq_deq_commit(struct ifqueue *,
void ifq_deq_rollback(struct ifqueue *, struct mbuf *);
struct mbuf *ifq_dequeue(struct ifqueue *);
int  ifq_hdatalen(struct ifqueue *);
+voidifq_set_maxlen(struct ifqueue *, unsigned int);
void ifq_mfreem(struct ifqueue *, struct mbuf *);
void ifq_mfreeml(struct ifqueue *, struct mbuf_list *);
unsigned int ifq_purge(struct ifqueue *);
@@ -448,9 +449,8 @@ int  ifq_deq_sleep(struct ifqueue *, st
 const char *, volatile unsigned int *,
 volatile unsigned int *);

-#defineifq_len(_ifq)   ((_ifq)->ifq_len)
-#defineifq_empty(_ifq) (ifq_len(_ifq) == 0)
-#defineifq_set_maxlen(_ifq, _l)((_ifq)->ifq_maxlen = (_l))
+#define ifq_len(_ifq)  READ_ONCE((_ifq)->ifq_len)
+#define ifq_empty(_ifq)(ifq_len(_ifq) == 0)

static inline int
ifq_is_priq(struct ifqueue *ifq)
@@ -490,8 +490,8 @@ int  ifiq_input(struct ifiqueue *, stru
int  ifiq_enqueue(struct ifiqueue *, struct mbuf *);
void ifiq_add_data(struct ifiqueue *, struct if_data *);

-#defineifiq_len(_ifiq) ml_len(&(_ifiq)->ifiq_ml)
-#defineifiq_empty(_ifiq)   ml_empty(&(_ifiq)->ifiq_ml)
+#define ifiq_len(_ifiq)READ_ONCE(ml_len(&(_ifiq)->ifiq_ml))
+#define ifiq_empty(_ifiq)  (ifiq_len(_ifiq) == 0)

#endif /* _KERNEL */

Re: wg destroy hangs

> On 5 Oct 2023, at 00:31, Alexander Bluhm  wrote:
> 
> On Wed, Oct 04, 2023 at 11:03:27PM +0300, Vitaliy Makkoveev wrote:
>> On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
>>> On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
> If it happns again, could you send an 'ps axlww | grep ifconifg'
> output?  Then we see the wait channel where it hangs in the kernel.
> 
> $ ps axlww
>   UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME 
> COMMAND
 
 Here it happened again:
 
 0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00 
 ifconfig wg1 destroy
>>> 
>>> wg_peer_destroy()
>>> ...
>>>NET_LOCK();
>>>while (!ifq_empty(&sc->sc_if.if_snd)) {
>>>NET_UNLOCK();
>>>tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
>>>NET_LOCK();
>>>}
>>>NET_UNLOCK();
>>> 
>>> This net lock dance looks fishy.  And the sleep has a timeout of 1
>>> milli second.  But that is may be per packet.  So if you have a
>>> long queue or the queue refills somehow, it will take forever.
>>> 
>>> I think the difference in the usage is constant traffic that keeps
>>> the send queue full.  The timeout hides the problem when there are
>>> only a few packets.
>>> 
>> 
>> This should ensure wg_qstart() will not dereference dying `peer'. Looks
>> crappy and potentially could block forever, but should work. However
>> netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
>> while netlock released before tsleep(), so it serializes nothing but
>> stops packets processing.
>> 
>> Kirill, does this diff help? 
> 
> I doubt that it changes much.  When netlock is not taken, the queue
> can still be filled with packets.
> 
> Removing this ugly netlock makes sense anyway.  But without any
> synchronisation just reading a variable feels wrong.  Can we add a
> read once like for mq_len in sys/mbuf.h?  And the ifq_set_maxlen()
> also looks very unsafe.  For mbuf queues I added a mutex, interface
> queues should do the same.
> 
> ok?
> 

I guess this is uniprocessor machine, so synchronisation is not
related.

diff is ok mvs.

> bluhm
> 
> Index: net/ifq.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v
> retrieving revision 1.50
> diff -u -p -r1.50 ifq.c
> --- net/ifq.c 30 Jul 2023 05:39:52 -  1.50
> +++ net/ifq.c 4 Oct 2023 21:04:20 -
> @@ -529,6 +529,14 @@ ifq_hdatalen(struct ifqueue *ifq)
>   return (len);
> }
> 
> +void
> +ifq_set_maxlen(struct ifqueue *ifq, unsigned int maxlen)
> +{
> + mtx_enter(&ifq->ifq_mtx);
> + ifq->ifq_maxlen = maxlen;
> + mtx_leave(&ifq->ifq_mtx);
> +}
> +
> unsigned int
> ifq_purge(struct ifqueue *ifq)
> {
> Index: net/ifq.h
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.h,v
> retrieving revision 1.38
> diff -u -p -r1.38 ifq.h
> --- net/ifq.h 30 Jul 2023 05:39:52 -  1.38
> +++ net/ifq.h 4 Oct 2023 21:09:04 -
> @@ -435,6 +435,7 @@ void   ifq_deq_commit(struct ifqueue *, 
> void   ifq_deq_rollback(struct ifqueue *, struct mbuf *);
> struct mbuf   *ifq_dequeue(struct ifqueue *);
> intifq_hdatalen(struct ifqueue *);
> +void  ifq_set_maxlen(struct ifqueue *, unsigned int);
> void   ifq_mfreem(struct ifqueue *, struct mbuf *);
> void   ifq_mfreeml(struct ifqueue *, struct mbuf_list *);
> unsigned int   ifq_purge(struct ifqueue *);
> @@ -448,9 +449,8 @@ intifq_deq_sleep(struct ifqueue *, st
>const char *, volatile unsigned int *,
>volatile unsigned int *);
> 
> -#define  ifq_len(_ifq)   ((_ifq)->ifq_len)
> -#define  ifq_empty(_ifq) (ifq_len(_ifq) == 0)
> -#define  ifq_set_maxlen(_ifq, _l)((_ifq)->ifq_maxlen = (_l))
> +#define ifq_len(_ifq)READ_ONCE((_ifq)->ifq_len)
> +#define ifq_empty(_ifq)  (ifq_len(_ifq) == 0)
> 
> static inline int
> ifq_is_priq(struct ifqueue *ifq)
> @@ -490,8 +490,8 @@ intifiq_input(struct ifiqueue *, stru
> intifiq_enqueue(struct ifiqueue *, struct mbuf *);
> void   ifiq_add_data(struct ifiqueue *, struct if_data *);
> 
> -#define  ifiq_len(_ifiq) ml_len(&(_ifiq)->ifiq_ml)
> -#define  ifiq_empty(_ifiq)   ml_empty(&(_ifiq)->ifiq_ml)
> +#define ifiq_len(_ifiq)  READ_ONCE(ml_len(&(_ifiq)->ifiq_ml))
> +#define ifiq_empty(_ifiq)(ifiq_len(_ifiq) == 0)
> 
> #endif /* _KERNEL */

Re: wg destroy hangs

On Wed, Oct 04, 2023 at 11:03:27PM +0300, Vitaliy Makkoveev wrote:
> On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
> > On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
> > > > If it happns again, could you send an 'ps axlww | grep ifconifg'
> > > > output?  Then we see the wait channel where it hangs in the kernel.
> > > > 
> > > > $ ps axlww
> > > >UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME 
> > > > COMMAND
> > > 
> > > Here it happened again:
> > > 
> > >  0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00 
> > > ifconfig wg1 destroy
> > 
> > wg_peer_destroy()
> > ...
> > NET_LOCK();
> > while (!ifq_empty(&sc->sc_if.if_snd)) {
> > NET_UNLOCK();
> > tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
> > NET_LOCK();
> > }
> > NET_UNLOCK();
> > 
> > This net lock dance looks fishy.  And the sleep has a timeout of 1
> > milli second.  But that is may be per packet.  So if you have a
> > long queue or the queue refills somehow, it will take forever.
> > 
> > I think the difference in the usage is constant traffic that keeps
> > the send queue full.  The timeout hides the problem when there are
> > only a few packets.
> > 
> 
> This should ensure wg_qstart() will not dereference dying `peer'. Looks
> crappy and potentially could block forever, but should work. However
> netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
> while netlock released before tsleep(), so it serializes nothing but
> stops packets processing.
> 
> Kirill, does this diff help? 

I doubt that it changes much.  When netlock is not taken, the queue
can still be filled with packets.

Removing this ugly netlock makes sense anyway.  But without any
synchronisation just reading a variable feels wrong.  Can we add a
read once like for mq_len in sys/mbuf.h?  And the ifq_set_maxlen()
also looks very unsafe.  For mbuf queues I added a mutex, interface
queues should do the same.

ok?

bluhm

Index: net/ifq.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v
retrieving revision 1.50
diff -u -p -r1.50 ifq.c
--- net/ifq.c   30 Jul 2023 05:39:52 -  1.50
+++ net/ifq.c   4 Oct 2023 21:04:20 -
@@ -529,6 +529,14 @@ ifq_hdatalen(struct ifqueue *ifq)
return (len);
 }
 
+void
+ifq_set_maxlen(struct ifqueue *ifq, unsigned int maxlen)
+{
+   mtx_enter(&ifq->ifq_mtx);
+   ifq->ifq_maxlen = maxlen;
+   mtx_leave(&ifq->ifq_mtx);
+}
+
 unsigned int
 ifq_purge(struct ifqueue *ifq)
 {
Index: net/ifq.h
===
RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.h,v
retrieving revision 1.38
diff -u -p -r1.38 ifq.h
--- net/ifq.h   30 Jul 2023 05:39:52 -  1.38
+++ net/ifq.h   4 Oct 2023 21:09:04 -
@@ -435,6 +435,7 @@ void ifq_deq_commit(struct ifqueue *, 
 voidifq_deq_rollback(struct ifqueue *, struct mbuf *);
 struct mbuf*ifq_dequeue(struct ifqueue *);
 int ifq_hdatalen(struct ifqueue *);
+voidifq_set_maxlen(struct ifqueue *, unsigned int);
 voidifq_mfreem(struct ifqueue *, struct mbuf *);
 voidifq_mfreeml(struct ifqueue *, struct mbuf_list *);
 unsigned intifq_purge(struct ifqueue *);
@@ -448,9 +449,8 @@ int  ifq_deq_sleep(struct ifqueue *, st
 const char *, volatile unsigned int *,
 volatile unsigned int *);
 
-#defineifq_len(_ifq)   ((_ifq)->ifq_len)
-#defineifq_empty(_ifq) (ifq_len(_ifq) == 0)
-#defineifq_set_maxlen(_ifq, _l)((_ifq)->ifq_maxlen = (_l))
+#define ifq_len(_ifq)  READ_ONCE((_ifq)->ifq_len)
+#define ifq_empty(_ifq)(ifq_len(_ifq) == 0)
 
 static inline int
 ifq_is_priq(struct ifqueue *ifq)
@@ -490,8 +490,8 @@ int  ifiq_input(struct ifiqueue *, stru
 int ifiq_enqueue(struct ifiqueue *, struct mbuf *);
 voidifiq_add_data(struct ifiqueue *, struct if_data *);
 
-#defineifiq_len(_ifiq) ml_len(&(_ifiq)->ifiq_ml)
-#defineifiq_empty(_ifiq)   ml_empty(&(_ifiq)->ifiq_ml)
+#define ifiq_len(_ifiq)READ_ONCE(ml_len(&(_ifiq)->ifiq_ml))
+#define ifiq_empty(_ifiq)  (ifiq_len(_ifiq) == 0)
 
 #endif /* _KERNEL */

Re: wg destroy hangs

On Wed, Oct 04, 2023 at 11:07:24PM +0200, Kirill Miazine wrote:
> 
> 
> • Vitaliy Makkoveev [2023-10-04 22:03]:
> > On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
> > > On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
> > > > > If it happns again, could you send an 'ps axlww | grep ifconifg'
> > > > > output?  Then we see the wait channel where it hangs in the kernel.
> > > > > 
> > > > > $ ps axlww
> > > > > UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   
> > > > > TIME COMMAND
> > > > 
> > > > Here it happened again:
> > > > 
> > > >   0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00
> > > > ifconfig wg1 destroy
> > > 
> > > wg_peer_destroy()
> > >   ...
> > >  NET_LOCK();
> > >  while (!ifq_empty(&sc->sc_if.if_snd)) {
> > >  NET_UNLOCK();
> > >  tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
> > >  NET_LOCK();
> > >  }
> > >  NET_UNLOCK();
> > > 
> > > This net lock dance looks fishy.  And the sleep has a timeout of 1
> > > milli second.  But that is may be per packet.  So if you have a
> > > long queue or the queue refills somehow, it will take forever.
> > > 
> > > I think the difference in the usage is constant traffic that keeps
> > > the send queue full.  The timeout hides the problem when there are
> > > only a few packets.
> > > 
> > 
> > This should ensure wg_qstart() will not dereference dying `peer'. Looks
> > crappy and potentially could block forever, but should work. However
> > netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
> > while netlock released before tsleep(), so it serializes nothing but
> > stops packets processing.
> > 
> > Kirill, does this diff help?
> 
> nope, same hang.
> 
> tested on a fresh Vultr VM with -current and patch below. VM got added to my
> normal WG network, and VM was accessed by SSH over that WG network.
> 
> then:
> 
>  # ifconfig wg1 down (from ssh -- connection to ssh session disappears)
>  # ifconfig wg1 delete (from console)
>  # ifconfig wg1 destroy" (from console -- command hangs)
> 
> interestingly, destroy works fine from ssh when commands are entered in a
> tmux session and executed immediately after each other:
> 
>   # ifconfig wg1 down; ifconfig wg1 delete; ifconfig wg1 destroy
> 
> looks like a timing issue.
> 

Looks like packet stook in `if_snd'. Hypothetically this hack should
help. Please note, even it works, I don't want to commit it. Someone
should introduce reference counter to wg_peer and remove this crap from
wg_peer_destroy().

Index: sys/net/if_wg.c
===
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- sys/net/if_wg.c 26 Sep 2023 15:16:44 -  1.31
+++ sys/net/if_wg.c 4 Oct 2023 21:21:40 -
@@ -507,13 +507,8 @@ wg_peer_destroy(struct wg_peer *peer)
 
noise_remote_clear(&peer->p_remote);
 
-   NET_LOCK();
-   while (!ifq_empty(&sc->sc_if.if_snd)) {
-   NET_UNLOCK();
+   while (!ifq_empty(&sc->sc_if.if_snd))
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
-   NET_LOCK();
-   }
-   NET_UNLOCK();
 
taskq_barrier(wg_crypt_taskq);
taskq_barrier(net_tq(sc->sc_if.if_index));
@@ -2580,6 +2575,7 @@ wg_down(struct wg_softc *sc)
wg_unbind(sc);
rw_exit_read(&sc->sc_lock);
NET_LOCK();
+   ifq_purge(&sc->sc_if.if_snd);
 }
 
 int

Re: wg destroy hangs





• Vitaliy Makkoveev [2023-10-04 22:03]:

On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:

On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:

If it happns again, could you send an 'ps axlww | grep ifconifg'
output?  Then we see the wait channel where it hangs in the kernel.

$ ps axlww
UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME COMMAND


Here it happened again:

  0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00
ifconfig wg1 destroy


wg_peer_destroy()
...
 NET_LOCK();
 while (!ifq_empty(&sc->sc_if.if_snd)) {
 NET_UNLOCK();
 tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
 NET_LOCK();
 }
 NET_UNLOCK();

This net lock dance looks fishy.  And the sleep has a timeout of 1
milli second.  But that is may be per packet.  So if you have a
long queue or the queue refills somehow, it will take forever.

I think the difference in the usage is constant traffic that keeps
the send queue full.  The timeout hides the problem when there are
only a few packets.



This should ensure wg_qstart() will not dereference dying `peer'. Looks
crappy and potentially could block forever, but should work. However
netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
while netlock released before tsleep(), so it serializes nothing but
stops packets processing.

Kirill, does this diff help?


nope, same hang.

tested on a fresh Vultr VM with -current and patch below. VM got added 
to my normal WG network, and VM was accessed by SSH over that WG network.


then:

 # ifconfig wg1 down (from ssh -- connection to ssh session disappears)
 # ifconfig wg1 delete (from console)
 # ifconfig wg1 destroy" (from console -- command hangs)

interestingly, destroy works fine from ssh when commands are entered in 
a tmux session and executed immediately after each other:


  # ifconfig wg1 down; ifconfig wg1 delete; ifconfig wg1 destroy

looks like a timing issue.



Index: sys/net/if_wg.c
===
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- sys/net/if_wg.c 26 Sep 2023 15:16:44 -  1.31
+++ sys/net/if_wg.c 4 Oct 2023 20:01:16 -
@@ -507,13 +507,8 @@ wg_peer_destroy(struct wg_peer *peer)
  
  	noise_remote_clear(&peer->p_remote);
  
-	NET_LOCK();

-   while (!ifq_empty(&sc->sc_if.if_snd)) {
-   NET_UNLOCK();
+   while (!ifq_empty(&sc->sc_if.if_snd))
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
-   NET_LOCK();
-   }
-   NET_UNLOCK();
  
  	taskq_barrier(wg_crypt_taskq);

taskq_barrier(net_tq(sc->sc_if.if_index));

rpki-client 8.6 has been released

2023-10-04 Thread Sebastian Benoit

rpki-client 8.6 has just been released and will be available in the
rpki-client directory of any OpenBSD mirror soon.

rpki-client is a FREE, easy-to-use implementation of the Resource
Public Key Infrastructure (RPKI) for Relying Parties (RP) to
facilitate validation of BGP announcements. The program queries the
global RPKI repository system and validates untrusted network inputs.
The program outputs validated ROA payloads, BGPsec Router keys, and
ASPA payloads in configuration formats suitable for OpenBGPD and BIRD,
and supports emitting CSV and JSON for consumption by other routing
stacks.

See RFC 6480 and RFC 6811 for a description of how RPKI and BGP Prefix
Origin Validation help secure the global Internet routing system.

rpki-client was primarily developed by Kristaps Dzonsons, Claudio
Jeker, Job Snijders, Theo Buehler, Theo de Raadt and Sebastian Benoit
as part of the OpenBSD Project.

This release includes the following changes to the previous release:

- A compliance check was added to ensure the X.509 Subject only
  contains commonName and optionally serialNumber.

- A compliance check was added to ensure the CMS SignedData and
  SignerInfo versions to be 3.

- Fisher-Yates shuffle the order in which Manifest entries are
  processed. Previously, work items were enqueued in the order the CA
  intended them to appear on a Manifest. However, there is no obvious
  benefit to third parties deciding the order in which things are
  processed. Now the Manifest ordering is randomized (as the order has
  no meaning anyway), and the number of concurrent repository
  synchronization operations is limited & timeboxed.

- Various refactoring work.

rpki-client works on all operating systems with a libcrypto library
based on OpenSSL 1.1 or LibreSSL 3.5, a libtls library compatible
with LibreSSL 3.5 or later, and zlib.

rpki-client is known to compile and run on at least the following
operating systems: Alpine, CentOS, Debian, Fedora, FreeBSD, Red Hat,
Rocky, Ubuntu, macOS, and of course OpenBSD!

It is our hope that packagers take interest and help adapt
rpki-client-portable to more distributions.

The mirrors where rpki-client can be found are on
https://www.rpki-client.org/portable.html

Reporting Bugs:
===

General bugs may be reported to tech@openbsd.org

Portable bugs may be filed at
https://github.com/rpki-client/rpki-client-portable

We welcome feedback and improvements from the broader community.
Thanks to all of the contributors who helped make this release
possible.

Assistance to coordinate security issues is available via
secur...@openbsd.org.

Re: wg destroy hangs

On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
> On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
> > > If it happns again, could you send an 'ps axlww | grep ifconifg'
> > > output?  Then we see the wait channel where it hangs in the kernel.
> > > 
> > > $ ps axlww
> > >UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME 
> > > COMMAND
> > 
> > Here it happened again:
> > 
> >  0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00 
> > ifconfig wg1 destroy
> 
> wg_peer_destroy()
>   ...
> NET_LOCK();
> while (!ifq_empty(&sc->sc_if.if_snd)) {
> NET_UNLOCK();
> tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
> NET_LOCK();
> }
> NET_UNLOCK();
> 
> This net lock dance looks fishy.  And the sleep has a timeout of 1
> milli second.  But that is may be per packet.  So if you have a
> long queue or the queue refills somehow, it will take forever.
> 
> I think the difference in the usage is constant traffic that keeps
> the send queue full.  The timeout hides the problem when there are
> only a few packets.
> 

This should ensure wg_qstart() will not dereference dying `peer'. Looks
crappy and potentially could block forever, but should work. However
netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
while netlock released before tsleep(), so it serializes nothing but
stops packets processing.

Kirill, does this diff help? 

Index: sys/net/if_wg.c
===
RCS file: /cvs/src/sys/net/if_wg.c,v
retrieving revision 1.31
diff -u -p -r1.31 if_wg.c
--- sys/net/if_wg.c 26 Sep 2023 15:16:44 -  1.31
+++ sys/net/if_wg.c 4 Oct 2023 20:01:16 -
@@ -507,13 +507,8 @@ wg_peer_destroy(struct wg_peer *peer)
 
noise_remote_clear(&peer->p_remote);
 
-   NET_LOCK();
-   while (!ifq_empty(&sc->sc_if.if_snd)) {
-   NET_UNLOCK();
+   while (!ifq_empty(&sc->sc_if.if_snd))
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
-   NET_LOCK();
-   }
-   NET_UNLOCK();
 
taskq_barrier(wg_crypt_taskq);
taskq_barrier(net_tq(sc->sc_if.if_index));

Re: missing FreeBSD stdio patch

2023-10-04 Thread enh

On Wed, Oct 4, 2023 at 12:46 PM Todd C. Miller  wrote:
>
> On Wed, 04 Oct 2023 11:53:51 -0600, Todd C. Miller wrote:
>
> > Yes, this should be fixed.  One difference is that in FreeBSD,
> > __swsetup() sets errno whereas in OpenBSD we set errno in the caller
> > when cantwrite() is true.  I think it makes sense to set both errno
> > and the __SERR flag in the same place.  We just need to decide which
> > place that is :-)
>
> I decided that it is simplest to set errno and __SERR in __swsetup()
> like FreeBSD has done instead of in the callers.  This is consistent
> with how fread(3) relies on __srefill() to set both errno and the
> error flag.

yeah, love it!

(writing the _other_ missing tests was on my to-do list, so i'd only
have been back in a week or two when i noticed that putc() etc had the
same problem :-) )

> We have an additional check in __swsetup() where we return EOF on
> error that also needs to set errno and __SERR.  There is no actual
> fd in that case so I've set errno to EINVAL.
>
>  - todd
>
> Index: lib/libc/stdio/fvwrite.c
> ===
> RCS file: /cvs/src/lib/libc/stdio/fvwrite.c,v
> retrieving revision 1.20
> diff -u -p -u -r1.20 fvwrite.c
> --- lib/libc/stdio/fvwrite.c17 Mar 2017 16:06:33 -  1.20
> +++ lib/libc/stdio/fvwrite.c4 Oct 2023 18:28:33 -
> @@ -34,7 +34,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include "local.h"
>  #include "fvwrite.h"
> @@ -58,10 +57,8 @@ __sfvwrite(FILE *fp, struct __suio *uio)
> if ((len = uio->uio_resid) == 0)
> return (0);
> /* make sure we can write */
> -   if (cantwrite(fp)) {
> -   errno = EBADF;
> +   if (cantwrite(fp))
> return (EOF);
> -   }
>
>  #defineMIN(a, b) ((a) < (b) ? (a) : (b))
>  #defineCOPY(n)   (void)memcpy(fp->_p, p, n)
> Index: lib/libc/stdio/putc.c
> ===
> RCS file: /cvs/src/lib/libc/stdio/putc.c,v
> retrieving revision 1.13
> diff -u -p -u -r1.13 putc.c
> --- lib/libc/stdio/putc.c   31 Aug 2015 02:53:57 -  1.13
> +++ lib/libc/stdio/putc.c   4 Oct 2023 18:28:37 -
> @@ -32,7 +32,6 @@
>   */
>
>  #include 
> -#include 
>  #include "local.h"
>
>  /*
> @@ -43,10 +42,8 @@
>  int
>  putc_unlocked(int c, FILE *fp)
>  {
> -   if (cantwrite(fp)) {
> -   errno = EBADF;
> +   if (cantwrite(fp))
> return (EOF);
> -   }
> _SET_ORIENTATION(fp, -1);
> return (__sputc(c, fp));
>  }
> Index: lib/libc/stdio/vfprintf.c
> ===
> RCS file: /cvs/src/lib/libc/stdio/vfprintf.c,v
> retrieving revision 1.81
> diff -u -p -u -r1.81 vfprintf.c
> --- lib/libc/stdio/vfprintf.c   8 Sep 2021 15:57:27 -   1.81
> +++ lib/libc/stdio/vfprintf.c   4 Oct 2023 16:40:18 -
> @@ -457,10 +457,8 @@ __vfprintf(FILE *fp, const char *fmt0, _
>
> _SET_ORIENTATION(fp, -1);
> /* sorry, fprintf(read_only_file, "") returns EOF, not 0 */
> -   if (cantwrite(fp)) {
> -   errno = EBADF;
> +   if (cantwrite(fp))
> return (EOF);
> -   }
>
> /* optimise fprintf(stderr) (and other unbuffered Unix files) */
> if ((fp->_flags & (__SNBF|__SWR|__SRW)) == (__SNBF|__SWR) &&
> Index: lib/libc/stdio/vfwprintf.c
> ===
> RCS file: /cvs/src/lib/libc/stdio/vfwprintf.c,v
> retrieving revision 1.22
> diff -u -p -u -r1.22 vfwprintf.c
> --- lib/libc/stdio/vfwprintf.c  8 Sep 2021 15:57:27 -   1.22
> +++ lib/libc/stdio/vfwprintf.c  4 Oct 2023 16:40:18 -
> @@ -451,10 +451,8 @@ __vfwprintf(FILE * __restrict fp, const
>
> _SET_ORIENTATION(fp, 1);
> /* sorry, fwprintf(read_only_file, "") returns EOF, not 0 */
> -   if (cantwrite(fp)) {
> -   errno = EBADF;
> +   if (cantwrite(fp))
> return (EOF);
> -   }
>
> /* optimise fwprintf(stderr) (and other unbuffered Unix files) */
> if ((fp->_flags & (__SNBF|__SWR|__SRW)) == (__SNBF|__SWR) &&
> Index: lib/libc/stdio/wbuf.c
> ===
> RCS file: /cvs/src/lib/libc/stdio/wbuf.c,v
> retrieving revision 1.13
> diff -u -p -u -r1.13 wbuf.c
> --- lib/libc/stdio/wbuf.c   31 Aug 2015 02:53:57 -  1.13
> +++ lib/libc/stdio/wbuf.c   4 Oct 2023 18:28:45 -
> @@ -32,7 +32,6 @@
>   */
>
>  #include 
> -#include 
>  #include "local.h"
>
>  /*
> @@ -54,10 +53,8 @@ __swbuf(int c, FILE *fp)
>  * calls might wrap _w from negative to positive.
>  */
> fp->_w = fp->_lbfsize;
> -   if (cantwrite(fp)) {
> -   errno = EBADF;
> +   if (cantwrite(fp))
> return (EOF);
> -   }
> c = (unsigned char)c;
>
> /*
> Index: lib/l

Re: missing FreeBSD stdio patch

2023-10-04 Thread Todd C . Miller

On Wed, 04 Oct 2023 11:53:51 -0600, Todd C. Miller wrote:

> Yes, this should be fixed.  One difference is that in FreeBSD,
> __swsetup() sets errno whereas in OpenBSD we set errno in the caller
> when cantwrite() is true.  I think it makes sense to set both errno
> and the __SERR flag in the same place.  We just need to decide which
> place that is :-)

I decided that it is simplest to set errno and __SERR in __swsetup()
like FreeBSD has done instead of in the callers.  This is consistent
with how fread(3) relies on __srefill() to set both errno and the
error flag.

We have an additional check in __swsetup() where we return EOF on
error that also needs to set errno and __SERR.  There is no actual
fd in that case so I've set errno to EINVAL.

 - todd

Index: lib/libc/stdio/fvwrite.c
===
RCS file: /cvs/src/lib/libc/stdio/fvwrite.c,v
retrieving revision 1.20
diff -u -p -u -r1.20 fvwrite.c
--- lib/libc/stdio/fvwrite.c17 Mar 2017 16:06:33 -  1.20
+++ lib/libc/stdio/fvwrite.c4 Oct 2023 18:28:33 -
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "local.h"
 #include "fvwrite.h"
@@ -58,10 +57,8 @@ __sfvwrite(FILE *fp, struct __suio *uio)
if ((len = uio->uio_resid) == 0)
return (0);
/* make sure we can write */
-   if (cantwrite(fp)) {
-   errno = EBADF;
+   if (cantwrite(fp))
return (EOF);
-   }
 
 #defineMIN(a, b) ((a) < (b) ? (a) : (b))
 #defineCOPY(n)   (void)memcpy(fp->_p, p, n)
Index: lib/libc/stdio/putc.c
===
RCS file: /cvs/src/lib/libc/stdio/putc.c,v
retrieving revision 1.13
diff -u -p -u -r1.13 putc.c
--- lib/libc/stdio/putc.c   31 Aug 2015 02:53:57 -  1.13
+++ lib/libc/stdio/putc.c   4 Oct 2023 18:28:37 -
@@ -32,7 +32,6 @@
  */
 
 #include 
-#include 
 #include "local.h"
 
 /*
@@ -43,10 +42,8 @@
 int
 putc_unlocked(int c, FILE *fp)
 {
-   if (cantwrite(fp)) {
-   errno = EBADF;
+   if (cantwrite(fp))
return (EOF);
-   }
_SET_ORIENTATION(fp, -1);
return (__sputc(c, fp));
 }
Index: lib/libc/stdio/vfprintf.c
===
RCS file: /cvs/src/lib/libc/stdio/vfprintf.c,v
retrieving revision 1.81
diff -u -p -u -r1.81 vfprintf.c
--- lib/libc/stdio/vfprintf.c   8 Sep 2021 15:57:27 -   1.81
+++ lib/libc/stdio/vfprintf.c   4 Oct 2023 16:40:18 -
@@ -457,10 +457,8 @@ __vfprintf(FILE *fp, const char *fmt0, _
 
_SET_ORIENTATION(fp, -1);
/* sorry, fprintf(read_only_file, "") returns EOF, not 0 */
-   if (cantwrite(fp)) {
-   errno = EBADF;
+   if (cantwrite(fp))
return (EOF);
-   }
 
/* optimise fprintf(stderr) (and other unbuffered Unix files) */
if ((fp->_flags & (__SNBF|__SWR|__SRW)) == (__SNBF|__SWR) &&
Index: lib/libc/stdio/vfwprintf.c
===
RCS file: /cvs/src/lib/libc/stdio/vfwprintf.c,v
retrieving revision 1.22
diff -u -p -u -r1.22 vfwprintf.c
--- lib/libc/stdio/vfwprintf.c  8 Sep 2021 15:57:27 -   1.22
+++ lib/libc/stdio/vfwprintf.c  4 Oct 2023 16:40:18 -
@@ -451,10 +451,8 @@ __vfwprintf(FILE * __restrict fp, const 
 
_SET_ORIENTATION(fp, 1);
/* sorry, fwprintf(read_only_file, "") returns EOF, not 0 */
-   if (cantwrite(fp)) {
-   errno = EBADF;
+   if (cantwrite(fp))
return (EOF);
-   }
 
/* optimise fwprintf(stderr) (and other unbuffered Unix files) */
if ((fp->_flags & (__SNBF|__SWR|__SRW)) == (__SNBF|__SWR) &&
Index: lib/libc/stdio/wbuf.c
===
RCS file: /cvs/src/lib/libc/stdio/wbuf.c,v
retrieving revision 1.13
diff -u -p -u -r1.13 wbuf.c
--- lib/libc/stdio/wbuf.c   31 Aug 2015 02:53:57 -  1.13
+++ lib/libc/stdio/wbuf.c   4 Oct 2023 18:28:45 -
@@ -32,7 +32,6 @@
  */
 
 #include 
-#include 
 #include "local.h"
 
 /*
@@ -54,10 +53,8 @@ __swbuf(int c, FILE *fp)
 * calls might wrap _w from negative to positive.
 */
fp->_w = fp->_lbfsize;
-   if (cantwrite(fp)) {
-   errno = EBADF;
+   if (cantwrite(fp))
return (EOF);
-   }
c = (unsigned char)c;
 
/*
Index: lib/libc/stdio/wsetup.c
===
RCS file: /cvs/src/lib/libc/stdio/wsetup.c,v
retrieving revision 1.7
diff -u -p -u -r1.7 wsetup.c
--- lib/libc/stdio/wsetup.c 8 Aug 2005 08:05:36 -   1.7
+++ lib/libc/stdio/wsetup.c 4 Oct 2023 18:23:48 -
@@ -31,6 +31,7 @@
  * SUCH DAMAGE.
  */
 
+#include 
 #include 
 #include 
 #include "local.h"
@@ -38,7 +39,7 @@
 /*
  * Various output routines call wsetup to be sure it is

Re: wg destroy hangs

On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
> > If it happns again, could you send an 'ps axlww | grep ifconifg'
> > output?  Then we see the wait channel where it hangs in the kernel.
> > 
> > $ ps axlww
> >UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME 
> > COMMAND
> 
> Here it happened again:
> 
>  0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00 
> ifconfig wg1 destroy

wg_peer_destroy()
...
NET_LOCK();
while (!ifq_empty(&sc->sc_if.if_snd)) {
NET_UNLOCK();
tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
NET_LOCK();
}
NET_UNLOCK();

This net lock dance looks fishy.  And the sleep has a timeout of 1
milli second.  But that is may be per packet.  So if you have a
long queue or the queue refills somehow, it will take forever.

I think the difference in the usage is constant traffic that keeps
the send queue full.  The timeout hides the problem when there are
only a few packets.

bluhm

Re: wg destroy hangs

• Kirill Miazine [2023-10-04 20:42]:

I saw some changes to wg recently, so I wanted to report the issue in
case recent commit changed something in time for release. I understand
the issue is probably a year old by now. I guess I hadn't destroyed wg
for a while, although I do believe I have...

• Alexander Bluhm [2023-10-04 16:31]:

On Wed, Oct 04, 2023 at 10:08:01AM -0400, Sonic wrote:

See the post:
"Uninterruptible D State after ifconfig wg0 destroy" Oct. 31, 2022 in
the

Bugs archive.

I have a test regress/sys/net/wg that configures a wg(4), sends
some traffic, and destroys it. I have never seen this bug. There
must be something special to trigger it.

If it happns again, could you send an 'ps axlww | grep ifconifg'
output? Then we see the wait channel where it hangs in the kernel.

$ ps axlww
UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TT
TIME COMMAND

Here it happened again:

0 75339 23922 0 10 0 360 296 wg_ifq D+U p0 0:00.00
ifconfig wg1 destroy

The WCHAN string can be found in the kernel sources and gives hints.
More sophisticated would be to break into ddb and show the kernel
stack trace of the ifconfig process. If you want to do that, I can
give some advice. But I recommend a serial console for that.

Any idea what you did specially to trigger the problem?

I did a down, delete and destroy sequence on a newly booted system.

Actually, after testing more, I see that doing a down, delete and
destroy of a wg interface works fine from console.

Also, I tried this command sequence in a tmux session in ssh over a
connection though the same wg interface:

root@fika ~ # ifconfig wg1 down; ifconfig wg1 delete; ifconfig wg1 destroy

That sequence didn't give an error.

But then when I did wg1 down via ssh over wg1, and _then_ delete and
destroy from console, destroy in console hanged.

Interestingly, when doing "ifconfig wg1 down; ifconfig wg1 delete;" in
tmux and then destroy in console, there's no hang either.

OpenBSD 7.4 (GENERIC) #1332: Wed Oct 4 01:00:54 MDT 2023

bluhm

Re: wg destroy hangs




I saw some changes to wg recently, so I wanted to report the issue in 
case recent commit changed something in time for release. I understand 
the issue is probably a year old by now. I guess I hadn't destroyed wg 
for a while, although I do believe I have...


• Alexander Bluhm [2023-10-04 16:31]:

On Wed, Oct 04, 2023 at 10:08:01AM -0400, Sonic wrote:

See the post:
"Uninterruptible D State after ifconfig wg0 destroy" Oct. 31, 2022 in the
Bugs archive.


I have a test regress/sys/net/wg that configures a wg(4), sends
some traffic, and destroys it.  I have never seen this bug.  There
must be something special to trigger it.

If it happns again, could you send an 'ps axlww | grep ifconifg'
output?  Then we see the wait channel where it hangs in the kernel.

$ ps axlww
   UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME COMMAND


Here it happened again:

0 75339 23922   0  10   0   360   296 wg_ifq  D+Up00:00.00 
ifconfig wg1 destroy



The WCHAN string can be found in the kernel sources and gives hints.
More sophisticated would be to break into ddb and show the kernel
stack trace of the ifconfig process.  If you want to do that, I can
give some advice.  But I recommend a serial console for that.

Any idea what you did specially to trigger the problem?


I did a down, delete and destroy sequence on a newly booted system.

OpenBSD 7.4 (GENERIC) #1332: Wed Oct  4 01:00:54 MDT 2023



bluhm

Re: missing FreeBSD stdio patch

2023-10-04 Thread Todd C . Miller

On Tue, 03 Oct 2023 16:09:12 -0700, enh wrote:

> looks like OpenBSD is missing this patch from FreeBSD? llvm libc++ hit
> this (a test worked on other OSes, including FreeBSD-based macOS) but
> failed on OpenBSD-based Android.
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=127335 was the
> original FreeBSD bug report. POSIX seems to explicitly require this
> behavior (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fwrite.h
> tml:
> "The fwrite() function shall return the number of elements
> successfully written, which may be less than nitems if a write error
> is encountered. If size or nitems is 0, fwrite() shall return 0 and
> the state of the stream remains unchanged. Otherwise, if a write error
> occurs, the error indicator for the stream shall be set, and errno
> shall be set to indicate the error.").

Yes, this should be fixed.  One difference is that in FreeBSD,
__swsetup() sets errno whereas in OpenBSD we set errno in the caller
when cantwrite() is true.  I think it makes sense to set both errno
and the __SERR flag in the same place.  We just need to decide which
place that is :-)

 - todd

Re: wg destroy hangs

It works currently on my own firewall. The problem occured on a client's
firewall, which I can't test right now. It's an amd64 system (older
Supermicro) and as I was shelled in remotely I was getting worried about
recovering and then I discovered "reboot -q" - nothing else was working to
get the interface back up.
Was running -current at the time as well as now.


On Wed, Oct 4, 2023 at 12:05 PM Alexander Bluhm 
wrote:

> On Wed, Oct 04, 2023 at 10:53:30AM -0400, Sonic wrote:
> > When it happened to me back then (2022) I'm pretty sure I did a "down"
> > followed by a "delete" and then the "destroy".
>
> root@ot6:.../~# cd /usr/src/regress/sys/net/wg
> root@ot6:.../wg# make ifconfig
> ...
> root@ot6:.../wg# ifconfig wg11
> wg11: flags=80c3 rdomain 11 mtu 1420
> index 42 priority 0 llprio 3
> wgport 211
> wgpubkey uQP9F5afOHni9RObVahSPxeJgbsrqGw/P4t5Balpmkc=
> wgpeer beT/atjwFPBo3Pv8IvFO5Wf/uVXfgZ5QLSSQIGm/sSc=
> wgendpoint 127.0.0.1 212
> tx: 0, rx: 0
> wgaip fdd7:e83e:66bc:46::2/128
> wgaip 10.188.44.2/32
> groups: wg
> inet 10.188.44.1 netmask 0xff00 broadcast 10.188.44.255
> inet6 fdd7:e83e:66bc:46::1 prefixlen 64
> root@ot6:.../wg# ifconfig wg11 down
> root@ot6:.../wg# ifconfig wg11 delete
> root@ot6:.../wg# ifconfig wg11 destroy
> root@ot6:.../wg#
>
> For me it works.  Tested on i386 and amd64.
>
> > Have not tried to recreate since then.
>
> Can you try it again?  What is different in your setup?
>
> bluhm
>

Re: wg destroy hangs

On Wed, Oct 04, 2023 at 10:53:30AM -0400, Sonic wrote:
> When it happened to me back then (2022) I'm pretty sure I did a "down"
> followed by a "delete" and then the "destroy".

root@ot6:.../~# cd /usr/src/regress/sys/net/wg
root@ot6:.../wg# make ifconfig
...
root@ot6:.../wg# ifconfig wg11
wg11: flags=80c3 rdomain 11 mtu 1420
index 42 priority 0 llprio 3
wgport 211
wgpubkey uQP9F5afOHni9RObVahSPxeJgbsrqGw/P4t5Balpmkc=
wgpeer beT/atjwFPBo3Pv8IvFO5Wf/uVXfgZ5QLSSQIGm/sSc=
wgendpoint 127.0.0.1 212
tx: 0, rx: 0
wgaip fdd7:e83e:66bc:46::2/128
wgaip 10.188.44.2/32
groups: wg
inet 10.188.44.1 netmask 0xff00 broadcast 10.188.44.255
inet6 fdd7:e83e:66bc:46::1 prefixlen 64
root@ot6:.../wg# ifconfig wg11 down
root@ot6:.../wg# ifconfig wg11 delete
root@ot6:.../wg# ifconfig wg11 destroy
root@ot6:.../wg#

For me it works.  Tested on i386 and amd64.

> Have not tried to recreate since then.

Can you try it again?  What is different in your setup?

bluhm

Re: wg destroy hangs

When it happened to me back then (2022) I'm pretty sure I did a "down"
followed by a "delete" and then the "destroy".
Have not tried to recreate since then.


On Wed, Oct 4, 2023 at 10:31 AM Alexander Bluhm 
wrote:

> On Wed, Oct 04, 2023 at 10:08:01AM -0400, Sonic wrote:
> > See the post:
> > "Uninterruptible D State after ifconfig wg0 destroy" Oct. 31, 2022 in the
> > Bugs archive.
>
> I have a test regress/sys/net/wg that configures a wg(4), sends
> some traffic, and destroys it.  I have never seen this bug.  There
> must be something special to trigger it.
>
> If it happns again, could you send an 'ps axlww | grep ifconifg'
> output?  Then we see the wait channel where it hangs in the kernel.
>
> $ ps axlww
>   UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME
> COMMAND
>
> The WCHAN string can be found in the kernel sources and gives hints.
> More sophisticated would be to break into ddb and show the kernel
> stack trace of the ifconfig process.  If you want to do that, I can
> give some advice.  But I recommend a serial console for that.
>
> Any idea what you did specially to trigger the problem?
>
> bluhm
>

Re: wg destroy hangs

On Wed, Oct 04, 2023 at 10:08:01AM -0400, Sonic wrote:
> See the post:
> "Uninterruptible D State after ifconfig wg0 destroy" Oct. 31, 2022 in the
> Bugs archive.

I have a test regress/sys/net/wg that configures a wg(4), sends
some traffic, and destroys it.  I have never seen this bug.  There
must be something special to trigger it.

If it happns again, could you send an 'ps axlww | grep ifconifg'
output?  Then we see the wait channel where it hangs in the kernel.

$ ps axlww
  UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT   TIME COMMAND

The WCHAN string can be found in the kernel sources and gives hints.
More sophisticated would be to break into ddb and show the kernel
stack trace of the ifconfig process.  If you want to do that, I can
give some advice.  But I recommend a serial console for that.

Any idea what you did specially to trigger the problem?

bluhm

Re: wg destroy hangs

See the post:
"Uninterruptible D State after ifconfig wg0 destroy" Oct. 31, 2022 in the
Bugs archive.



On Wed, Oct 4, 2023 at 10:04 AM Sonic  wrote:

> This goes back a ways, to at least 7.2 in October 2022.
>
>
> On Wed, Oct 4, 2023 at 8:54 AM Kirill Miazine  wrote:
>
>> Recently on snapshots I have noticed that ifconfig wgN destroy would
>> just hang there, without any way to get back the control. Power reset
>> would be the only way to reboot and regain control.
>>
>> I don't have exact date when it happened first, maybe some 10 days ago,
>> but problem is present on most recent snapshot (amd64).
>>
>> -- Kirill
>>
>>

Re: wg destroy hangs

This goes back a ways, to at least 7.2 in October 2022.


On Wed, Oct 4, 2023 at 8:54 AM Kirill Miazine  wrote:

> Recently on snapshots I have noticed that ifconfig wgN destroy would
> just hang there, without any way to get back the control. Power reset
> would be the only way to reboot and regain control.
>
> I don't have exact date when it happened first, maybe some 10 days ago,
> but problem is present on most recent snapshot (amd64).
>
> -- Kirill
>
>

Re: wg destroy hangs

2023-10-04 Thread Chris Cappuccio

Can you try compiling without this:

https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/if_wg.c.diff?r1=1.29&r2=1.30

Kirill Miazine [k...@krot.org] wrote:
> Recently on snapshots I have noticed that ifconfig wgN destroy would just
> hang there, without any way to get back the control. Power reset would be
> the only way to reboot and regain control.
> 
> I don't have exact date when it happened first, maybe some 10 days ago, but
> problem is present on most recent snapshot (amd64).
> 
> -- Kirill

wg destroy hangs