I committed this a couple of weeks ago.

--
 Sent from a phone, apologies for poor formatting.
On 8 April 2021 06:10:25 Klemens Nanni <k...@openbsd.org> wrote:

On Mon, Mar 22, 2021 at 12:42:27AM +1100, Matt Dunwoodie wrote:
On Sat, 20 Mar 2021 11:48:52 +0000
Stuart Henderson <s...@spacehopper.org> wrote:

oh, let's cc Matt on this too.

On 2021/03/20 11:17, Martin Pieuchot wrote:
On 19/03/21(Fri) 20:15, Stuart Henderson wrote:
Not a great report but I don't have much more to go on, machine
had ddb.panic=0 and ddb hanged while printing the stack trace.
Retyped by hand, may contain typos. Happened a few hours after
setting up wg on it.

uvm_fault(0xffffffff82204e38, 0x20, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff81752116 cs 8 rflags 10246 cr2 20
cpl 0 rsp 00023b35eb0 gsbase 0xffffffff820eaff0 kgsbase 0x0
panic: trap type 6, code=0, pc=ffffffff81752116
Starting stack trace...
panic(ffffffff81ddc97a) at panic+0x11d
kerntrap(ffff800023b35e00) at kerntrap+0x114
alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
wg_index_drop(ffff8000012ae000,0) at wg_index_drop+0x96
noise_create_initiation(

This is a NULL dereference at line 1981 of net/if_wg.c:

wg_index_drop(void *_sc, uint32_t key0)
{
...
/* We expect a peer */
   peer = CONTAINER_OF(iter->i_value, struct wg_peer,
p_remote); ...
}

Does that mean that `iter' is NULL and i_value' is at ofset 0x20 in
that struct?

Correct. The issue is we're trying to remove an index that doesn't
exist. wg_index_drop iterates through the list and expects to find a
matching index (perhaps a KASSERT could have been helpful here).
Nevertheless, since index 0 doesn't exist `iter` ends up being NULL.

Oh, I am an idiot, I had debug set and there is something other than
just standard messages around that time. Both sides are OpenBSD
wg(4). I did not have debug on the other side.

[...]
18:51:08.041Z  wg2: Sending handshake initiation to peer 3
18:51:08.091Z  wg2: Receiving handshake initiation from peer 3
18:51:08.091Z  wg2: Sending handshake response to peer 3
18:51:08.091Z  wg2: Unknown handshake response
18:51:13.141Z  wg2: Receiving handshake initiation from peer 3
18:51:13.141Z  wg2: Sending handshake response to peer 3
18:51:13.191Z  wg2: Handshake for peer 3 did not complete after 5
seconds, retrying (try 2) 18:51:13.191Z  wg2: Receiving keepalive
packet from peer 3 18:51:13.191Z  wg2: Sending keepalive packe
18:51:13.191Z  t to peer 3
18:52:28.242Z  wg2: Sending keepalive packet to peer 3
18:52:28.342Z  wg2: Receiving keepalive packet from peer 3
18:53:43.343Z  wg2: Sending keepalive packet to peer 3
18:54:58.345Z  wg2: Sending handshake initiation to peer 3
18:54:58.395Z  wg2: Receiving handshake initiation from peer 3
18:54:58.395Z  wg2: Sending handshake response to peer 3
18:54:58.395Z  wg2: Unknown handshake response
<syslog stops here, rest retyped>
wg2: Handshake for peer 3 did not complete after 5 seconds, retrying
(try 2) wg2: Sending handshake initiation to peer 3
wg2: Sending handshake response to peer 3
<null deref here>

With this information, it was possible to reproduce the issue on my
end. There is a race between sending/receiving handshake packets. This
occurs if we consume an initiation, then send an initiation prior to
replying to the consumed initiation.

In particular, when consuming an initiation, we don't generate the
index until creating the response (which is incorrect). If we attempt
to create an initiation between these processes, we drop any
outstanding handshake which in this case has index 0 as set when
consuming the initiation.

The fix attached is to generate the index when consuming the initiation
so that any spurious initiation creation can drop a valid index. The
patch also consolidates setting fields on the handshake.

With this patch applied, I was unable to reproduce the crash.
This looks good and works, OK kn

sthen, do you want to commit this fix?  I think it should make it into
6.9 release.

diff --git net/wg_noise.c net/wg_noise.c
index 86f7823cc83..176c36609fc 100644
--- net/wg_noise.c
+++ net/wg_noise.c
@@ -299,9 +299,6 @@ noise_consume_initiation(struct noise_local *l, struct noise_remote **rp,
NOISE_TIMESTAMP_LEN + NOISE_AUTHTAG_LEN, key, hs.hs_hash) != 0)
goto error;

- hs.hs_state = CONSUMED_INITIATION;
- hs.hs_local_index = 0;
- hs.hs_remote_index = s_idx;
memcpy(hs.hs_e, ue, NOISE_PUBLIC_KEY_LEN);

/* We have successfully computed the same results, now we ensure that
@@ -321,6 +318,9 @@ noise_consume_initiation(struct noise_local *l, struct noise_remote **rp,

/* Ok, we're happy to accept this initiation now */
noise_remote_handshake_index_drop(r);
+ hs.hs_state = CONSUMED_INITIATION;
+ hs.hs_local_index = noise_remote_handshake_index_get(r);
+ hs.hs_remote_index = s_idx;
r->r_handshake = hs;
*rp = r;
ret = 0;
@@ -369,7 +369,6 @@ noise_create_response(struct noise_remote *r, uint32_t *s_idx, uint32_t *r_idx,
noise_msg_encrypt(en, NULL, 0, key, hs->hs_hash);

hs->hs_state = CREATED_RESPONSE;
- hs->hs_local_index = noise_remote_handshake_index_get(r);
*r_idx = hs->hs_remote_index;
*s_idx = hs->hs_local_index;
ret = 0;

Reply via email to