On 06/29/2018 06:45 AM, Daniel Borkmann wrote: > On 06/25/2018 05:34 PM, John Fastabend wrote: > [...] >> @@ -2302,9 +2347,12 @@ static int sock_hash_ctx_update_elem(struct >> bpf_sock_ops_kern *skops, >> goto bucket_err; >> } >> >> - e->hash_link = l_new; >> - e->htab = container_of(map, struct bpf_htab, map); >> + rcu_assign_pointer(e->hash_link, l_new); >> + rcu_assign_pointer(e->htab, >> + container_of(map, struct bpf_htab, map)); >> + write_lock_bh(&sock->sk_callback_lock); >> list_add_tail(&e->list, &psock->maps); >> + write_unlock_bh(&sock->sk_callback_lock); >> >> /* add new element to the head of the list, so that >> * concurrent search will find it before old elem >> @@ -2314,8 +2362,10 @@ static int sock_hash_ctx_update_elem(struct >> bpf_sock_ops_kern *skops, >> psock = smap_psock_sk(l_old->sk); >> >> hlist_del_rcu(&l_old->hash_node); >> + write_lock_bh(&l_old->sk->sk_callback_lock); >> smap_list_hash_remove(psock, l_old); >> smap_release_sock(psock, l_old->sk); >> + write_unlock_bh(&l_old->sk->sk_callback_lock); >> free_htab_elem(htab, l_old); >> } >> raw_spin_unlock_bh(&b->lock); > > Rather a question for clarification that came up while staring at this part > in sock_hash_ctx_update_elem(): > > In the 'err' label, wouldn't we also need the sk_callback_lock given we access > a potential psock, and modify its state (refcnt) would this be needed as well, > or are we good here since we're under RCU context anyway? Hmm, if latter is > indeed the case, then I'm wondering why we would need the above /sock/'s > sk_callback_lock for modifying list handling inside the psock and couldn't use > some locking that is only in scope of the actual struct smap_psock? >
So... originally I used sk_callback_lock to cover both the maps list and modifying sk callbacks (its real intention!) but with the addition of sockhash that has gotten increasingly clumsy. At this point it seems like best/cleanest option would be to move sk_callback_lock to _only_ cover callback updates and create a new lock to protect maps. Then we avoid this pain. Question is do we want to do this in bpf or bpf-next? I'm thinking we respin this series, yet again, with the above and hopefully at that point the locking will be cleaner. Your observation about err label is correct, see below. > My other question is, if someone calls the update helper with BPF_NOEXIST > which > we seem to support with bailing out via -EEXIST, and if there's currently a > psock attached already to the sock, then we drop this one off from the socket? > Is that correct / expected? I have a patch I'll submit shortly to fix this. This was missed fallout from allowing socks in multiple maps. Checking to see if it has a psock is not sufficient to know if we can dec the refcnt. It used to be in earlier versions before the multiple map support. Also because the psock can be referenced still from another map the sk_callback_lock (or new map specific lock) is needed to protect maps. > > Thanks, > Daniel >