Re: [PATCH bpf-next v2 3/7] bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall

Mauricio Vasquez Tue, 16 Oct 2018 19:29:54 -0700



On 10/16/2018 06:20 PM, Alexei Starovoitov wrote:

On Tue, Oct 16, 2018 at 04:16:39PM -0500, Mauricio Vasquez wrote:


On 10/11/2018 06:51 PM, Alexei Starovoitov wrote:

On Wed, Oct 10, 2018 at 05:50:01PM -0500, Mauricio Vasquez wrote:

Does it make sense to you?

I reread the other patch, and found it does NOT use the following logic for
queue and stack:

                  rcu_read_lock();
                  ptr = map->ops->map_lookup_and_delete_elem(map, key);
                  if (ptr)
                          memcpy(value, ptr, value_size);

I guess this part is not used at all? Can we just remove it?

Thanks,
Song

This is the base code for map_lookup_and_delete support, it is not used in
queue/stack maps.

I think we can leave it there, so when somebody implements lookup_and_delete
for other maps doesn't have to care about implementing also this.

The code looks useful to me, but I also agree with Song. And in the kernel
we don't typically add 'almost dead code'.
May be provide an implementation of the lookup_and_delete for hash map
so it's actually used ?

I haven't written any code but I think there is a potential problem here.
Current lookup_and_delete returns a pointer to the element, hence deletion
of the element should be done using call_rcu to guarantee this is valid
after returning.
In the hashtab, the deletion only uses call_rcu when there is not prealloc,
otherwise the element is pushed on the list of free elements immediately.
If we move the logic to push elements into the free list under a call_rcu
invocation, it could happen that the free list is empty because the call_rcu
is still pending (a similar issue we had with the queue/stack maps when they
used a pass by reference API).

There is another way to implement it without this issue, in syscall.c:
l = ops->lookup(key);
memcpy(l, some_buffer)
ops->delete(key)

The point here is that the lookup_and_delete operation is not being used at
all, so, is lookup + delete = lookup_and_delete?, can it be generalized?
If this is true, then what is the sense of having lookup_and_delete
syscall?,

I though of lookup_and_delete command as atomic operation.
Only in such case it would make sense.
Otherwise there is no point in having additional cmd.
In case of hash map the implementation would be similar to delete:
   raw_spin_lock_irqsave(&b->lock, flags);
   l = lookup_elem_raw(head, hash, key, key_size);
   if (l) {
           hlist_nulls_del_rcu(&l->hash_node);
           bpf_long_memcpy(); // into temp kernel area
           free_htab_elem(htab, l);
           ret = 0;
   }
   raw_spin_unlock_irqrestore(&b->lock, flags);
   copy_to_user();

Well, this is new approach, currently operations have no enough info toperform the copy_to_user directly, btw, is there any technical reasonwhy a double mem copy is done? (from the map value into a temp kernelbuffer and then to userspace?)


there is a chance that some other cpu is doing lookup in parallel
and may be modifying value, so bpf_long_mempcy() isn't fully atomic.

I think we already have that case, if an eBPF program is updating themap, a lookup from userspace could see a partially updated value.

But bpf side is written together with user space side,
so above almost-atomic lookup_and_delete is usable instead
of lookup and then delete which races too much.

Having said that I think it's fine to defer this new ndo for now
and leave lookup_and_delete syscall cmd for stack/queue map only.

I agree, just a question, should we remove the "almost dead code" orleave it there?


Thanks,
Mauricio.

Re: [PATCH bpf-next v2 3/7] bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall

Reply via email to