On Wed, 2018-10-17 at 16:18 +0200, Daniel Borkmann wrote:
> > Sadly I don't have a reproducer other than I've seen it happen on
> > several distinct boxes and deployments.
> > 
> > We use several hash maps and I don't know how to tell from the OOPs
> > which one caused this but the two largest (100-600k entries) work
> > like:
> > 
> > - eBPF program creates entries as it sees flows on the network and
> > counts metrics (bytes, packets, etc) into each entry and updates a
> > timestamp on each update
> 
> Does that happen via map update helper from kernel side in your prog?

Yes, when a new entry is added there is a map_update_elem() call.
However when an entry is updated, we update the struct returned by
map_lookup_elem directly without another call to map_update_elem().

We had another thread a while back on whether or not map_update_elem()
needs to be called every time and I'm still not really clear on the
requirements for safe concurrency in this area.

> > - User space iterates over the map every 10s and pulls out the
> > metrics
> > for further processing. If the timestamp on the entry is over a
> > threshold, the user space process deletes the entry (kernel side
> > never
> > deletes)
> 
> And the map iterate + delete, are there multiple threads walking it
> at the same time or just single one (just making sure ...)?

There is one Goroutine which does this work which should map to one
thread (we use https://github.com/aterlo/puregobpf).

> > bpftool map output:
> > 
> > 22: hash  flags 0x0
> 
> Ok, you are using map prealloc here. How often does the BUG trigger
> in general on your machines? Was there an older kernel where you
> haven't been able to trigger it?

We've seen these for some time going back several kernel versions. I'm
pretty sure I've also seen it on 4.18 but don't have an example handy.
I can't say definitively how often per box but we probably see one of
these every week or so across many boxes. I initially thought it was
related to bad hardware but now doubt that.


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1501): https://lists.iovisor.org/g/iovisor-dev/message/1501
Mute This Topic: https://lists.iovisor.org/mt/27375713/21656
Group Owner: iovisor-dev+ow...@lists.iovisor.org
Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to