On Wed, Jun 21, 2017 at 6:40 AM, Douglas Caetano dos Santos via iovisor-dev <iovisor-dev@lists.iovisor.org> wrote: > > Hi, > > We're starting to extensively use eBPF on our servers and we've got a couple > of > questions on specific internals that I hope you can help clarify. > > The problem we're trying to solve is this: we want to mark packets from > selected > incoming TCP flows to act on them in different ways. This marking must be done > before the packets enter the IP stack.
This should work, just be aware that other subsystems may also mark the packet (nft, etc.) so your system configuration should be sane. > > > We plan on receiving around 1Mpps, so we're doing the decision making with > eBPF. > > Our idea is to use a eBPF hashtable whose keys are tuples and values are some > flags. Once we receive a SYN packet, we decide if we start marking this new > flow > from now on. If no marking is going to be done, we do nothing, but if we > decide > to mark, we add this tuple to the hashtable with some internal flags (to be > used > by a user space program) and add a mark to this packet. For other packets, we > check if the tuple exists in the hashtable and, if it exists, we add a mark to > these packets. > > So our questions are: > > 1) considering we're going to receive 1Mpps, are eBPF hashtables appropriate > for > this task? Yes, though I haven't recently done specific microbenchmarking (others have, maybe they can chime in?), I can say anecdotally that a use case with say 10Mpps across 16 cores and an insert/delete for each packet would be achievable. My use case was doing a bunch of other stuff at the time so I can't say how much of that workload was hash table specific time. > > and 2) are the values written into the hashtable or the insertion/deletion of > entries immediately propagated to other CPUs? This is important to avoid a > race > condition where two packets of the same flow are received in different CPUs, > where one could get marked and the other don't. The hash table implementation uses RCU semantics to perform the synchronization. There is a fair amount of documentation on that subject, but in a nutshell, writes on one core will be made available to other cores after a small grace period. Reads will always see a consistent view of the hash table, but might read slightly older data. If ntuple filtering is turned on in your nic, I would expect that all packets for the same flow arrive to the same queue/CPU. Also, note that there are two flavors of hash table, one which is shared across all cpus (with semantics as just described) and one which is per-cpu (faster but not shared). I expect you want to use the former. > > (Thanks, Brendan, for pointing me to the iovisor mailing list.) > > Thanks for your help with these questions. > Best regards, > Douglas Santos. > _______________________________________________ > iovisor-dev mailing list > iovisor-dev@lists.iovisor.org > https://lists.iovisor.org/mailman/listinfo/iovisor-dev _______________________________________________ iovisor-dev mailing list iovisor-dev@lists.iovisor.org https://lists.iovisor.org/mailman/listinfo/iovisor-dev