I modified the dpdkdump project's code (which uses the pdump framework) to
scale and support the highest throughput possible. I enabled the pdump
framework (which creates a hook in rx/tx in the background) with big
rte_ring and then tried to fan out packets to several rte_rings (I tried
with max 10) which was continuously polled by 10 separate processes solely
running on separate cores and write the packets. when my primary
application had in/out around 1 million pps I saw that, then I could get
around 2 million pps (in+out) in the main rte_ring. but when I increased
the load to 2 million pps (in/out), I was only getting around 2.6-2.8
million pps from the hook though I should get around 4 million pps
(in+out). I am seeing a big "ring full" count, but is it because my fan out
is slow or pdump itself have a bottleneck limit? please let me know...
I took a look at dumpcap which aims at capturing 10gbit/s but it needs to
be run as a primary process as far as I understand... but I need a
secondary dpdk application to capture around 5-10 million pps

Reply via email to