This is a good discussion and I hope Intel can see and benefit from it. For my "usecase", I don't necessarily need round robin on a per packet level, but simply some normalized distribution among core queues that has nothing to do with anything inside the packet. A good solution perhaps could be to allow the NIC to switch to another core's queue after a certain number of packets have been received... perhaps using something like the burst rate. I just see this as being something that is of the most fundamental in functionality and lacking in the DPDK. I'm sure there are many "usecase"s that don't involve routing/forwarding/switching/etc. but of course need to maximize throughput.
- Mike On Thu, Dec 5, 2013 at 9:29 AM, Prashant Upadhyaya < prashant.upadhyaya at aricent.com> wrote: > Hi, > > Well, GTP is the main usecase. > We end up with a GTP tunnel between the two machines. > And ordinarily with 82599, all the data will land up on a single queue and > therefore must be polled on a single core. Bottleneck. > > But in general, if I want to employ all the CPU cores horsepower > simultaneously to pickup the packets from NIC, then it is natural that I > drop a queue each for every core into the NIC and if the NIC does a round > robin then it naturally fans out and I can use all the cores to lift > packets from NIC in a load balanced fashion. > > Imagine a theoretical usecase, where I have to lift the packets from the > NIC, inspect it myself in the application and then switch them to the right > core for further processing. So my cores have two jobs, one is to poll the > NIC and then switch the packets to the right core. Here I would simply love > to poll the queue and the intercore ring from each core to achieve the > processing. No single core will become the bottleneck as far as polling the > NIC is concerned. You might argue on what basis I switch to the relevant > core for further processing, but that's _my_ usecase and headache to > further equally distribute amongst the cores. > > Imagine an LTE usecase where I am on the core side (SGW), the packets come > over GTP from thousands of mobiles (via eNB). I can employ all the cores to > pickup the GTP packets (if NIC gives me round robin) and then based on the > inner IP packet's src IP address (the mobile IP address), I can take it to > the further relevant core for processing. This way I will get a complete > load balancing done not only for polling from NIC but also for processing > of the inner IP packets. > > I have also worked a lot on Cavium processors. Those of you who are > familiar with that would know that the POW scheduler gives the packets to > whichever core is requesting for work so the packets can go to any core in > Cavium Octeon processor. The only way to achieve similar functionality in > DPDK is to drop a queue per core into the NIC and then let NIC do round > robin on those queues blindly. What's the harm if this feature is added, > let those who want to use it, use, and those who hate it or think it is > useless, ignore. > > Regards > -Prashant > > -----Original Message----- > From: Fran?ois-Fr?d?ric Ozog [mailto:ff at ozog.com] > Sent: Thursday, December 05, 2013 2:16 PM > To: Prashant Upadhyaya > Cc: 'Michael Quicquaro'; 'Stephen Hemminger'; dev at dpdk.org > Subject: RE: [dpdk-dev] generic load balancing > > Hi, > > If the traffic you manage is above MPLS or GTP encapsulations, then you > can use cards that provide flexible hash functions. Chelsio cxgb5 provides > combination of "offset", length and tuple that may help. > > The only reason I would have loved to get a pure round robin feature was > to pass certain "Breaking Point" (http://www.ixiacom.com/breakingpoint) > tests where the traffic issue was multicast from a single source... But > that is not real life traffic. > > If you could share the use case... > > Fran?ois-Fr?d?ric > > > -----Message d'origine----- > > De : Prashant Upadhyaya [mailto:prashant.upadhyaya at aricent.com] > > Envoy? : jeudi 5 d?cembre 2013 06:30 > > ? : Stephen Hemminger > > Cc : Fran?ois-Fr?d?ric Ozog; Michael Quicquaro; dev at dpdk.org Objet : > > RE: [dpdk-dev] generic load balancing > > > > Hi Stepher, > > > > The awfulness depends upon the 'usecase' > > I have eg. a usecase where I want this roundrobin behaviour. > > > > I just want the NIC to give me a facility to use this. > > > > Regards > > -Prashant > > > > > > -----Original Message----- > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > Sent: Thursday, December 05, 2013 10:25 AM > > To: Prashant Upadhyaya > > Cc: Fran?ois-Fr?d?ric Ozog; Michael Quicquaro; dev at dpdk.org > > Subject: Re: [dpdk-dev] generic load balancing > > > > Round robin would actually be awful for any protocol because it would > cause > > out of order packets. > > That is why flow based algorithms like flow director and RSS work much > > better. > > > > On Wed, Dec 4, 2013 at 8:31 PM, Prashant Upadhyaya > > <prashant.upadhyaya at aricent.com> wrote: > > > Hi, > > > > > > It's a real pity that Intel 82599 NIC (and possibly others) don't > > > have a > > simple round robin scheduling of packets on the configured queues. > > > > > > I have requested Intel earlier, and using this forum requesting > > > again -- > > please please put this facility in the NIC that if I drop N queues > > there and configure the NIC for some round robin scheduling on > > queues, then NIC should simply put the received packets one by one on > > queue 1, then on queue2,....,then on queueN, and then back on queue 1. > > > The above is very useful in lot of load balancing cases. > > > > > > Regards > > > -Prashant > > > > > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of > > > Fran?ois-Fr?d?ric Ozog > > > Sent: Thursday, December 05, 2013 2:35 AM > > > To: 'Michael Quicquaro' > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] generic load balancing > > > > > > Hi, > > > > > > As far as I can tell, this is really hardware dependent. Some hash > > functions allow uplink and downlink packets of the same "session" to > > go to the same queue (I know Chelsio can do this). > > > > > > For the Intel card, you may find what you want in: > > > http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10 > > > -g > > > be-con > > > troller-datasheet.html > > > > > > Other cards require NDA or other agreements to get details of RSS. > > > > > > If you have a performance problem, may I suggest you use kernel 3.10 > then > > monitor system activity with "perf" command. For instance you can > > start with "perf top -a" this will give you nice information. Then > > your creativity will do the rest ;-) You may be surprised what comes > > on the top hot points... > > > (the most unexpected hot function I found here was Linux syscall > > > gettimeofday!!!) > > > > > > Fran?ois-Fr?d?ric > > > > > >> -----Message d'origine----- > > >> De : dev [mailto:dev-bounces at dpdk.org] De la part de Michael > > >> Quicquaro Envoy? : mercredi 4 d?cembre 2013 18:53 ? : dev at dpdk.org > Objet > > : > > >> [dpdk-dev] generic load balancing > > >> > > >> Hi all, > > >> I am writing a dpdk application that will receive packets from one > > >> interface and process them. It does not forward packets in the > > > traditional > > >> sense. However, I do need to process them at full line rate and > > >> therefore need more than one core. The packets can be somewhat > > >> generic in nature > > > and > > >> can be nearly identical (especially at the beginning of the packet). > > >> I've used the rxonly function of testpmd as a model. > > >> > > >> I've run into problems in processing a full line rate of data since > > >> the nature of the data causes all the data to be presented to only > > >> one > > core. > > > I > > >> get a large percentage of dropped packets (shows up as Rx-Errors in > > >> "port > > >> stats") because of this. I've tried modifying the data so that > > >> packets have different UDP ports and that seems to work when I use > > >> --rss-udp > > >> > > >> My questions are: > > >> 1) Is there a way to configure RSS so that it alternates packets to > > >> all configured cores regardless of the packet data? > > >> > > >> 2) Where is the best place to learn more about RSS and how to > > >> configure it? I have not found much in the DPDK documentation. > > >> > > >> Thanks for the help, > > >> - Mike > > > > > > > > > > > > > > > > > > ==================================================================== > > > == > > > ========= Please refer to > > > http://www.aricent.com/legal/email_disclaimer.html > > > for important disclosures regarding this electronic communication. > > > ==================================================================== > > > == > > > ========= > > > > > > > > > > > =========================================================================== > > ==== > > Please refer to http://www.aricent.com/legal/email_disclaimer.html > > for important disclosures regarding this electronic communication. > > > =========================================================================== > > ==== > > > > > > > =============================================================================== > Please refer to http://www.aricent.com/legal/email_disclaimer.html > for important disclosures regarding this electronic communication. > > =============================================================================== >