Since I don't receive any response from any other I'll answer with my own findings so far. Doesn't everybody have this problem when using TNAPI?
I've now traced it down to that the performance drop is whether or not the E1000_MRQC_RSS_FIELD_IPV4 bit in the MRQC configuration register is active or not, it has nothing to do with linux stack which I've initially thought. It is the board that limits the performance within the RSS scheduling. Even if there is only a single RSS queue this is valid. When using the igb drivers patched with pure PF_RING this bit can be active without this performance loss, even when separating on different queue's work then. - Do you have TNAPI running at high speeds (> 300kpacket/s) using 82576 boards using TNAPI? - Which version of kernel are you then using? - Which version of dca and ioatdma? - Are there different versions of the board? Is there any upgradeable firmware on them? - Is there a race condition inside the board since the TNAPI thread is "rather aggressive" on the rx-ring? It would be nice if someone of you could help me answer some of my questions, the best would be if someone also gives me a solution. :-) Best regards /Mathias Björklund My info: Quad 1G 82576 board - Ethernet controller [0200]: Intel Corporation Device [8086:10e8] (rev 01) Kernel: 2.6.27 from Ubuntu 8.10 Dca version 1.4 Ioatdma 3.30 Dual - Intel(R) Xeon(R) CPU E5450 @ 3.00GHz TNAPI as of 2010-01-19 PF_RING 4.1 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: den 3 februari 2010 08:47 To: [email protected] Subject: Re: [Ntop-misc] Performance problem when using TN_API I give additional information/comments around my problem. It gets even stranger and stranger, there must be something very basic I'm missing or doing wrong. I've modified the TNAPI thread loop so it defines use_ring = 1 and commented out the call to ring_handler(skb..... As far as I see this should remove the copying to pf_ring and also prevent the packet to go to linux kernel. Still: When starting pf_ring/userland/examples/pfcount -i eth3 I receive 150k packets/s when receiving IP packets and 500k packets/s when receiving other Ethernet packets. (pfcount of course reports 0 packets since the call to pf_ring never takes place now) Driver goes to promisc mode when starting the tool. All time spent in TNAPI thread as system. Best regards /Mathias Björklund -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of [email protected] Sent: den 2 februari 2010 14:24 To: [email protected] Subject: Re: [Ntop-misc] Performance problem when using TN_API I was evidently not clear enough with my question. It only concerns reception. I use a different computer with a non tnapi drive for transmission. The performance of the snooping reception is affected by the packet which is received/transmitted if I use TNAPI+PFRING, but not if using only PF_RING. The strange thing is that PF_RING with IP packets gives good performance whereas TNAPI+PFRING with IP packets give bad performance. With non-ip packets both of them perform well. The key point actually seems to be if there is a non-zero ip address in the packet. I've used pf_ring both with transparent_mode=2 and default 0 but no difference. I have set all interrupts for this interface towards a core #0. (echo 1 > /proc/irq/204/smp_affinity) I verify that the TNAPI thread is set to core #0. I use the pfcount on core #0. (pfcount -I eth3) It works very well with non-ip packets but lousy with ip packets. I attach a debug_trace within pf_ring one with ip packet and one without but I see no major difference... Am I missing something basic? Who is looking at the ip-packet since pf_ring should drop it? The processing power is used within the tnapi thread when the performance is bad. I use kernel 2.6.27. Should I upgrade? The dca is version 1.4 Pf_ring 4.1 Tnapi 19/1-2010 Best regards /Mathias Björklund PF_RING + TNAPI + VALID IP PACKET (approx 150 kpackets/s) ========================================================= [447495.178511] [PF_RING] get_remove_slot(0) [447495.180025] [PF_RING] poll returning 0 [447495.968286] [PF_RING] --> skb_ring_handler() [channel_id=0/1] [447495.971108] [PF_RING] skb_ring_handler() [skb=f5222900][0.0][len=98][dev=eth3][csum=0] [447495.974025] is_ip 1 [447495.974799] [PF_RING] --> add_skb_to_ring(len=98) [channel_id=0/1] [447495.977124] [PF_RING] add_skb_to_ring: [<NULL>][displ=0][len=98][caplen=98][is_ip_pkt=1][0 -> 0][f5512000/f5512000] [447495.981438] [PF_RING] add_skb_to_ring(skb) [len=98][tot=0][insertIdx=0][pkt_type=0][cloned=0] [447495.984632] [PF_RING] --> add_pkt_to_ring(len=98) [pfr->channel_id=255][channel_id=0] [447495.987577] [PF_RING] get_insert_slot(0): returned slot [slot_state=0] [447495.990062] [PF_RING] --> [caplen=98][len=98][displ=0][parsed_header_len=0][bucket_len=128][sizeof=84] [447495.993432] [PF_RING] ==> insert_idx=1 [447495.994763] [PF_RING] [pfr->slots_info->insert_idx=1] [447495.994769] [PF_RING] poll called (non DNA device) [447495.994771] [PF_RING] get_remove_slot(0) [447495.994773] [PF_RING] poll returning 1 PF_RING + TNAPI + INVALID IP PACKET (approx 500 kpackets/s) ============================================================= [447440.511732] [PF_RING] poll called (non DNA device) [447440.513530] [PF_RING] get_remove_slot(0) [447440.514935] [PF_RING] poll returning 0 [447440.680982] [PF_RING] --> skb_ring_handler() [channel_id=0/1] [447440.685235] [PF_RING] skb_ring_handler() [skb=f5222840][0.0][len=98][dev=eth3][csum=0] [447440.688195] is_ip 0 [447440.688913] [PF_RING] --> add_skb_to_ring(len=98) [channel_id=0/1] [447440.691368] [PF_RING] add_skb_to_ring: [<NULL>][displ=0][len=98][caplen=98][is_ip_pkt=0][0 -> 0][f5512000/f5512000] [447440.695260] [PF_RING] add_skb_to_ring(skb) [len=98][tot=0][insertIdx=0][pkt_type=0][cloned=0] [447440.698315] [PF_RING] --> add_pkt_to_ring(len=98) [pfr->channel_id=255][channel_id=0] [447440.702068] [PF_RING] get_insert_slot(0): returned slot [slot_state=0] [447440.704476] [PF_RING] --> [caplen=98][len=98][displ=0][parsed_header_len=0][bucket_len=128][sizeof=84] [447440.707843] [PF_RING] ==> insert_idx=1 [447440.709198] [PF_RING] [pfr->slots_info->insert_idx=1] [447440.709204] [PF_RING] poll called (non DNA device) [447440.709205] [PF_RING] get_remove_slot(0) [447440.709207] [PF_RING] poll returning 1 [447440.709214] [PF_RING] poll called (non DNA device) -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Luca Deri Sent: den 1 februari 2010 22:46 To: [email protected] Subject: Re: [Ntop-misc] Performance problem when using TN_API Mathias TNAPI has been designed for packet capture and not for transmission. This means that the optimization is on RX. Do not expect to see any performance improvement (or perhaps some slow-down as the code has been tunes for RX) on TX. Luca On Feb 1, 2010, at 8:43 AM, <[email protected]> wrote: > Hi! > > I'm using TN_API and I'm having performance problem when using pfcount/pcount. > > I'm using an own stub to send packets to the machine which sends RAW Ethernet > packets. (100 bytes long) > I'm using the igb driver. > > When I send non ip-packets or packets that doesn't have a valid ip-address in > destination or source I get expected results. > 500 k packets/s. But as soon > as I send IP packets with a non-zero destination or source address I get > results in the range of 150 k packets/s. > When using only PF_RING I get > 400k packets/s independently of valid ip > address or not. Does anybody have a clue why it behaves like this. According > to 'top' the cpu-time are spent inside the TNAPI thread. > I have similar result independently of pfcount or pcount. > > PERFORMANCE > INVALID IP-ADDRESS VALID IP-ADDRESS > (or non-ip) > TNAPI with PFRING >500 kpacket/s >150 kpacket/s > TNAPI without PFRING >500 kpacket/s >150 kpacket/s > PFRING aware driver >400 kpacket/s >400 kpacket/s > > I don't have any iptables installed. > No ip-address configured on the interface. > I believe that I've stopped all (other) processes that listen on this > interface. > > Ubuntu 8.10 using 2.6.27-16-generic kernel. > PF_RING 4.1 > TN_API as of 2010-01-19 > Intel(R) Xeon(R) CPU E5450 @ 3.00GHz (most of stuff is limited to > one processor though) > > Can anybody help me in some direction where to look? > > Best regards > > /Mathias Björklund > (This is a resend as the first one seems to get dropped by the moderator due > to not subscribed on list, sorry if you get 2) > > _______________________________________________ > Ntop-misc mailing list > [email protected] > http://listgateway.unipi.it/mailman/listinfo/ntop-misc --- Keep looking, don't settle - Steve Jobs _______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc _______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc _______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc _______________________________________________ Ntop-misc mailing list [email protected] http://listgateway.unipi.it/mailman/listinfo/ntop-misc
