Hello again, Now I send you the statistics I have collected in the test I've done. But before, another problem I didn't tell before, because it didn't happen always. But it's quite strange. I have used pktgen in quite a few machines and in all of them, if you say clone_skb=1000 or so, the performance boosts. With the Pentium 4, however, it doesnt do anything. I mean:
clone_skb = 0 --> aprox. 400kpps clone_skb = 100000 --> aprox. 400kpps In the Pentium 3, on the other hand, I can see the performance boost in all its essence (from 100kpps with clone_skb=0 to 400kpps with clone_skb=100000). Why are these results? From what you have already told me, I guess the bottleneck in the Pentium 4 is the PCI bus (33MHz), which cant send faster. As it's seen, the machine has enough time to alloc new skb's before the packets are sent. So the unique perceptible difference between the two would be the idle time. Am I right? Now, let's face the statistics: There are two injectors and a receiving machine connected to a switch. Both injectors are sending at the same time, in order to achieve an aggregated throughput. So, if both send at 400kpps, we wil get 800kpps in the receiver. There will be a slice of time in which just one of them will be running (one will finish its count first), but this time is considerably shorter than the measured time. Global pktgen parameters: pkt_size=60 delay=0 clone_skb=1000000 count = 20000000 Injector A (Pentium 4 / e1000) ------------------------------------------------------------ pktgen.packet_count: 20000000 pktgen.packet_rate (pps): 411767 pktgen.throughput (Mbps): 197 pktgen.total_time (us): 48571152 pktgen.work_time (us): 35601946 pktgen.idle_time (us): 12969206 Injector B (Dual Pentium III / e1000) ------------------------------------------------------------ pktgen.packet_count: 20000000 pktgen.packet_rate (pps): 466700 pktgen.throughput (Mbps): 224 pktgen.total_time (us): 42854078 pktgen.work_time (us): 34770557 pktgen.idle_time (us): 8083521 Receiver (Dual AMD Opteron / tg3) ------------------------------------------------------------ ifstats.uname: Linux bipt176 2.6.13ksensor #11 SMP Mon Dec 26 14:15:52 CET 2005 x86_64 GNU/Linux ifstats.nr_cpus: 2 ifstats.cpu_speed (MHz): 1792.654 ifstats.arch_bits: 64-bit ifstats.if_driver: tg3 ifstats.if_speed (Mbps): 1000Mb/s ifstats.rx_octets: 2.56e+09 ifstats.rx_fragments: 0 ifstats.rx_ucast_packets: 40000003 ifstats.rx_mcast_packets: 0 ifstats.rx_bcast_packets: 2 ifstats.rx_fcs_errors: 0 ifstats.rx_align_errors: 0 ifstats.rx_xon_pause_rcvd: 0 ifstats.rx_xoff_pause_rcvd: 0 ifstats.rx_mac_ctrl_rcvd: 0 ifstats.rx_xoff_entered: 0 ifstats.rx_frame_too_long_errors: 0 ifstats.rx_jabbers: 0 ifstats.rx_undersize_packets: 0 ifstats.rx_in_length_errors: 0 ifstats.rx_out_length_errors: 0 ifstats.rx_64_or_less_octet_packets: 40000005 ifstats.rx_65_to_127_octet_packets: 0 ifstats.rx_128_to_255_octet_packets: 0 ifstats.rx_256_to_511_octet_packets: 0 ifstats.rx_512_to_1023_octet_packets: 0 ifstats.rx_1024_to_1522_octet_packets: 0 ifstats.rx_1523_to_2047_octet_packets: 0 ifstats.rx_2048_to_4095_octet_packets: 0 ifstats.rx_4096_to_8191_octet_packets: 0 ifstats.rx_8192_to_9022_octet_packets: 0 ifstats.tx_octets: 9024 ifstats.tx_collisions: 0 ifstats.tx_xon_sent: 0 ifstats.tx_xoff_sent: 0 ifstats.tx_flow_control: 0 ifstats.tx_mac_errors: 0 ifstats.tx_single_collisions: 0 ifstats.tx_mult_collisions: 0 ifstats.tx_deferred: 0 ifstats.tx_excessive_collisions: 0 ifstats.tx_late_collisions: 0 ifstats.tx_collide_2times: 0 ifstats.tx_collide_3times: 0 ifstats.tx_collide_4times: 0 ifstats.tx_collide_5times: 0 ifstats.tx_collide_6times: 0 ifstats.tx_collide_7times: 0 ifstats.tx_collide_8times: 0 ifstats.tx_collide_9times: 0 ifstats.tx_collide_10times: 0 ifstats.tx_collide_11times: 0 ifstats.tx_collide_12times: 0 ifstats.tx_collide_13times: 0 ifstats.tx_collide_14times: 0 ifstats.tx_collide_15times: 0 ifstats.tx_ucast_packets: 98 ifstats.tx_mcast_packets: 0 ifstats.tx_bcast_packets: 1 ifstats.tx_carrier_sense_errors: 0 ifstats.tx_discards: 0 ifstats.tx_errors: 0 ifstats.dma_writeq_full: 30066024 ifstats.dma_write_prioq_full: 0 ifstats.rxbds_empty: 0 ifstats.rx_discards: 13517210 ifstats.rx_errors: 0 ifstats.rx_threshold_hit: 5057812 ifstats.dma_readq_full: 0 ifstats.dma_read_prioq_full: 0 ifstats.tx_comp_queue_full: 0 ifstats.ring_set_send_prod_index: 99 ifstats.ring_status_update: 5404635 ifstats.nic_irqs: 141204 ifstats.nic_avoided_irqs: 5263431 ifstats.nic_tx_threshold_hit: 0 I hope these stats (combined with the information I provided in the previous email) will let you understand if the receiving machine has got HW_FLOW on or off. Last question: There are two stats of interest, dma_writeq_full and rx_discards (these stats are specific for the tg3 card): ifstats.dma_writeq_full: 30066024 ifstats.rx_discards: 13517210 As far as I can understand, dma_writeq_full means that the card finds the rx_ring full and overwrites a previous packet (so that packet is lost). So how can the rx_discards (packets discarded) counter less than the dma_writeq_full counter? Thank you Regards Aritz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html