> > We are doing pure IPv4 forwarding between two
> Ethernet
> > interfaces:
> >
> > IXIA port A<--->System Under Test<--->IXIA Port B
> >
> > Traffic has two IP destinations for each direction
> and
> > L4 protocol is UDP. There are two static ARP
> entries
> > and only interface routes. Two tests are identical
> > except that we switch from one driver to another.
> >
> > Ethernet ports on the SUT are oversubscribed --
> I'm
> > sending 60% of line rate (of 256-byte packets) and
> > measuring percentage of pass-through traffic which
> > makes to the IXIA port on the other side. Traffic
> is
> > bidirectional and system load is close to 100%.
> >
>
> Could you post the profiles. Hopefully, others have
> good ideas
> as well.
>
> 256 bytes is the size where the copybreak
> optimization kicks in
> so you might want to experiment with the copybreak
> module option
> to the sky2 driver. copybreak=0 would no packets to
> be copied,
> copybreak=1514 would cause all packets to be copied.
> Copying is
> an optimization that helps when receiving small
> packets locally,
> but may slow down forwarding path.
>
Profiles were attached to previous posting in the
thread. I'm pasting them in plain text now at the end.
There are four profiles: two for the vmlinux and two
for sky2 and sk98lin drivers.
Regarding copybreak parameter: it appears that it
kicks in starting from 128 bytes by default???
...
static int copybreak __read_mostly = 128;
module_param(copybreak, int, 0);
MODULE_PARM_DESC(copybreak, "Receive copy threshold");
...
Anyway, I tried both copybreak settings of 0 and 1500:
there is significant slowdown when copybreak is set to
1500 with 256-byte traffic. Another clarification:
256-byte packets refer to entire Ethernet frame
including FCS, so when packets make into the driver
they become 252-byte long. I also tried to switch
driver to IRQ mode from MSI (SK98LIN is running is IRQ
mode) -- that did not have any significant effect on
forwarding performance.
Oprofile results:
profile for vmlinux 2.6.21.3 running with sk98lin
driver:
CPU: PIII, speed 2000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is
not halted) with a unit mask of 0x00 (No unit mask)
count 10
samples %symbol name
1626 14.3222 _raw_spin_trylock
935 8.2357 dev_hard_start_xmit
756 6.6590 sub_preempt_count
574 5.0559 __alloc_skb
507 4.4658 _raw_spin_unlock
462 4.0694 add_preempt_count
452 3.9813 dev_queue_xmit
432 3.8052 ip_output
416 3.6642 ip_rcv
406 3.5761 preempt_schedule
380 3.3471 netif_receive_skb
364 3.2062 __qdisc_run
283 2.4927 skb_release_data
274 2.4135 debug_smp_processor_id
265 2.3342 kfree
219 1.9290 kmem_cache_free
211 1.8585 __kmalloc
181 1.5943 ip_route_input
177 1.5591 pfifo_fast_dequeue
164 1.4446 ip_forward
150 1.3212 kmem_cache_alloc
141 1.2420 __kfree_skb
128 1.1275 ide_insw
121 1.0658 rt_hash_code
100 0.8808 pfifo_fast_requeue
960.8456 nf_iterate
940.8280 pfifo_fast_enqueue
910.8016 eth_type_trans
800.7047 nf_hook_slow
780.6870 cache_alloc_refill
720.6342 dev_kfree_skb_any
680.5990 local_bh_enable
580.5109 kfree_skb
580.5109 kfree_skbmem
520.4580 free_block
490.4316 selinux_ipv4_postroute_last
480.4228 delay_tsc
380.3347 page_fault
360.3171 kunmap_atomic
330.2907 memcpy
270.2378 __handle_mm_fault
270.2378 __netif_schedule
270.2378 cache_flusharray
260.2290 do_wp_page
250.2202 net_rx_action
210.1850 __d_lookup
160.1409 __copy_to_user_ll
160.1409 unmap_vmas
150.1321 default_idle
150.1321 kmap_atomic
140.1233 get_page_from_freelist
120.1057 __link_path_walk
120.1057 flush_tlb_mm
120.1057 strnlen_user
110.0969 avc_has_perm_noaudit
110.0969 do_page_fault
110.0969 sysenter_past_esp
100.0881 inode_has_perm
100.0881 net_tx_action
100.0881 selinux_inode_permission
9 0.0793 __might_sleep
9 0.0793 filemap_nopage
8 0.0705 cache_reap
8 0.0705 find_get_page
8 0.0705 find_vma
8 0.0705 local_bh_disable
7 0.0617 _atomic_dec_and_lock
6 0.0528 __copy_from_user_ll
6 0.0528 do_lookup
6 0.0528 do_timer
6 0.0528 free_hot_cold_page
6 0.0528 hrtimer_run_queues
6 0.0528 run_rebalance_domains
5 0.0440 apic_timer_interrupt
5 0.0440 error_code
5 0.0440 find_busiest_group
5 0.0440 task_rq_lock
4 0.0352 __do_softirq
4 0.0352 _spin_lock_irq
4 0.0352 copy_page_range
4 0.0352 do_mmap_pgoff
4