Andi Kleen writes:

 > > If not, you loose.
 > 
 > It all depends on if the higher levels on the trie are small
 > enough to be kept in cache. Even with two cache misses it might
 > still break even, but have better scalability.

 Yes the trick to keep root large to allow a very flat tree and few 
 cache misses. Stefan Nilsson (author of LC-trie) and were able to 
 improve the the LC-trie quite a bit we called this trie+hash ->trash

 The paper discusses trie/hash... (you've seen it)

 http://www.nada.kth.se/~snilsson/public/papers/trash/

 > Another advantage would be to eliminate the need for large memory
 > blocks, which cause problems too e.g. on NUMA. It certainly would
 > save quite some memory if the tree levels are allocated on demand
 > only. However breaking it up might also cost more TLB misses, 
 > but those could be eliminated by preallocating the tree in
 > the same way as the hash today. Don't know if it's needed or not.
 > 
 > I guess someone needs to code it up and try it.
 
 I've implemented trie/trash as replacement for the dst hash to full
 key lookup for ipv4 (unicache) to start with. And is still is focusing 
 on the nasty parts, packet forwarding, as we don't want break this.... 
 So the benfits of full flow lookup is not accounted. I.e the full flow
 lookup could give socket at no cost and do some conntrack support
 like Evgeniy did in netchannels pathes.

 Below, some recent comparisions and profiles for the packet forwardning
 
Input 2 * 65k concurrent flows eth0->eth1, eth2->eth3 in forwarding
On separate CPU Opteron 2218 (2.6 GHZ)  net-2.6.21 git. Numbers are very 
approximative but should still be representative. Profiles are 
collected. 

Performance comparison
----------------------
Table below holds: dst-entries in use, lookup hits, slow path and total pps

Flowlen 40

250k 1020 + 21 = 1041 pps  Vanilla rt_hash=32k
1M    950 + 29 =  979 pps  Vanilla rt_hash=131k
260k  980 + 24 = 1004 pps  Unicache

Flowlen 4 (rdos)

290k 560 + 162 = 722 kpps  Vanilla  rt_hash=32k
1M   400 + 165 = 565 kpps  Vanilla  rt_hash=131k
230k 570 + 170 = 740 kpps  Unicache 


unicache flen=4 pkts
--------------------
c02df84f 5257     7.72078     tkey_extract_bits
c023151a 5230     7.68112     e1000_clean_rx_irq
c02df908 3306     4.85541     tkey_equals
c014cf31 3296     4.84072     kfree
c02f8c3b 3067     4.5044      ip_route_input
c02fbdf0 2948     4.32963     ip_forward
c023024e 2809     4.12548     e1000_xmit_frame
c02e06f1 2792     4.10052     trie_lookup
c02fd764 2159     3.17085     ip_output
c032591c 1965     2.88593     fn_trie_lookup
c014cd82 1456     2.13838     kmem_cache_alloc
c02fa941 1337     1.96361     ip_rcv
c014ced0 1334     1.9592      kmem_cache_free
c02e1538 1289     1.89311     unicache_tcp_establish
c02e2d70 1218     1.78884     dev_queue_xmit
c02e31af 1074     1.57735     netif_receive_skb
c02f8484 1053     1.54651     ip_route_input_slow
c02db552 987      1.44957     __alloc_skb
c02e626f 913      1.34089     dst_alloc
c02edaad 828      1.21606     __qdisc_run
c0321ccf 810      1.18962     fib_get_table
c02e14c1 782      1.1485      match_pktgen
c02e6375 766      1.125       dst_destroy
c02e10e8 728      1.06919     unicache_hash_code
c0231242 647      0.950227    e1000_clean_tx_irq
c02f7d23 625      0.917916    ipv4_dst_destroy

unicache flen=40 pkts
---------------------
c023151a 6742     10.3704     e1000_clean_rx_irq
c02df908 4553     7.00332     tkey_equals
c02fbdf0 4455     6.85258     ip_forward
c02e06f1 4067     6.25577     trie_lookup
c02f8c3b 3951     6.07734     ip_route_input
c02df84f 3929     6.0435      tkey_extract_bits
c023024e 3538     5.44207     e1000_xmit_frame
c014cf31 3152     4.84834     kfree
c02fd764 2711     4.17        ip_output
c02e1538 1930     2.96868     unicache_tcp_establish
c02fa941 1696     2.60875     ip_rcv
c02e31af 1466     2.25497     netif_receive_skb
c02e2d70 1412     2.17191     dev_queue_xmit
c014cd82 1397     2.14883     kmem_cache_alloc
c02db552 1394     2.14422     __alloc_skb
c02edaad 1032     1.5874      __qdisc_run
c02ed6b8 957      1.47204     eth_header
c02e15dd 904      1.39051     unicache_garbage_collect_active
c02db94e 861      1.32437     kfree_skb
c0231242 794      1.22131     e1000_clean_tx_irq
c022fd58 778      1.1967      e1000_tx_map
c014ce73 756      1.16286     __kmalloc
c014ced0 740      1.13825     kmem_cache_free
c02e14c1 701      1.07826     match_pktgen
c023002c 621      0.955208    e1000_tx_queue
c02e78fa 519      0.798314    neigh_resolve_output

Vanilla w. flen=4 pkts rt_hash=32k
----------------------------------
c02f6852 15704    22.9102     ip_route_input
c023151a 5324     7.76705     e1000_clean_rx_irq
c02f84a1 4457     6.5022      ip_rcv
c02f9950 3065     4.47145     ip_forward
c023024e 2630     3.83684     e1000_xmit_frame
c0323380 2343     3.41814     fn_trie_lookup
c02fb2c4 2181     3.1818      ip_output
c02f4a3b 1839     2.68287     rt_intern_hash
c02f4480 1762     2.57054     rt_may_expire
c02f6046 1697     2.47571     ip_route_input_slow
c014ced0 1615     2.35608     kmem_cache_free
c014cf31 1522     2.22041     kfree
c02e0bcb 1321     1.92717     netif_receive_skb
c02e078c 1251     1.82505     dev_queue_xmit
c02e3c8b 1089     1.58871     dst_alloc
c02eb0d4 1010     1.47346     eth_header
c02e3d91 973      1.41948     dst_destroy
c02db552 972      1.41803     __alloc_skb
c02eb4c9 883      1.28819     __qdisc_run
c031f733 798      1.16418     fib_get_table
c01c0fa2 749      1.0927      memcmp
c02f59a5 703      1.02559     ipv4_dst_destroy
c014cd82 673      0.981822    kmem_cache_alloc
c0324fb1 670      0.977446    fib_lookup
c02db94e 670      0.977446    kfree_skb
c022fd58 631      0.92055     e1000_tx_map

Vanilla w. flen=40 pkts rt_hash=32k
-----------------------------------
c02f6852 17445    25.507      ip_route_input
c023151a 7657     11.1956     e1000_clean_rx_irq
c02f84a1 7004     10.2408     ip_rcv
c02f9950 4851     7.09283     ip_forward
c023024e 3874     5.66432     e1000_xmit_frame
c02fb2c4 2905     4.24751     ip_output
c014cf31 2065     3.01931     kfree
c02e078c 1874     2.74005     dev_queue_xmit
c02e0bcb 1712     2.50318     netif_receive_skb
c02db552 1352     1.97681     __alloc_skb
c02eb4c9 1204     1.76041     __qdisc_run
c02eb0d4 980      1.4329      eth_header
c02db94e 846      1.23697     kfree_skb
c022fd58 833      1.21796     e1000_tx_map
c014ced0 819      1.19749     kmem_cache_free
c014cd82 806      1.17848     kmem_cache_alloc
c014ce73 761      1.11269     __kmalloc
c0231242 717      1.04835     e1000_clean_tx_irq
c023002c 595      0.869972    e1000_tx_queue
c02e5316 563      0.823184    neigh_resolve_output
c02eb221 498      0.728145    eth_type_trans
c0231ea2 471      0.688667    e1000_alloc_rx_buffers
c0323380 459      0.671121    fn_trie_lookup
c02ef8fd 456      0.666735    qdisc_dequeue_head
c02e06dd 422      0.617022    dev_hard_start_xmit
c02f4074 404      0.590704    rt_hash_code

Vanilla w. flen=4 pkts rt_hash=131k
------------------------------------
c02f6852 9679     17.3758     ip_route_input
c023151a 3895     6.99232     e1000_clean_rx_irq
c02f84a1 2858     5.13069     ip_rcv
c0323380 2507     4.50057     fn_trie_lookup
c023024e 2233     4.00869     e1000_xmit_frame
c02f4480 2075     3.72505     rt_may_expire
c02f9950 1966     3.52937     ip_forward
c014ced0 1659     2.97824     kmem_cache_free
c02f4a3b 1654     2.96927     rt_intern_hash
c02f6046 1523     2.73409     ip_route_input_slow
c02fb2c4 1500     2.6928      ip_output
c014cf31 1375     2.4684      kfree
c02e078c 1098     1.97113     dev_queue_xmit
c02e3c8b 1081     1.94061     dst_alloc
c02e3d91 947      1.70006     dst_destroy
c014cc12 890      1.59773     free_block
c02e0bcb 874      1.56901     netif_receive_skb
c02eb4c9 861      1.54567     __qdisc_run
c02db552 777      1.39487     __alloc_skb
c02f59a5 705      1.26562     ipv4_dst_destroy
c031f733 689      1.2369      fib_get_table
c02ef8fd 669      1.20099     qdisc_dequeue_head
c01c0fa2 665      1.19381     memcmp
c0231242 642      1.15252     e1000_clean_tx_irq
c0324fb1 636      1.14175     fib_lookup
c014c8f1 620      1.11303     slab_put_obj

Vanilla w. flen=40 pkts rt_hash=131k
------------------------------------
c02f6852 9647     16.9913     ip_route_input
c023151a 6879     12.116      e1000_clean_rx_irq
c02f84a1 6102     10.7475     ip_rcv
c02f9950 4248     7.48203     ip_forward
c023024e 3625     6.38474     e1000_xmit_frame
c02fb2c4 2501     4.40503     ip_output
c014cf31 2349     4.13731     kfree
c02e078c 1714     3.01888     dev_queue_xmit
c02e0bcb 1461     2.57327     netif_receive_skb
c02eb4c9 1286     2.26504     __qdisc_run
c02db552 1239     2.18226     __alloc_skb
c0231242 1034     1.82119     e1000_clean_tx_irq
c02ef8fd 936      1.64858     qdisc_dequeue_head
c014ced0 888      1.56404     kmem_cache_free
c014cd82 877      1.54467     kmem_cache_alloc
c022fd58 828      1.45836     e1000_tx_map
c014ce73 789      1.38967     __kmalloc
c02db94e 741      1.30513     kfree_skb
c023002c 622      1.09553     e1000_tx_queue
c02eb221 524      0.922925    eth_type_trans
c0323380 473      0.833098    fn_trie_lookup
c02e039b 449      0.790827    dev_kfree_skb_any
c0231ea2 442      0.778498    e1000_alloc_rx_buffers
c02e06dd 420      0.739749    dev_hard_start_xmit
c0230227 391      0.688671    e1000_maybe_stop_tx
c02f4074 336      0.591799    rt_hash_code


Some comments:

To some extent we compare apples and bananas but still interesting 
uniccahe does the full key lookup and we see the lookup cost itself 
is not expensive (trie_lookup) but we see  key handling costs a bit. 
tkey_extract_bits is when me make the key. The corresponding 
for the hash would be when we create the actual hash.

tkey_equals is to verify the lookup also this we do with hash
maybe it should be possible i case of routing to optimize a bit.
Only if packet is for localhost we would need to do the full
tkey_equals - this could possible save some cache misses.


Cheers.
                                        --ro


 
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to