Just doing some network TX perf. measurement on a Dell 1850 dual Xeon
box and I see that DMA mapping my buffers seems to be incredibly
costly: each mapping taking >8us, some >16us.
I dug into the ddi_dma_addr_bind_handle() operation and it seems to be
hat_getpfnum() that's slow. I used the following bit of D...
[efsys_txq_map_packet takes a packet passed down by the stack and
tries to DMA map all its dblks]
fbt::efsys_txq_map_packet:entry
{
self->in_tx = 1;
}
fbt::rootnex_dma_bindhdl:entry
/self->in_tx == 1/
{
self->in_bind = 1;
}
fbt::rootnex_dma_bindhdl:return
/self->in_bind == 1/
{
self->in_bind = 0;
}
fbt::efsys_txq_map_packet:return
{
self->in_tx = 0;
}
fbt::hat_getpfnum:entry
/self->in_bind == 1/
{
self->ts = timestamp;
}
fbt::hat_getpfnum:return
/self->ts != 0/
{
@time["getpfnum"] = quantize(timestamp - self->ts);
self->ts = 0;
}
... and got the following on a 60 second run of a tight loop (in
kernel) which allocates a single 1500 byte dblk, makes it look like an
ethernet packet and then passes it to my TX code:
getpfnum
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@ 592964
8192 |@@@@@@@@@@@@@@@@@@@ 538248
16384 | 826
32768 | 27
65536 | 0
That seems incredibly slow. Based on these sorts of numbers bcopy()ing
network packets will probably be faster, even with a jumbo MTU.
Paul
--
Paul Durrant
http://www.linkedin.com/in/pdurrant
_______________________________________________
perf-discuss mailing list
[email protected]