Bosko Milekic writes: > > > > I'm a bit worried about other devices.. Tradidtionally, mbufs have > > never crossed page boundaries so most drivers never bother to check > > for a transmit mbuf crossing a page boundary. Using physically > > discontigous mbufs could lead to a lot of subtle data corruption. > > I assume here that when you say "mbuf" you mean "jumbo buffer attached > to an mbuf." In that case, yeah, all that we need to make sure of is
Yes. > that the driver knows that it's dealing with non-physically-contiguous > pages. For what concerns regular 2K mbuf clusters as well as the 256 > byte mbufs themselves, they never cross page boundaries so this should > not be a problem for those drivers that do not use jumbo clusters. But it would be problematic if we used the 10K jumbo cluster in so_send() like I'd like to. Ah, the pains of legacy code... :-( I'm also having a little trouble convincing myself that page-crossing jumbo clusters would be safe in all scenarios. I suppose if you were to make the transmit logic in all drivers which supported jumbo frames clueful, then using them for receives would be safe. > > One question. I've observed some really anomolous behaviour under > > -stable with my Myricom GM driver (2Gb/s + 2Gb/s link speed, Dual 1GHz > > pIII). When I use 4K mbufs for receives, the best speed I see is > > about 1300Mb/sec. However, if I use private 9K physically contiguous > > buffers I see 1850Mb/sec (iperf TCP). > > > > The obvious conclusion is that there's a lot of overhead in setting up > > the DMA engines, but that's not the case; we have a fairly quick chain > > dma engine. I've provided a "control" by breaking my contiguous > > buffers down into 4K chunks so that I do the same number of DMAs in > > both cases and I still see ~1850 Mb/sec for the 9K buffers. > > > > A coworker suggested that the problem was that when doing copyouts to > > userspace, the PIII was doing speculative reads and loading the cache > > with the next page. However, we then start copying from a totally > > different address using discontigous buffers, so we effectively take > > 2x the number of cache misses we'd need to. Does that sound > > reasonable to you? I need to try malloc'ing virtually contigous and > > physically discontigous buffers & see if I get the same (good) > > performance... > > I believe that the Intel chips do "virtual page caching" and that the > logic that does the virtual -> physical address translation sits between > the L2 cache and RAM. If that is indeed the case, then your idea of > testing with virtually contiguous pages is a good one. > Unfortunately, I don't know if the PIII is doing speculative > cache-loads, but it could very well be the case. If it is and if in > fact the chip does caching based on virtual addresses, then providing it > with virtually contiguous address space may yield better results. If > you try this, please let me know. I'm extremely interested in seeing > the results! Thanks for your input. I'll post the results when I get to it. I'm working on an AIX driver right now and I need to finish that before I have any playtime.. (AIX is utterly bizzare; pagable kernel, misleading docs, etc, etc) Thanks again, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message