On Mon, 2011-01-24 at 11:14 -0500, Bart Van Assche wrote: > On Mon, Jan 24, 2011 at 4:32 PM, Or Gerlitz <ogerl...@voltaire.com> wrote: > > David Dillow wrote: > >>> if we look on the 50% for SAS/1M IOs that you're presenting, can you tell > >>> what made the difference, srp went from sg_tablesize of 255 to 256 so the > >>> upper layers where able to provide 1M as one IO > > > >> This win is from sg_tablesize going from 255 to 256 in this case; the HW > >> really likes that better than getting two requests -- one for 1020 KB > >> and one for 4 KB. > > > > Its always nice to find the simplest explanation to the greatest > improvement... going to the 2nd largest gains > > > >> SAS 2M 520 MB/s 861 MB/s > >> SAS 4M 529 MB/s 921 MB/s > >> SAS 8M 600 MB/s 951 MB/s > > > > I wonder what made the difference here? it can't be only the 255 --> > > 256 sg_tablesize change, for the 2M case > > the change to use 512 pages FMRs could let you use one rkey/fmr for > > the whole IO but not for 4M/8M > > I think it would be interesting to have performance measurements with > a RAM disk as target too because it is hard to tell for someone not > familiar with the internals of the target used in this test which > performance gain is due to the initiator changes and which is due to > the target behavior.
I think it is pretty obvious that the gain is due to the initiator changes allowing us to drive the target the way it likes to be driven, but perhaps I haven't given you enough information. The HW is backed by a RAID6 (really RAID3 + two parity drives). Each 4 KB block is broken into stripes across 8 512 byte sectors, and there is no write combining when the write cache is disabled. So, when we're splitting 1 MB into a 1020 KB and a 4 KB request, that translates into a 127.5 KB and a 512 byte request to each backend storage device. With the patches, that remains a single 128 KB request, or 256KB for 2M, etc. The low level drives can optimize that much better. I did runs against my IOP test harness, and it showed better performance there as well, though that was unexpected -- I figured we'd see a slight decline in IOPS. I have not yet investigated further, but you have the code and are welcome to run tests and report results. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html