On Mon, 2011-01-24 at 11:14 -0500, Bart Van Assche wrote:
> On Mon, Jan 24, 2011 at 4:32 PM, Or Gerlitz <ogerl...@voltaire.com> wrote:
> > David Dillow wrote:
> >>> if we look on the 50% for SAS/1M IOs that you're presenting, can you tell
> >>> what made the difference, srp went from sg_tablesize of 255 to 256 so the
> >>> upper layers where able to provide 1M as one IO
> >
> >> This win is from sg_tablesize going from 255 to 256 in this case; the HW
> >> really likes that better than getting two requests -- one for 1020 KB
> >> and one for 4 KB.
> >
> > Its always nice to find the simplest explanation to the greatest
> improvement... going to the 2nd largest gains
> >
> >> SAS   2M      520 MB/s        861 MB/s
> >> SAS   4M      529 MB/s        921 MB/s
> >> SAS   8M      600 MB/s        951 MB/s
> >
> > I wonder what made the difference here? it can't be only the 255 -->
> > 256 sg_tablesize change, for the 2M case
> > the change to use 512 pages FMRs could let you use one rkey/fmr for
> > the whole IO but not for 4M/8M
> 
> I think it would be interesting to have performance measurements with
> a RAM disk as target too because it is hard to tell for someone not
> familiar with the internals of the target used in this test which
> performance gain is due to the initiator changes and which is due to
> the target behavior.

I think it is pretty obvious that the gain is due to the initiator
changes allowing us to drive the target the way it likes to be driven,
but perhaps I haven't given you enough information. The HW is backed by
a RAID6 (really RAID3 + two parity drives). Each 4 KB block is broken
into stripes across 8 512 byte sectors, and there is no write combining
when the write cache is disabled. So, when we're splitting 1 MB into a
1020 KB and a 4 KB request, that translates into a 127.5 KB and a 512
byte request to each backend storage device. With the patches, that
remains a single 128 KB request, or 256KB for 2M, etc. The low level
drives can optimize that much better.

I did runs against my IOP test harness, and it showed better performance
there as well, though that was unexpected -- I figured we'd see a slight
decline in IOPS. I have not yet investigated further, but you have the
code and are welcome to run tests and report results.
-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to