Re: [PATCH] megasas: Update to version 1.01

2010-06-09 Thread Nicholas A. Bellinger
On Wed, 2010-06-09 at 12:32 +0200, Hannes Reinecke wrote:
> Nicholas A. Bellinger wrote:
> > Hi Hannes,
> > 
> > I applied your changes and everything looks good with the exception of
> > the new MEGASAS_DEFAULT_SGE=80 setting..
> > 
> >> diff --git a/hw/megasas.c b/hw/megasas.c
> >> index 250c3fb..19569a8 100644
> >> --- a/hw/megasas.c
> >> +++ b/hw/megasas.c
> >> @@ -40,38 +40,17 @@ do { fprintf(stderr, "megasas: error: " fmt , ## 
> >> __VA_ARGS__);} while (0)
> >>  #endif
> >>  
> >>  /* Static definitions */
> >> -#define MEGASAS_MAX_FRAMES 1000
> >> -#define MEGASAS_MAX_SGE 8
> > 
> > 
> > 
> >> +#define MEGASAS_VERSION "1.01"
> >> +#define MEGASAS_MAX_FRAMES 2048   /* Firmware limit at 65535 */
> >> +#define MEGASAS_DEFAULT_FRAMES 1000   /* Windows requires this */
> >> +#define MEGASAS_MAX_SGE 255   /* Firmware limit */
> >> +#define MEGASAS_DEFAULT_SGE 80
> > 
> > Ok, I have been running some LTP disktest raw bandwith benchmarks with a
> > 256K blocksize with megasas -> TCM_Loop -> TCM/RAMDISK_DR LUNs into a
> > v2.6.26 x86_64 Linux guest (4 VCPUs and 2048 memory) and I noticed
> > something interesting..
> > 
> > With the new MEGASAS_DEFAULT_SGE 80 setting for fw_sge, read/write tests
> > have dropped from the original ~1050 MB/sec to roughly ~400 MB/sec.
> > Passing in the new qdev option using the old default of max_sge=8 the
> > speed jumps back up to the range that where previously observed w/o this
> > patch.  Going a bit further, using max_sge=16 jumps up bandwith up to
> > ~1600 MB/sec, and max_sge=24 takes it up to ~2200 MB/sec..!  Using
> > max_sge=32 then sharply drops back to ~800 MB/sec, and increasing to
> > larger values brings bandwith down lower and lower..
> > 
> > Taking a look at the megaraid_sas LLD in the KVM guest, the struct
> > scsi_host is being registered with sg_tablesize=28 which appears to be
> > where the sharp dropoff for max_sge > 28 begins to occur.  I see that
> > MFI_DCMD_CTRL_GET_INFO is returning the configured fw_sge to the guest,
> > but AFAICT megaraid_sas does not adjust itself to use the larger value
> > reported by GET_INFO.
> > 
> Thanks for confirmation. You just confirmed _why_ I made
> the SGE setting configurable.
> 
> The SGE default setting as found on 'real' HBAs is in fact 80,
> hence this value.
> However, I always suspected that we will have problems with
> direct SGL mapping if the settings from the underlying hardware
> and the emulation don't match.
> Which was the reason for the LSF discussion topic, if you remember :-)
> So thanks for the confirmation here.

Indeed, I was looking at best case large block bandwith with
TCM/RAMDISK_DR and zero-copy struct scatterlist mapping with the
can_queue and max_sectors using 1024.   Having a TCM IBLOCK/FILEIO/pSCSI
backstore for a real backend struct block_device is going to have a
certain overhead compared to raw struct page ramdisk, but I think the
RAMDISK_DR subsystem plugin gives us a good idea of where we are at with
TCM_Loop struct scsi_devices..  ;)

> 
> Hence I made the SGE setting configurable, so that it can be
> adjusted (manually for starters) to the underlying hardware.
> If you do a:
> 
> -device megasas,id=megasas,max_sge=28,mode=jbod
> 
> you have the desired behaviour.

Perfect.. I will check out mode=jbod as well..

> 
> Currently we cannot do this tuning automatically; we just have
> _one_ setting for the entire HBA emulation whereas the underlying
> disks connected to the megasas might have different settings.
> 
> Again, the proper handling here should be discussed on the LSF.
> 



> > So that said, I think we want to use MEGASAS_DEFAULT_SGE 28 to match
> > what the Linux driver is using.  I have not checked what the equivlient
> > sg_tablesize for the MSFT LLD is doing, but it appears we need to error
> > on the conserative side here.  What do you think..?
> > 
> As said, this is _not_ what linux is using. This is what you particular
> HBA is using. On one of my machines I have:
> 
> cat /sys/class/scsi_host/host?/sg_tablesize 
> 128
> 128
> 128
> 64
> 64
> 128
> 128
> 
> So maybe you should consider updating your HBA ...
> 

Yes, my mistake.  megaraid_sas is actually querying for it's struct
scsi_host->sg_tablesize..

> I would advocate setting it to the real HBA setting of
> 80 (which works just find for file-based backends)
> and have it adjusted manually if an sg-based backend
> is used.
> 

Hmm, then it appears that there is a known bottleneck somewhere in the
v2.6.26 Linux guest stack or perhaps somewhere else or something with
SG_IO..?

I am still using include/scsi/sg.h:SG_MAX_QUEUE 128, but I am not sure
if this would be effectted by the larger max_sge too..?  I am also
wondering if the conversion to use BSG here will have an effect with the
larger max_sge values..?

Best,

--nab

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org

Re: [PATCH] megasas: Update to version 1.01

2010-06-09 Thread Nicholas A. Bellinger
On Wed, 2010-06-09 at 03:14 -0700, Nicholas A. Bellinger wrote:
> On Tue, 2010-06-08 at 16:15 +0200, Hannes Reinecke wrote:
> > This patch updates the megasas HBA emulation to version 1.01.
> > It fixes the following issues:
> > 
> > - Remove hand-crafted inquiry command
> > - Remove bounce-buffer for direct commands
> > - Implements qdev properties to set 'max_sge', 'max_cmds'.
> > - Implement JBOD mode
> > - Improve direct command handling
> > - Minor cleanups
> > 
> > Signed-off-by: Hannes Reinecke 
> > 
> 
> Hi Hannes,
> 
> I applied your changes and everything looks good with the exception of
> the new MEGASAS_DEFAULT_SGE=80 setting..
> 
> > diff --git a/hw/megasas.c b/hw/megasas.c
> > index 250c3fb..19569a8 100644
> > --- a/hw/megasas.c
> > +++ b/hw/megasas.c
> > @@ -40,38 +40,17 @@ do { fprintf(stderr, "megasas: error: " fmt , ## 
> > __VA_ARGS__);} while (0)
> >  #endif
> >  
> >  /* Static definitions */
> > -#define MEGASAS_MAX_FRAMES 1000
> > -#define MEGASAS_MAX_SGE 8
> 
> 
> 
> > +#define MEGASAS_VERSION "1.01"
> > +#define MEGASAS_MAX_FRAMES 2048/* Firmware limit at 65535 */
> > +#define MEGASAS_DEFAULT_FRAMES 1000/* Windows requires this */
> > +#define MEGASAS_MAX_SGE 255/* Firmware limit */
> > +#define MEGASAS_DEFAULT_SGE 80
> 
> Ok, I have been running some LTP disktest raw bandwith benchmarks with a
> 256K blocksize with megasas -> TCM_Loop -> TCM/RAMDISK_DR LUNs into a
> v2.6.26 x86_64 Linux guest (4 VCPUs and 2048 memory) and I noticed
> something interesting..
> 
> With the new MEGASAS_DEFAULT_SGE 80 setting for fw_sge, read/write tests
> have dropped from the original ~1050 MB/sec to roughly ~400 MB/sec.
> Passing in the new qdev option using the old default of max_sge=8 the
> speed jumps back up to the range that where previously observed w/o this
> patch.  Going a bit further, using max_sge=16 jumps up bandwith up to
> ~1600 MB/sec, and max_sge=24 takes it up to ~2200 MB/sec..!  Using
> max_sge=32 then sharply drops back to ~800 MB/sec, and increasing to
> larger values brings bandwith down lower and lower..
> 
> Taking a look at the megaraid_sas LLD in the KVM guest, the struct
> scsi_host is being registered with sg_tablesize=28 which appears to be
> where the sharp dropoff for max_sge > 28 begins to occur.  I see that
> MFI_DCMD_CTRL_GET_INFO is returning the configured fw_sge to the guest,
> but AFAICT megaraid_sas does not adjust itself to use the larger value
> reported by GET_INFO.
> 

Actually scratch that last part about the megaraid_sas LLD..

drivers/scsi/megaraid/megaraid_sas.c:megasas_io_attach() is setting the
MFI GET_INFO returned fw_sge value to struct scsi_host->sg_tablesize
before calling scsi_add_host(), but there still appears to be an
performance issue somewhere when using max_sge > 28..

Note for TCM_Loop LLD running on KVM host we are currently using a
default of sg_tablesize=256.  Running the same LTP disktest benchmark
with TCM/RAMDISK LUNs as above for a Linux guest on a baremetal KVM host
(5500 series Nehalem) with v2.6.34 I am now seeing ~12900 MB/sec (yes,
100 Gbit/sec) of sustained read/write bandwith into the same single
Linux/SCSI LUN.  So AFAICT the limiting factor with the larger megasas
max_sge mentioned above appears to be outside of any bottlenecks that
may exist in Linux/SCSI or TCM_Loop+SG_IO backstore used with megasas.

Best,

--nab


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html