On Wed, 2010-06-09 at 12:32 +0200, Hannes Reinecke wrote:
> Nicholas A. Bellinger wrote:
> > Hi Hannes,
> >
> > I applied your changes and everything looks good with the exception of
> > the new MEGASAS_DEFAULT_SGE=80 setting..
> >
> >> diff --git a/hw/megasas.c b/hw/megasas.c
> >> index 250c3fb..19569a8 100644
> >> --- a/hw/megasas.c
> >> +++ b/hw/megasas.c
> >> @@ -40,38 +40,17 @@ do { fprintf(stderr, "megasas: error: " fmt , ##
> >> __VA_ARGS__);} while (0)
> >> #endif
> >>
> >> /* Static definitions */
> >> -#define MEGASAS_MAX_FRAMES 1000
> >> -#define MEGASAS_MAX_SGE 8
> >
> >
> >
> >> +#define MEGASAS_VERSION "1.01"
> >> +#define MEGASAS_MAX_FRAMES 2048 /* Firmware limit at 65535 */
> >> +#define MEGASAS_DEFAULT_FRAMES 1000 /* Windows requires this */
> >> +#define MEGASAS_MAX_SGE 255 /* Firmware limit */
> >> +#define MEGASAS_DEFAULT_SGE 80
> >
> > Ok, I have been running some LTP disktest raw bandwith benchmarks with a
> > 256K blocksize with megasas -> TCM_Loop -> TCM/RAMDISK_DR LUNs into a
> > v2.6.26 x86_64 Linux guest (4 VCPUs and 2048 memory) and I noticed
> > something interesting..
> >
> > With the new MEGASAS_DEFAULT_SGE 80 setting for fw_sge, read/write tests
> > have dropped from the original ~1050 MB/sec to roughly ~400 MB/sec.
> > Passing in the new qdev option using the old default of max_sge=8 the
> > speed jumps back up to the range that where previously observed w/o this
> > patch. Going a bit further, using max_sge=16 jumps up bandwith up to
> > ~1600 MB/sec, and max_sge=24 takes it up to ~2200 MB/sec..! Using
> > max_sge=32 then sharply drops back to ~800 MB/sec, and increasing to
> > larger values brings bandwith down lower and lower..
> >
> > Taking a look at the megaraid_sas LLD in the KVM guest, the struct
> > scsi_host is being registered with sg_tablesize=28 which appears to be
> > where the sharp dropoff for max_sge > 28 begins to occur. I see that
> > MFI_DCMD_CTRL_GET_INFO is returning the configured fw_sge to the guest,
> > but AFAICT megaraid_sas does not adjust itself to use the larger value
> > reported by GET_INFO.
> >
> Thanks for confirmation. You just confirmed _why_ I made
> the SGE setting configurable.
>
> The SGE default setting as found on 'real' HBAs is in fact 80,
> hence this value.
> However, I always suspected that we will have problems with
> direct SGL mapping if the settings from the underlying hardware
> and the emulation don't match.
> Which was the reason for the LSF discussion topic, if you remember :-)
> So thanks for the confirmation here.
Indeed, I was looking at best case large block bandwith with
TCM/RAMDISK_DR and zero-copy struct scatterlist mapping with the
can_queue and max_sectors using 1024. Having a TCM IBLOCK/FILEIO/pSCSI
backstore for a real backend struct block_device is going to have a
certain overhead compared to raw struct page ramdisk, but I think the
RAMDISK_DR subsystem plugin gives us a good idea of where we are at with
TCM_Loop struct scsi_devices.. ;)
>
> Hence I made the SGE setting configurable, so that it can be
> adjusted (manually for starters) to the underlying hardware.
> If you do a:
>
> -device megasas,id=megasas,max_sge=28,mode=jbod
>
> you have the desired behaviour.
Perfect.. I will check out mode=jbod as well..
>
> Currently we cannot do this tuning automatically; we just have
> _one_ setting for the entire HBA emulation whereas the underlying
> disks connected to the megasas might have different settings.
>
> Again, the proper handling here should be discussed on the LSF.
>
> > So that said, I think we want to use MEGASAS_DEFAULT_SGE 28 to match
> > what the Linux driver is using. I have not checked what the equivlient
> > sg_tablesize for the MSFT LLD is doing, but it appears we need to error
> > on the conserative side here. What do you think..?
> >
> As said, this is _not_ what linux is using. This is what you particular
> HBA is using. On one of my machines I have:
>
> cat /sys/class/scsi_host/host?/sg_tablesize
> 128
> 128
> 128
> 64
> 64
> 128
> 128
>
> So maybe you should consider updating your HBA ...
>
Yes, my mistake. megaraid_sas is actually querying for it's struct
scsi_host->sg_tablesize..
> I would advocate setting it to the real HBA setting of
> 80 (which works just find for file-based backends)
> and have it adjusted manually if an sg-based backend
> is used.
>
Hmm, then it appears that there is a known bottleneck somewhere in the
v2.6.26 Linux guest stack or perhaps somewhere else or something with
SG_IO..?
I am still using include/scsi/sg.h:SG_MAX_QUEUE 128, but I am not sure
if this would be effectted by the larger max_sge too..? I am also
wondering if the conversion to use BSG here will have an effect with the
larger max_sge values..?
Best,
--nab
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org