Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Luis Bolinches Thu, 11 Jun 2020 01:02:01 -0700

On that RAID 6 what is the logical RAID block size? 128K, 256K, other?

--
Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations / Salutacions
Luis Bolinches

Consultant IT Specialist

IBM Spectrum Scale development

ESS & client adoption teams

Mobile Phone: +358503112585

https://www.youracclaim.com/user/luis-bolinches

Ab IBM Finland Oy

Laajalahdentie 23

00330 Helsinki

Uusimaa - Finland

"If you always give you will always have" -- Anonymous

----- Original message -----
From: Giovanni Bracco <giovanni.bra...@enea.it>
Sent by: gpfsug-discuss-boun...@spectrumscale.org
To: Jan-Frode Myklebust <janfr...@tanso.net>, gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Cc: Agostino Funel <agostino.fu...@enea.it>
Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN
Date: Thu, Jun 11, 2020 10:53

Comments and updates in the text:

On 05/06/20 19:02, Jan-Frode Myklebust wrote:
> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco
> <giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>:
>
> answer in the text
>
> On 05/06/20 14:58, Jan-Frode Myklebust wrote:
> >
> > Could maybe be interesting to drop the NSD servers, and let all
> nodes
> > access the storage via srp ?
>
> no we can not: the production clusters fabric is a mix of a QDR based
> cluster and a OPA based cluster and NSD nodes provide the service to
> both.
>
>
> You could potentially still do SRP from QDR nodes, and via NSD for your
> omnipath nodes. Going via NSD seems like a bit pointless indirection.

not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share
the same data lake in Spectrum Scale/GPFS so the NSD servers support the
flexibility of the setup.

NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at
the moment 3 different generations of DDN storages are connected,
9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less
expensive storage, to be used when performance is not the first priority.

>
>
>
> >
> > Maybe turn off readahead, since it can cause performance degradation
> > when GPFS reads 1 MB blocks scattered on the NSDs, so that
> read-ahead
> > always reads too much. This might be the cause of the slow read
> seen —
> > maybe you’ll also overflow it if reading from both NSD-servers at
> the
> > same time?
>
> I have switched the readahead off and this produced a small (~10%)
> increase of performances when reading from a NSD server, but no change
> in the bad behaviour for the GPFS clients
>
>
> >
> >
> > Plus.. it’s always nice to give a bit more pagepool to hhe
> clients than
> > the default.. I would prefer to start with 4 GB.
>
> we'll do also that and we'll let you know!
>
>
> Could you show your mmlsconfig? Likely you should set maxMBpS to
> indicate what kind of throughput a client can do (affects GPFS
> readahead/writebehind). Would typically also increase workerThreads on
> your NSD servers.

At this moment this is the output of mmlsconfig

# mmlsconfig
Configuration data for cluster GPFSEXP.portici.enea.it:
-------------------------------------------------------
clusterName GPFSEXP.portici.enea.it
clusterId 13274694257874519577
autoload no
dmapiFileHandleSize 32
minReleaseLevel 5.0.4.0
ccrEnabled yes
cipherList AUTHONLY
verbsRdma enable
verbsPorts qib0/1
[cresco-gpfq7,cresco-gpfq8]
verbsPorts qib0/2
[common]
pagepool 4G
adminMode central

File systems in cluster GPFSEXP.portici.enea.it:
------------------------------------------------
/dev/vsd_gexp2
/dev/vsd_gexp3

>
>
> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip size.
> When you write one GPFS block, less than a half RAID stripe is written,
> which means you need to read back some data to calculate new parities.
> I would prefer 4 MB block size, and maybe also change to 8+p+q so that
> one GPFS is a multiple of a full 2 MB stripe.
>
>
> -jf

we have now added another file system based on 2 NSD on RAID6 8+p+q,
keeping the 1MB block size just not to change too many things at the
same time, but no substantial change in very low readout performances,
that are still of the order of 50 MB/s while write performance are 1000MB/s

Any other suggestion is welcomed!

Giovanni

--
Giovanni Bracco
phone +39 351 8804788
E-mail giovanni.bra...@enea.it
WWW http://www.afs.enea.it/bracco
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Reply via email to