8 data * 256K does not align to your 1MB Raid 6 is already not the best option for writes. I would look into use multiples of 2MB block sizes.
-- Cheers > On 11. Jun 2020, at 17.07, Giovanni Bracco <giovanni.bra...@enea.it> wrote: > > 256K > > Giovanni > >> On 11/06/20 10:01, Luis Bolinches wrote: >> On that RAID 6 what is the logical RAID block size? 128K, 256K, other? >> -- >> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations >> / Salutacions >> Luis Bolinches >> Consultant IT Specialist >> IBM Spectrum Scale development >> ESS & client adoption teams >> Mobile Phone: +358503112585 >> *https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youracclaim.com_user_luis-2Dbolinches-2A&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=0sBbPyJrNuU4BjRb4Cv2f8Z0ot7MiVpqshdkyAHqiuE&e= >> Ab IBM Finland Oy >> Laajalahdentie 23 >> 00330 Helsinki >> Uusimaa - Finland >> >> *"If you always give you will always have" -- Anonymous* >> >> ----- Original message ----- >> From: Giovanni Bracco <giovanni.bra...@enea.it> >> Sent by: gpfsug-discuss-boun...@spectrumscale.org >> To: Jan-Frode Myklebust <janfr...@tanso.net>, gpfsug main discussion >> list <gpfsug-discuss@spectrumscale.org> >> Cc: Agostino Funel <agostino.fu...@enea.it> >> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance >> in simple spectrum scale/gpfs cluster with a storage-server SAN >> Date: Thu, Jun 11, 2020 10:53 >> Comments and updates in the text: >> >>> On 05/06/20 19:02, Jan-Frode Myklebust wrote: >>> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco >>> <giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>: >>> >>> answer in the text >>> >>>> On 05/06/20 14:58, Jan-Frode Myklebust wrote: >>> > >>> > Could maybe be interesting to drop the NSD servers, and >> let all >>> nodes >>> > access the storage via srp ? >>> >>> no we can not: the production clusters fabric is a mix of a >> QDR based >>> cluster and a OPA based cluster and NSD nodes provide the >> service to >>> both. >>> >>> >>> You could potentially still do SRP from QDR nodes, and via NSD >> for your >>> omnipath nodes. Going via NSD seems like a bit pointless indirection. >> >> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share >> the same data lake in Spectrum Scale/GPFS so the NSD servers support the >> flexibility of the setup. >> >> NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at >> the moment 3 different generations of DDN storages are connected, >> 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less >> expensive storage, to be used when performance is not the first >> priority. >> >>> >>> >>> >>> > >>> > Maybe turn off readahead, since it can cause performance >> degradation >>> > when GPFS reads 1 MB blocks scattered on the NSDs, so that >>> read-ahead >>> > always reads too much. This might be the cause of the slow >> read >>> seen — >>> > maybe you’ll also overflow it if reading from both >> NSD-servers at >>> the >>> > same time? >>> >>> I have switched the readahead off and this produced a small >> (~10%) >>> increase of performances when reading from a NSD server, but >> no change >>> in the bad behaviour for the GPFS clients >>> >>> >>> > >>> > >>> > Plus.. it’s always nice to give a bit more pagepool to hhe >>> clients than >>> > the default.. I would prefer to start with 4 GB. >>> >>> we'll do also that and we'll let you know! >>> >>> >>> Could you show your mmlsconfig? Likely you should set maxMBpS to >>> indicate what kind of throughput a client can do (affects GPFS >>> readahead/writebehind). Would typically also increase >> workerThreads on >>> your NSD servers. >> >> At this moment this is the output of mmlsconfig >> >> # mmlsconfig >> Configuration data for cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------------- >> clusterName GPFSEXP.portici.enea.it >> clusterId 13274694257874519577 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 5.0.4.0 >> ccrEnabled yes >> cipherList AUTHONLY >> verbsRdma enable >> verbsPorts qib0/1 >> [cresco-gpfq7,cresco-gpfq8] >> verbsPorts qib0/2 >> [common] >> pagepool 4G >> adminMode central >> >> File systems in cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------ >> /dev/vsd_gexp2 >> /dev/vsd_gexp3 >> >> >>> >>> >>> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip >> size. >>> When you write one GPFS block, less than a half RAID stripe is >> written, >>> which means you need to read back some data to calculate new >> parities. >>> I would prefer 4 MB block size, and maybe also change to 8+p+q so >> that >>> one GPFS is a multiple of a full 2 MB stripe. >>> >>> >>> -jf >> >> we have now added another file system based on 2 NSD on RAID6 8+p+q, >> keeping the 1MB block size just not to change too many things at the >> same time, but no substantial change in very low readout performances, >> that are still of the order of 50 MB/s while write performance are >> 1000MB/s >> >> Any other suggestion is welcomed! >> >> Giovanni >> >> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bra...@enea.it >> WWW https://urldefense.proofpoint.com/v2/url?u=http-3A__www.afs.enea.it_bracco&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=q-8zfr3t0TGWOicysbq0ezzL2xpk3dzDg2m1plcsWm0&e= >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=CZv204_tsb3M3xIwxRyIyvTjptoQL-gD-VhzUkMRyrc&e= >> >> >> Ellei edellä ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=CZv204_tsb3M3xIwxRyIyvTjptoQL-gD-VhzUkMRyrc&e= >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bra...@enea.it > WWW https://urldefense.proofpoint.com/v2/url?u=http-3A__www.afs.enea.it_bracco&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=q-8zfr3t0TGWOicysbq0ezzL2xpk3dzDg2m1plcsWm0&e= > Ellei edellä ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss