Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?
Of course, you might also be interested in our upcoming Webinar on 22nd September (which I haven't advertised yet): https://www.spectrumscaleug.org/event/ssugdigital-deep-dive-in-spectrum-scale-core/ ... This presentation will discuss selected improvements in Spectrum V5, focusing on improvements for inode management, VCPU scaling and considerations for NUMA. Simon On 04/09/2020, 08:56, "gpfsug-discuss-boun...@spectrumscale.org on behalf of Jonathan Buzzard" wrote: On 02/09/2020 23:28, Andrew Beattie wrote: > Giovanni, I have clients in Australia that are running AMD ROME > processors in their Visualisation nodes connected to scale 5.0.4 > clusters with no issues. Spectrum Scale doesn't differentiate between > x86 processor technologies -- it only looks at x86_64 (OS support > more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?
On 02/09/2020 23:28, Andrew Beattie wrote: Giovanni, I have clients in Australia that are running AMD ROME processors in their Visualisation nodes connected to scale 5.0.4 clusters with no issues. Spectrum Scale doesn't differentiate between x86 processor technologies -- it only looks at x86_64 (OS support more than anything else) While true bear in mind their are limits on the number of cores that it might be quite easy to pass on a high end multi CPU AMD machine :-) See question 5.3 https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.pdf 192 is the largest tested limit for the number of cores and there is a hard limit at 1536 cores. From memory these limits are lower in older versions of GPFS.So I think the "tested" limit in 4.2 is 64 cores from memory (or was at the time of release), but works just fine on 80 cores as far as I can tell. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?
I don’t currently have any x86 based servers to do that kind of performance testing, But the PCI-Gen 4 advantages alone mean that the AMD server options have significant benefits over current Intel processor platforms. There are however limited storage controllers and Network adapters that can help utilise the full benefits of PCI-gen4. In terms of NSD architecture there are many variables that you also have to take into consideration. Are you looking at storage rich servers? Are you looking at SAN attached Flash Are you looking at scale ECE type deployment? As an IBM employee and someone familiar with ESS 5000, and the differences / benefits of the 5K architecture, Unless your planning on building a Scale ECE type cluster with AMD processors, storage class memory, and NVMe flash modules. I would seriously consider the ESS 5k over an x86 based NL-SAS storage topology Including AMD. Sent from my iPhone > On 3 Sep 2020, at 17:44, Giovanni Bracco wrote: > > OK from client side, but I would like to know if the same is also for > NSD servers with AMD EPYC, do they operate with good performance > compared to Intel CPUs? > > Giovanni > >> On 03/09/20 00:28, Andrew Beattie wrote: >> Giovanni, >> I have clients in Australia that are running AMD ROME processors in >> their Visualisation nodes connected to scale 5.0.4 clusters with no issues. >> Spectrum Scale doesn't differentiate between x86 processor technologies >> -- it only looks at x86_64 (OS support more than anything else) >> Andrew Beattie >> File and Object Storage Technical Specialist - A/NZ >> IBM Systems - Storage >> Phone: 614-2133-7927 >> E-mail: abeat...@au1.ibm.com <mailto:abeat...@au1.ibm.com> >> >>- Original message - >>From: Giovanni Bracco >>Sent by: gpfsug-discuss-boun...@spectrumscale.org >>To: gpfsug main discussion list , >> Frederick Stock >>Cc: >>Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what >>about AMD epyc support in GPFS? >>Date: Thu, Sep 3, 2020 7:29 AM >>I am curious to know about AMD epyc support by GPFS: what is the status? >>Giovanni Bracco >> >>>On 28/08/20 14:25, Frederick Stock wrote: >>> Not sure that Spectrum Scale has stated it supports the AMD epyc >>(Rome?) >>> processors. You may want to open a help case to determine the >>cause of >>> this problem. >>> Note that Spectrum Scale 4.2.x goes out of service on September >>30, 2020 >>> so you may want to consider upgrading your cluster. And should Scale >>> officially support the AMD epyc processor it would not be on >>Scale 4.2.x. >>> >>> Fred >>> __ >>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 >>> sto...@us.ibm.com >>> >>> - Original message - >>> From: Philipp Helo Rehs >>> Sent by: gpfsug-discuss-boun...@spectrumscale.org >>> To: gpfsug main discussion list >> >>> Cc: >>> Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck >>> Date: Fri, Aug 28, 2020 5:52 AM >>> Hello, >>> >>> we have a gpfs v4 cluster running with 4 nsds and i am trying >>to add >>> some clients: >>> >>> mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 >>> >>> this commands hangs and do not finish >>> >>> When i look into the server, i can see the following >>processes which >>> never finish: >>> >>> root 38138 0.0 0.0 123048 10376 ?Ss 11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote >>checkNewClusterNode3 >>> lc/setupClient >>> >> %%%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: >>> %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 >>> root 38169 0.0 0.0 123564 10892 ?S11:32 0:00 >>> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl >>setupClient 2 >>> 21479 >>> >> 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 >>> 0 1191 >>> root 38212 100 0.0 35544 5752 ?R11:32 9:40 >>> /usr/lpp/mmfs/bin
Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?
OK from client side, but I would like to know if the same is also for NSD servers with AMD EPYC, do they operate with good performance compared to Intel CPUs? Giovanni On 03/09/20 00:28, Andrew Beattie wrote: Giovanni, I have clients in Australia that are running AMD ROME processors in their Visualisation nodes connected to scale 5.0.4 clusters with no issues. Spectrum Scale doesn't differentiate between x86 processor technologies -- it only looks at x86_64 (OS support more than anything else) Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeat...@au1.ibm.com <mailto:abeat...@au1.ibm.com> - Original message - From: Giovanni Bracco Sent by: gpfsug-discuss-boun...@spectrumscale.org To: gpfsug main discussion list , Frederick Stock Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS? Date: Thu, Sep 3, 2020 7:29 AM I am curious to know about AMD epyc support by GPFS: what is the status? Giovanni Bracco On 28/08/20 14:25, Frederick Stock wrote: > Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) > processors. You may want to open a help case to determine the cause of > this problem. > Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 > so you may want to consider upgrading your cluster. And should Scale > officially support the AMD epyc processor it would not be on Scale 4.2.x. > > Fred > __ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 > sto...@us.ibm.com > > - Original message - > From: Philipp Helo Rehs > Sent by: gpfsug-discuss-boun...@spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck > Date: Fri, Aug 28, 2020 5:52 AM > Hello, > > we have a gpfs v4 cluster running with 4 nsds and i am trying to add > some clients: > > mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 > > this commands hangs and do not finish > > When i look into the server, i can see the following processes which > never finish: > > root 38138 0.0 0.0 123048 10376 ? Ss 11:32 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3 > lc/setupClient > %%%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: > %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 > root 38169 0.0 0.0 123564 10892 ? S 11:32 0:00 > /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2 > 21479 > 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 > 0 1191 > root 38212 100 0.0 35544 5752 ? R 11:32 9:40 > /usr/lpp/mmfs/bin/tsgskkm store --cert > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out > /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off > > The node is an AMD epyc. > > Any idea what could cause the issue? > > ssh is possible in both directions and firewall is disabled. > > > Kind regards > > Philipp Rehs > > > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > ___ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bra...@enea.it WWW http://www.afs.enea.it/bracco ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/lis
Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?
Giovanni, I have clients in Australia that are running AMD ROME processors in their Visualisation nodes connected to scale 5.0.4 clusters with no issues. Spectrum Scale doesn't differentiate between x86 processor technologies -- it only looks at x86_64 (OS support more than anything else) Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeat...@au1.ibm.com - Original message -From: Giovanni Bracco Sent by: gpfsug-discuss-boun...@spectrumscale.orgTo: gpfsug main discussion list , Frederick Stock Cc:Subject: [EXTERNAL] Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?Date: Thu, Sep 3, 2020 7:29 AM I am curious to know about AMD epyc support by GPFS: what is the status?Giovanni BraccoOn 28/08/20 14:25, Frederick Stock wrote:> Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?)> processors. You may want to open a help case to determine the cause of> this problem.> Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020> so you may want to consider upgrading your cluster. And should Scale> officially support the AMD epyc processor it would not be on Scale 4.2.x.>> Fred> __> Fred Stock | IBM Pittsburgh Lab | 720-430-8821> sto...@us.ibm.com>> - Original message -> From: Philipp Helo Rehs > Sent by: gpfsug-discuss-boun...@spectrumscale.org> To: gpfsug main discussion list > Cc:> Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck> Date: Fri, Aug 28, 2020 5:52 AM> Hello,>> we have a gpfs v4 cluster running with 4 nsds and i am trying to add> some clients:>> mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1>> this commands hangs and do not finish>> When i look into the server, i can see the following processes which> never finish:>> root 38138 0.0 0.0 123048 10376 ? Ss 11:32 0:00> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3> lc/setupClient> %%%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0:> %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1> root 38169 0.0 0.0 123564 10892 ? S 11:32 0:00> /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2> 21479> 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191> 0 1191> root 38212 100 0.0 35544 5752 ? R 11:32 9:40> /usr/lpp/mmfs/bin/tsgskkm store --cert> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out> /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off>> The node is an AMD epyc.>> Any idea what could cause the issue?>> ssh is possible in both directions and firewall is disabled.>>> Kind regards>> Philipp Rehs>>> ___> gpfsug-discuss mailing list> gpfsug-discuss at spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> ___> gpfsug-discuss mailing list> gpfsug-discuss at spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >--Giovanni Braccophone +39 351 8804788E-mail giovanni.bra...@enea.itWWW http://www.afs.enea.it/bracco ___gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] tsgskkm stuck---> what about AMD epyc support in GPFS?
I am curious to know about AMD epyc support by GPFS: what is the status? Giovanni Bracco On 28/08/20 14:25, Frederick Stock wrote: Not sure that Spectrum Scale has stated it supports the AMD epyc (Rome?) processors. You may want to open a help case to determine the cause of this problem. Note that Spectrum Scale 4.2.x goes out of service on September 30, 2020 so you may want to consider upgrading your cluster. And should Scale officially support the AMD epyc processor it would not be on Scale 4.2.x. Fred __ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 sto...@us.ibm.com - Original message - From: Philipp Helo Rehs Sent by: gpfsug-discuss-boun...@spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] tsgskkm stuck Date: Fri, Aug 28, 2020 5:52 AM Hello, we have a gpfs v4 cluster running with 4 nsds and i am trying to add some clients: mmaddnode -N hpc-storage-1-ib:client:hpc-storage-1 this commands hangs and do not finish When i look into the server, i can see the following processes which never finish: root 38138 0.0 0.0 123048 10376 ? Ss 11:32 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote checkNewClusterNode3 lc/setupClient %%%%:00_VERSION_LINE::1709:3:1::lc:gpfs3.hilbert.hpc.uni-duesseldorf.de::0:/bin/ssh:/bin/scp:5362040003754711198:lc2:1597757602::HPCStorage.hilbert.hpc.uni-duesseldorf.de:2:1:1:2:A:::central:0.0: %%home%%:20_MEMBER_NODE::5:20:hpc-storage-1 root 38169 0.0 0.0 123564 10892 ? S 11:32 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmremote ccrctl setupClient 2 21479 1=gpfs3-ib.hilbert.hpc.uni-duesseldorf.de:1191,2=gpfs4-ib.hilbert.hpc.uni-duesseldorf.de:1191,4=gpfs6-ib.hilbert.hpc.uni-duesseldorf.de:1191,3=gpfs5-ib.hilbert.hpc.uni-duesseldorf.de:1191 0 1191 root 38212 100 0.0 35544 5752 ? R 11:32 9:40 /usr/lpp/mmfs/bin/tsgskkm store --cert /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.cert --priv /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.priv --out /var/mmfs/ssl/stage/tmpKeyData.mmremote.38169.keystore --fips off The node is an AMD epyc. Any idea what could cause the issue? ssh is possible in both directions and firewall is disabled. Kind regards Philipp Rehs ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Giovanni Bracco phone +39 351 8804788 E-mail giovanni.bra...@enea.it WWW http://www.afs.enea.it/bracco ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss