Re: [gpfsug-discuss] pagepool

Ryan Novosielski Fri, 08 Mar 2024 08:52:57 -0800

Curious, if you could say something about how you ended up with some page pool 
values on your client side that are that high. For what use cases does 64GB, 
for example, make a difference?


--
#BlackLivesMatter
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB A555B, Newark
     `'

On Mar 8, 2024, at 11:32, Wahl, Edward <ew...@osc.edu> wrote:

Yikes!  Those must be some mighty large memory compute nodes!   That is an OK 
setting for a large memory ESS/DSS server but NOT the compute nodes at my site, 
as that is in bytes.
(so ~324 GB)  Even on our 1TB+ memory machines we do not tune it that high.

You can set pagepool for nodeclass machines such as all your compute, but 
pagepool is one of those settings where you will have to restart the clients 
for it to take effect. (such as most all the rdma settings, etc)
You should look into creating a “nodeclass” for each of your “node types” if 
you have not already, so you can avoid OOM issues from just the pagepool, and 
tune other settings per node-type (rdma/network settings, etc)
I would address this here, rather than on the Slurm side.   Then you can 
address (total memory minus the pagepool) for the overall addressability to 
Slurm for user jobs.  Leave some spare memory for the system itself or you will 
see more memory issues and whatnot when users get close to OOM, even in their 
cgroup.

Example from a cross mounted compute-side cluster.  Default is 1GB:
[root@nostorage-manager1 ~]# mmlsconfig pagepool
pagepool 1024M
pagepool 4G [k8,pitzer]
pagepool 64G [ascend]
pagepool 16G [ib-spire-login,owenslogin,pitzerlogin]
pagepool 48G [dm]
pagepool 4G [cardinal]
pagepool 64G [cardinal_quadport]

example from the ESS/DSS server side.  Later ESS versions set things by mmvdisk 
groups, rather than server type.
# mmlsconfig pagepool
pagepool 32G
pagepool 358G [gss_ppc64]
pagepool 16384M [ibmems11-hs,ems]
pagepool 324383477760 
[ess3200_mmvdisk_ibmessio13_hs_ibmessio14_hs,ess3200_mmvdisk_ibmessio15_hs_ibmessio16_hs,ess3200_mmvdisk_ibmessio17_hs_ibmessio18_hs]
pagepool 64G [sp]
pagepool 384399572992 
[ibmgssio1_hsibmgssio2_hs,ibmgssio3_hsibmgssio4_hs,ibmgssio5_hsibmgssio6_hs]
pagepool 573475966156 [ess5k_mmvdisk_ibmessio11_hs_ibmessio12_hs]
pagepool 96G [ces]

example of nodeclasses used to address other settings, such as what Infiniband 
port(s) to use.
# mmlsconfig verbsports
verbsPorts mlx5_0
verbsPorts mlx5_0 mlx5_2 [pitzer_dualport]
verbsPorts mlx4_1/1 mlx4_1/2 [dm]
verbsPorts mlx5_0 mlx5_2 [k8_dualport]
verbsPorts mlx5_0 mlx5_1 mlx5_2 mlx5_3 [cardinal_quadport]

Ed Wahl
Ohio Supercomputer Center
From: gpfsug-discuss 
<gpfsug-discuss-boun...@gpfsug.org<mailto:gpfsug-discuss-boun...@gpfsug.org>> 
On Behalf Of Iban Cabrillo
Sent: Friday, March 8, 2024 9:40 AM
To: gpfsug-discuss 
<gpfsug-disc...@spectrumscale.org<mailto:gpfsug-disc...@spectrumscale.org>>
Subject: [gpfsug-discuss] pagepool

Good afternoon, We are new to the DSS system configurations. Reviewing the 
configuration I have seen that the default pagepool is set to this value: 
pagepool 323908133683 But not only in the DSS servers, but also in the rest of 
the HPC nodes
Good afternoon,
   We are new to the DSS system configurations. Reviewing the configuration I 
have seen that the default pagepool is set to this value:

    pagepool 323908133683

But not only in the DSS servers, but also in the rest of the HPC nodes and I 
don't know if it is an excessive value. We are noticing that some jobs are 
dying by "Memory cgroup out of memory: Killed process XXX", and my doubt is if 
this pagepool is reserving too much memory for the mmfs process in decripento 
of the execution of jobs.

Any advice is welcomed,

Regards, I
--

================================================================
  Ibán Cabrillo Bartolomé
  Instituto de Física de Cantabria (IFCA-CSIC)
  Santander, Spain
  Tel: +34942200969/+34669930421
  Responsible for advanced computing service (RSC)
=========================================================================================
=========================================================================================
All our suppliers must know and accept IFCA policy available at:

https://confluence.ifca.es/display/IC/Information+Security+Policy+for+External+Suppliers<https://urldefense.com/v3/__https:/confluence.ifca.es/display/IC/Information*Security*Policy*for*External*Suppliers__;KysrKys!!KGKeukY!3o_dGRsvxDtOG6Z646nJEb9ehb_ondS1kL3gecKjKN7mvMULc6h9iKST-ihDjnWz04X-lcNATjPzLDB2eW7P$>
==========================================================================================

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Re: [gpfsug-discuss] pagepool

Reply via email to