OK, so hold on … NOW what’s going on???  I deleted the filesystem … went to 
lunch … came back an hour later … recreated the filesystem with a metadata 
block size of 4 MB … and I STILL have a 1 MB block size in the system pool and 
the wrong fragment size in other pools…

Kevin

/root/gpfs
root@testnsd1# mmdelfs gpfs5
All data on the following disks of gpfs5 will be destroyed:
    test21A3nsd
    test21A4nsd
    test21B3nsd
    test21B4nsd
    test23Ansd
    test23Bnsd
    test23Cnsd
    test24Ansd
    test24Bnsd
    test24Cnsd
    test25Ansd
    test25Bnsd
    test25Cnsd
Completed deletion of file system /dev/gpfs5.
mmdelfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
/root/gpfs
root@testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 
-j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v 
yes --nofilesetdf --metadata-block-size 4M

The following disks of gpfs5 will be formatted on node testnsd3:
    test21A3nsd: size 953609 MB
    test21A4nsd: size 953609 MB
    test21B3nsd: size 953609 MB
    test21B4nsd: size 953609 MB
    test23Ansd: size 15259744 MB
    test23Bnsd: size 15259744 MB
    test23Cnsd: size 1907468 MB
    test24Ansd: size 15259744 MB
    test24Bnsd: size 15259744 MB
    test24Cnsd: size 1907468 MB
    test25Ansd: size 15259744 MB
    test25Bnsd: size 15259744 MB
    test25Cnsd: size 1907468 MB
Formatting file system ...
Disks up to size 8.29 TB can be added to storage pool system.
Disks up to size 16.60 TB can be added to storage pool raid1.
Disks up to size 132.62 TB can be added to storage pool raid6.
Creating Inode File
  12 % complete on Thu Aug  2 13:16:26 2018
  25 % complete on Thu Aug  2 13:16:31 2018
  38 % complete on Thu Aug  2 13:16:36 2018
  50 % complete on Thu Aug  2 13:16:41 2018
  62 % complete on Thu Aug  2 13:16:46 2018
  74 % complete on Thu Aug  2 13:16:52 2018
  85 % complete on Thu Aug  2 13:16:57 2018
  96 % complete on Thu Aug  2 13:17:02 2018
 100 % complete on Thu Aug  2 13:17:03 2018
Creating Allocation Maps
Creating Log Files
   3 % complete on Thu Aug  2 13:17:09 2018
  28 % complete on Thu Aug  2 13:17:15 2018
  53 % complete on Thu Aug  2 13:17:20 2018
  78 % complete on Thu Aug  2 13:17:26 2018
 100 % complete on Thu Aug  2 13:17:27 2018
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
  98 % complete on Thu Aug  2 13:17:34 2018
 100 % complete on Thu Aug  2 13:17:34 2018
Formatting Allocation Map for storage pool raid1
  52 % complete on Thu Aug  2 13:17:39 2018
 100 % complete on Thu Aug  2 13:17:43 2018
Formatting Allocation Map for storage pool raid6
  24 % complete on Thu Aug  2 13:17:48 2018
  50 % complete on Thu Aug  2 13:17:53 2018
  74 % complete on Thu Aug  2 13:17:58 2018
  99 % complete on Thu Aug  2 13:18:03 2018
 100 % complete on Thu Aug  2 13:18:03 2018
Completed creation of file system /dev/gpfs5.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
/root/gpfs
root@testnsd1# mmlsfs gpfs5
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 8192                     Minimum fragment (subblock) size 
in bytes (system pool)
                    32768                    Minimum fragment (subblock) size 
in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 2                        Default number of metadata replicas
 -M                 3                        Maximum number of metadata replicas
 -r                 1                        Default number of data replicas
 -R                 3                        Maximum number of data replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 all                      ACL semantics in effect
 -n                 32                       Estimated number of nodes that 
will mount file system
 -B                 1048576                  Block size (system pool)
                    4194304                  Block size (other pools)
 -Q                 user;group;fileset       Quotas accounting enabled
                    user;group;fileset       Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 19.01 (5.0.1.0)          File system version
 --create-time      Thu Aug  2 13:16:47 2018 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 33554432                 Logfile size
 -E                 Yes                      Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           Yes                      Fast external attributes enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      101095424                Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 128              Number of subblocks per full block
 -P                 system;raid1;raid6       Disk storage pools in file system
 --file-audit-log   No                       File Audit Logging enabled?
 --maintenance-mode No                       Maintenance Mode enabled?
 -d                 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd
  Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /gpfs5                   Default mount point
 --mount-priority   0                        Mount priority
/root/gpfs
root@testnsd1#


—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633


On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L 
<kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>> wrote:

Hi All,

Thanks for all the responses on this, although I have the sneaking suspicion 
that the most significant thing that is going to come out of this thread is the 
knowledge that Sven has left IBM for DDN.  ;-) or :-( or :-O depending on your 
perspective.

Anyway … we have done some testing which has shown that a 4 MB block size is 
best for those workloads that use “normal” sized files.  However, we - like 
many similar institutions - support a mixed workload, so the 128K fragment size 
that comes with that is not optimal for the primarily biomedical type 
applications that literally create millions of very small files.  That’s why we 
settled on 1 MB as a compromise.

So we’re very eager to now test with GPFS 5, a 4 MB block size, and a 8K 
fragment size.  I’m recreating my test cluster filesystem now with that config 
… so 4 MB block size on the metadata only system pool, too.

Thanks to all who took the time to respond to this thread.  I hope it’s been 
beneficial to others as well…

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - 
(615)875-9633

On Aug 1, 2018, at 7:11 PM, Andrew Beattie 
<abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>> wrote:

I too would second the comment about doing testing specific to your environment

We recently deployed a number of ESS building blocks into a customer site that 
was specifically being used for a mixed HPC workload.

We spent more than a week playing with different block sizes for both data and 
metadata trying to identify which variation would provide the best mix of both 
metadata performance and data performance.  one thing we noticed very early on 
is that MDtest and IOR both respond very differently as you play with both 
block size and subblock size.  What works for one use case may be a very poor 
option for another use case.

Interestingly enough it turned out that the best overall option for our 
particular use case was an 8MB block size with 32k sub blocks -- as that gave 
us good Metadata performance and good sequential data performance

which is probably why 32k sub block was the default for so many years ....
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>


----- Original message -----
From: "Marc A Kaplan" <makap...@us.ibm.com<mailto:makap...@us.ibm.com>>
Sent by: 
gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org>
To: gpfsug main discussion list 
<gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>>
Cc:
Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 
filesystem?
Date: Thu, Aug 2, 2018 10:01 AM

Firstly, I do suggest that you run some tests and see how much, if any, 
difference the settings that are available make in performance and/or storage 
utilization.

Secondly, as I and others have hinted at, deeper in the system, there may be 
additional parameters and settings.  Sometimes they are available via commands, 
and/or configuration settings, sometimes not.

Sometimes that's just because we didn't want to overwhelm you or ourselves with 
yet more "tuning knobs".

Sometimes it's because we made some component more tunable than we really 
needed, but did not make all the interconnected components equally or as widely 
tunable.
Sometimes it's because we want to save you from making ridiculous settings that 
would lead to problems...

OTOH, as I wrote before, if a burning requirement surfaces, things may change 
from release to release... Just as for so many years subblocks per block seemed 
forever frozen at the number 32.  Now it varies... and then the discussion 
shifts to why can't it be even more flexible?


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=33lN3yKin9et0lnbjMFDVEeSDSf3rmwDQu%2BsvheTeB8%3D&reserved=0>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0>


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at 
spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=33lN3yKin9et0lnbjMFDVEeSDSf3rmwDQu%2BsvheTeB8%3D&reserved=0>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&amp;sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&amp;reserved=0

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&amp;data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&amp;sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&amp;reserved=0

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to