OK, so hold on … NOW what’s going on??? I deleted the filesystem … went to lunch … came back an hour later … recreated the filesystem with a metadata block size of 4 MB … and I STILL have a 1 MB block size in the system pool and the wrong fragment size in other pools…
Kevin /root/gpfs root@testnsd1# mmdelfs gpfs5 All data on the following disks of gpfs5 will be destroyed: test21A3nsd test21A4nsd test21B3nsd test21B4nsd test23Ansd test23Bnsd test23Cnsd test24Ansd test24Bnsd test24Cnsd test25Ansd test25Bnsd test25Cnsd Completed deletion of file system /dev/gpfs5. mmdelfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root@testnsd1# mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter -k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes --nofilesetdf --metadata-block-size 4M The following disks of gpfs5 will be formatted on node testnsd3: test21A3nsd: size 953609 MB test21A4nsd: size 953609 MB test21B3nsd: size 953609 MB test21B4nsd: size 953609 MB test23Ansd: size 15259744 MB test23Bnsd: size 15259744 MB test23Cnsd: size 1907468 MB test24Ansd: size 15259744 MB test24Bnsd: size 15259744 MB test24Cnsd: size 1907468 MB test25Ansd: size 15259744 MB test25Bnsd: size 15259744 MB test25Cnsd: size 1907468 MB Formatting file system ... Disks up to size 8.29 TB can be added to storage pool system. Disks up to size 16.60 TB can be added to storage pool raid1. Disks up to size 132.62 TB can be added to storage pool raid6. Creating Inode File 12 % complete on Thu Aug 2 13:16:26 2018 25 % complete on Thu Aug 2 13:16:31 2018 38 % complete on Thu Aug 2 13:16:36 2018 50 % complete on Thu Aug 2 13:16:41 2018 62 % complete on Thu Aug 2 13:16:46 2018 74 % complete on Thu Aug 2 13:16:52 2018 85 % complete on Thu Aug 2 13:16:57 2018 96 % complete on Thu Aug 2 13:17:02 2018 100 % complete on Thu Aug 2 13:17:03 2018 Creating Allocation Maps Creating Log Files 3 % complete on Thu Aug 2 13:17:09 2018 28 % complete on Thu Aug 2 13:17:15 2018 53 % complete on Thu Aug 2 13:17:20 2018 78 % complete on Thu Aug 2 13:17:26 2018 100 % complete on Thu Aug 2 13:17:27 2018 Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool system 98 % complete on Thu Aug 2 13:17:34 2018 100 % complete on Thu Aug 2 13:17:34 2018 Formatting Allocation Map for storage pool raid1 52 % complete on Thu Aug 2 13:17:39 2018 100 % complete on Thu Aug 2 13:17:43 2018 Formatting Allocation Map for storage pool raid6 24 % complete on Thu Aug 2 13:17:48 2018 50 % complete on Thu Aug 2 13:17:53 2018 74 % complete on Thu Aug 2 13:17:58 2018 99 % complete on Thu Aug 2 13:18:03 2018 100 % complete on Thu Aug 2 13:18:03 2018 Completed creation of file system /dev/gpfs5. mmcrfs: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. /root/gpfs root@testnsd1# mmlsfs gpfs5 flag value description ------------------- ------------------------ ----------------------------------- -f 8192 Minimum fragment (subblock) size in bytes (system pool) 32768 Minimum fragment (subblock) size in bytes (other pools) -i 4096 Inode size in bytes -I 32768 Indirect block size in bytes -m 2 Default number of metadata replicas -M 3 Maximum number of metadata replicas -r 1 Default number of data replicas -R 3 Maximum number of data replicas -j scatter Block allocation type -D nfs4 File locking semantics in effect -k all ACL semantics in effect -n 32 Estimated number of nodes that will mount file system -B 1048576 Block size (system pool) 4194304 Block size (other pools) -Q user;group;fileset Quotas accounting enabled user;group;fileset Quotas enforced none Default quotas enabled --perfileset-quota No Per-fileset quota enforcement --filesetdf No Fileset df enabled? -V 19.01 (5.0.1.0) File system version --create-time Thu Aug 2 13:16:47 2018 File system creation time -z No Is DMAPI enabled? -L 33554432 Logfile size -E Yes Exact mtime mount option -S relatime Suppress atime mount option -K whenpossible Strict replica allocation option --fastea Yes Fast external attributes enabled? --encryption No Encryption enabled? --inode-limit 101095424 Maximum number of inodes --log-replicas 0 Number of log replicas --is4KAligned Yes is4KAligned? --rapid-repair Yes rapidRepair enabled? --write-cache-threshold 0 HAWC Threshold (max 65536) --subblocks-per-full-block 128 Number of subblocks per full block -P system;raid1;raid6 Disk storage pools in file system --file-audit-log No File Audit Logging enabled? --maintenance-mode No Maintenance Mode enabled? -d test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd Disks in file system -A yes Automatic mount option -o none Additional mount options -T /gpfs5 Default mount point --mount-priority 0 Mount priority /root/gpfs root@testnsd1# — Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633 On Aug 2, 2018, at 3:31 PM, Buterbaugh, Kevin L <kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>> wrote: Hi All, Thanks for all the responses on this, although I have the sneaking suspicion that the most significant thing that is going to come out of this thread is the knowledge that Sven has left IBM for DDN. ;-) or :-( or :-O depending on your perspective. Anyway … we have done some testing which has shown that a 4 MB block size is best for those workloads that use “normal” sized files. However, we - like many similar institutions - support a mixed workload, so the 128K fragment size that comes with that is not optimal for the primarily biomedical type applications that literally create millions of very small files. That’s why we settled on 1 MB as a compromise. So we’re very eager to now test with GPFS 5, a 4 MB block size, and a 8K fragment size. I’m recreating my test cluster filesystem now with that config … so 4 MB block size on the metadata only system pool, too. Thanks to all who took the time to respond to this thread. I hope it’s been beneficial to others as well… Kevin — Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633 On Aug 1, 2018, at 7:11 PM, Andrew Beattie <abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com>> wrote: I too would second the comment about doing testing specific to your environment We recently deployed a number of ESS building blocks into a customer site that was specifically being used for a mixed HPC workload. We spent more than a week playing with different block sizes for both data and metadata trying to identify which variation would provide the best mix of both metadata performance and data performance. one thing we noticed very early on is that MDtest and IOR both respond very differently as you play with both block size and subblock size. What works for one use case may be a very poor option for another use case. Interestingly enough it turned out that the best overall option for our particular use case was an 8MB block size with 32k sub blocks -- as that gave us good Metadata performance and good sequential data performance which is probably why 32k sub block was the default for so many years .... Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeat...@au1.ibm.com<mailto:abeat...@au1.ibm.com> ----- Original message ----- From: "Marc A Kaplan" <makap...@us.ibm.com<mailto:makap...@us.ibm.com>> Sent by: gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-boun...@spectrumscale.org> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org<mailto:gpfsug-discuss@spectrumscale.org>> Cc: Subject: Re: [gpfsug-discuss] Sub-block size not quite as expected on GPFS 5 filesystem? Date: Thu, Aug 2, 2018 10:01 AM Firstly, I do suggest that you run some tests and see how much, if any, difference the settings that are available make in performance and/or storage utilization. Secondly, as I and others have hinted at, deeper in the system, there may be additional parameters and settings. Sometimes they are available via commands, and/or configuration settings, sometimes not. Sometimes that's just because we didn't want to overwhelm you or ourselves with yet more "tuning knobs". Sometimes it's because we made some component more tunable than we really needed, but did not make all the interconnected components equally or as widely tunable. Sometimes it's because we want to save you from making ridiculous settings that would lead to problems... OTOH, as I wrote before, if a burning requirement surfaces, things may change from release to release... Just as for so many years subblocks per block seemed forever frozen at the number 32. Now it varies... and then the discussion shifts to why can't it be even more flexible? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=33lN3yKin9et0lnbjMFDVEeSDSf3rmwDQu%2BsvheTeB8%3D&reserved=0> http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0> _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspectrumscale.org&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=33lN3yKin9et0lnbjMFDVEeSDSf3rmwDQu%2BsvheTeB8%3D&reserved=0> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb821b9e8a6db4408fff308d5f80c907d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687655210056012&sdata=SCzz05SABDQ0vxprDYfdKGOY1VES%2Fm0tIr2kRnGlY4c%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C050353d8d80b4e272ab708d5f8b70361%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636688387286266248&sdata=d1rBsXZEn1BlkmvHGKHvkk2%2FWmXAppqS5SbOQF0ZCrY%3D&reserved=0
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss