Re: [gpfsug-discuss] kernel 3.10.0-1160.36.2.el7.x86_64 (CVE-2021-33909) not compatible with DB2 (for TSM, HPSS, possibly other IBM apps)
Hey Jonathan 3.10.0-1160.31.1 seems to be one of the last kernel releases prior to the CVE-2021-33909 exploit. 3.10.0-1160.36.2.el7.x86_64 seems to be the first on the Redhat repo that fixes the exploit, but it's not working for our combination of TSM/DB2 versions: * TSM 8.1.8 * DB2 v11.1.4.4 I'll just keep one eye on the repo for the next kernel available and try it again. Until then I'll stick with 3.10.0-1062.18.1 On the HPSS side 3.10.0-1160.36.2.el7.x86_64 worked fine with DB2 11.5, but not with 10.5 Thanks Jaime On 7/30/2021 07:27:49, Jonathan Buzzard wrote: On 30/07/2021 05:16, Jaime Pinto wrote: Alert related to sysadmins managing TSM/DB2 servers and those responsible for applying security patches, in particular kernel 3.10.0-1160.36.2.el7.x86_64, despite security concerns raised by CVE-2021-33909: Please hold off on upgrading your RedHat systems (possibly centos too). I just found out the hard way that kernel 3.10.0-1160.36.2.el7.x86_64 is not compatible with DB2, and after the node reboot DB2 would not work anymore, not only on TSM, but neither on HPSS. I had to revert the kernel to 3.10.0-1062.18.1.el7.x86_64 to get DB2 working properly again. For the record I have been running Spectrum Protect Extended Edition 8.1.12 on 3.10.0-1160.31.1 (genuine RHEL 7.9) since the 11th of June this year. I would say therefore there is no need to roll back quite so far as 3.10.0-1062.18.1 which is quite ancient now. Can't test anything newer as I am literally in the middle of migrating our TSM server to new hardware and a RHEL 8.4 install. Spent yesterday in the data centre re-cabling the disk arrays to the new server; neat, tidy and labelled this time :-) JAB. --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] kernel 3.10.0-1160.36.2.el7.x86_64 (CVE-2021-33909) not compatible with DB2 (for TSM, HPSS, possibly other IBM apps)
Alert related to sysadmins managing TSM/DB2 servers and those responsible for applying security patches, in particular kernel 3.10.0-1160.36.2.el7.x86_64, despite security concerns raised by CVE-2021-33909: Please hold off on upgrading your RedHat systems (possibly centos too). I just found out the hard way that kernel 3.10.0-1160.36.2.el7.x86_64 is not compatible with DB2, and after the node reboot DB2 would not work anymore, not only on TSM, but neither on HPSS. I had to revert the kernel to 3.10.0-1062.18.1.el7.x86_64 to get DB2 working properly again. --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] SS 5.0.x and quota issues
So Bob, Yes, we too have observed an uncharacteristic lag on the correction of the internal quota accounting on GPFS since we updated from version 3.3 to version 4.x some 7-8 years ago. That lag remains through version 5.0.x as well. And it persisted through several appliances (DDN, G200, GSS, ESS and now DSS-G). In our university environment there is also a lot of data churning, in particular small files. The workaround has always been to periodically run mmcheckquota on the top independent fileset to expedite that correction (I have a crontab script that measures the relative size of the in-dought columns on the mmrepquota report, size and inodes, and if either exceeds 2% for any USR/GRP/FILESET I run mmrepquota) We have opened supports calls with IBM about this issue in the past, and we never got a solution, to possibly adjust some GPFS configuration parameter, and have this correction done automatically. We gave up. And that begs the question: what do you mean by "... 5.0.4-4 ... that has a fix for mmcheckquota"? Isn't mmcheckquota zeroing the in-doubt columns when you run it? The fix should be for gpfs (something buried in the code over many versions). As far as I can tell there has never been anything wrong with mmcheckquota. Thanks Jaime On 5/18/2020 08:59:09, Cregan, Bob wrote: Hi, At Imperial we have been experiencing an issue with SS 5.0.x and quotas. The main symptom is a slow decay in the accuracy of reported quota usage when compared to the actual usage as reported by "du". This discrepancy can be as little as a few percent and as much as many X100% . We also sometimes see bizarre effects such negative file number counts being reported. We have been working with IBM and have put in the latest 5.0.4-4 (that has a fix for mmcheckquota) that we have been pinning our hopes on, but this has not worked. Is anyone else experiencing similar issues? We need to try and get an idea if this is an issue peculiar to our set up or a more general SS problem. We are using user and group quotas in a fileset context. Thanks *Bob Cregan* HPC Systems Analyst Information & Communication Technologies Imperial College London, South Kensington Campus London, SW7 2AZ T: 07712388129 E: b.cre...@imperial.ac.uk W: www.imperial.ac.uk/ict/rcs <http://www.imperial.ac.uk/ict/rcs> _1505984389175_twitter.png @imperialRCS @imperialRSE 1505983882959_Imperial-RCS.png ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss . . . TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ******** --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Odd networking/name resolution issue
The rationale for my suggestion doesn't have much to do with the central DNS server, but everything to do with the DNS client side of the service. If you have a very busy cluster at times, and a number of nodes really busy with 1000+ IOPs for instance, so much that the OS on the client can't barely spare a cycle to query the DSN server on what the IP associated with the name of interface leading to the GPFS infrastructure is, or even process that response when it returns, on the same interface where it's having contentions and trying to process all the gpfs data transactions, you can have temporary catch 22 situations. This can generate a backlog of waiters, and eventual expelling of some nodes when the cluster managers don't hear from them in reasonable time. It's doesn't really matter if you have a central DNS server in steroids. Jaime On 5/10/2020 03:35:29, TURNER Aaron wrote: Following on from Jonathan Buzzards comments, I'd also like to point out that I've never known a central DNS failure in a UK HEI for as long as I can remember, and it was certainly not my intention to suggest that as I think a central DNS issue is highly unlikely. And indeed, as I originally noted, the standard command-line tools on the nodes resolve the names as expected, so whatever is going on looks like it affects GPFS only. It may even be that the repetition of the domain names in the logs is just a function of something it is doing when logging when a node is failing to connect for some other reason entirely. It's just not something I recall having seen before and wanted to see if anyone else had seen it. -- *From:* gpfsug-discuss-boun...@spectrumscale.org on behalf of Jonathan Buzzard *Sent:* 09 May 2020 23:22 *To:* gpfsug-discuss@spectrumscale.org *Subject:* Re: [gpfsug-discuss] Odd networking/name resolution issue On 09/05/2020 12:06, Jaime Pinto wrote: DNS shouldn't be relied upon on a GPFS cluster for internal communication/management or data. The 1980's have called and want their lack of IP resolution protocols back :-) I would kindly disagree. If your DNS is not working then your cluster is fubar anyway and a zillion other things will also break very rapidly. For us at least half of the running jobs would be dead in a few minutes as failure to contact license servers would cause the software to stop. All authentication and account lookup is also going to fail as well. You could distribute a hosts file but frankly outside of a storage only cluster (as opposed to one with hundreds if not thousands of compute nodes) that is frankly madness and will inevitably come to bite you in the ass because they *will* get out of sync. The only hosts entry we have is for the Salt Stack host because it tries to do things before the DNS resolvers have been setup and consequently breaks otherwise. Which IMHO is duff on it's behalf. I would add I can't think of a time in the last 16 years where internal DNS at any University I have worked at has stopped working for even one millisecond. If DNS is that flaky at your institution then I suggest sacking the people responsible for it's maintenance as being incompetent twits. It is just such a vanishingly remote possibility that it's not worth bothering about. Frankly a aircraft falling out the sky and squishing your data centre seems more likely to me. Finally in a world of IPv6 then anything other than DNS is a utter madness IMHO. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org
Re: [gpfsug-discuss] Odd networking/name resolution issue
DNS shouldn't be relied upon on a GPFS cluster for internal communication/management or data. As a starting point, make sure the IP's and names of all managers/quorum nodes and clients have *unique* entries in the hosts files of all other nodes in the clusters, being the same as how they where joined and licensed in the first place. If you issue a 'mmlscluster' on the cluster manager for the servers and clients, those results should be used to build the common hosts file for all nodes involved. Also, all nodes should have a common ntp configuration, pointing to the same *internal* ntp server, easily accessible via name/IP also on the hosts file. And obviously, you need a stable network, eth or IB. Have a good monitoring tool in place, to rule out network as a possible culprit. In the particular case of IB, check that the fabric managers are doing their jobs properly. And keep one eye on the 'tail -f /var/mmfs/gen/mmfslog' output of the managers and the nodes being expelled for other clues. Jaime On 5/9/2020 06:25:28, TURNER Aaron wrote: Dear All, We are getting, on an intermittent basis with currently no obvious pattern, an issue with GPFS nodes reporting rejecting nodes of the form: nodename.domain.domain.domain DNS resolution using the standard command-line tools of the IP address present in the logs does not repeat the domain, and so far it seems isolated to GPFS. Ultimately the nodes are rejected as not responding on the network. Has anyone seen this sort of behaviour before? Regards Aaron Turner The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss . . . TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] GPFS vulnerability with possible root exploit on versions prior to 5.0.4.3 (and 4.2.3.21)
In case you missed (the forum has been pretty quiet about this one), CVE-2020-4273 had an update yesterday: https://www.ibm.com/support/pages/node/6151701?myns=s033=OCSTXKQY=E_sp=s033-_-OCSTXKQY-_-E If you can't do the upgrade now, at least apply the mitigation to the client nodes generally exposed to unprivileged users: Check the setuid bit: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') Apply the mitigation: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("chmod u-s /usr/lpp/mmfs/bin/"$9)}' Verification: ls -l /usr/lpp/mmfs/bin | grep r-s | awk '{system("ls -l /usr/lpp/mmfs/bin/"$9)}') All the best Jaime . . . TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ******** --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] fast search for archivable data sets
Hi Jim, If you never worked with policy rules before, you may want to start by building your nerves to it. In the /usr/lpp/mmfs/samples/ilm path you will find several examples of templates that you can use to play around. I would start with the 'list' rules first. Some of those templates are a bit complex, so here is one script that I use on a regular basis to detect files larger than 1MB (you can even exclude specific filesets): ~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-large /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'largefiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files that have more than 1MB of space allocated. */ RULE 'r2' LIST 'largefiles' SHOW('-u' vc(USER_ID) || ' -s' || vc(FILE_SIZE)) /*FROM POOL 'system'*/ FROM POOL 'data' /*FOR FILESET('root')*/ WEIGHT(FILE_SIZE) WHERE KB_ALLOCATED > 1024 /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('mmpolicyRules','todelete','tapenode-stuff','toarchive') ~~~ And here is another to detect files not looked at for more than 6 months. I found more effective to use atime and ctime. You could combine this with the one above to detect file size as well. ~~~ dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-atime-ctime-gt-6months /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'accessedfiles' EXEC '/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc, accessed prior to a certain date AND that are not owned by root. Include the owner's id with each object and sort them by the owner's id */ /* Files in special filesets, such as mmpolicyRules, are never moved or deleted */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET ('scratch-root','todelete','root') RULE 'r5' LIST 'accessedfiles' DIRECTORIES_PLUS FROM POOL 'data' SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -c' || vc(CREATION_TIME) || ' -s ' || vc(FILE_SIZE)) WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 183) AND (DAYS(CURRENT_TIMESTAMP) - DAYS(CREATION_TIME) > 183) AND NOT USER_ID = 0 AND NOT (PATH_NAME LIKE '/gpfs/fs0/scratch/r/root/%') ~~~ Note that both these scripts work on a system wide (or root fileset) basis, and will not give you specific directories, unless you run them several times on specific directories (not very efficient). To produce general lists per directory you would need to do some post processing on the lists, with 'awk' or some other scripting language. If you need some samples I can send you. And finally, you need to be more specific by what you mean by 'archivable'. Once you produce the list you can do several things with them or leverage the rules to actually execute things, such as move, delete, or hsm stuff. The /usr/lpp/mmfs/samples/ilm path has some samples as well. On 4/3/2020 18:25:33, Jim Kavitsky wrote: Hello everyone, I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of inodes, and I'm looking for the best way to locate archivable directories. For example, these might be directories where whose contents were greater than 5 or 10TB, and whose contents had atimes greater than two years. Has anyone found a great way to do this with a policy engine run? If not, is there another good way that anyone would recommend? Thanks in advance, yes, there is another way, the 'mmfind' utility, also in the same sample path. You have to compile it for you OS (mmfind.README). This is a very powerful canned procedure that lets you run the "-exec" option just as in the normal linux version of 'find'. I use it very often, and it's just as efficient as the other policy rules based alternative. Good luck. Keep safe and confined. Jaime Jim Kavitsky ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss . . . TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ******** --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___
Re: [gpfsug-discuss] mmbackup monitoring
Additionally, mmbackup creates by default a .mmbackupCfg directory on the root of the fileset where it dumps several files and directories with the progress of the backup. For instance: expiredFiles/, prepFiles/, updatedFiles/, dsminstr.log, ... You may then create a script to search these directories for logs/lists of what has happened, and generate a more detailed report of what happened during the backup. In our case I generate a daily report of how many files and how much data have been sent to the TSM server and deleted for each user, including their paths. You can do more tricks if you want. Jaime On 3/25/2020 10:15:59, Skylar Thompson wrote: We execute mmbackup via a regular TSM client schedule with an incremental action, with a virtualmountpoint set to an empty, local "canary" directory. mmbackup runs as a preschedule command, and the client -domain parameter is set only to backup the canary directory. dsmc will backup the canary directory as a filespace only if mmbackup succeeds (exits with 0). We can then monitor the canary and infer the status of the associated GPFS filespace or fileset. On Wed, Mar 25, 2020 at 10:01:04AM +, Jonathan Buzzard wrote: What is the best way of monitoring whether or not mmbackup has managed to complete a backup successfully? Traditionally one use a TSM monitoring solution of your choice to make sure nodes where backing up (I am assuming mmbackup is being used in conjunction with TSM here). However mmbackup does not update the backup_end column in the filespaceview table (at least in 4.2) which makes things rather more complicated. The best I can come up with is querying the events table to see if the client schedule completed, but that gives a false sense of security as the schedule completing does not mean the backup completed as far as I know. What solutions are you all using, or does mmbackup in 5.x update the filespaceview table? . . . TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup [--tsm-servers TSMServer[, TSMServer...]]
Hi Mark, Just a follow up to your suggestion few months ago. I finally got to a point where I do 2 independent backups of the same path to 2 servers, and they are pretty even, finishing within 4 hours each, when serialized. I now just would like to use one mmbackup instance to 2 servers at the same time, with the --tsm-servers option, however it's not being accepted/recognized (see below). So, what is the proper syntax for this option? Thanks Jaime # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ‐‐tsm‐servers TAPENODE3,TAPENODE4 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ‐‐tsm‐servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] Changing the order of the options/arguments makes no difference. Even when I explicitly specify only one server, mmbackup still doesn't seem to recognize the ‐‐tsm‐servers option (it thinks it's some kind of argument): # /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib ‐‐tsm‐servers TAPENODE3 -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog --scope inodespace -v -a 8 -L 2 mmbackup: Incorrect extra argument: ‐‐tsm‐servers Usage: mmbackup {Device | Directory} [-t {full | incremental}] [-N {Node[,Node...] | NodeFile | NodeClass}] [-g GlobalWorkDirectory] [-s LocalWorkDirectory] [-S SnapshotName] [-f] [-q] [-v] [-d] [-a IscanThreads] [-n DirThreadLevel] [-m ExecThreads | [[--expire-threads ExpireThreads] [--backup-threads BackupThreads]]] [-B MaxFiles | [[--max-backup-count MaxBackupCount] [--max-expire-count MaxExpireCount]]] [--max-backup-size MaxBackupSize] [--qos QosClass] [--quote | --noquote] [--rebuild] [--scope {filesystem | inodespace}] [--backup-migrated | --skip-migrated] [--tsm-servers TSMServer[,TSMServer...]] [--tsm-errorlog TSMErrorLogFile] [-L n] [-P PolicyFile] I defined the 2 servers stanzas as follows: # cat dsm.sys SERVERNAME TAPENODE3 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS10.20.205.51 NODENAMEhome COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT1048576 SERVERNAME TAPENODE4 SCHEDMODE PROMPTED ERRORLOGRETENTION 0 D TCPSERVERADDRESS192.168.94.128 NODENAMEhome COMMMETHOD TCPIP TCPPort 1500 PASSWORDACCESS GENERATE TXNBYTELIMIT1048576 TCPBuffsize 512 On 2019-11-03 8:56 p.m., Jaime Pinto wrote: On 11/3/2019 20:24:35, Marc A Kaplan wrote: Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. Hey Marc, They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the start of each instance, each with its own dsm.sys (3 wrappers). (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog -g /gpfs/fs1/home/.mmbackupCfg1 --scope inodespace -v -a 8 -L 2) (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog -g /gpfs/fs1/home/.mmbackupCfg2 --scope inodespace -v -a 8 -L 2) (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog -g /gpfs/fs1/home/.mmbackupCfg3 --scope inodespace -v -a 8 -L 2) I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for multiple target servers. It may be just what I need. I'll try this next. Thank you very much, Jaime Peeking into the script, I find: if [[ $scope == "inode-space" ]] then deviceSuffix="${deviceName}.${filesetName}" else deviceSuffix="${deviceName}" I believe mmbackup is designed to allow concurrent backup of different indepe
Re: [gpfsug-discuss] mmbackup ‐g GlobalWorkDirectory not being followed
On 11/3/2019 20:24:35, Marc A Kaplan wrote: Please show us the 2 or 3 mmbackup commands that you would like to run concurrently. Hey Marc, They would be pretty similar, with the only different being the target TSM server, determined by sourcing a different dsmenv1(2 or 3) prior to the start of each instance, each with its own dsm.sys (3 wrappers). (source dsmenv1; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog -g /gpfs/fs1/home/.mmbackupCfg1 --scope inodespace -v -a 8 -L 2) (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog -g /gpfs/fs1/home/.mmbackupCfg2 --scope inodespace -v -a 8 -L 2) (source dsmenv3; /usr/lpp/mmfs/bin/mmbackup /gpfs/fs1/home -N tapenode3-ib -s /dev/shm --tsm-errorlog $tmpDir/home-tsm-errorlog -g /gpfs/fs1/home/.mmbackupCfg3 --scope inodespace -v -a 8 -L 2) I was playing with the -L (to control the policy), but you bring up a very good point I had not experimented with, such as a single traverse for multiple target servers. It may be just what I need. I'll try this next. Thank you very much, Jaime Peeking into the script, I find: if [[ $scope == "inode-space" ]] then deviceSuffix="${deviceName}.${filesetName}" else deviceSuffix="${deviceName}" I believe mmbackup is designed to allow concurrent backup of different independent filesets within the same filesystem, Or different filesystems... And a single mmbackup instance can drive several TSM servers, which can be named with an option or in the dsm.sys file: # --tsm-servers TSMserver[,TSMserver...] # List of TSM servers to use instead of the servers in the dsm.sys file. Inactive hide details for Jaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instrucJaime Pinto ---11/01/2019 07:40:47 PM---How can I force secondary processes to use the folder instructed by the -g option? I started a mmbac From: Jaime Pinto To: gpfsug main discussion list Date: 11/01/2019 07:40 PM Subject: [EXTERNAL] [gpfsug-discuss] mmbackup ‐g GlobalWorkDirectory not being followed Sent by: gpfsug-discuss-boun...@spectrumscale.org -- How can I force secondary processes to use the folder instructed by the -g option? I started a mmbackup with ‐g /gpfs/fs1/home/.mmbackupCfg1 and another with ‐g /gpfs/fs1/home/.mmbackupCfg2 (and another with ‐g /gpfs/fs1/home/.mmbackupCfg3 ...) However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance -a 8 -P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope inodespace Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the different mmbackup instances tripping over each other. The serializing is very undesirable. Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] mmbackup ‐g GlobalWorkDirectory not being followed
How can I force secondary processes to use the folder instructed by the -g option? I started a mmbackup with ‐g /gpfs/fs1/home/.mmbackupCfg1 and another with ‐g /gpfs/fs1/home/.mmbackupCfg2 (and another with ‐g /gpfs/fs1/home/.mmbackupCfg3 ...) However I'm still seeing transient files being worked into a "/gpfs/fs1/home/.mmbackupCfg" folder (created by magic !!!). This absolutely can not happen, since it's mixing up workfiles from multiple mmbackup instances for different target TSM servers. See below the "-f /gpfs/fs1/home/.mmbackupCfg/prepFiles" created by mmapplypolicy (forked by mmbackup): DEBUGtsbackup33: /usr/lpp/mmfs/bin/mmapplypolicy "/gpfs/fs1/home" -g /gpfs/fs1/home/.mmbackupCfg2 -N tapenode3-ib -s /dev/shm -L 2 --qos maintenance -a 8 -P /var/mmfs/mmbackup/.mmbackupRules.fs1.home -I prepare -f /gpfs/fs1/home/.mmbackupCfg/prepFiles --irule0 --sort-buffer-size=5% --scope inodespace Basically, I don't want a "/gpfs/fs1/home/.mmbackupCfg" folder to ever exist. Otherwise I'll be forced to serialize these backups, to avoid the different mmbackup instances tripping over each other. The serializing is very undesirable. Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] mmbackup: how to keep list(expiredFiles, updatedFiles) files
How can I instruct mmbackup to *NOT* delete the temporary directories and files created inside the FILESET/.mmbackupCfg folder? I can see that during the process the folders expiredFiles & updatedFiles are there, and contain the lists I'm interested in for post-analysis. Thanks Jaime --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmapplypolicy on nested filesets ...
It took a while before I could get back to this issue, but I want to confirm that Marc's suggestions worked line a charm, and did exactly what I hoped for: * remove any FOR FILESET(...) specifications * mmapplypolicy /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan ... --scope inodespace -P your-policy-rules-file ... I didn't have to do anything else, but exclude a few filesets from the scan. Thanks Jaime Quoting "Marc A Kaplan" : I suggest you remove any FOR FILESET(...) specifications from your rules and then run mmapplypolicy /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan ... --scope inodespace -P your-policy-rules-file ... See also the (RTFineM) for the --scope option and the Directory argument of the mmapplypolicy command. That is the best, most efficient way to scan all the files that are in a particular inode-space. Also, you must have all filesets of interest "linked" and the file system must be mounted. Notice that "independent" means that the fileset name is used to denote both a fileset and an inode-space, where said inode-space contains the fileset of that name and possibly other "dependent" filesets... IF one wished to search the entire file system for files within several different filesets, one could use rules with FOR FILESET('fileset1','fileset2','and-so-on') Or even more flexibly WHERE FILESET_NAME LIKE 'sql-like-pattern-with-%s-and-maybe-_s' Or even more powerfully WHERE regex(FILESET_NAME, 'extended-regular-.*-expression') From: "Jaime Pinto" To: "gpfsug main discussion list" Date: 04/18/2018 01:00 PM Subject:[gpfsug-discuss] mmapplypolicy on nested filesets ... Sent by:gpfsug-discuss-boun...@spectrumscale.org A few months ago I asked about limits and dynamics of traversing depended .vs independent filesets on this forum. I used the information provided to make decisions and setup our new DSS based gpfs storage system. Now I have a problem I couldn't' yet figure out how to make it work: 'project' and 'scratch' are top *independent* filesets of the same file system. 'proj1', 'proj2' are dependent filesets nested under 'project' 'scra1', 'scra2' are dependent filesets nested under 'scratch' I would like to run a purging policy on all contents under 'scratch' (which includes 'scra1', 'scra2'), and TSM backup policies on all contents under 'project' (which includes 'proj1', 'proj2'). HOWEVER: When I run the purging policy on the whole gpfs device (with both 'project' and 'scratch' filesets) * if I use FOR FILESET('scratch') on the list rules, the 'scra1' and 'scra2' filesets under scratch are excluded (totally unexpected) * if I use FOR FILESET('scra1') I get error that scra1 is dependent fileset (Ok, that is expected) * if I use /*FOR FILESET('scratch')*/, all contents under 'project', 'proj1', 'proj2' are traversed as well, and I don't want that (it takes too much time) * if I use /*FOR FILESET('scratch')*/, and instead of the whole device I apply the policy to the /scratch mount point only, the policy still traverses all the content of 'project', 'proj1', 'proj2', which I don't want. (again, totally unexpected) QUESTION: How can I craft the syntax of the mmapplypolicy in combination with the RULE filters, so that I can traverse all the contents under the 'scratch' independent fileset, including the nested dependent filesets 'scra1','scra2', and NOT traverse the other independent filesets at all (since this takes too much time)? Thanks Jaime PS: FOR FILESET('scra*') does not work. TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials=DwICAg=jf_iaSHvJObTbx-siA1ZOg=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE=IpwHlr0YNr7rgV7gI8Y2sxIELLIwA15KK4nBnv9BYWk= --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE=aff0vMJkKd-Z3pw3-jckmI3ejqXh8aSr8rxkKf3OGdk= TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials
Re: [gpfsug-discuss] Capacity pool filling
I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity. I also think you may not have stopped the crontab script as you believe you did. Jaime Quoting "Buterbaugh, Kevin L" : Hi All, First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way? We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend. However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this: 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-) 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool??? Is there a third explanation I?m not thinking of? Thanks... ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633 TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ******** --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmapplypolicy on nested filesets ...
Ok Marc and Frederick, there is hope. I'll conduct more experiments and report back Thanks for the suggestions. Jaime Quoting "Marc A Kaplan" <makap...@us.ibm.com>: I suggest you remove any FOR FILESET(...) specifications from your rules and then run mmapplypolicy /path/to/the/root/directory/of/the/independent-fileset-you-wish-to-scan ... --scope inodespace -P your-policy-rules-file ... See also the (RTFineM) for the --scope option and the Directory argument of the mmapplypolicy command. That is the best, most efficient way to scan all the files that are in a particular inode-space. Also, you must have all filesets of interest "linked" and the file system must be mounted. Notice that "independent" means that the fileset name is used to denote both a fileset and an inode-space, where said inode-space contains the fileset of that name and possibly other "dependent" filesets... IF one wished to search the entire file system for files within several different filesets, one could use rules with FOR FILESET('fileset1','fileset2','and-so-on') Or even more flexibly WHERE FILESET_NAME LIKE 'sql-like-pattern-with-%s-and-maybe-_s' Or even more powerfully WHERE regex(FILESET_NAME, 'extended-regular-.*-expression') From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Date: 04/18/2018 01:00 PM Subject:[gpfsug-discuss] mmapplypolicy on nested filesets ... Sent by:gpfsug-discuss-boun...@spectrumscale.org A few months ago I asked about limits and dynamics of traversing depended .vs independent filesets on this forum. I used the information provided to make decisions and setup our new DSS based gpfs storage system. Now I have a problem I couldn't' yet figure out how to make it work: 'project' and 'scratch' are top *independent* filesets of the same file system. 'proj1', 'proj2' are dependent filesets nested under 'project' 'scra1', 'scra2' are dependent filesets nested under 'scratch' I would like to run a purging policy on all contents under 'scratch' (which includes 'scra1', 'scra2'), and TSM backup policies on all contents under 'project' (which includes 'proj1', 'proj2'). HOWEVER: When I run the purging policy on the whole gpfs device (with both 'project' and 'scratch' filesets) * if I use FOR FILESET('scratch') on the list rules, the 'scra1' and 'scra2' filesets under scratch are excluded (totally unexpected) * if I use FOR FILESET('scra1') I get error that scra1 is dependent fileset (Ok, that is expected) * if I use /*FOR FILESET('scratch')*/, all contents under 'project', 'proj1', 'proj2' are traversed as well, and I don't want that (it takes too much time) * if I use /*FOR FILESET('scratch')*/, and instead of the whole device I apply the policy to the /scratch mount point only, the policy still traverses all the content of 'project', 'proj1', 'proj2', which I don't want. (again, totally unexpected) QUESTION: How can I craft the syntax of the mmapplypolicy in combination with the RULE filters, so that I can traverse all the contents under the 'scratch' independent fileset, including the nested dependent filesets 'scra1','scra2', and NOT traverse the other independent filesets at all (since this takes too much time)? Thanks Jaime PS: FOR FILESET('scra*') does not work. TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials=DwICAg=jf_iaSHvJObTbx-siA1ZOg=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE=IpwHlr0YNr7rgV7gI8Y2sxIELLIwA15KK4nBnv9BYWk= --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8=y0aRzkzp0QA9QR8eh3XtN6PETqWYDCNvItdihzdueTE=aff0vMJkKd-Z3pw3-jckmI3ejqXh8aSr8rxkKf3OGdk= TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto - Storage Analyst SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 Universi
Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5?
We are considering moving from user/group based quotas to path based quotas with nested filesets. We also facing challenges to traverse 'Dependent Filesets' for daily TSM backups of projects and for purging scratch area. We're about to deploy a new GPFS storage cluster, some 12-15PB, 13K+ users and 5K+ groups as the baseline, with expected substantial scaling up within the next 3-5 years in all dimmensions. Therefore, decisions we make now under GPFS v4.x trough v5.x will have consequences in the very near future, if they are not the proper ones. Thanks Jaime Quoting "Daniel Kidger" <daniel.kid...@uk.ibm.com>: Jamie, I believe at least one of those limits is 'maximum supported' rather than an architectural limit. Is your use case one which would push these boundaries? If so care to describe what you would wish to do? Daniel [1] DR DANIEL KIDGER IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kid...@uk.ibm.com - Original message ----- From: "Jaime Pinto" <pi...@scinet.utoronto.ca> Sent by: gpfsug-discuss-boun...@spectrumscale.org To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>, "Truong Vu" <truo...@us.ibm.com> Cc: gpfsug-discuss@spectrumscale.org Subject: Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Date: Mon, Feb 5, 2018 2:56 PM Thanks Truong Jaime Quoting "Truong Vu" <truo...@us.ibm.com>: Hi Jamie, The limits are the same in 5.0.0. We'll look into the FAQ. Thanks, Tru. From: gpfsug-discuss-requ...@spectrumscale.org To: gpfsug-discuss@spectrumscale.org Date: 02/05/2018 07:00 AM Subject: gpfsug-discuss Digest, Vol 73, Issue 9 Sent by: gpfsug-discuss-boun...@spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss@spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s=[2] or, via email, send a message with subject or body 'help' to gpfsug-discuss-requ...@spectrumscale.org You can reach the person managing the list at gpfsug-discuss-ow...@spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) ------ Message: 1 Date: Sun, 04 Feb 2018 14:58:39 -0500 From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3q...@support.scinet.utoronto.ca> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets[3] I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm[4] https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm[5] Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto This message was sent using IMP at SciNet Consortium, University of Toronto. -- ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s=[6] End of gpfsug-discuss Digest, Vol 73, Issue 9 * TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials=DwICAg=jf_iaSHvJObTbx-siA1ZOg=HlQDuUjgJx4p54QzcXd0_zTwf4Cr2t3NINalNhLTA2E=xnPNZO_v81jNbr_IcbbyLPUpPdAFjKIzptnqTnmqaFQ=Dln7axLq9ej2KttpKZJwLKuvxfS
Re: [gpfsug-discuss] Maximum Number of filesets on GPFS v5?
Thanks Truong Jaime Quoting "Truong Vu" <truo...@us.ibm.com>: Hi Jamie, The limits are the same in 5.0.0. We'll look into the FAQ. Thanks, Tru. From: gpfsug-discuss-requ...@spectrumscale.org To: gpfsug-discuss@spectrumscale.org Date: 02/05/2018 07:00 AM Subject:gpfsug-discuss Digest, Vol 73, Issue 9 Sent by:gpfsug-discuss-boun...@spectrumscale.org Send gpfsug-discuss mailing list submissions to gpfsug-discuss@spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s= or, via email, send a message with subject or body 'help' to gpfsug-discuss-requ...@spectrumscale.org You can reach the person managing the list at gpfsug-discuss-ow...@spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Maximum Number of filesets on GPFS v5? (Jaime Pinto) -- Message: 1 Date: Sun, 04 Feb 2018 14:58:39 -0500 From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Subject: [gpfsug-discuss] Maximum Number of filesets on GPFS v5? Message-ID: <20180204145839.77101pngtlr3q...@support.scinet.utoronto.ca> Content-Type: text/plain;charset=ISO-8859-1; DelSp="Yes"; format="flowed" Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto This message was sent using IMP at SciNet Consortium, University of Toronto. -- ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=jf_iaSHvJObTbx-siA1ZOg=HQmkdQWQHoc1Nu6Mg_g8NVugim3OiUUy5n0QgLQcbkM=doLWvSNAkaAwsGv0OWEMdk4umwTUPj5qHjnchKlkNE4=ptDCYhJK4ltkJaYKCaTThZHUXCFrHGIIPVCgBD-VH8s= End of gpfsug-discuss Digest, Vol 73, Issue 9 * TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] Maximum Number of filesets on GPFS v5?
Here is what I found for versions 4 & 3.5: * Maximum Number of Dependent Filesets: 10,000 * Maximum Number of Independent Filesets: 1,000 https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#filesets I'm having some difficulty finding published documentation on limitations for version 5: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/6027-2699.htm https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1pdg_increasefilesetspace.htm Any hints? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Fileset quotas enforcement
Did you try to run mmcheckquota on the device I observed that in the most recent versions (for the last 3 years) there is a real long lag for GPFS to process the internal accounting. So there is a slippage effects that skews quota operations. mmcheckquota is supposed to reset and zero all those cumulative deltas effective immediately. Jaime Quoting "Emmanuel Barajas Gonzalez" <vanfa...@mx1.ibm.com>: Hello spectrum scale team! I'm working on the implementation of quotas per fileset and I followed the basic instructions described in the documentation. Currently the gpfs device has per-fileset quotas and there is one fileset with a block soft and a hard limit set. My problem is that I'm being able to write more and more files beyond the quota (the grace period has expired as well). How can I make sure quotas will be enforced and that no user will be able to consume more space than specified? mmrepquota smfslv0 Block Limits | Name filesettype KB quota limit in_doubtgrace | root root USR 512 0 0 0 none | root cp1USR 64128 0 0 0 none | system root GRP 512 0 0 0 none | system cp1GRP 64128 0 0 0 none | valid root GRP 0 0 0 0 none | root root FILESET 512 0 0 0 none | cp1root FILESET 64128 2048 2048 0 expired | Thanks in advance ! Best regards, __ Emmanuel Barajas Gonzalez TRANSPARENT CLOUD TIERING FOR DS8000 Phone: 52-33-3669-7000 x5547 E-mail: vanfa...@mx1.ibm.com[1] Follow me: @van_falen 2200 Camino A El Castillo El Salto, JAL 45680 Mexico Links: -- [1] mailto:vanfa...@mx1.ibm.com TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Quota and hardlimit enforcement
www.huk.de<http://www.huk.de/> HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas (stv.). Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/> http://gpfsug.org/mailman/listinfo/gpfsug-discuss TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Quota and hardlimit enforcement
Renar For as long as the usage is below the hard limit (space or inodes) and below the grace period you'll be able to write. I don't think you can set the grace period to an specific value as a quota parameter, such as none. That is set at the filesystem creation time. BTW, grace period limit has been a mystery to me for many years. My impression is that GPFS keeps changing it internally depending on the position of the moon. I think ours is 2 hours, but at times I can see users writing for longer. Jaime Quoting "Grunenberg, Renar" <renar.grunenb...@huk-coburg.de>: Hallo All, we are on Version 4.2.3.2 and see some missunderstandig in the enforcement of hardlimit definitions on a flieset quota. What we see is we put some 200 GB files on following quota definitions: quota 150 GB Limit 250 GB Grace none. After the creating of one 200 GB we hit the softquota limit, thats ok. But After the the second file was created!! we expect an io error but it don?t happen. We define all well know Parameters (-Q,..) on the filesystem . Is this a bug or a Feature? mmcheckquota are already running at first. Regards Renar. Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon:09561 96-44110 Telefax:09561 96-44104 E-Mail: renar.grunenb...@huk-coburg.de Internet: www.huk.de HUK-COBURG Haftpflicht-Unterstützungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-Jürgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Herøy, Dr. Jörg Rheinländer (stv.), Sarah Rössler, Daniel Thomas (stv.). Diese Nachricht enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - SpaceManagement (GPFS HSM)
It has been a while since I used HSM with GPFS via TSM, but as far as I can remember, unprivileged users can run dsmmigrate and dsmrecall. Based on the instructions on the link, dsmrecall may now leverage the Recommended Access Order (RAO) available on enterprise drives, however root would have to be the one to invoke that feature. In that case we may have to develop a middleware/wrapper for dsmrecall that will run as root and act on behalf of the user when optimization is requested. Someone here more familiar with the latest version of TSM-HSM may be able to give us some hints on how people are doing this in practice. Jaime Quoting "Andrew Beattie" <abeat...@au1.ibm.com>: Thanks Jaime, How do you get around Optimised recalls? from what I can see the optimised recall process needs a root level account to retrieve a list of files https://www.ibm.com/support/knowledgecenter/SSSR2R_7.1.1/com.ibm.itsm.hsmul.doc/c_recall_optimized_tape.html[1] Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeat...@au1.ibm.com[2] - Original message ----- From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>, "Andrew Beattie" <abeat...@au1.ibm.com> Cc: gpfsug-discuss@spectrumscale.org Subject: Re: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - Space Management (GPFS HSM) Date: Fri, Jun 2, 2017 7:28 PM We have that situation. Users don't need to login to NSD's What you need is to add at least one gpfs client to the cluster (or multi-cluster), mount the DMAPI enabled file system, and use that node as a gateway for end-users. They can access the contents on the mount point with their own underprivileged accounts. Whether or not on a schedule, the moment an application or linux command (such as cp, cat, vi, etc) accesses a stub, the file will be staged. Jaime Quoting "Andrew Beattie" <abeat...@au1.ibm.com>: Quick question, Does anyone have a Scale / GPFS environment (HPC) where users need the ability to recall data sets after they have been stubbed, but only System Administrators are permitted to log onto the NSD servers for security purposes. And if so how do you provide the ability for the users to schedule their data set recalls? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeat...@au1.ibm.com[1] Links: -- [1] mailto:abeat...@au1.ibm.com[3] TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials[4] --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] Spectrum Scale - Spectrum Protect - Space Management (GPFS HSM)
We have that situation. Users don't need to login to NSD's What you need is to add at least one gpfs client to the cluster (or multi-cluster), mount the DMAPI enabled file system, and use that node as a gateway for end-users. They can access the contents on the mount point with their own underprivileged accounts. Whether or not on a schedule, the moment an application or linux command (such as cp, cat, vi, etc) accesses a stub, the file will be staged. Jaime Quoting "Andrew Beattie" <abeat...@au1.ibm.com>: Quick question, Does anyone have a Scale / GPFS environment (HPC) where users need the ability to recall data sets after they have been stubbed, but only System Administrators are permitted to log onto the NSD servers for security purposes. And if so how do you provide the ability for the users to schedule their data set recalls? Regards, Andrew Beattie Software Defined Storage - IT Specialist Phone: 614-2133-7927 E-mail: abeat...@au1.ibm.com[1] Links: -- [1] mailto:abeat...@au1.ibm.com TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmbackup with TSM INCLUDE/EXCLUDE was Re: What is an independent fileset? was: mmbackup with fileset : scope errors
s Mon May 29 15:54:52 2017 mmbackup:Determining file system changes for wosgpfs [TAPENODE3]. Mon May 29 15:54:52 2017 mmbackup:changed=3, expired=0, unsupported=0 for server [TAPENODE3] Mon May 29 15:54:52 2017 mmbackup:Sending files to the TSM server [3 changed, 0 expired]. mmbackup: TSM Summary Information: Total number of objects inspected: 3 Total number of objects backed up: 3 Total number of objects updated:0 Total number of objects rebound:0 Total number of objects deleted:0 Total number of objects expired:0 Total number of objects failed: 0 Total number of objects encrypted: 0 Total number of bytes inspected:4096 Total number of bytes transferred: 512 -- mmbackup: Backup of /wosgpfs completed successfully at Mon May 29 15:54:56 EDT 2017. -- real0m9.276s user0m2.906s sys 0m3.212s _ Thanks for all the help Jaime From: Jez Tucker <jtuc...@pixitmedia.com> To: gpfsug-discuss@spectrumscale.org Date: 05/18/2017 03:33 PM Subject:Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi When mmbackup has passed the preflight stage (pretty quickly) you'll find the autogenerated ruleset as /var/mmfs/mmbackup/.mmbackupRules* Best, Jez On 18/05/17 20:02, Jaime Pinto wrote: Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" <makap...@us.ibm.com>: 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan. Plan your Test and Test your Plan Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "Marc A Kaplan" <makap...@us.ibm.com> Cc: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Date: 05/18/2017 12:36 PM Subject:Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three exter
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Ok Mark I'll follow your option 2) suggestion, and capture what mmbackup is using as a rule first, then modify it. I imagine by 'capture' you are referring to the -L n level I use? -L n Controls the level of information displayed by the mmbackup command. Larger values indicate the display of more detailed information. n should be one of the following values: 3 Displays the same information as 2, plus each candidate file and the applicable rule. 4 Displays the same information as 3, plus each explicitly EXCLUDEed or LISTed file, and the applicable rule. 5 Displays the same information as 4, plus the attributes of candidate and EXCLUDEed or LISTed files. 6 Displays the same information as 5, plus non-candidate files and their attributes. Thanks Jaime Quoting "Marc A Kaplan" <makap...@us.ibm.com>: 1. As I surmised, and I now have verification from Mr. mmbackup, mmbackup wants to support incremental backups (using what it calls its shadow database) and keep both your sanity and its sanity -- so mmbackup limits you to either full filesystem or full inode-space (independent fileset.) If you want to do something else, okay, but you have to be careful and be sure of yourself. IBM will not be able to jump in and help you if and when it comes time to restore and you discover that your backup(s) were not complete. 2. If you decide you're a big boy (or woman or XXX) and want to do some hacking ... Fine... But even then, I suggest you do the smallest hack that will mostly achieve your goal... DO NOT think you can create a custom policy rules list for mmbackup out of thin air Capture the rules mmbackup creates and make small changes to that -- And as with any disaster recovery plan. Plan your Test and Test your Plan Then do some dry run recoveries before you really "need" to do a real recovery. I only even sugest this because Jaime says he has a huge filesystem with several dependent filesets and he really, really wants to do a partial backup, without first copying or re-organizing the filesets. HMMM otoh... if you have one or more dependent filesets that are smallish, and/or you don't need the backups -- create independent filesets, copy/move/delete the data, rename, voila. From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "Marc A Kaplan" <makap...@us.ibm.com> Cc: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Date: 05/18/2017 12:36 PM Subject:Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" <makap...@us.ibm.com>: Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental ba
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Marc The -P option may be a very good workaround, but I still have to test it. I'm currently trying to craft the mm rule, as minimalist as possible, however I'm not sure about what attributes mmbackup expects to see. Below is my first attempt. It would be nice to get comments from somebody familiar with the inner works of mmbackup. Thanks Jaime /* A macro to abbreviate VARCHAR */ define([vc],[VARCHAR($1)]) /* Define three external lists */ RULE EXTERNAL LIST 'allfiles' EXEC '/scratch/r/root/mmpolicyRules/mmpolicyExec-list' /* Generate a list of all files, directories, plus all other file system objects, like symlinks, named pipes, etc. Include the owner's id with each object and sort them by the owner's id */ RULE 'r1' LIST 'allfiles' DIRECTORIES_PLUS SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -m' || vc(MODIFICATION_TIME) || ' -s ' || vc(FILE_SIZE)) FROM POOL 'system' FOR FILESET('sysadmin3') /* Files in special filesets, such as those excluded, are never traversed */ RULE 'ExcSpecialFile' EXCLUDE FOR FILESET('scratch3','project3') Quoting "Marc A Kaplan" <makap...@us.ibm.com>: Jaime, While we're waiting for the mmbackup expert to weigh in, notice that the mmbackup command does have a -P option that allows you to provide a customized policy rules file. So... a fairly safe hack is to do a trial mmbackup run, capture the automatically generated policy file, and then augment it with FOR FILESET('fileset-I-want-to-backup') clauses Then run the mmbackup for real with your customized policy file. mmbackup uses mmapplypolicy which by itself is happy to limit its directory scan to a particular fileset by using mmapplypolicy /path-to-any-directory-within-a-gpfs-filesystem --scope fileset However, mmbackup probably has other worries and for simpliciity and helping make sure you get complete, sensible backups, apparently has imposed some restrictions to preserve sanity (yours and our support team! ;-) ) ... (For example, suppose you were doing incremental backups, starting at different paths each time? -- happy to do so, but when disaster strikes and you want to restore -- you'll end up confused and/or unhappy!) "converting from one fileset to another" --- sorry there is no such thing. Filesets are kinda like little filesystems within filesystems. Moving a file from one fileset to another requires a copy operation. There is no fast move nor hardlinking. --marc From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>, "Marc A Kaplan" <makap...@us.ibm.com> Date: 05/18/2017 09:58 AM Subject:Re: [gpfsug-discuss] What is an independent fileset? was: mmbackupwith fileset : scope errors Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" <makap...@us.ibm.com>: When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to
Re: [gpfsug-discuss] What is an independent fileset? was: mmbackup with fileset : scope errors
Thanks for the explanation Mark and Luis, It begs the question: why filesets are created as dependent by default, if the adverse repercussions can be so great afterward? Even in my case, where I manage GPFS and TSM deployments (and I have been around for a while), didn't realize at all that not adding and extra option at fileset creation time would cause me huge trouble with scaling later on as I try to use mmbackup. When you have different groups to manage file systems and backups that don't read each-other's manuals ahead of time then we have a really bad recipe. I'm looking forward to your explanation as to why mmbackup cares one way or another. I'm also hoping for a hint as to how to configure backup exclusion rules on the TSM side to exclude fileset traversing on the GPFS side. Is mmbackup smart enough (actually smarter than TSM client itself) to read the exclusion rules on the TSM configuration and apply them before traversing? Thanks Jaime Quoting "Marc A Kaplan" <makap...@us.ibm.com>: When I see "independent fileset" (in Spectrum/GPFS/Scale) I always think and try to read that as "inode space". An "independent fileset" has all the attributes of an (older-fashioned) dependent fileset PLUS all of its files are represented by inodes that are in a separable range of inode numbers - this allows GPFS to efficiently do snapshots of just that inode-space (uh... independent fileset)... And... of course the files of dependent filesets must also be represented by inodes -- those inode numbers are within the inode-space of whatever the containing independent fileset is... as was chosen when you created the fileset If you didn't say otherwise, inodes come from the default "root" fileset Clear as your bath-water, no? So why does mmbackup care one way or another ??? Stay tuned BTW - if you look at the bits of the inode numbers carefully --- you may not immediately discern what I mean by a "separable range of inode numbers" -- (very technical hint) you may need to permute the bit order before you discern a simple pattern... From: "Luis Bolinches" <luis.bolinc...@fi.ibm.com> To: gpfsug-discuss@spectrumscale.org Cc: gpfsug-discuss@spectrumscale.org Date: 05/18/2017 02:10 AM Subject:Re: [gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org Hi There is no direct way to convert the one fileset that is dependent to independent or viceversa. I would suggest to take a look to chapter 5 of the 2014 redbook, lots of definitions about GPFS ILM including filesets http://www.redbooks.ibm.com/abstracts/sg248254.html?Open Is not the only place that is explained but I honestly believe is a good single start point. It also needs an update as does nto have anything on CES nor ESS, so anyone in this list feel free to give feedback on that page people with funding decisions listen there. So you are limited to either migrate the data from that fileset to a new independent fileset (multiple ways to do that) or use the TSM client config. - Original message - From: "Jaime Pinto" <pi...@scinet.utoronto.ca> Sent by: gpfsug-discuss-boun...@spectrumscale.org To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>, "Jaime Pinto" <pi...@scinet.utoronto.ca> Cc: Subject: Re: [gpfsug-discuss] mmbackup with fileset : scope errors Date: Thu, May 18, 2017 4:43 AM There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 Will post the outcome. Jaime Quoting "Jaime Pinto" <pi...@scinet.utoronto.ca>: Quoting "Luis Bolinches" <luis.bolinc...@fi.ibm.com>: Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however
Re: [gpfsug-discuss] mmbackup with fileset : scope errors
There is hope. See reference link below: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.1.1/com.ibm.spectrum.scale.v4r11.ins.doc/bl1ins_tsm_fsvsfset.htm The issue has to do with dependent vs. independent filesets, something I didn't even realize existed until now. Our filesets are dependent (for no particular reason), so I have to find a way to turn them into independent. The proper option syntax is "--scope inodespace", and the error message actually flagged that out, however I didn't know how to interpret what I saw: # mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 mmbackup: Backup of /gpfs/sgfs1/sysadmin3 begins at Wed May 17 21:27:43 EDT 2017. Wed May 17 21:27:45 2017 mmbackup:mmbackup: Backing up *dependent* fileset sysadmin3 is not supported Wed May 17 21:27:45 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 Will post the outcome. Jaime Quoting "Jaime Pinto" <pi...@scinet.utoronto.ca>: Quoting "Luis Bolinches" <luis.bolinc...@fi.ibm.com>: Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Date: 17-05-17 23:44 Subject:[gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edellä ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland
Re: [gpfsug-discuss] mmbackup with fileset : scope errors
Quoting "Luis Bolinches" <luis.bolinc...@fi.ibm.com>: Hi have you tried to add exceptions on the TSM client config file? Hey Luis, That would work as well (mechanically), however it's not elegant or efficient. When you have over 1PB and 200M files on scratch it will take many hours and several helper nodes to traverse that fileset just to be negated by TSM. In fact exclusion on TSM are just as inefficient. Considering that I want to keep project and sysadmin on different domains then it's much worst, since we have to traverse and exclude scratch & (project|sysadmin) twice, once to capture sysadmin and again to capture project. If I have to use exclusion rules it has to rely sole on gpfs rules, and somehow not traverse scratch at all. I suspect there is a way to do this properly, however the examples on the gpfs guide and other references are not exhaustive. They only show a couple of trivial cases. However my situation is not unique. I suspect there are may facilities having to deal with backup of HUGE filesets. So the search is on. Thanks Jaime Assuming your GPFS dir is /IBM/GPFS and your fileset to exclude is linked on /IBM/GPFS/FSET1 dsm.sys ... DOMAIN /IBM/GPFS EXCLUDE.DIR /IBM/GPFS/FSET1 From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Date: 17-05-17 23:44 Subject:[gpfsug-discuss] mmbackup with fileset : scope errors Sent by:gpfsug-discuss-boun...@spectrumscale.org I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edellä ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] mmbackup with fileset : scope errors
I have a g200 /gpfs/sgfs1 filesystem with 3 filesets: * project3 * scratch3 * sysadmin3 I have no problems mmbacking up /gpfs/sgfs1 (or sgfs1), however we have no need or space to include *scratch3* on TSM. Question: how to craft the mmbackup command to backup /gpfs/sgfs1/project3 and/or /gpfs/sgfs1/sysadmin3 only? Below are 3 types of errors: 1) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. 2) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope inodespace --tsm-errorlog $logfile -L 2 ERROR: Wed May 17 16:27:11 2017 mmbackup:mmbackup: Backing up dependent fileset sysadmin3 is not supported Wed May 17 16:27:11 2017 mmbackup:This fileset is not suitable for fileset level backup. exit 1 3) mmbackup /gpfs/sgfs1/sysadmin3 -N tsm-helper1-ib0 -s /dev/shm --scope filesystem --tsm-errorlog $logfile -L 2 ERROR: mmbackup: Options /gpfs/sgfs1/sysadmin3 and --scope filesystem cannot be specified at the same time. These examples don't really cover my case: https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmbackup.htm#mmbackup__mmbackup_examples Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation
Just bumping up. When I first posted this subject at the end of March there was a UG meeting that drove people's attention. I hope to get some comments now. Thanks Jaime Quoting "Jaime Pinto" <pi...@scinet.utoronto.ca>: In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota once a month, usually after the massive monthly purge. I noticed that starting with the GSS and ESS appliances under 3.5 that I needed to run mmcheckquota more often, at least once a week, or as often as daily, to clear the slippage errors in the accounting information, otherwise users complained that they were hitting their quotas, even throughout they deleted a lot of stuff. More recently we adopted a G200 appliance (1.8PB), with v4.1, and now things have gotten worst, and I have to run it twice daily, just in case. So, what I am missing? Is their a parameter since 3.5 and through 4.1 that we can set, so that GPFS will reconcile the quota accounting internally more often and on its own? Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] help with multi-cluster setup: Network isunreachable
As it turned out, the 'authorized_keys' file placed in the /var/mmfs/ssl directory of the NDS for the new storage cluster 4 (4.1.1-14) needed an explicit entry of the following format for the bracket associated with clients on cluster 0: nistCompliance=off Apparently the default for 4.1.x is: nistCompliance=SP800-131A I just noticed that on cluster 3 (4.1.1-7) that entry is also present for the bracket associated with clients cluster 0. I guess the Seagate fellows that helped us install the G200 in our facility had that figured out. The original "TLS handshake" error message kind of gave me a hint of the problem, however the 4.1 installation manual specifically mentioned that this could be an issue only on 4.2 onward. The troubleshoot guide for 4.2 has this excerpt: "Ensure that the configurations of GPFS and the remote key management (RKM) server are compatible when it comes to the version of the TLS protocol used upon key retrieval (GPFS uses the nistCompliance configuration variable to control that). In particular, if nistCompliance=SP800-131A is set in GPFS, ensure that the TLS v1.2 protocol is enabled in the RKM server. If this does not resolve the issue, contact the IBM Support Center.". So, how am I to know that nistCompliance=off is even an option? For backward compatibility with the older storage clusters on 3.5 the clients cluster need to have nistCompliance=off I hope this helps the fellows in mixed versions environments, since it's not obvious from the 3.5/4.1 installation manuals or the troubleshoots guide what we should do. Thanks everyone for the help. Jaime Quoting "Uwe Falke" <uwefa...@de.ibm.com>: Hi, Jaime, I'd suggest you trace a client while trying to connect and check what addresses it is going to talk to actually. It is a bit tedious, but you will be able to find this in the trace report file. You might also get an idea what's going wrong... Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services --- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: uwefa...@de.ibm.com --- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Andreas Hasse, Thomas Wolter Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Jaime Pinto" <pi...@scinet.utoronto.ca> To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org> Date: 05/08/2017 06:06 PM Subject:[gpfsug-discuss] help with multi-cluster setup: Network is unreachable Sent by:gpfsug-discuss-boun...@spectrumscale.org We have a setup in which "cluster 0" is made up of clients only on gpfs v4.1, ie, no NDS's or formal storage on this primary membership. All storage for those clients come in a multi-cluster fashion, from clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7). We recently added a new storage cluster 4 (4.1.1-14), and for some obscure reason we keep getting "Network is unreachable" during mount by clients, even though there were no issues or errors with the multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add' worked fine, and all clients have an entry in /etc/fstab for the file system associated with the new cluster 4. The weird thing is that we can mount cluster 3 fine (also 4.1). Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as the IB network goes there are no problems routing/pinging around all the clusters. So this must be internal to GPFS. None of the clusters have the subnet parameter set explicitly at configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem we need to. All have cipherList AUTHONLY. One difference is that cluster 4 has DMAPI enabled (don't think it matters). Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients during mount (10.20.179.1 is one of the NDS on cluster 4): Mon May 8 11:35:27.773 2017: [I] Waiting to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.777 2017: [W] The TLS handshake with node 10.20.179.1 failed with error 447 (client side). Mon May 8 11:35:28.781 2017: [E] Failed to join remote cluster wosgpfs.wos-gateway01-ib0 Mon May 8 11:35:28.782 2017: [W] Command: err 719: mount wosgpfs.wos-gateway01-ib0:wosgpfs Mon May 8 11:35:28.783 2017: Network is unreachable I see this reference to "TLS handshake&
Re: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable
Quoting valdis.kletni...@vt.edu: On Mon, 08 May 2017 12:06:22 -0400, "Jaime Pinto" said: Another piece og information is that as far as GPFS goes all clusters are configured to communicate exclusively over Infiniband, each on a different 10.20.x.x network, but broadcast 10.20.255.255. As far as Have you verified that broadcast setting actually works, and packets aren't being discarded as martians? Yes, we have. They are fine. I'm seeing "failure to join the cluster" messages prior to the "network unreachable" in the mmfslog files, so I'm starting to suspect minor disparities between older releases of 3.5.x.x at one end and newer 4.1.x.x at the other. I'll dig a little more and report the findings. Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ******** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable
I only ask that we look beyond the trivial. The existing multi-cluster setup with mixed versions of servers already work fine with 4000+ clients on 4.1. We still have 3 legacy servers on 3.5, we already have a server on 4.1 also serving fine. The brand new 4.1 server we added last week seems to be at odds for some reason, not that obvious. Thanks Jaime Quoting "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>: Hi Eric, Jamie, Interesting comment as we do exactly the opposite! I always make sure that my servers are running a particular version before I upgrade any clients. Now we never mix and match major versions (i.e. 4.x and 3.x) for long ? those kinds of upgrades we do rapidly. But right now I?ve got clients running 4.2.0-3 talking just fine to 4.2.2.3 servers. To be clear, I?m not saying I?m right and Eric?s wrong at all - just an observation / data point. YMMV? Kevin On May 8, 2017, at 11:34 AM, J. Eric Wonderley <eric.wonder...@vt.edu<mailto:eric.wonder...@vt.edu>> wrote: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu> - (615)875-9633 TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] help with multi-cluster setup: Network is unreachable
Sorry, I made a mistake on the original description: all our clients are already on 4.1.1-7. Jaime Quoting "J. Eric Wonderley" <eric.wonder...@vt.edu>: Hi Jamie: I think typically you want to keep the clients ahead of the server in version. I would advance the version of you client nodes. New clients can communicate with older versions of server nsds. Vice versa...no so much. TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] BIG LAG since 3.5 on quota accounting reconciliation
In the old days of DDN 9900 and gpfs 3.4 I only had to run mmcheckquota once a month, usually after the massive monthly purge. I noticed that starting with the GSS and ESS appliances under 3.5 that I needed to run mmcheckquota more often, at least once a week, or as often as daily, to clear the slippage errors in the accounting information, otherwise users complained that they were hitting their quotas, even throughout they deleted a lot of stuff. More recently we adopted a G200 appliance (1.8PB), with v4.1, and now things have gotten worst, and I have to run it twice daily, just in case. So, what I am missing? Is their a parameter since 3.5 and through 4.1 that we can set, so that GPFS will reconcile the quota accounting internally more often and on its own? Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] fix mmrepquota report format during grace periods
Aah! Another one of those options not so well documented or exposed: Usage: mmrepquota [-u] [-g] [-e] [-q] [-n] [-v] [-t] [--block-size {BlockSize | auto}] {-a | Device[:Fileset] ...} or mmrepquota -j [-e] [-q] [-n] [-v] [-t] [--block-size {BlockSize | auto}] {-a | Device ...} I agree, in this way it would be easier for a script to deal with fields that have spaces, using ':' as a field separator. However it mangles all the information together, making it very difficult for human eyes of a sysadmin to deal with in its original format. I'll take it under consideration for the scripts version (many of them to be revised), however the best is for the original and plain reports to have consistence. Thanks Jaime Quoting "Oesterlin, Robert" <robert.oester...@nuance.com>: Try running it with the ?-Y? option, it returns an easily to read output: mmrepquota -Y dns mmrepquota::HEADER:version:reserved:reserved:filesystemName:quotaType:id:name:blockUsage:blockQuota:blockLimit:blockInDoubt:blockGrace:filesUsage:filesQuota:filesLimit:filesInDoubt:filesGrace:remarks:quota:defQuota:fid:filesetname: mmrepquota::0:1:::dns:USR:0:root:0:0:0:0:none:1:0:0:0:none:i:on:off:0:root: mmrepquota::0:1:::dns:USR:0:root:0:0:0:0:none:1:0:0:0:none:i:on:off:1:users: mmrepquota::0:1:::dns:GRP:0:root:0:0:0:0:none:1:0:0:0:none:i:on:off:0:root: mmrepquota::0:1:::dns:GRP:0:root:0:0:0:0:none:1:0:0:0:none:i:on:off:1:users: mmrepquota::0:1:::dns:FILESET:0:root:0:0:0:0:none:1:0:0:0:none:i:on:off::: mmrepquota::0:1:::dns:FILESET:1:users:0:4294967296:4294967296:0:none:1:0:0:0:none:e:on:off::: Bob Oesterlin Sr Principal Storage Engineer, Nuance On 3/28/17, 9:47 AM, "gpfsug-discuss-boun...@spectrumscale.org on behalf of Jaime Pinto" <gpfsug-discuss-boun...@spectrumscale.org on behalf of pi...@scinet.utoronto.ca> wrote: Any chance you guys in the GPFS devel team could patch the mmrepquota code so that during grace periods the report column for "none" would still be replaced with >>>*ONE*<<< word? By that I mean, instead of "2 days" for example, just print "2-days" or "2days" or "2_days", and so on. I have a number of scripts that fail for users when they are over their quotas under grace periods, because the report shifts the remaining information for that user 1 column to the right. Obviously it would cost me absolutely nothing to patch my scripts to deal with this, however the principle here is that the reports generated by GPFS should be the ones keeping consistence. Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES https://urldefense.proofpoint.com/v2/url?u=http-3A__www.scinethpc.ca_testimonials=DwICAg=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU=PnZlzkqTEICwnHCIZvUgTr2CN-RqtzNsKbADKWCeLhA=TVGnqMwSWqNI1Vu1BlCcwXiVGsLUO9ZnbqlasVmT2HU= --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss=DwICAg=djjh8EKwHtOepW4Bjau0lKhLlu-DxM1dlgP0rrLsOzY=LPDewt1Z4o9eKc86MXmhqX-45Cz1yz1ylYELF9olLKU=PnZlzkqTEICwnHCIZvUgTr2CN-RqtzNsKbADKWCeLhA=AXRwDMAVkYdwEaSFzejLQzNnS-KXoKj9GauzeEuu2H8= ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] fix mmrepquota report format during grace periods
Any chance you guys in the GPFS devel team could patch the mmrepquota code so that during grace periods the report column for "none" would still be replaced with >>>*ONE*<<< word? By that I mean, instead of "2 days" for example, just print "2-days" or "2days" or "2_days", and so on. I have a number of scripts that fail for users when they are over their quotas under grace periods, because the report shifts the remaining information for that user 1 column to the right. Obviously it would cost me absolutely nothing to patch my scripts to deal with this, however the principle here is that the reports generated by GPFS should be the ones keeping consistence. Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] replicating ACLs across GPFS's?
Great guys!!! Just what I was looking for. Everyone is always so helpful on this forum. Thanks a lot. Jaime Quoting "Laurence Horrocks-Barlow" <laure...@qsplace.co.uk>: Are you talking about the GPFSUG github? https://github.com/gpfsug/gpfsug-tools The patched rsync there I believe was done by Orlando. -- Lauz On 05/01/2017 22:01, Buterbaugh, Kevin L wrote: Hi Jaime, IBM developed a patch for rsync that can replicate ACL?s ? we?ve used it and it works great ? can?t remember where we downloaded it from, though. Maybe someone else on the list who *isn?t* having a senior moment can point you to it? Kevin On Jan 5, 2017, at 3:53 PM, Jaime Pinto <pi...@scinet.utoronto.ca> wrote: Does anyone know of a functional standard alone tool to systematically and recursively find and replicate ACLs that works well with GPFS? * We're currently using rsync, which will replicate permissions fine, however it leaves the ACL's behind. The --perms option for rsync is blind to ACLs. * The native linux trick below works well with ext4 after an rsync, but makes a mess on GPFS. % getfacl -R /path/to/source > /root/perms.ac % setfacl --restore=/root/perms.acl * The native GPFS mmgetacl/mmputacl pair does not have a built-in recursive option. Any ideas? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.ca University of Toronto 661 University Ave. (MaRS), Suite 1140 Toronto, ON, M5G1M1 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] quota on secondary groups for a user?
OK More info: Users can apply the 'sg group1' or 'sq group2' command from a shell or script to switch the group mask from that point on, and dodge the quota that may have been exceeded on a group. However, as the group owner or other member of the group on the limit, I could not find a tool they can use on their own to find out who is(are) the largest user(s); 'du' takes too long, and some users don't give read permissions on their directories. As part of the puzzle solution I have to come up with a root wrapper that can make the contents of the mmrepquota report available to them. Jaime Quoting "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>: Hi Jaime, Thank you so much for doing this and reporting back the results! They?re in line with what I would expect to happen. I was going to test this as well, but we have had to extend our downtime until noontime tomorrow, so I haven?t had a chance to do so yet. Now I don?t have to? ;-) Kevin On Aug 4, 2016, at 10:59 AM, Jaime Pinto <pi...@scinet.utoronto.ca<mailto:pi...@scinet.utoronto.ca>> wrote: Since there were inconsistencies in the responses, I decided to rig a couple of accounts/groups on our LDAP to test "My interpretation", and determined that I was wrong. When Kevin mentioned it would mean a bug I had to double-check: If a user hits the hard quota or exceeds the grace period on the soft quota on any of the secondary groups that user will be stopped from further writing to those groups as well, just as in the primary group. I hope this clears the waters a bit. I still have to solve my puzzle. Thanks everyone for the feedback. Jaime Quoting "Jaime Pinto" <pi...@scinet.utoronto.ca<mailto:pi...@scinet.utoronto.ca>>: Quoting "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu<mailto:kevin.buterba...@vanderbilt.edu>>: Hi Sven, Wait - am I misunderstanding something here? Let?s say that I have ?user1? who has primary group ?group1? and secondary group ?group2?. And let?s say that they write to a directory where the bit on the directory forces all files created in that directory to have group2 associated with them. Are you saying that those files still count against group1?s group quota??? Thanks for clarifying? Kevin Not really, My interpretation is that all files written with group2 will count towards the quota on that group. However any users with group2 as the primary group will be prevented from writing any further when the group2 quota is reached. However the culprit user1 with primary group as group1 won't be detected by gpfs, and can just keep going on writing group2 files. As far as the individual user quota, it doesn't matter: group1 or group2 it will be counted towards the usage of that user. It would be interesting if the behavior was more as expected. I just checked with my Lustre counter-parts and they tell me whichever secondary group is hit first, however many there may be, the user will be stopped. The problem then becomes identifying which of the secondary groups hit the limit for that user. Jaime On Aug 3, 2016, at 11:35 AM, Sven Oehme <oeh...@gmail.com<mailto:oeh...@gmail.com><mailto:oeh...@gmail.com>> wrote: Hi, quotas are only counted against primary group sven On Wed, Aug 3, 2016 at 9:22 AM, Jaime Pinto <pi...@scinet.utoronto.ca<mailto:pi...@scinet.utoronto.ca><mailto:pi...@scinet.utoronto.ca>> wrote: Suppose I want to set both USR and GRP quotas for a user, however GRP is not the primary group. Will gpfs enforce the secondary group quota for that user? What I mean is, if the user keeps writing files with secondary group as the attribute, and that overall group quota is reached, will that user be stopped by gpfs? Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca<http://www.scinet.utoronto.ca><http://www.scinet.utoronto.ca/> - www.computecanada.org<http://www.computecanada.org><http://www.computecanada.org/> University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss ******** TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/
Re: [gpfsug-discuss] quota on secondary groups for a user?
Since there were inconsistencies in the responses, I decided to rig a couple of accounts/groups on our LDAP to test "My interpretation", and determined that I was wrong. When Kevin mentioned it would mean a bug I had to double-check: If a user hits the hard quota or exceeds the grace period on the soft quota on any of the secondary groups that user will be stopped from further writing to those groups as well, just as in the primary group. I hope this clears the waters a bit. I still have to solve my puzzle. Thanks everyone for the feedback. Jaime Quoting "Jaime Pinto" <pi...@scinet.utoronto.ca>: Quoting "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>: Hi Sven, Wait - am I misunderstanding something here? Let?s say that I have ?user1? who has primary group ?group1? and secondary group ?group2?. And let?s say that they write to a directory where the bit on the directory forces all files created in that directory to have group2 associated with them. Are you saying that those files still count against group1?s group quota??? Thanks for clarifying? Kevin Not really, My interpretation is that all files written with group2 will count towards the quota on that group. However any users with group2 as the primary group will be prevented from writing any further when the group2 quota is reached. However the culprit user1 with primary group as group1 won't be detected by gpfs, and can just keep going on writing group2 files. As far as the individual user quota, it doesn't matter: group1 or group2 it will be counted towards the usage of that user. It would be interesting if the behavior was more as expected. I just checked with my Lustre counter-parts and they tell me whichever secondary group is hit first, however many there may be, the user will be stopped. The problem then becomes identifying which of the secondary groups hit the limit for that user. Jaime On Aug 3, 2016, at 11:35 AM, Sven Oehme <oeh...@gmail.com<mailto:oeh...@gmail.com>> wrote: Hi, quotas are only counted against primary group sven On Wed, Aug 3, 2016 at 9:22 AM, Jaime Pinto <pi...@scinet.utoronto.ca<mailto:pi...@scinet.utoronto.ca>> wrote: Suppose I want to set both USR and GRP quotas for a user, however GRP is not the primary group. Will gpfs enforce the secondary group quota for that user? What I mean is, if the user keeps writing files with secondary group as the attribute, and that overall group quota is reached, will that user be stopped by gpfs? Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca<http://www.scinet.utoronto.ca/> - www.computecanada.org<http://www.computecanada.org/> University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] quota on secondary groups for a user?
Quoting "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>: Hi Sven, Wait - am I misunderstanding something here? Let?s say that I have ?user1? who has primary group ?group1? and secondary group ?group2?. And let?s say that they write to a directory where the bit on the directory forces all files created in that directory to have group2 associated with them. Are you saying that those files still count against group1?s group quota??? Thanks for clarifying? Kevin Not really, My interpretation is that all files written with group2 will count towards the quota on that group. However any users with group2 as the primary group will be prevented from writing any further when the group2 quota is reached. However the culprit user1 with primary group as group1 won't be detected by gpfs, and can just keep going on writing group2 files. As far as the individual user quota, it doesn't matter: group1 or group2 it will be counted towards the usage of that user. It would be interesting if the behavior was more as expected. I just checked with my Lustre counter-parts and they tell me whichever secondary group is hit first, however many there may be, the user will be stopped. The problem then becomes identifying which of the secondary groups hit the limit for that user. Jaime On Aug 3, 2016, at 11:35 AM, Sven Oehme <oeh...@gmail.com<mailto:oeh...@gmail.com>> wrote: Hi, quotas are only counted against primary group sven On Wed, Aug 3, 2016 at 9:22 AM, Jaime Pinto <pi...@scinet.utoronto.ca<mailto:pi...@scinet.utoronto.ca>> wrote: Suppose I want to set both USR and GRP quotas for a user, however GRP is not the primary group. Will gpfs enforce the secondary group quota for that user? What I mean is, if the user keeps writing files with secondary group as the attribute, and that overall group quota is reached, will that user be stopped by gpfs? Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca<http://www.scinet.utoronto.ca/> - www.computecanada.org<http://www.computecanada.org/> University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GPFS on ZFS! ... ?
As Marc, I also have questions related to performance. Assuming we let ZFS take care of the underlying software raid, what would be the difference between GPFS and Lustre for instance, for the "parallel serving" at scale part of the file system. What would keep GPFS from performing or functioning just as well? Thanks Jaime Quoting "Marc A Kaplan" <makap...@us.ibm.com>: How do you set the size of a ZFS file that is simulating a GPFS disk? How do "tell" GPFS about that? How efficient is this layering, compared to just giving GPFS direct access to the same kind of LUNs that ZFS is using? Hmmm... to partially answer my question, I do something similar, but strictly for testing non-performance critical GPFS functions. On any file system one can: dd if=/dev/zero of=/fakedisks/d3 count=1 bs=1M seek=3000 # create a fake 3GB disk for GPFS Then use a GPFS nsd configuration record like this: %nsd: nsd=d3 device=/fakedisks/d3 usage=dataOnly pool=xtra servers=bog-xxx Which starts out as sparse and the filesystem will dynamically "grow" as GPFS writes to it... But I have no idea how well this will work for a critical "production" system... tx, marc kaplan. From: "Allen, Benjamin S." <bsal...@alcf.anl.gov> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 06/13/2016 12:34 PM Subject:Re: [gpfsug-discuss] GPFS on ZFS? Sent by:gpfsug-discuss-boun...@spectrumscale.org Jaime, See https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_nsddevices.htm . An example I have for add /dev/nvme* devices: * GPFS doesn't know how that /dev/nvme* are valid block devices, use a user exit script to let it know about it cp /usr/lpp/mmfs/samples/nsddevices.sample /var/mmfs/etc/nsddevices * Edit /var/mmfs/etc/nsddevices, and add to linux section: if [[ $osName = Linux ]] then : # Add function to discover disks in the Linux environment. for dev in $( cat /proc/partitions | grep nvme | awk '{print $4}' ) do echo $dev generic done fi * Copy edited nsddevices to the rest of the nodes at the same path for host in n01 n02 n03 n04; do scp /var/mmfs/etc/nsddevices ${host}:/var/mmfs/etc/nsddevices done Ben On Jun 13, 2016, at 11:26 AM, Jaime Pinto <pi...@scinet.utoronto.ca> wrote: Hi Chris As I understand, GPFS likes to 'see' the block devices, even on a hardware raid solution such as DDN's. How is that accomplished when you use ZFS for software raid? On page 4, I see this info, and I'm trying to interpret it: General Configuration ... * zvols * nsddevices - echo "zdX generic" Thanks Jaime Quoting "Hoffman, Christopher P" <cphof...@lanl.gov>: Hi Jaime, What in particular would you like explained more? I'd be more than happy to discuss things further. Chris ____ From: gpfsug-discuss-boun...@spectrumscale.org [gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jaime Pinto [pi...@scinet.utoronto.ca] Sent: Monday, June 13, 2016 10:11 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] GPFS on ZFS? I just came across this presentation on "GPFS with underlying ZFS block devices", by Christopher Hoffman, Los Alamos National Lab, although some of the implementation remains obscure. http://files.gpfsug.org/presentations/2016/anl-june/LANL_GPFS_ZFS.pdf It would be great to have more details, in particular the possibility of straight use of GPFS on ZFS, instead of the 'archive' use case as described on the presentation. Thanks Jaime Quoting "Jaime Pinto" <pi...@scinet.utoronto.ca>: Since we can not get GNR outside ESS/GSS appliances, is anybody using ZFS for software raid on commodity storage? Thanks Jaime TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials --- Jaime Pinto SciNet HPC Consortium - Comp
[gpfsug-discuss] GPFS on ZFS?
Since we can not get GNR outside ESS/GSS appliances, is anybody using ZFS for software raid on commodity storage? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] backup and disaster recovery solutions
Hi Mark Personally I'm aware of the HSM features. However I was specifically referring to TSM Backup restore. I was told the new GUI for unprivileged users looks identical to what root would see, but unprivileged users would only be able to see material for which they have read permissions, and restore only to paths they have write permissions. The GUI is supposed to be a difference platform then the java/WebSphere like we have seen in the past to manage TSM. I'm looking forward to it as well. Jaime Quoting Marc A Kaplan <makap...@us.ibm.com>: IBM HSM products have always supported unprivileged, user triggered recall of any file. I am not familiar with any particular GUI, but from the CLI, it's easy enough: dd if=/pathtothefileyouwantrecalled of=/dev/null bs=1M count=2 & # pulling the first few blocks will trigger a complete recall if the file happens to be on HSM We also had IBM HSM for mainframe MVS, years and years ago, which is now called DFHSM for z/OS. (I remember using this from TSO...) If the file has been migrated to a tape archive, accessing the file will trigger a tape mount which can take a while, depending on how fast your tape mounting (robot?), operates and what other requests may be queued ahead of yours! TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] backup and disaster recovery solutions
I heard as recently as last Friday from IBM support/vendors/developers of GPFS/TSM/HSM that the newest release of Spectrum Protect (7.11) offers a GUI interface that is user centric, and will allow for unprivileged users to restore their own material via a newer WebGUI (one that also works with Firefox, Chrome and on linux, not only IE on Windows). Users may authenticate via AD or LDAP, and traverse only what they would be allowed to via linux permissions and ACLs. Jaime Quoting Jonathan Buzzard <jonat...@buzzard.me.uk>: On Mon, 2016-04-11 at 10:34 -0400, Jaime Pinto wrote: Do you want backups or periodic frozen snapshots of the file system? Backups can entail some level of version control, so that you or end-users can get files back on certain points in time, in case of accidental deletions. Besides 1.5PB is a lot of material, so you may not want to take full snapshots that often. In that case, a combination of daily incremental backups using TSM with GPFS's mmbackup can be a good option. TSM also does a very good job at controlling how material is distributed across multiple tapes, and that is something that requires a lot of micro-management if you want a home grown solution of rsync+LTFS. Is there any other viable option other than TSM for backing up 1.5PB of data? All other backup software does not handle this at all well. On the other hand, you could use gpfs built-in tools such a mmapplypolicy to identify candidates for incremental backup, and send them to LTFS. Just more micro management, and you may have to come up with your own tool to let end-users restore their stuff, or you'll have to act on their behalf. I was not aware of a way of letting end users restore their stuff from *backup* for any of the major backup software while respecting the file system level security of the original file system. If you let the end user have access to the backup they can restore any file to any location which is generally not a good idea. I do have a concept of creating a read only Fuse mounted file system from a TSM point in time synthetic backup, and then using the shadow copy feature of Samba to enable restores using the "Previous Versions" feature of windows file manager. I got as far as getting a directory tree you could browse through but then had an enforced change of jobs and don't have access to a TSM server any more to continue development. Note if anyone from IBM is listening that would be a super cool feature. JAB. -- Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk Fife, United Kingdom. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup)
OK, that is good to know. I'll give it a try with snapshot then. We already have 3.5 almost everywhere, and planing for 4.2 upgrade (reading the posts with interest) Thanks Jaime Quoting Yuri L Volobuev <volob...@us.ibm.com>: Under both 3.2 and 3.3 mmbackup would always lock up our cluster when using snapshot. I never understood the behavior without snapshot, and the lock up was intermittent in the carved-out small test cluster, so I never felt confident enough to deploy over the larger 4000+ clients cluster. Back then, GPFS code had a deficiency: migrating very large files didn't work well with snapshots (and some operation mm commands). In order to create a snapshot, we have to have the file system in a consistent state for a moment, and we get there by performing a "quiesce" operation. This is done by flushing all dirty buffers to disk, stopping any new incoming file system operations at the gates, and waiting for all in-flight operations to finish. This works well when all in-flight operations actually finish reasonably quickly. That assumption was broken if an external utility, e.g. mmapplypolicy, used gpfs_restripe_file API on a very large file, e.g. to migrate the file's blocks to a different storage pool. The quiesce operation would need to wait for that API call to finish, as it's an in-flight operation, but migrating a multi-TB file could take a while, and during this time all new file system ops would be blocked. This was solved several years ago by changing the API and its callers to do the migration one block range at a time, thus making each individual syscall short and allowing quiesce to barge in and do its thing. All currently supported levels of GPFS have this fix. I believe mmbackup was affected by the same GPFS deficiency and benefited from the same fix. yuri TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] Use of commodity HDs on large GPFS client base clusters?
I'd like to hear about performance consideration from sites that may be using "non-IBM sanctioned" storage hardware or appliance, such as DDN, GSS, ESS (we have all of these). For instance, how could that compare with ESS, which I understand has some sort of "dispersed parity" feature, that substantially diminishes rebuilt time in case of HD failures. I'm particularly interested on HPC sites with 5000+ clients mounting such commodity NSD's+HD's setup. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority
Hey Dominic Just submitted a new request: Headline: GPFS+TSM+HSM: staging vs. migration priority ID: 85292 Thank you Jaime Quoting Dominic Mueller-Wicke01 <dominic.muel...@de.ibm.com>: Hi Jaime, I received the same request from other customers as well. could you please open a RFE for the theme and send me the RFE ID? I will discuss it with the product management then. RFE Link: https://www.ibm.com/developerworks/rfe/execute?use_case=changeRequestLanding_ID=0_ID=360=11=12 Greetings, Dominic. __ Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical Lead | +49 7034 64 32794 | dominic.muel...@de.ibm.com Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart, HRB 243294 From: Jaime Pinto <pi...@scinet.utoronto.ca> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>, Marc A Kaplan <makap...@us.ibm.com> Cc: Dominic Mueller-Wicke01/Germany/IBM@IBMDE Date: 09.03.2016 16:22 Subject:Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan <makap...@us.ibm.com>: For a write or create operation ENOSPC would make some sense. But if the file already exists and I'm just opening for read access I would be very confused by ENOSPC. How should the system respond: "Sorry, I know about that file, I have it safely stored away in HSM, but it is not available right now. Try again later!" EAGAIN or EBUSY might be the closest in ordinary language... But EAGAIN is used when a system call is interrupted and can be retried right away... So EBUSY? The standard return codes in Linux are: #define EPERM1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH3 /* No such process */ #define EINTR4 /* Interrupted system call */ #define EIO 5 /* I/O error */ #define ENXIO6 /* No such device or address */ #define E2BIG7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF9 /* Bad file number */ #define ECHILD 10 /* No child processes */ #define EAGAIN 11 /* Try again */ #define ENOMEM 12 /* Out of memory */ #define EACCES 13 /* Permission denied */ #define EFAULT 14 /* Bad address */ #define ENOTBLK 15 /* Block device required */ #define EBUSY 16 /* Device or resource busy */ #define EEXIST 17 /* File exists */ #define EXDEV 18 /* Cross-device link */ #define ENODEV 19 /* No such device */ #define ENOTDIR 20 /* Not a directory */ #define EISDIR 21 /* Is a directory */ #define EINVAL 22 /* Invalid argument */ #define ENFILE 23 /* File table overflow */ #define EMFILE 24 /* Too many open files */ #define ENOTTY 25 /* Not a typewriter */ #define ETXTBSY 26 /* Text file busy */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* No space left on device */ #define ESPIPE 29 /* Illegal seek */ #define EROFS 30 /* Read-only file system */ #define EMLINK 31 /* Too many links */ #define EPIPE 32 /* Broken pipe */ #define EDOM33 /* Math argument out of domain of func */ #define ERANGE 34 /* Math result not representable */ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at
Re: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup)
Quoting Yaron Daniel <y...@il.ibm.com>: Hi Did u use mmbackup with TSM ? https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm I have used mmbackup on test mode a few times before, while under gpfs 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our facility yet). Under both 3.2 and 3.3 mmbackup would always lock up our cluster when using snapshot. I never understood the behavior without snapshot, and the lock up was intermittent in the carved-out small test cluster, so I never felt confident enough to deploy over the larger 4000+ clients cluster. Another issue was that the version of mmbackup then would not let me choose the client environment associated with a particular gpfs file system, fileset or path, and the equivalent storage pool and /or policy on the TSM side. With the native TSM client we can do this by configuring the dsmenv file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so we can keep the backups segregated on different pools/tapes if necessary (by user, by group, by project, etc) The problem we all agree on is that TSM client traversing is VERY SLOW, and can not be parallelized. I always knew that the mmbackup client was supposed to replace the TSM client for the traversing, and then parse the "necessary parameters" and files to the native TSM client, so it could then take over for the remainder of the workflow. Therefore, the remaining problems are as follows: * I never understood the snapshot induced lookup, and how to fix it. Was it due to the size of our cluster or the version of GPFS? Has it been addressed under 3.5 or 4.x series? Without the snapshot how would mmbackup know what was already gone to backup since the previous incremental backup? Does it check each file against what is already on TSM to build the list of candidates? What is the experience out there? * In the v4r2 version of the manual for the mmbackup utility we still don't seem to be able to determine which TSM BA Client dsmenv to use as a parameter. All we can do is choose the --tsm-servers TSMServer[,TSMServer...]] . I can only conclude that all the contents of any backup on the GPFS side will always end-up on a default storage pool and use the standard TSM policy if nothing else is done. I'm now wondering if it would be ok to simply 'source dsmenv' from a shell for each instance of the mmbackup we fire up, in addition to setting up the other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described on man page. * what about the restore side of things? Most mm* commands can only be executed by root. Should we still have to rely on the TSM BA Client (dsmc|dsmj) if unprivileged users want to restore their own stuff? I guess I'll have to conduct more experiments. Please also review this : http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf This is pretty good, as a high level overview. Much better than a few others I've seen with the release of the Spectrum Suite, since it focus entirely on GPFS/TSM/backup|(HSM). It would be nice to have some typical implementation examples. Thanks a lot for the references Yaron, and again thanks for any further comments. Jaime Regards Yaron Daniel 94 Em Ha'Moshavot Rd Server, Storage and Data Services - Team Leader Petach Tiqva, 49527 Global Technology Services Israel Phone: +972-3-916-5672 Fax: +972-3-916-5672 Mobile: +972-52-8395593 e-mail: y...@il.ibm.com IBM Israel gpfsug-discuss-boun...@spectrumscale.org wrote on 03/09/2016 09:56:13 PM: From: Jaime Pinto <pi...@scinet.utoronto.ca> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 03/09/2016 09:56 PM Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup) Sent by: gpfsug-discuss-boun...@spectrumscale.org Here is another area where I've been reading material from several sources for years, and in fact trying one solution over the other from time-to-time in a test environment. However, to date I have not been able to find a one-piece-document where all these different IBM alternatives for backup are discussed at length, with the pos and cons well explained, along with the how-to's. I'm currently using TSM(built-in backup client), and over the years I developed a set of tricks to rely on disk based volumes as intermediate cache, and multiple backup client nodes, to split the load and substantially improve the performance of the backup compared to when I first deployed this solution. However I suspect it could still be improved further if I was to apply tools from the GPFS side of the equation. I would appreciate any comments/pointers. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto
[gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup)
Here is another area where I've been reading material from several sources for years, and in fact trying one solution over the other from time-to-time in a test environment. However, to date I have not been able to find a one-piece-document where all these different IBM alternatives for backup are discussed at length, with the pos and cons well explained, along with the how-to's. I'm currently using TSM(built-in backup client), and over the years I developed a set of tricks to rely on disk based volumes as intermediate cache, and multiple backup client nodes, to split the load and substantially improve the performance of the backup compared to when I first deployed this solution. However I suspect it could still be improved further if I was to apply tools from the GPFS side of the equation. I would appreciate any comments/pointers. Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority
Interesting perspective Mark. I'm inclined to think EBUSY would be more appropriate. Jaime Quoting Marc A Kaplan <makap...@us.ibm.com>: For a write or create operation ENOSPC would make some sense. But if the file already exists and I'm just opening for read access I would be very confused by ENOSPC. How should the system respond: "Sorry, I know about that file, I have it safely stored away in HSM, but it is not available right now. Try again later!" EAGAIN or EBUSY might be the closest in ordinary language... But EAGAIN is used when a system call is interrupted and can be retried right away... So EBUSY? The standard return codes in Linux are: #define EPERM1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH3 /* No such process */ #define EINTR4 /* Interrupted system call */ #define EIO 5 /* I/O error */ #define ENXIO6 /* No such device or address */ #define E2BIG7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF9 /* Bad file number */ #define ECHILD 10 /* No child processes */ #define EAGAIN 11 /* Try again */ #define ENOMEM 12 /* Out of memory */ #define EACCES 13 /* Permission denied */ #define EFAULT 14 /* Bad address */ #define ENOTBLK 15 /* Block device required */ #define EBUSY 16 /* Device or resource busy */ #define EEXIST 17 /* File exists */ #define EXDEV 18 /* Cross-device link */ #define ENODEV 19 /* No such device */ #define ENOTDIR 20 /* Not a directory */ #define EISDIR 21 /* Is a directory */ #define EINVAL 22 /* Invalid argument */ #define ENFILE 23 /* File table overflow */ #define EMFILE 24 /* Too many open files */ #define ENOTTY 25 /* Not a typewriter */ #define ETXTBSY 26 /* Text file busy */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* No space left on device */ #define ESPIPE 29 /* Illegal seek */ #define EROFS 30 /* Read-only file system */ #define EMLINK 31 /* Too many links */ #define EPIPE 32 /* Broken pipe */ #define EDOM33 /* Math argument out of domain of func */ #define ERANGE 34 /* Math result not representable */ TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials **** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
[gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority
I'm wondering whether the new version of the "Spectrum Suite" will allow us set the priority of the HSM migration to be higher than staging. I ask this because back in 2011 when we were still using Tivoli HSM with GPFS, during mixed requests for migration and staging operations, we had a very annoying behavior in which the staging would always take precedence over migration. The end-result was that the GPFS would fill up to 100% and induce a deadlock on the cluster, unless we identified all the user driven stage requests in time, and killed them all. We contacted IBM support a few times asking for a way fix this, and were told it was built into TSM. Back then we gave up IBM's HSM primarily for this reason, although performance was also a consideration (more to this on another post). We are now reconsidering HSM for a new deployment, however only if this issue has been resolved (among a few others). What has been some of the experience out there? Thanks Jaime --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
Re: [gpfsug-discuss] mmlsnode: Unable to determine the local node identity.
Quoting "Buterbaugh, Kevin L" <kevin.buterba...@vanderbilt.edu>: Hi Jaime, Have you tried wiping out /var/mmfs/gen/* and /var/mmfs/etc/* on the old nodeA? Kevin That did the trick. Thanks Kevin and all that responded privately. Jaime On Feb 10, 2016, at 1:26 PM, Jaime Pinto <pi...@scinet.utoronto.ca> wrote: Dear group I'm trying to deal with this in the most elegant way possible: Once upon the time there were nodeA and nodeB in the cluster, on a 'onDemand manual HA' fashion. * nodeA died, so I migrated the whole OS/software/application stack from backup over to 'nodeB', IP/hostname, etc, hence 'old nodeB' effectively became the new nodeA. * Getting the new nodeA to rejoin the cluster was already a pain, but through a mmdelnode and mmaddnode operation we eventually got it to mount gpfs. Well ... * Old nodeA is now fixed and back on the network, and I'd like to re-purpose it as the new standby nodeB (IP and hostname already applied). As the subject say, I'm now facing node identity issues. From the FSmgr I already tried to del/add nodeB, even nodeA, etc, however GPFS seems to keep some information cached somewhere in the cluster. * At this point I even turned old nodeA into a nodeC with a different IP, etc, but that doesn't help either. I can't even start gpfs on nodeC. Question: what is the appropriate process to clean this mess from the GPFS perspective? I can't touch the new nodeA. It's highly committed in production already. Thanks Jaime ******** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss TELL US ABOUT YOUR SUCCESS STORIES http://www.scinethpc.ca/testimonials ******** --- Jaime Pinto SciNet HPC Consortium - Compute/Calcul Canada www.scinet.utoronto.ca - www.computecanada.org University of Toronto 256 McCaul Street, Room 235 Toronto, ON, M5T1W5 P: 416-978-2755 C: 416-505-1477 This message was sent using IMP at SciNet Consortium, University of Toronto. ___ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss