We do have different amounts of space in the system pool which had the changes applied:
[root@scg4-hn01 ~]# mmdf gsfs0 -P system disk disk size failure holds holds free KB free KB name in KB group metadata data in full blocks in fragments --------------- ------------- -------- -------- ----- -------------------- ------------------- Disks in storage pool: system (Maximum disk size allowed is 3.6 TB) VD000 377487360 100 Yes No 143109120 ( 38%) 35708688 ( 9%) DMD_NSD_804 377487360 100 Yes No 79526144 ( 21%) 2924584 ( 1%) VD002 377487360 100 Yes No 143067136 ( 38%) 35713888 ( 9%) DMD_NSD_802 377487360 100 Yes No 79570432 ( 21%) 2926672 ( 1%) VD004 377487360 100 Yes No 143107584 ( 38%) 35727776 ( 9%) DMD_NSD_805 377487360 200 Yes No 79555584 ( 21%) 2940040 ( 1%) VD001 377487360 200 Yes No 142964992 ( 38%) 35805384 ( 9%) DMD_NSD_803 377487360 200 Yes No 79580160 ( 21%) 2919560 ( 1%) VD003 377487360 200 Yes No 143132672 ( 38%) 35764200 ( 9%) DMD_NSD_801 377487360 200 Yes No 79550208 ( 21%) 2915232 ( 1%) ------------- -------------------- ------------------- (pool total) 3774873600 1113164032 ( 29%) 193346024 ( 5%) and mmldisk shows that there is a problem with replication: ... Number of quorum disks: 5 Read quorum value: 3 Write quorum value: 3 Attention: Due to an earlier configuration change the file system is no longer properly replicated. I thought a 'mmrestripe -r' would fix this, not that I have to fix it first before restriping? jbh On Thu, Nov 2, 2017 at 9:45 AM, Frederick Stock <sto...@us.ibm.com> wrote: > Assuming you are replicating data and metadata have you confirmed that all > failure groups have the same free space? That is could it be that one of > your failure groups has less space than the others? You can verify this > with the output of mmdf and look at the NSD sizes and space available. > > Fred > __________________________________________________ > Fred Stock | IBM Pittsburgh Lab | 720-430-8821 <(720)%20430-8821> > sto...@us.ibm.com > > > > From: John Hanks <griz...@gmail.com> > To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> > Date: 11/02/2017 12:20 PM > Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on > device" > Sent by: gpfsug-discuss-boun...@spectrumscale.org > ------------------------------ > > > > Addendum to last message: > > We haven't upgraded recently as far as I know (I just inherited this a > couple of months ago.) but am planning an outage soon to upgrade from > 4.2.0-4 to 4.2.3-5. > > My growing collection of output files generally contain something like > > This inode list was generated in the Parallel Inode Traverse on Thu Nov 2 > 08:34:22 2017 > INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID > MEMO(INODE_FLAGS FILE_TYPE [ERROR]) > 53506 0:0 0 1 0 > illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device > > With that inode varying slightly. > > jbh > > On Thu, Nov 2, 2017 at 8:55 AM, Scott Fadden <*sfad...@us.ibm.com* > <sfad...@us.ibm.com>> wrote: > Sorry just reread as I hit send and saw this was mmrestripe, in my case it > was mmdeledisk. > > Did you try running the command on just one pool. Or using -B instead? > > What is the file it is complaining about in > "/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711" > ? > > Looks like it could be related to the maxfeaturelevel of the cluster. Have > you recently upgraded? Is everything up to the same level? > > Scott Fadden > Spectrum Scale - Technical Marketing > Phone: *(503) 880-5833* <(503)%20880-5833> > *sfad...@us.ibm.com* <sfad...@us.ibm.com> > *http://www.ibm.com/systems/storage/spectrum/scale* > <http://www.ibm.com/systems/storage/spectrum/scale> > > > ----- Original message ----- > From: Scott Fadden/Portland/IBM > To: *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org> > Cc: *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org> > Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device" > Date: Thu, Nov 2, 2017 8:44 AM > > I opened a defect on this the other day, in my case it was an incorrect > error message. What it meant to say was,"The pool is not empty." Are you > trying to remove the last disk in a pool? If so did you empty the pool with > a MIGRATE policy first? > > > Scott Fadden > Spectrum Scale - Technical Marketing > Phone: *(503) 880-5833* <(503)%20880-5833> > *sfad...@us.ibm.com* <sfad...@us.ibm.com> > *http://www.ibm.com/systems/storage/spectrum/scale* > <http://www.ibm.com/systems/storage/spectrum/scale> > > > ----- Original message ----- > From: John Hanks <*griz...@gmail.com* <griz...@gmail.com>> > Sent by: *gpfsug-discuss-boun...@spectrumscale.org* > <gpfsug-discuss-boun...@spectrumscale.org> > To: gpfsug main discussion list <*gpfsug-discuss@spectrumscale.org* > <gpfsug-discuss@spectrumscale.org>> > Cc: > Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device" > Date: Thu, Nov 2, 2017 8:34 AM > > We have no snapshots ( they were the first to go when we initially hit the > full metadata NSDs). > > I've increased quotas so that no filesets have hit a space quota. > > Verified that there are no inode quotas anywhere. > > mmdf shows the least amount of free space on any nsd to be 9% free. > > Still getting this error: > > [root@scg-gs0 ~]# mmrestripefs gsfs0 -r -N scg-gs0,scg-gs1,scg-gs2,scg-gs3 > Scanning file system metadata, phase 1 ... > Scan completed successfully. > Scanning file system metadata, phase 2 ... > Scanning file system metadata for sas0 storage pool > Scanning file system metadata for sata0 storage pool > Scan completed successfully. > Scanning file system metadata, phase 3 ... > Scan completed successfully. > Scanning file system metadata, phase 4 ... > Scan completed successfully. > Scanning user file metadata ... > Error processing user file metadata. > No space left on device > Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711' on > scg-gs0 for inodes with broken disk addresses or failures. > mmrestripefs: Command failed. Examine previous error messages to determine > cause. > > I should note too that this fails almost immediately, far to quickly to > fill up any location it could be trying to write to. > > jbh > > On Thu, Nov 2, 2017 at 7:57 AM, David Johnson <*david_john...@brown.edu* > <david_john...@brown.edu>> wrote: > One thing that may be relevant is if you have snapshots, depending on your > release level, > inodes in the snapshot may considered immutable, and will not be > migrated. Once the snapshots > have been deleted, the inodes are freed up and you won’t see the (somewhat > misleading) message > about no space. > > — ddj > Dave Johnson > Brown University > > On Nov 2, 2017, at 10:43 AM, John Hanks <*griz...@gmail.com* > <griz...@gmail.com>> wrote: > Thanks all for the suggestions. > > Having our metadata NSDs fill up was what prompted this exercise, but > space was previously feed up on those by switching them from metadata+data > to metadataOnly and using a policy to migrate files out of that pool. So > these now have about 30% free space (more if you include fragmented space). > The restripe attempt is just to make a final move of any remaining data off > those devices. All the NSDs now have free space on them. > > df -i shows inode usage at about 84%, so plenty of free inodes for the > filesystem as a whole. > > We did have old .quota files laying around but removing them didn't have > any impact. > > mmlsfileset fs -L -i is taking a while to complete, I'll let it simmer > while getting to work. > > mmrepquota does show about a half-dozen filesets that have hit their quota > for space (we don't set quotas on inodes). Once I'm settled in this morning > I'll try giving them a little extra space and see what happens. > > jbh > > > On Thu, Nov 2, 2017 at 4:19 AM, Oesterlin, Robert < > *robert.oester...@nuance.com* <robert.oester...@nuance.com>> wrote: > One thing that I’ve run into before is that on older file systems you had > the “*.quota” files in the file system root. If you upgraded the file > system to a newer version (so these files aren’t used) - There was a bug at > one time where these didn’t get properly migrated during a restripe. > Solution was to just remove them > > > > > > Bob Oesterlin > > Sr Principal Storage Engineer, Nuance > > > > *From: *<*gpfsug-discuss-boun...@spectrumscale.org* > <gpfsug-discuss-boun...@spectrumscale.org>> on behalf of John Hanks < > *griz...@gmail.com* <griz...@gmail.com>> > *Reply-To: *gpfsug main discussion list < > *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org>> > *Date: *Wednesday, November 1, 2017 at 5:55 PM > *To: *gpfsug <*gpfsug-discuss@spectrumscale.org* > <gpfsug-discuss@spectrumscale.org>> > *Subject: *[EXTERNAL] [gpfsug-discuss] mmrestripefs "No space left on > device" > > > > Hi all, > > > > I'm trying to do a restripe after setting some nsds to metadataOnly and I > keep running into this error: > > > > Scanning user file metadata ... > > 0.01 % complete on Wed Nov 1 15:36:01 2017 ( 40960 inodes with > total 531689 MB data processed) > > Error processing user file metadata. > > Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779708' on > scg-gs0 for inodes with broken disk addresses or failures. > > mmrestripefs: Command failed. Examine previous error messages to determine > cause. > > > > The file it points to says: > > > > This inode list was generated in the Parallel Inode Traverse on Wed Nov 1 > 15:36:06 2017 > > INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID > MEMO(INODE_FLAGS FILE_TYPE [ERROR]) > > 53504 0:0 0 1 0 > illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device > > > > > > /var on the node I am running this on has > 128 GB free, all the NSDs have > plenty of free space, the filesystem being restriped has plenty of free > space and if I watch the node while running this no filesystem on it even > starts to get full. Could someone tell me where mmrestripefs is attempting > to write and/or how to point it at a different location? > > > > Thanks, > > > > jbh > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=> > *http://gpfsug.org/mailman* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=> > /listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=> > *http://gpfsug.org/mailman* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=> > /listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=> > > *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=> > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=> > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=> > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug. > org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_ > iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m= > uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_ > FF6spqHVpo_0joLY&e= > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss