Yep, looks like Robert Oesterlin was right, it was the old quota files causing the snag. Now sure how "mv *.quota" managed to move the group file and not the user file, but I'll let that remain a mystery of the universe. In any case I have a restripe running now and have learned a LOT about all the bits in the process. Many thanks to everyone who replied, I learn something from this list every time I get near it.
Thank you, jbh On Thu, Nov 2, 2017 at 11:14 AM, John Hanks <griz...@gmail.com> wrote: > tsfindiconde tracked the file to user.quota, which somehow escaped my > previous attempt to "mv *.quota /elsewhere/" I've moved that now and > verified it is actually gone and will retry once the current restripe on > the sata0 pool is wrapped up. > > jbh > > On Thu, Nov 2, 2017 at 10:57 AM, Frederick Stock <sto...@us.ibm.com> > wrote: > >> Did you run the tsfindinode command to see where that file is located? >> Also, what does the mmdf show for your other pools notably the sas0 storage >> pool? >> >> Fred >> __________________________________________________ >> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 <(720)%20430-8821> >> sto...@us.ibm.com >> >> >> >> From: John Hanks <griz...@gmail.com> >> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> >> Date: 11/02/2017 01:17 PM >> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on >> device" >> Sent by: gpfsug-discuss-boun...@spectrumscale.org >> ------------------------------ >> >> >> >> We do have different amounts of space in the system pool which had the >> changes applied: >> >> [root@scg4-hn01 ~]# mmdf gsfs0 -P system >> disk disk size failure holds holds free >> KB free KB >> name in KB group metadata data in full >> blocks in fragments >> --------------- ------------- -------- -------- ----- >> -------------------- ------------------- >> Disks in storage pool: system (Maximum disk size allowed is 3.6 TB) >> VD000 377487360 100 Yes No 143109120 ( >> 38%) 35708688 ( 9%) >> DMD_NSD_804 377487360 100 Yes No 79526144 ( >> 21%) 2924584 ( 1%) >> VD002 377487360 100 Yes No 143067136 ( >> 38%) 35713888 ( 9%) >> DMD_NSD_802 377487360 100 Yes No 79570432 ( >> 21%) 2926672 ( 1%) >> VD004 377487360 100 Yes No 143107584 ( >> 38%) 35727776 ( 9%) >> DMD_NSD_805 377487360 200 Yes No 79555584 ( >> 21%) 2940040 ( 1%) >> VD001 377487360 200 Yes No 142964992 ( >> 38%) 35805384 ( 9%) >> DMD_NSD_803 377487360 200 Yes No 79580160 ( >> 21%) 2919560 ( 1%) >> VD003 377487360 200 Yes No 143132672 ( >> 38%) 35764200 ( 9%) >> DMD_NSD_801 377487360 200 Yes No 79550208 ( >> 21%) 2915232 ( 1%) >> ------------- >> -------------------- ------------------- >> (pool total) 3774873600 1113164032 ( >> 29%) 193346024 ( 5%) >> >> >> and mmldisk shows that there is a problem with replication: >> >> ... >> Number of quorum disks: 5 >> Read quorum value: 3 >> Write quorum value: 3 >> Attention: Due to an earlier configuration change the file system >> is no longer properly replicated. >> >> >> I thought a 'mmrestripe -r' would fix this, not that I have to fix it >> first before restriping? >> >> jbh >> >> >> On Thu, Nov 2, 2017 at 9:45 AM, Frederick Stock <*sto...@us.ibm.com* >> <sto...@us.ibm.com>> wrote: >> Assuming you are replicating data and metadata have you confirmed that >> all failure groups have the same free space? That is could it be that one >> of your failure groups has less space than the others? You can verify this >> with the output of mmdf and look at the NSD sizes and space available. >> >> Fred >> __________________________________________________ >> Fred Stock | IBM Pittsburgh Lab | *720-430-8821* <(720)%20430-8821> >> *sto...@us.ibm.com* <sto...@us.ibm.com> >> >> >> >> From: John Hanks <*griz...@gmail.com* <griz...@gmail.com>> >> To: gpfsug main discussion list < >> *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org>> >> Date: 11/02/2017 12:20 PM >> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on >> device" >> Sent by: *gpfsug-discuss-boun...@spectrumscale.org* >> <gpfsug-discuss-boun...@spectrumscale.org> >> ------------------------------ >> >> >> >> Addendum to last message: >> >> We haven't upgraded recently as far as I know (I just inherited this a >> couple of months ago.) but am planning an outage soon to upgrade from >> 4.2.0-4 to 4.2.3-5. >> >> My growing collection of output files generally contain something like >> >> This inode list was generated in the Parallel Inode Traverse on Thu Nov >> 2 08:34:22 2017 >> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID >> MEMO(INODE_FLAGS FILE_TYPE [ERROR]) >> 53506 0:0 0 1 0 >> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device >> >> With that inode varying slightly. >> >> jbh >> >> On Thu, Nov 2, 2017 at 8:55 AM, Scott Fadden <*sfad...@us.ibm.com* >> <sfad...@us.ibm.com>> wrote: >> Sorry just reread as I hit send and saw this was mmrestripe, in my case >> it was mmdeledisk. >> >> Did you try running the command on just one pool. Or using -B instead? >> >> What is the file it is complaining about in >> "/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711" >> ? >> >> Looks like it could be related to the maxfeaturelevel of the cluster. >> Have you recently upgraded? Is everything up to the same level? >> >> Scott Fadden >> Spectrum Scale - Technical Marketing >> Phone: *(503) 880-5833* <(503)%20880-5833> >> *sfad...@us.ibm.com* <sfad...@us.ibm.com> >> *http://www.ibm.com/systems/storage/spectrum/scale* >> <http://www.ibm.com/systems/storage/spectrum/scale> >> >> >> ----- Original message ----- >> From: Scott Fadden/Portland/IBM >> To: *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org> >> Cc: *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org> >> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device" >> Date: Thu, Nov 2, 2017 8:44 AM >> >> I opened a defect on this the other day, in my case it was an incorrect >> error message. What it meant to say was,"The pool is not empty." Are you >> trying to remove the last disk in a pool? If so did you empty the pool with >> a MIGRATE policy first? >> >> >> Scott Fadden >> Spectrum Scale - Technical Marketing >> Phone: *(503) 880-5833* <(503)%20880-5833> >> *sfad...@us.ibm.com* <sfad...@us.ibm.com> >> *http://www.ibm.com/systems/storage/spectrum/scale* >> <http://www.ibm.com/systems/storage/spectrum/scale> >> >> >> ----- Original message ----- >> From: John Hanks <*griz...@gmail.com* <griz...@gmail.com>> >> Sent by: *gpfsug-discuss-boun...@spectrumscale.org* >> <gpfsug-discuss-boun...@spectrumscale.org> >> To: gpfsug main discussion list <*gpfsug-discuss@spectrumscale.org* >> <gpfsug-discuss@spectrumscale.org>> >> Cc: >> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device" >> Date: Thu, Nov 2, 2017 8:34 AM >> >> We have no snapshots ( they were the first to go when we initially hit >> the full metadata NSDs). >> >> I've increased quotas so that no filesets have hit a space quota. >> >> Verified that there are no inode quotas anywhere. >> >> mmdf shows the least amount of free space on any nsd to be 9% free. >> >> Still getting this error: >> >> [root@scg-gs0 ~]# mmrestripefs gsfs0 -r -N scg-gs0,scg-gs1,scg-gs2,scg-gs >> 3 >> Scanning file system metadata, phase 1 ... >> Scan completed successfully. >> Scanning file system metadata, phase 2 ... >> Scanning file system metadata for sas0 storage pool >> Scanning file system metadata for sata0 storage pool >> Scan completed successfully. >> Scanning file system metadata, phase 3 ... >> Scan completed successfully. >> Scanning file system metadata, phase 4 ... >> Scan completed successfully. >> Scanning user file metadata ... >> Error processing user file metadata. >> No space left on device >> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711' on >> scg-gs0 for inodes with broken disk addresses or failures. >> mmrestripefs: Command failed. Examine previous error messages to >> determine cause. >> >> I should note too that this fails almost immediately, far to quickly to >> fill up any location it could be trying to write to. >> >> jbh >> >> On Thu, Nov 2, 2017 at 7:57 AM, David Johnson <*david_john...@brown.edu* >> <david_john...@brown.edu>> wrote: >> One thing that may be relevant is if you have snapshots, depending on >> your release level, >> inodes in the snapshot may considered immutable, and will not be >> migrated. Once the snapshots >> have been deleted, the inodes are freed up and you won’t see the >> (somewhat misleading) message >> about no space. >> >> — ddj >> Dave Johnson >> Brown University >> >> On Nov 2, 2017, at 10:43 AM, John Hanks <*griz...@gmail.com* >> <griz...@gmail.com>> wrote: >> Thanks all for the suggestions. >> >> Having our metadata NSDs fill up was what prompted this exercise, but >> space was previously feed up on those by switching them from metadata+data >> to metadataOnly and using a policy to migrate files out of that pool. So >> these now have about 30% free space (more if you include fragmented space). >> The restripe attempt is just to make a final move of any remaining data off >> those devices. All the NSDs now have free space on them. >> >> df -i shows inode usage at about 84%, so plenty of free inodes for the >> filesystem as a whole. >> >> We did have old .quota files laying around but removing them didn't have >> any impact. >> >> mmlsfileset fs -L -i is taking a while to complete, I'll let it simmer >> while getting to work. >> >> mmrepquota does show about a half-dozen filesets that have hit their >> quota for space (we don't set quotas on inodes). Once I'm settled in this >> morning I'll try giving them a little extra space and see what happens. >> >> jbh >> >> >> On Thu, Nov 2, 2017 at 4:19 AM, Oesterlin, Robert < >> *robert.oester...@nuance.com* <robert.oester...@nuance.com>> wrote: >> One thing that I’ve run into before is that on older file systems you had >> the “*.quota” files in the file system root. If you upgraded the file >> system to a newer version (so these files aren’t used) - There was a bug at >> one time where these didn’t get properly migrated during a restripe. >> Solution was to just remove them >> >> >> >> >> >> Bob Oesterlin >> >> Sr Principal Storage Engineer, Nuance >> >> >> >> *From: *<*gpfsug-discuss-boun...@spectrumscale.org* >> <gpfsug-discuss-boun...@spectrumscale.org>> on behalf of John Hanks < >> *griz...@gmail.com* <griz...@gmail.com>> >> *Reply-To: *gpfsug main discussion list < >> *gpfsug-discuss@spectrumscale.org* <gpfsug-discuss@spectrumscale.org>> >> *Date: *Wednesday, November 1, 2017 at 5:55 PM >> *To: *gpfsug <*gpfsug-discuss@spectrumscale.org* >> <gpfsug-discuss@spectrumscale.org>> >> *Subject: *[EXTERNAL] [gpfsug-discuss] mmrestripefs "No space left on >> device" >> >> >> >> Hi all, >> >> >> >> I'm trying to do a restripe after setting some nsds to metadataOnly and I >> keep running into this error: >> >> >> >> Scanning user file metadata ... >> >> 0.01 % complete on Wed Nov 1 15:36:01 2017 ( 40960 inodes with >> total 531689 MB data processed) >> >> Error processing user file metadata. >> >> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779708' on >> scg-gs0 for inodes with broken disk addresses or failures. >> >> mmrestripefs: Command failed. Examine previous error messages to >> determine cause. >> >> >> >> The file it points to says: >> >> >> >> This inode list was generated in the Parallel Inode Traverse on Wed Nov >> 1 15:36:06 2017 >> >> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID >> MEMO(INODE_FLAGS FILE_TYPE [ERROR]) >> >> 53504 0:0 0 1 0 >> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device >> >> >> >> >> >> /var on the node I am running this on has > 128 GB free, all the NSDs >> have plenty of free space, the filesystem being restriped has plenty of >> free space and if I watch the node while running this no filesystem on it >> even starts to get full. Could someone tell me where mmrestripefs is >> attempting to write and/or how to point it at a different location? >> >> >> >> Thanks, >> >> >> >> jbh >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=> >> *http://gpfsug.org/mailman* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=> >> /listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=> >> *http://gpfsug.org/mailman* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=> >> /listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=> >> >> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=> >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=WvredVor59NfZe-GxK5qa27t7_OT-zg1uOs__CSYmJM&e=> >> >> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at *spectrumscale.org* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=WvredVor59NfZe-GxK5qa27t7_OT-zg1uOs__CSYmJM&e=> >> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=yDRpuvz3LOTwvP2pkIJEU7NWUxwMOcYHyXBRoWCPF-s&e=> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.o >> rg_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObT >> bx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m= >> XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=yDRpuvz3LOTwvP >> 2pkIJEU7NWUxwMOcYHyXBRoWCPF-s&e= >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss