Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Jason Williams
Chad hit the nail on the head.  I thought about the fact that it was still 
deactivated yesterday but was afraid to reactivate it until I verified the 
space was free.


FWIW, the URL about handling full OSTs does not include the fact that the space 
will not be free until you reactivate the OST.  It actually implies the 
opposite.


http://wiki.lustre.org/Handling_Full_OSTs




--
Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu<mailto:jas...@jhu.edu>




From: Chad DeWitt 
Sent: Thursday, January 17, 2019 3:07 PM
To: Jason Williams
Cc: Alexander I Kulyavtsev; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Migrating files doesn't free space on the OST

Hi Jason,

I do not know if this will help you or not, but I had a situation in 2.8.0 
where an OST filled up and I marked it as disabled on the MDS:

lctl dl | grep osc
...Grab the device_id of the full OST and then deactivate it...
lctl --device device_id deactivate

IIRC, this allowed the data to be read, but deletes were not processed.  When I 
re-activated the OST, then the deletes were processed and space started 
clearing.  I think you stated you had the OST deactivated.  If you still do, 
try to reactive it.

lctl --device device_id activate

Once you reactivate the OST, the deletes will start processing within 10 - 30 
seconds...  Just use lfs df -h to watch...

-cd




Chad DeWitt, CISSP

UNC Charlotte | ITS – University Research Computing

9201 University City Blvd. | Charlotte, NC 28223

ccdew...@uncc.edu<mailto:ccdew...@uncc.edu> | www.uncc.edu




If you are not the intended recipient of this transmission or a person 
responsible for delivering it to the intended recipient, any disclosure, 
copying, distribution, or other use of any of the information in this 
transmission is strictly prohibited. If you have received this transmission in 
error, please notify me immediately by reply email or by telephone at 
704-687-7802. Thank you.


On Thu, Jan 17, 2019 at 2:38 PM Jason Williams 
mailto:jas...@jhu.edu>> wrote:

Hello Alexander,


Thank you for your reply.

- We are not using zfs, it's an LDISKFS backing store, so no snapshots.

- I have re-run lfs getstripe to make sure the file is indeed moving

- I just looked for lfsck but I don't seem to have it.  We are running 2.10.4 
so I don't know what version that appeared in.

- I will try to have a look into the jobstats and see what I can find, but I 
made sure the files I moved were not in use when I moved them.



--
Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu<mailto:jas...@jhu.edu>




From: Alexander I Kulyavtsev mailto:a...@fnal.gov>>
Sent: Thursday, January 17, 2019 12:56 PM
To: Jason Williams; 
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
Subject: Re: Migrating files doesn't free space on the OST


- you can re-run command to find files residing on ost to see if files are new 
or old.

- zfs may have snapshots if you ever did snapshots; it takes space.

- removing data or snapshots has some lag to release the blocks (tens of 
minutes) but I guess that is completed by now.

- there are can be orphan objects on OST if you had crashes. On older lustre 
versions if the ost was emptied out you can mount underlying fs as ext4 or zfs; 
set mount to readonly and browse ost objects - you may see if there are some 
orphan objects left. On newer lustre releases you probably can run lfsck 
(lustre scanner).

- to find what hosts / jobs currently writing to lustre you may enable lustre 
jobstats; clear counters and parse stats files in /proc . There was xltop tool 
on github for older versions of lustre not having implemented jobstats but it 
was not updated for a while.

- depending on lustre version you have the implementation of lfs migrate is 
different. The older version copied file with other name to other ost, renamed 
files and removed old file. If migration done on file open for write by 
application the data will not be released until file closed (and data in new 
file are wrong). Recent implementation of migrate does swap of the file objects 
with file layout lock taken. I can not tell if it is safe for active write.

- not releasing space can be a bug - did you check jira on whamcloud? What 
version of lustre do you have? Is it ldiskfs or zfs based? zfs version?


Alex.



From: lustre-discuss 
mailto:lustre-discuss-boun...@lists.lustre.org>>
 on behalf of Jason Williams mailto:jas...@jhu.edu>>
Sent: Wednesday, January 16, 2019 10:25 AM
To: lustr

Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Mohr Jr, Richard Frank (Rick Mohr)


> On Jan 17, 2019, at 2:38 PM, Jason Williams  wrote:
> 
> - I just looked for lfsck but I don't seem to have it.  We are running 2.10.4 
> so I don't know what version that appeared in.

lfsck is handled as a subcommand for lctl.

http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.lfsckadmin

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Chad DeWitt
Hi Jason,

I do not know if this will help you or not, but I had a situation in 2.8.0
where an OST filled up and I marked it as disabled on the MDS:

lctl dl | grep osc
...Grab the *device_id* of the full OST and then deactivate it...
lctl --device *device_id* deactivate

IIRC, this allowed the data to be read, but deletes were not processed.
When I re-activated the OST, then the deletes were processed and space
started clearing.  I think you stated you had the OST deactivated.  If you
still do, try to reactive it.

lctl --device *device_id* activate

Once you reactivate the OST, the deletes will start processing within 10 -
30 seconds...  Just use lfs df -h to watch...

-cd




Chad DeWitt, CISSP

UNC Charlotte *| *ITS – University Research Computing

9201 University City Blvd. *| *Charlotte, NC 28223

ccdew...@uncc.edu *| *www.uncc.edu




If you are not the intended recipient of this transmission or a person
responsible for delivering it to the intended recipient, any disclosure,
copying, distribution, or other use of any of the information in this
transmission is strictly prohibited. If you have received this transmission
in error, please notify me immediately by reply email or by telephone at
704-687-7802. Thank you.


On Thu, Jan 17, 2019 at 2:38 PM Jason Williams  wrote:

> Hello Alexander,
>
>
> Thank you for your reply.
>
> - We are not using zfs, it's an LDISKFS backing store, so no snapshots.
>
> - I have re-run lfs getstripe to make sure the file is indeed moving
>
> - I just looked for lfsck but I don't seem to have it.  We are running
> 2.10.4 so I don't know what version that appeared in.
>
> - I will try to have a look into the jobstats and see what I can find, but
> I made sure the files I moved were not in use when I moved them.
>
>
>
> --
> Jason Williams
> Assistant Director
> Systems and Data Center Operations.
> Maryland Advanced Research Computing Center (MARCC)
> Johns Hopkins University
> jas...@jhu.edu
>
>
>
> --
> *From:* Alexander I Kulyavtsev 
> *Sent:* Thursday, January 17, 2019 12:56 PM
> *To:* Jason Williams; lustre-discuss@lists.lustre.org
> *Subject:* Re: Migrating files doesn't free space on the OST
>
>
> - you can re-run command to find files residing on ost to see if files are
> new or old.
>
> - zfs may have snapshots if you ever did snapshots; it takes space.
>
> - removing data or snapshots has some lag to release the blocks (tens of
> minutes) but I guess that is completed by now.
>
> - there are can be orphan objects on OST if you had crashes. On older
> lustre versions if the ost was emptied out you can mount underlying fs as
> ext4 or zfs; set mount to readonly and browse ost objects - you may see if
> there are some orphan objects left. On newer lustre releases you probably
> can run lfsck (lustre scanner).
>
> - to find what hosts / jobs currently writing to lustre you may enable
> lustre jobstats; clear counters and parse stats files in /proc . There was
> xltop tool on github for older versions of lustre not having implemented
> jobstats but it was not updated for a while.
>
> - depending on lustre version you have the implementation of lfs migrate
> is different. The older version copied file with other name to other ost,
> renamed files and removed old file. If migration done on file open for
> write by application the data will not be released until file closed (and
> data in new file are wrong). Recent implementation of migrate does swap of
> the file objects with file layout lock taken. I can not tell if it is safe
> for active write.
>
> - not releasing space can be a bug - did you check jira on whamcloud? What
> version of lustre do you have? Is it ldiskfs or zfs based? zfs version?
>
>
> Alex.
>
>
> --------------
> *From:* lustre-discuss  on
> behalf of Jason Williams 
> *Sent:* Wednesday, January 16, 2019 10:25 AM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] Migrating files doesn't free space on the OST
>
>
> I am trying to migrate files I know are not in use off of the full OST
> that I have using lfs migrate.  I have verified up and down that the files
> I am moving are on that OST and that after the migrate lfs getstripe indeed
> shows they are no longer on that OST since it's disabled in the MDS.
>
>
> The problem is, the used space on the OST is not going down.
>
>
> I see one of at least two issues:
>
> - the OST is just not freeing the space for some reason or another ( I
> don't know)
>
> - Or someone is writing to existing files just as fast as I am clearing
> the data (possible, bu

Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Jason Williams
Hello Alexander,


Thank you for your reply.

- We are not using zfs, it's an LDISKFS backing store, so no snapshots.

- I have re-run lfs getstripe to make sure the file is indeed moving

- I just looked for lfsck but I don't seem to have it.  We are running 2.10.4 
so I don't know what version that appeared in.

- I will try to have a look into the jobstats and see what I can find, but I 
made sure the files I moved were not in use when I moved them.



--
Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu<mailto:jas...@jhu.edu>




From: Alexander I Kulyavtsev 
Sent: Thursday, January 17, 2019 12:56 PM
To: Jason Williams; lustre-discuss@lists.lustre.org
Subject: Re: Migrating files doesn't free space on the OST


- you can re-run command to find files residing on ost to see if files are new 
or old.

- zfs may have snapshots if you ever did snapshots; it takes space.

- removing data or snapshots has some lag to release the blocks (tens of 
minutes) but I guess that is completed by now.

- there are can be orphan objects on OST if you had crashes. On older lustre 
versions if the ost was emptied out you can mount underlying fs as ext4 or zfs; 
set mount to readonly and browse ost objects - you may see if there are some 
orphan objects left. On newer lustre releases you probably can run lfsck 
(lustre scanner).

- to find what hosts / jobs currently writing to lustre you may enable lustre 
jobstats; clear counters and parse stats files in /proc . There was xltop tool 
on github for older versions of lustre not having implemented jobstats but it 
was not updated for a while.

- depending on lustre version you have the implementation of lfs migrate is 
different. The older version copied file with other name to other ost, renamed 
files and removed old file. If migration done on file open for write by 
application the data will not be released until file closed (and data in new 
file are wrong). Recent implementation of migrate does swap of the file objects 
with file layout lock taken. I can not tell if it is safe for active write.

- not releasing space can be a bug - did you check jira on whamcloud? What 
version of lustre do you have? Is it ldiskfs or zfs based? zfs version?


Alex.



From: lustre-discuss  on behalf of 
Jason Williams 
Sent: Wednesday, January 16, 2019 10:25 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Migrating files doesn't free space on the OST


I am trying to migrate files I know are not in use off of the full OST that I 
have using lfs migrate.  I have verified up and down that the files I am moving 
are on that OST and that after the migrate lfs getstripe indeed shows they are 
no longer on that OST since it's disabled in the MDS.


The problem is, the used space on the OST is not going down.


I see one of at least two issues:

- the OST is just not freeing the space for some reason or another ( I don't 
know)

- Or someone is writing to existing files just as fast as I am clearing the 
data (possible, but kind of hard to find)


Is there possibly something else I am missing? Also, does anyone know a good 
way to see if some client is writing to that OST and determine who it is if 
it's more probable that that is what is going on?



--
Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu<mailto:jas...@jhu.edu>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Robin Humble
On Wed, Jan 16, 2019 at 04:25:25PM +, Jason Williams wrote:
>I am trying to migrate files I know are not in use off of the full OST that I 
>have using lfs migrate.  I have verified up and down that the files I am 
>moving are on that OST and that after the migrate lfs getstripe indeed shows 
>they are no longer on that OST since it's disabled in the MDS.
>
>The problem is, the used space on the OST is not going down.
>
>I see one of at least two issues:
>
>- the OST is just not freeing the space for some reason or another ( I don't 
>know)

if you are using an older Lustre version (eg. IEEL) then you may have
to re-enable the OST on the MDS to allow deletes to occur on the OST.
then check no new files went there while it was enabled, and possibly
loop and repeat.

the newer ways of disabling file creation on OSTs in recent Lustre
versions don't have this problem.

>- Or someone is writing to existing files just as fast as I am clearing the 
>data (possible, but kind of hard to find)
>
>Is there possibly something else I am missing? Also, does anyone know a good 
>way to see if some client is writing to that OST and determine who it is if 
>it's more probable that that is what is going on?

perhaps check 'lsof' on every client.
if a client has a file open then it can't be deleted.

cheers,
robin
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Migrating files doesn't free space on the OST

2019-01-16 Thread Jason Williams
I am trying to migrate files I know are not in use off of the full OST that I 
have using lfs migrate.  I have verified up and down that the files I am moving 
are on that OST and that after the migrate lfs getstripe indeed shows they are 
no longer on that OST since it's disabled in the MDS.


The problem is, the used space on the OST is not going down.


I see one of at least two issues:

- the OST is just not freeing the space for some reason or another ( I don't 
know)

- Or someone is writing to existing files just as fast as I am clearing the 
data (possible, but kind of hard to find)


Is there possibly something else I am missing? Also, does anyone know a good 
way to see if some client is writing to that OST and determine who it is if 
it's more probable that that is what is going on?



--
Jason Williams
Assistant Director
Systems and Data Center Operations.
Maryland Advanced Research Computing Center (MARCC)
Johns Hopkins University
jas...@jhu.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org