Re: [lustre-discuss] Full OST

2021-09-16 Thread Alastair Basden

Hi Cory,

Servers and clients are all 2.12.6, and were installed as such (i.e. 
haven't been updated from an older version).


Cheers,
Alastair.

On Thu, 16 Sep 2021, Spitz, Cory James wrote:


[EXTERNAL EMAIL]

What versions do you have on your servers and clients?  Do you have some wide 
gap in versions?  Is your sever very old?

There was a change to the object deletion protocol that you may need to 
contend with.  It was related to LU-5814.  If you don't have an older 
server then this is not your problem.


But if that's the case you'll need to keep manually deleting objects 
unless or until you get the older software replaced (or change your 
clients to operate the old way).


-Cory


On 9/16/21, 3:45 AM, "lustre-discuss on behalf of Alastair Basden" 
 wrote:

   Hi all,

   We mounted as ext4, removed the files, and then remounted as lustre (and
   did the lfsck scans).

   All seemed fine, and the OST went back into production.

   However, it again has the same problem - it is filling up.  Currently
   lfs df reports it as 89% full with 4.8TB used.

   However, an lfs find --ost=... can only account for 268GB.

   So I again suspect that there are unlinked/deleted files, which aren't
   actually being deleted.

   Does anyone have any idea how to get it deleting files correctly?  All the
   other OSTs are behaving perfectly fine (including those served by the same
   OSS).

   Cheers,
   Alastair.



   On Thu, 9 Sep 2021, Andreas Dilger wrote:

   > [EXTERNAL EMAIL]
   >
   >
   > On Sep 8, 2021, at 04:42, Alastair Basden 
mailto:a.g.bas...@durham.ac.uk>> wrote:
   >
   >
   > Next step would be to unmount OST004e, run a full e2fsck, and then check lost+found 
and/or a regular "find /mnt/ost -type f -size +1M" or similar to find where the 
files are.
   >
   >
   > Thanks.  e2fsck returns clean (on its own, with -p and with -f).
   >
   > Now, the find command does return a large number of files belonging to 
usera - and of sufficient size to fill up the disk.
   >
   > e.g. /mnt/ost/O/0/d3/29379 has a size 2.3G.
   >
   > If you run 'll_decode_filter_fid /mnt/ost/O/0/d3/29379' or 'debugfs -c -R "stat 
O/0/d3/29379" /dev/' it will print the *parent* (MDT) FID suitable for "lfs 
fid2path" on a client.  This probably won't work, but worth a try anyway.
   >
   > So it would seem that these files are getting deleted from the mds, but 
not from this OST.  Has this been seen before?  The other OSTs seem fine - stuff 
getting deleted as expected.
   >
   > Based on the very low object number, I would guess that these are old files and 
relate to some kind of issue seen in the past (e.g. MDT corruption where e2fsck cleared some 
inodes, or similar).  The "debugfs stat" command above will also print the object 
creation time along with the normal timestamps.
   >
   > Is it safe to simply remove all these files, and then remount etc?  How 
can we ensure that new files will be deleted from the OST in the future?
   >
   > If they are not referenced by any in-use file (per fid2path) then yes.
   >
   > Cheers, Andreas
   > --
   > Andreas Dilger
   > Lustre Principal Architect
   > Whamcloud
   >
   >
   >
   >
   >
   >
   >
   >
   ___
   lustre-discuss mailing list
   lustre-discuss@lists.lustre.org
   http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Full OST

2021-09-16 Thread Spitz, Cory James via lustre-discuss
What versions do you have on your servers and clients?  Do you have some wide 
gap in versions?  Is your sever very old?

There was a change to the object deletion protocol that you may need to contend 
with.  It was related to LU-5814.  If you don't have an older server then this 
is not your problem.

But if that's the case you'll need to keep manually deleting objects unless or 
until you get the older software replaced (or change your clients to operate 
the old way).

-Cory


On 9/16/21, 3:45 AM, "lustre-discuss on behalf of Alastair Basden" 
 
wrote:

Hi all,

We mounted as ext4, removed the files, and then remounted as lustre (and 
did the lfsck scans).

All seemed fine, and the OST went back into production.

However, it again has the same problem - it is filling up.  Currently
lfs df reports it as 89% full with 4.8TB used.

However, an lfs find --ost=... can only account for 268GB.

So I again suspect that there are unlinked/deleted files, which aren't 
actually being deleted.

Does anyone have any idea how to get it deleting files correctly?  All the 
other OSTs are behaving perfectly fine (including those served by the same 
OSS).

Cheers,
Alastair.



On Thu, 9 Sep 2021, Andreas Dilger wrote:

> [EXTERNAL EMAIL]
>
>
> On Sep 8, 2021, at 04:42, Alastair Basden 
mailto:a.g.bas...@durham.ac.uk>> wrote:
>
>
> Next step would be to unmount OST004e, run a full e2fsck, and then check 
lost+found and/or a regular "find /mnt/ost -type f -size +1M" or similar to 
find where the files are.
>
>
> Thanks.  e2fsck returns clean (on its own, with -p and with -f).
>
> Now, the find command does return a large number of files belonging to 
usera - and of sufficient size to fill up the disk.
>
> e.g. /mnt/ost/O/0/d3/29379 has a size 2.3G.
>
> If you run 'll_decode_filter_fid /mnt/ost/O/0/d3/29379' or 'debugfs -c -R 
"stat O/0/d3/29379" /dev/' it will print the *parent* (MDT) FID 
suitable for "lfs fid2path" on a client.  This probably won't work, but worth a 
try anyway.
>
> So it would seem that these files are getting deleted from the mds, but 
not from this OST.  Has this been seen before?  The other OSTs seem fine - 
stuff getting deleted as expected.
>
> Based on the very low object number, I would guess that these are old 
files and relate to some kind of issue seen in the past (e.g. MDT corruption 
where e2fsck cleared some inodes, or similar).  The "debugfs stat" command 
above will also print the object creation time along with the normal timestamps.
>
> Is it safe to simply remove all these files, and then remount etc?  How 
can we ensure that new files will be deleted from the OST in the future?
>
> If they are not referenced by any in-use file (per fid2path) then yes.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Whamcloud
>
>
>
>
>
>
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Full OST

2021-09-16 Thread Alastair Basden

Hi all,

We mounted as ext4, removed the files, and then remounted as lustre (and 
did the lfsck scans).


All seemed fine, and the OST went back into production.

However, it again has the same problem - it is filling up.  Currently
lfs df reports it as 89% full with 4.8TB used.

However, an lfs find --ost=... can only account for 268GB.

So I again suspect that there are unlinked/deleted files, which aren't 
actually being deleted.


Does anyone have any idea how to get it deleting files correctly?  All the 
other OSTs are behaving perfectly fine (including those served by the same 
OSS).


Cheers,
Alastair.



On Thu, 9 Sep 2021, Andreas Dilger wrote:


[EXTERNAL EMAIL]


On Sep 8, 2021, at 04:42, Alastair Basden 
mailto:a.g.bas...@durham.ac.uk>> wrote:


Next step would be to unmount OST004e, run a full e2fsck, and then check lost+found 
and/or a regular "find /mnt/ost -type f -size +1M" or similar to find where the 
files are.


Thanks.  e2fsck returns clean (on its own, with -p and with -f).

Now, the find command does return a large number of files belonging to usera - 
and of sufficient size to fill up the disk.

e.g. /mnt/ost/O/0/d3/29379 has a size 2.3G.

If you run 'll_decode_filter_fid /mnt/ost/O/0/d3/29379' or 'debugfs -c -R "stat O/0/d3/29379" 
/dev/' it will print the *parent* (MDT) FID suitable for "lfs fid2path" on a 
client.  This probably won't work, but worth a try anyway.

So it would seem that these files are getting deleted from the mds, but not 
from this OST.  Has this been seen before?  The other OSTs seem fine - stuff 
getting deleted as expected.

Based on the very low object number, I would guess that these are old files and relate to 
some kind of issue seen in the past (e.g. MDT corruption where e2fsck cleared some 
inodes, or similar).  The "debugfs stat" command above will also print the 
object creation time along with the normal timestamps.

Is it safe to simply remove all these files, and then remount etc?  How can we 
ensure that new files will be deleted from the OST in the future?

If they are not referenced by any in-use file (per fid2path) then yes.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud









___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre file system installation issues

2021-09-16 Thread Andreas Dilger via lustre-discuss
You are trying to build the 2.14.54 (development) branch against a very old 
kernel.  You are much more likely to have success with the b2_12/2.12.7 release 
for this kernel.

On Sep 13, 2021, at 11:06, Nagmat Nazarov 
mailto:nag...@nevada.unr.edu>> wrote:

Dear Lustre file system community

I am trying to install the lustre file system according to 
"https://kojiwell.github.io/blog/2018/02/27/lustre-on-centos72#sidenotes;. When 
I "sudo make rpms" I get an error message on the attached file. Can anyone help 
me in installing the lustre file system?

Kind regards
Nagmat


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org