Re: [lustre-discuss] failed OST recover

2020-11-30 Thread Sergey Zhumatiy

Many years ago when I was using Lustre-1.8.X, I used to suffer the
same nightmare as you now. The following procedure saved me. But
I am not sure whether it works to you or not.

  Thank you! I had found this recipe, but in new lustre versions it 
does not work, ll_recover_lost_found_objs does not exists any more. I 
have 2.12.2 installed.
  As I understand, its function is integrated into lfsck procedure now. 
But it does not work as I expect.


  Can anybody give me a clue how to force this procedure? Should I stop 
all clients and do lsfck with enabled broken OST? I do not want to 
experiment, while I have tens of users and one week of lustre 
unavailability without significant results looks very bad for me.



1. umount all the clients, umount OST.

2. mount OST as ldiskfs:

mount -t ldiskfs /dev/ /mnt

3. Run the command:

ll_recover_lost_found_objs -d 

At that event it restored about 70% of data back.


In case that you want to remove the files which were lost in OST, but
unfortunately using "rm -f " does not work:

1. Record the full paths of the files which you want to remove.

2. umount all client, OST, and MDT.

3. Mount MDT as ldiskfs:

mount -t ldiskfs /dev/ /mnt

4. Go to /mnt/ROOT/. You will find the completed directory tree of
your Lustre file system, but without the file contents. You can
remove the files you want from here.


Cheers,

T.H.Hsieh


On Mon, Nov 30, 2020 at 01:09:07PM +0300, Sergey Zhumatiy wrote:

   Hello!
   Please, help to resolve... One ost on my lustre installation has been
failed. It lost all fs metadatam so I couldn't mount it as lustre
filesystem. I've checked it by e2fsck and all data was moved into lost+found
folder. Then I moved this folder to another storage, re-created this ost
(with old target index), then put back lost+found folder.

   After mount this ost lustre, I've started lfsck on mds. In several hours I
disabled this ost, because no client can work. Then lustre become heathy,
and I started lfs_migrate from this ost.

   But it seems, that data was not restored by lfsck and lfs_migrate moved a
few of files and the rest is 'endpoint not connected'.

   How can I restore some data and delete unrecoverable data?

--
   With respect
Serg.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



--
  С уважением
   Serg.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] failed OST recover

2020-11-30 Thread Tung-Han Hsieh
Dear Serg,

Many years ago when I was using Lustre-1.8.X, I used to suffer the
same nightmare as you now. The following procedure saved me. But
I am not sure whether it works to you or not.

1. umount all the clients, umount OST.

2. mount OST as ldiskfs:

mount -t ldiskfs /dev/ /mnt

3. Run the command:

   ll_recover_lost_found_objs -d 

At that event it restored about 70% of data back.


In case that you want to remove the files which were lost in OST, but
unfortunately using "rm -f " does not work:

1. Record the full paths of the files which you want to remove.

2. umount all client, OST, and MDT.

3. Mount MDT as ldiskfs:

mount -t ldiskfs /dev/ /mnt

4. Go to /mnt/ROOT/. You will find the completed directory tree of
   your Lustre file system, but without the file contents. You can
   remove the files you want from here.


Cheers,

T.H.Hsieh


On Mon, Nov 30, 2020 at 01:09:07PM +0300, Sergey Zhumatiy wrote:
>   Hello!
>   Please, help to resolve... One ost on my lustre installation has been
> failed. It lost all fs metadatam so I couldn't mount it as lustre
> filesystem. I've checked it by e2fsck and all data was moved into lost+found
> folder. Then I moved this folder to another storage, re-created this ost
> (with old target index), then put back lost+found folder.
> 
>   After mount this ost lustre, I've started lfsck on mds. In several hours I
> disabled this ost, because no client can work. Then lustre become heathy,
> and I started lfs_migrate from this ost.
> 
>   But it seems, that data was not restored by lfsck and lfs_migrate moved a
> few of files and the rest is 'endpoint not connected'.
> 
>   How can I restore some data and delete unrecoverable data?
> 
> -- 
>   With respect
>Serg.
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] The safe path for upgrading servers from 2.5.3 to 2.12.x

2020-11-30 Thread Tung-Han Hsieh
Dear Nguyen,

Usually the upgrade procedure is the following:

1. Shutdown the Lustre file system completely.
   (umount all the clients and servers)

2. In all the clients and servers, install the new version of Lustre
   software. If your servers are Lustre with ldiskfs backend, please
   install the recommand version of e2fsprogs package.
   (see https://www.lustre.org/download/)

3. In all the servers and clients, unload the Lustre modules by:

lustre_rmmod

   Or you can reboot these machines instead.

4. In each of the servers, run the following command to upgrade:

tunefs.lustre --writeconf /dev/

   Note that you have to do it for all the MGS / MDT / OST.

5. When running tunefs.lustre, it may prompt you to turn on some options
   of the ldiskfs file system of corresponding device using the e2fsprogs
   utilities. Just follow the indications.

6. Then the upgrade is completed. You can try to restart the Lustre file
   system.

I used to upgrade from version 2.5.X to 2.10.X and 2.12.X directly.
Everything looks fine to me.

Cheers,

T.H.Hsieh


On Mon, Nov 30, 2020 at 11:17:03PM +0700, Nguyen Viet Cuong wrote:
> Hi there,
> 
> Can anyone advise me the safe way to upgrade Lustre server from 2.5.3 to
> 2.10.x or 2.12.x? I am running 2.5.3 on CentOS 6.5 with a FDR card. I now
> have to upgrade the card to EDR or HDR.
> 
> And, is there anyone successfully connecting servers with the mix of EDR
> and HDR200? And how?
> 
> Best regards,
> Nguyen Viet Cuong

> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] The safe path for upgrading servers from 2.5.3 to 2.12.x

2020-11-30 Thread Nguyen Viet Cuong
Hi there,

Can anyone advise me the safe way to upgrade Lustre server from 2.5.3 to
2.10.x or 2.12.x? I am running 2.5.3 on CentOS 6.5 with a FDR card. I now
have to upgrade the card to EDR or HDR.

And, is there anyone successfully connecting servers with the mix of EDR
and HDR200? And how?

Best regards,
Nguyen Viet Cuong
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Bad performance lustre

2020-11-30 Thread Benjamin GALINA
Hi Lustre Community ,

Could you help me solve my luster perfomance problem. ?
I have just set up several Lustre gateways between two infiniband networks A 
and B.

Node A  <--> Getway <--> Node B
lustre 2.12.2  Lustre 2.13.3   Lustre 2.12.3

I am using ConnectX 5 Mellanox cards on servers with AMD Epyc processors.

I have tested and validated all speeds at infiniband (ib_read_bw and 
ib_write_bw) and tcp / ip (iperf3) level.

>From a lustre point of view (lnet selfttest the bit rates are correct between 
>the nodes of A and the gateways.
On the other hand, the performance of the B nodes and the gateways are very 
very low:

[LNet Rates of servers]
[R] Avg: 26   RPC/s Min: 26   RPC/s Max: 26   RPC/s
[W] Avg: 36   RPC/s Min: 36   RPC/s Max: 36   RPC/s
[LNet Bandwidth of servers]
[R] Avg: 8.61 MiB/s Min: 8.61 MiB/s Max: 8.61 MiB/s
[W] Avg: 9.59 MiB/s Min: 9.59 MiB/s Max: 9.59 MiB/s
[LNet Rates of servers]
[R] Avg: 33   RPC/s Min: 33   RPC/s Max: 33   RPC/s
[W] Avg: 42   RPC/s Min: 42   RPC/s Max: 42   RPC/s
[LNet Bandwidth of servers]
[R] Avg: 8.62 MiB/s Min: 8.62 MiB/s Max: 8.62 MiB/s
[W] Avg: 8.81 MiB/s Min: 8.81 MiB/s Max: 8.81 MiB/s

Do you have any idea on the causes of these ridiculous flow rates?

Regards,






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] failed OST recover

2020-11-30 Thread Sergey Zhumatiy

  Hello!
  Please, help to resolve... One ost on my lustre installation has been 
failed. It lost all fs metadatam so I couldn't mount it as lustre 
filesystem. I've checked it by e2fsck and all data was moved into 
lost+found folder. Then I moved this folder to another storage, 
re-created this ost (with old target index), then put back lost+found 
folder.


  After mount this ost lustre, I've started lfsck on mds. In several 
hours I disabled this ost, because no client can work. Then lustre 
become heathy, and I started lfs_migrate from this ost.


  But it seems, that data was not restored by lfsck and lfs_migrate 
moved a few of files and the rest is 'endpoint not connected'.


  How can I restore some data and delete unrecoverable data?

--
  With respect
   Serg.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Quota related (Anilkumar Naik)

2020-11-30 Thread 肖正刚
Sorry, I typed the wrong word.
You should replace qouta by quota.

Anilkumar Naik  于2020年11月30日周一 下午2:41写道:

> Below commands having errors for me. From our lustre details, could you
> please provide exact command to run at our server.thank you.
>
> Regards,
> Anilkumar
>
> On Mon, 30 Nov, 2020, 6:59 am 肖正刚,  wrote:
>
>> Hi,
>> you can enable user quota on mgs by
>> "
>> lctl conf_param your_fsname.qouta.mdt=u
>> lctl conf_param your_fsname.qouta.ost=u
>> "
>> details about quota in lustre manual chapter 25
>> https://doc.lustre.org/lustre_manual.xhtml#configuringquotas
>>
>>
>>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.12 Client connecting to 2.5 Server

2020-11-30 Thread Nguyen Viet Cuong
Thanks!

I will try it and update result to you.

Nguyen Viet Cuong


On Mon, Nov 30, 2020 at 4:00 PM Tung-Han Hsieh <
thhs...@twcp1.phys.ntu.edu.tw> wrote:

> Hello,
>
> It is OK. We have a cluster with Lustre-2.5.3 installed in
> the Lustre servers, and the clients with Lustre 2.5.3, 2.10.7,
> and 2.12.5 mounted the Lustre-2.5.3 servers. So far there is
> no problems.
>
> Cheers,
>
> T.H.Hsieh
>
> On Mon, Nov 30, 2020 at 03:48:07PM +0700, Nguyen Viet Cuong wrote:
> > Hi there,
> >
> > Did anyone try to use 2.12 client with 2.5 server? Is it compatible? If I
> > take a test on a live system, any risk?
> >
> > Thanks!
> > Cuong Nguyen
>
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] 2.12 Client connecting to 2.5 Server

2020-11-30 Thread Tung-Han Hsieh
Hello,

It is OK. We have a cluster with Lustre-2.5.3 installed in
the Lustre servers, and the clients with Lustre 2.5.3, 2.10.7,
and 2.12.5 mounted the Lustre-2.5.3 servers. So far there is
no problems.

Cheers,

T.H.Hsieh

On Mon, Nov 30, 2020 at 03:48:07PM +0700, Nguyen Viet Cuong wrote:
> Hi there,
> 
> Did anyone try to use 2.12 client with 2.5 server? Is it compatible? If I
> take a test on a live system, any risk?
> 
> Thanks!
> Cuong Nguyen

> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] 2.12 Client connecting to 2.5 Server

2020-11-30 Thread Nguyen Viet Cuong
Hi there,

Did anyone try to use 2.12 client with 2.5 server? Is it compatible? If I
take a test on a live system, any risk?

Thanks!
Cuong Nguyen
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org