Re: [lustre-discuss] Data migration from one OST to anther

2019-03-10 Thread Andreas Dilger
Note that the "max_create_count=0" feature is only working with newer versions 
of Lustre - 2.10 and later.

It is recommended to upgrade to a newer release than 2.5 in any case. 

Cheers, Andreas

> On Mar 5, 2019, at 10:33, Tung-Han Hsieh  
> wrote:
> 
> Dear All,
> 
> We have found the answer. Starting from Lustre-2.4, the OST will stop
> any update actions if we deactive it. Hence during data migration, if
> we deactive the OST chome-OST0028_UUID, and copy data out via:
> 
>cp -a  .tmp
>mv .tmp 
> 
> The "junk" still leaves in chome-OST0028_UUID, unless we restart the
> MDT. Restarting MDT will clean out the junks resides the previously
> deactived OSTs.
> 
> Another way to perform the data migration for chome-OST0028_UUID is:
> 
> root@mds# echo 0 > 
> /opt/lustre/fs/osc/chome-OST0028-osc-MDT/max_create_count
> 
> Thus the OST is still active, but just not creating new object. So doing
> data migration we can see its space continuously released.
> 
> But here we encouter another problem. In our Lustre file system we have
> 41 OSTs, in which 8 OSTs are full and we want to do data migration. So
> we blocked these OSTs from creating new objects. But during the data
> migration, suddently the whole Lustre file system hangs, and the MDS
> server has a lot of the following dmesg messages:
> 
> ---
> [960570.287161] Lustre: chome-OST001a-osc-MDT: slow creates, 
> last=[0x1001a:0x3ef241:0x0], next=[0x1001a:0x3ef241:0x0], reserved=0, 
> syn_changes=0, syn_rpc_in_progress=0, status=0
> [960570.287244] Lustre: Skipped 2 previous similar messages
> ---
> 
> where chome-OST001a-osc-MDT is one of the blocked OSTs. It looks like
> that MDT still wants to store data into the blocked OSTs. But since they
> are blocked, so the whole file system hangs.
> 
> Could anyone give us suggestions how to solve it ?
> 
> Best Regards,
> 
> T.H.Hsieh
> 
>> On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote:
>> Dear All,
>> 
>> We have a problem of data migration from one OST two another.
>> 
>> We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
>> on the clients. We want to migrate some data from one OST to another in
>> order to re-balance the data occupation among OSTs. In the beginning we
>> follow the old method (i.e., method found in Lustre-1.8.X manuals) for
>> the data migration. Suppose we have two OSTs:
>> 
>> root@client# /opt/lustre/bin/lfs df
>> UUID   1K-blocksUsed   Available Use% Mounted on
>> chome-OST0028_UUID7692938224  724670914855450156  99% /work[OST:40]
>> chome-OST002a_UUID   14640306852  7094037956  6813847024  51% /work[OST:42]
>> 
>> and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
>> Our procedures are:
>> 
>> 1. We deactivate chome-OST0028_UUID:
>>   root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active
>> 
>> 2. We find all files located in chome-OST0028_UUID:
>>   root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list
>> 
>> 3. In each file listed in the file "list", we did:
>> 
>>cp -a  .tmp
>>mv .tmp 
>> 
>> During the migration, we really saw that more and more data written into
>> chome-OST002a_UUID. But we did not see any disk release in 
>> chome-OST0028_UUID.
>> In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
>> more data coming in, and chome-OST0028_UUID has more and more free space.
>> 
>> It looks like that the data files referenced by MDT have copied to
>> chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
>> Even though we activate chome-OST0028_UUID after migration, the situation
>> is still the same:
>> 
>> root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active
>> 
>> Is there any way to cure this problem ?
>> 
>> 
>> Thanks very much.
>> 
>> T.H.Hsieh
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Data migration from one OST to anther

2019-03-05 Thread Tung-Han Hsieh
Dear All,

We have found the answer. Starting from Lustre-2.4, the OST will stop
any update actions if we deactive it. Hence during data migration, if
we deactive the OST chome-OST0028_UUID, and copy data out via:

cp -a  .tmp
mv .tmp 

The "junk" still leaves in chome-OST0028_UUID, unless we restart the
MDT. Restarting MDT will clean out the junks resides the previously
deactived OSTs.

Another way to perform the data migration for chome-OST0028_UUID is:

root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/max_create_count

Thus the OST is still active, but just not creating new object. So doing
data migration we can see its space continuously released.

But here we encouter another problem. In our Lustre file system we have
41 OSTs, in which 8 OSTs are full and we want to do data migration. So
we blocked these OSTs from creating new objects. But during the data
migration, suddently the whole Lustre file system hangs, and the MDS
server has a lot of the following dmesg messages:

---
[960570.287161] Lustre: chome-OST001a-osc-MDT: slow creates, 
last=[0x1001a:0x3ef241:0x0], next=[0x1001a:0x3ef241:0x0], reserved=0, 
syn_changes=0, syn_rpc_in_progress=0, status=0
[960570.287244] Lustre: Skipped 2 previous similar messages
---

where chome-OST001a-osc-MDT is one of the blocked OSTs. It looks like
that MDT still wants to store data into the blocked OSTs. But since they
are blocked, so the whole file system hangs.

Could anyone give us suggestions how to solve it ?

Best Regards,

T.H.Hsieh

On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote:
> Dear All,
> 
> We have a problem of data migration from one OST two another.
> 
> We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
> on the clients. We want to migrate some data from one OST to another in
> order to re-balance the data occupation among OSTs. In the beginning we
> follow the old method (i.e., method found in Lustre-1.8.X manuals) for
> the data migration. Suppose we have two OSTs:
> 
> root@client# /opt/lustre/bin/lfs df
> UUID   1K-blocksUsed   Available Use% Mounted on
> chome-OST0028_UUID7692938224  724670914855450156  99% /work[OST:40]
> chome-OST002a_UUID   14640306852  7094037956  6813847024  51% /work[OST:42]
> 
> and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
> Our procedures are:
> 
> 1. We deactivate chome-OST0028_UUID:
>root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active
> 
> 2. We find all files located in chome-OST0028_UUID:
>root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list
> 
> 3. In each file listed in the file "list", we did:
> 
>   cp -a  .tmp
>   mv .tmp 
> 
> During the migration, we really saw that more and more data written into
> chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID.
> In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
> more data coming in, and chome-OST0028_UUID has more and more free space.
> 
> It looks like that the data files referenced by MDT have copied to
> chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
> Even though we activate chome-OST0028_UUID after migration, the situation
> is still the same:
> 
> root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active
> 
> Is there any way to cure this problem ?
> 
> 
> Thanks very much.
> 
> T.H.Hsieh
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Data migration from one OST to anther

2019-03-03 Thread Patrick Farrell
Hsieh,

This sounds similar to a bug with pre-2.5 servers and 2.7 (or newer) clients.  
The client and server have a disagreement about which does the delete, and the 
delete doesn’t happen.  Since you’re running 2.5, I don’t think you should see 
this, but the symptoms are the same.   You can temporarily fix things by 
restarting/remounting your OST(s), which will trigger orphan cleanup.  But if 
that works, the only long term fix is to upgrade your servers to a version that 
is expected to work with your clients.  (The 2.10 maintenance release is nice 
if you are not interested in the newest features, otherwise, 2.12 is also an 
option.)

I would also recommend where possible that you keep clients and servers in sync 
- we do interop testing, but same version on both is much more widely used.

- Patrick

From: lustre-discuss  on behalf of 
Tung-Han Hsieh 
Sent: Sunday, March 3, 2019 4:00:17 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Data migration from one OST to anther

Dear All,

We have a problem of data migration from one OST two another.

We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
on the clients. We want to migrate some data from one OST to another in
order to re-balance the data occupation among OSTs. In the beginning we
follow the old method (i.e., method found in Lustre-1.8.X manuals) for
the data migration. Suppose we have two OSTs:

root@client# /opt/lustre/bin/lfs df
UUID   1K-blocksUsed   Available Use% Mounted on
chome-OST0028_UUID7692938224  724670914855450156  99% /work[OST:40]
chome-OST002a_UUID   14640306852  7094037956  6813847024  51% /work[OST:42]

and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
Our procedures are:

1. We deactivate chome-OST0028_UUID:
   root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active

2. We find all files located in chome-OST0028_UUID:
   root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list

3. In each file listed in the file "list", we did:

cp -a  .tmp
mv .tmp 

During the migration, we really saw that more and more data written into
chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID.
In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
more data coming in, and chome-OST0028_UUID has more and more free space.

It looks like that the data files referenced by MDT have copied to
chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
Even though we activate chome-OST0028_UUID after migration, the situation
is still the same:

root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active

Is there any way to cure this problem ?


Thanks very much.

T.H.Hsieh
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Data migration from one OST to anther

2019-03-03 Thread Tung-Han Hsieh
Dear All,

We have a problem of data migration from one OST two another.

We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
on the clients. We want to migrate some data from one OST to another in
order to re-balance the data occupation among OSTs. In the beginning we
follow the old method (i.e., method found in Lustre-1.8.X manuals) for
the data migration. Suppose we have two OSTs:

root@client# /opt/lustre/bin/lfs df
UUID   1K-blocksUsed   Available Use% Mounted on
chome-OST0028_UUID7692938224  724670914855450156  99% /work[OST:40]
chome-OST002a_UUID   14640306852  7094037956  6813847024  51% /work[OST:42]

and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID.
Our procedures are:

1. We deactivate chome-OST0028_UUID:
   root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active

2. We find all files located in chome-OST0028_UUID:
   root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list

3. In each file listed in the file "list", we did:

cp -a  .tmp
mv .tmp 

During the migration, we really saw that more and more data written into
chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID.
In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has
more data coming in, and chome-OST0028_UUID has more and more free space.

It looks like that the data files referenced by MDT have copied to
chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID.
Even though we activate chome-OST0028_UUID after migration, the situation
is still the same:

root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active

Is there any way to cure this problem ?


Thanks very much.

T.H.Hsieh
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org