Re: [lustre-discuss] Data migration from one OST to anther
Note that the "max_create_count=0" feature is only working with newer versions of Lustre - 2.10 and later. It is recommended to upgrade to a newer release than 2.5 in any case. Cheers, Andreas > On Mar 5, 2019, at 10:33, Tung-Han Hsieh > wrote: > > Dear All, > > We have found the answer. Starting from Lustre-2.4, the OST will stop > any update actions if we deactive it. Hence during data migration, if > we deactive the OST chome-OST0028_UUID, and copy data out via: > >cp -a .tmp >mv .tmp > > The "junk" still leaves in chome-OST0028_UUID, unless we restart the > MDT. Restarting MDT will clean out the junks resides the previously > deactived OSTs. > > Another way to perform the data migration for chome-OST0028_UUID is: > > root@mds# echo 0 > > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/max_create_count > > Thus the OST is still active, but just not creating new object. So doing > data migration we can see its space continuously released. > > But here we encouter another problem. In our Lustre file system we have > 41 OSTs, in which 8 OSTs are full and we want to do data migration. So > we blocked these OSTs from creating new objects. But during the data > migration, suddently the whole Lustre file system hangs, and the MDS > server has a lot of the following dmesg messages: > > --- > [960570.287161] Lustre: chome-OST001a-osc-MDT: slow creates, > last=[0x1001a:0x3ef241:0x0], next=[0x1001a:0x3ef241:0x0], reserved=0, > syn_changes=0, syn_rpc_in_progress=0, status=0 > [960570.287244] Lustre: Skipped 2 previous similar messages > --- > > where chome-OST001a-osc-MDT is one of the blocked OSTs. It looks like > that MDT still wants to store data into the blocked OSTs. But since they > are blocked, so the whole file system hangs. > > Could anyone give us suggestions how to solve it ? > > Best Regards, > > T.H.Hsieh > >> On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote: >> Dear All, >> >> We have a problem of data migration from one OST two another. >> >> We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8 >> on the clients. We want to migrate some data from one OST to another in >> order to re-balance the data occupation among OSTs. In the beginning we >> follow the old method (i.e., method found in Lustre-1.8.X manuals) for >> the data migration. Suppose we have two OSTs: >> >> root@client# /opt/lustre/bin/lfs df >> UUID 1K-blocksUsed Available Use% Mounted on >> chome-OST0028_UUID7692938224 724670914855450156 99% /work[OST:40] >> chome-OST002a_UUID 14640306852 7094037956 6813847024 51% /work[OST:42] >> >> and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID. >> Our procedures are: >> >> 1. We deactivate chome-OST0028_UUID: >> root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active >> >> 2. We find all files located in chome-OST0028_UUID: >> root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list >> >> 3. In each file listed in the file "list", we did: >> >>cp -a .tmp >>mv .tmp >> >> During the migration, we really saw that more and more data written into >> chome-OST002a_UUID. But we did not see any disk release in >> chome-OST0028_UUID. >> In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has >> more data coming in, and chome-OST0028_UUID has more and more free space. >> >> It looks like that the data files referenced by MDT have copied to >> chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID. >> Even though we activate chome-OST0028_UUID after migration, the situation >> is still the same: >> >> root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active >> >> Is there any way to cure this problem ? >> >> >> Thanks very much. >> >> T.H.Hsieh >> ___ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Data migration from one OST to anther
Dear All, We have found the answer. Starting from Lustre-2.4, the OST will stop any update actions if we deactive it. Hence during data migration, if we deactive the OST chome-OST0028_UUID, and copy data out via: cp -a .tmp mv .tmp The "junk" still leaves in chome-OST0028_UUID, unless we restart the MDT. Restarting MDT will clean out the junks resides the previously deactived OSTs. Another way to perform the data migration for chome-OST0028_UUID is: root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/max_create_count Thus the OST is still active, but just not creating new object. So doing data migration we can see its space continuously released. But here we encouter another problem. In our Lustre file system we have 41 OSTs, in which 8 OSTs are full and we want to do data migration. So we blocked these OSTs from creating new objects. But during the data migration, suddently the whole Lustre file system hangs, and the MDS server has a lot of the following dmesg messages: --- [960570.287161] Lustre: chome-OST001a-osc-MDT: slow creates, last=[0x1001a:0x3ef241:0x0], next=[0x1001a:0x3ef241:0x0], reserved=0, syn_changes=0, syn_rpc_in_progress=0, status=0 [960570.287244] Lustre: Skipped 2 previous similar messages --- where chome-OST001a-osc-MDT is one of the blocked OSTs. It looks like that MDT still wants to store data into the blocked OSTs. But since they are blocked, so the whole file system hangs. Could anyone give us suggestions how to solve it ? Best Regards, T.H.Hsieh On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote: > Dear All, > > We have a problem of data migration from one OST two another. > > We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8 > on the clients. We want to migrate some data from one OST to another in > order to re-balance the data occupation among OSTs. In the beginning we > follow the old method (i.e., method found in Lustre-1.8.X manuals) for > the data migration. Suppose we have two OSTs: > > root@client# /opt/lustre/bin/lfs df > UUID 1K-blocksUsed Available Use% Mounted on > chome-OST0028_UUID7692938224 724670914855450156 99% /work[OST:40] > chome-OST002a_UUID 14640306852 7094037956 6813847024 51% /work[OST:42] > > and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID. > Our procedures are: > > 1. We deactivate chome-OST0028_UUID: >root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active > > 2. We find all files located in chome-OST0028_UUID: >root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list > > 3. In each file listed in the file "list", we did: > > cp -a .tmp > mv .tmp > > During the migration, we really saw that more and more data written into > chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID. > In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has > more data coming in, and chome-OST0028_UUID has more and more free space. > > It looks like that the data files referenced by MDT have copied to > chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID. > Even though we activate chome-OST0028_UUID after migration, the situation > is still the same: > > root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active > > Is there any way to cure this problem ? > > > Thanks very much. > > T.H.Hsieh > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Data migration from one OST to anther
Hsieh, This sounds similar to a bug with pre-2.5 servers and 2.7 (or newer) clients. The client and server have a disagreement about which does the delete, and the delete doesn’t happen. Since you’re running 2.5, I don’t think you should see this, but the symptoms are the same. You can temporarily fix things by restarting/remounting your OST(s), which will trigger orphan cleanup. But if that works, the only long term fix is to upgrade your servers to a version that is expected to work with your clients. (The 2.10 maintenance release is nice if you are not interested in the newest features, otherwise, 2.12 is also an option.) I would also recommend where possible that you keep clients and servers in sync - we do interop testing, but same version on both is much more widely used. - Patrick From: lustre-discuss on behalf of Tung-Han Hsieh Sent: Sunday, March 3, 2019 4:00:17 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] Data migration from one OST to anther Dear All, We have a problem of data migration from one OST two another. We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8 on the clients. We want to migrate some data from one OST to another in order to re-balance the data occupation among OSTs. In the beginning we follow the old method (i.e., method found in Lustre-1.8.X manuals) for the data migration. Suppose we have two OSTs: root@client# /opt/lustre/bin/lfs df UUID 1K-blocksUsed Available Use% Mounted on chome-OST0028_UUID7692938224 724670914855450156 99% /work[OST:40] chome-OST002a_UUID 14640306852 7094037956 6813847024 51% /work[OST:42] and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID. Our procedures are: 1. We deactivate chome-OST0028_UUID: root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active 2. We find all files located in chome-OST0028_UUID: root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list 3. In each file listed in the file "list", we did: cp -a .tmp mv .tmp During the migration, we really saw that more and more data written into chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID. In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has more data coming in, and chome-OST0028_UUID has more and more free space. It looks like that the data files referenced by MDT have copied to chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID. Even though we activate chome-OST0028_UUID after migration, the situation is still the same: root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active Is there any way to cure this problem ? Thanks very much. T.H.Hsieh ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Data migration from one OST to anther
Dear All, We have a problem of data migration from one OST two another. We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8 on the clients. We want to migrate some data from one OST to another in order to re-balance the data occupation among OSTs. In the beginning we follow the old method (i.e., method found in Lustre-1.8.X manuals) for the data migration. Suppose we have two OSTs: root@client# /opt/lustre/bin/lfs df UUID 1K-blocksUsed Available Use% Mounted on chome-OST0028_UUID7692938224 724670914855450156 99% /work[OST:40] chome-OST002a_UUID 14640306852 7094037956 6813847024 51% /work[OST:42] and we want to migrate data from chome-OST0028_UUID to chome-OST002a_UUID. Our procedures are: 1. We deactivate chome-OST0028_UUID: root@mds# echo 0 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active 2. We find all files located in chome-OST0028_UUID: root@client# /opt/lustre/bin/lfs find --obd chome-OST0028_UUID /work > list 3. In each file listed in the file "list", we did: cp -a .tmp mv .tmp During the migration, we really saw that more and more data written into chome-OST002a_UUID. But we did not see any disk release in chome-OST0028_UUID. In Lustre-1.8.X, doing this way we did saw that chome-OST002a_UUID has more data coming in, and chome-OST0028_UUID has more and more free space. It looks like that the data files referenced by MDT have copied to chome-OST002a_UUID, but the junks still remain in chome-OST0028_UUID. Even though we activate chome-OST0028_UUID after migration, the situation is still the same: root@mds# echo 1 > /opt/lustre/fs/osc/chome-OST0028-osc-MDT/active Is there any way to cure this problem ? Thanks very much. T.H.Hsieh ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org