from:"Tung\-Han Hsieh via lustre\-discuss"

Re: [lustre-discuss] Coordinating cluster start and shutdown?

2023-12-10 Thread Tung-Han Hsieh via lustre-discuss

Dear All,

I can contribute a few simple scripts to coordinate the start / stop of the
whole Lustre file system. Everyone is welcome to use it or modify it to fit
the usage of your system. Sorry that I did not prepare a completed document
for these scripts. Here I only mention some relevant usages briefly. If you
are interested in more details, I will be happy to answer here.

- server:/opt/lustre/etc/cfs-chome:
   The configuration file, where the Lustre file system is named "chome".
The head node is named "server", which is also one of the Lustre clients.
This file lists all the MGS, MDS, OSS, and lustre clients. If MGS and MDS
have both ethernet and infiniband networks, you can specify their IP
explicitly. If MDT or OST were formatted by ZFS, you can list them as well.

- server:/opt/lustre/etc/cfsd:
   The main script to coordinate the start / stop / shutdown (emergent
shutdown) of the Lustre system, running in the head node. The usage is:
   # cd /opt/lustre/etc/
   # ./cfsd start chrome
   # ./cfsd stop chrome
   # ./cfsd shutdown

   When doing "start", it will do the following procedures (the script will
ssh into each file servers and clients to do the mount):
   1. If some of the MDT/OST were based on ZFS, it starts ZFS of these
MDT/OST first.
   2. Mount MGT, MDT, and OST in order.
   3. Mount all the clients.

   When doing "stop", it will reverse the above procedures to do unmount.

   When doing "shutdown", usually used when the air-conditioner of the
computing room is broken, and the whole room is in a emergent state that we
need to shutdown the whole system as fast as possible:
   1. Shutdown all the clients (for the head node, only unmount Lustre
without shutdown) right now.
   2. Unmount all the OST, MDT, MGT, and then shutdown these servers.
   3. Shutdown the head node.

- client:/etc/init.d/lustre_mnt:
   Sometimes the clients have to be rebooted, and we want it to mount
Lustre automatically, or unmount Lustre correctly during shutdown. This
script do this work. It reads /opt/lustre/etc/cfs-chome to check whether
all the file servers are alive, determine whether it should mount Lustre
through ethernet or infiniband, and do the mount. When doing unmount, after
unmount it also unload all the Lustre kernel modules. The usage is:
   # /etc/init.d/lustre_mnt start
   # /etc/init.d/lustre_mnt stop

- client:/etc/systemd/system/sysinit.target.wants/lustre_mnt.service:
   If the client has infiniband network, it is very annoying that it will
stop OpenIB quite quickly before shutdown the Lustre mounts, and then hang
the system without power-off. Hence, this file is to tell systemd to wait
for /etc/init.d/lustre_mnt stop and then proceed the shutdown of OpenIB.

Please note that these scripts may have bugs when used in variety
environments. And also note that these scripts does not implement the case
of Lustre HA (because we don't have). If you have any suggestions, I will
be very appreciated. I am also very happy if you could find them useful.

Cheers,

T.H.Hsieh

Bertschinger, Thomas Andrew Hjorth via lustre-discuss <
lustre-discuss@lists.lustre.org> 於 2023年12月7日 週四 上午12:01寫道：

> Hello Jan,
>
> You can use the Pacemaker / Corosync high-availability software stack for
> this: specifically, ordering constraints [1] can be used.
>
> Unfortunately, Pacemaker is probably over-the-top if you don't need HA --
> its configuration is complex and difficult to get right, and it
> significantly complicates system administration. One downside of Pacemaker
> is that it is not easy to decouple the Pacemaker service from the Lustre
> services, meaning if you stop the Pacemaker service, it will try to stop
> all of the Lustre services. This might make it inappropriate for use cases
> that don't involve HA.
>
> Given those downsides, if others in the community have suggestions on
> simpler means to accomplish this, I'd love to see other tools that can be
> used here (especially officially supported ones, if they exist).
>
> [1]
> https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/constraints.html#specifying-the-order-in-which-resources-should-start-stop
>
> - Thomas Bertschinger
>
> 
> From: lustre-discuss  on behalf
> of Jan Andersen 
> Sent: Wednesday, December 6, 2023 3:27 AM
> To: lustre
> Subject: [EXTERNAL] [lustre-discuss] Coordinating cluster start and
> shutdown?
>
> Are there any tools for coordinating the start and shutdown of lustre
> filesystem, so that the OSS systems don't attempt to mount disks before the
> MGT and MDT are online?
> ___
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


cfs-chome
Description: Binary data


cfsd
Description: Binary data


lustre_mnt
Description: Binary data


lustre_mnt.service
Description: Binary data

Re: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

2023-10-01 Thread Tung-Han Hsieh via lustre-discuss

d ZFS version anymore. The command to upgrade ZFS pool is (for all
> the MDT and OST pools):
> ```
> zpool upgrade 
> zpool status 
> ```
>   Checking zpool status again, the warning messages were
> disappeared.
> 3) I checked the Lustre Operation Manual again, and saw that in the
> recent version it does not need to run tunefs.lustre --writeconf command.
> Sorry that it was my fault. But in our case, we run this command whenever
> doing Lustre upgrade. Note that to run it, we have to run it in all the
> Lustre MDT and OST pools. It cleared out the logs and will regenerate it
> when mounting Lustre.
>
> 2. One of our clusters have Lustre-2.12.6, with MDT with ldiskfs backend,
> and OST with ZFS backend, where ZFS is 0.7.13. We have to upgrade it to
> e2fsprogs-1.47.0, ZFS-2.0.7 and Lustre-2.15.3. The upgrade of OST part is
> exactly the same as the above, so I don't repeat. The major challenge is
> MDT with ldiskfs. What I have done is
> 1) After installing all the new version of the software, running
> tunefs.lustre --writeconf (for all MDT and OST). Probably this is the wrong
> step for the upgrade to Lustre-2.15.X.
> 2) According to Lustre Operation manual chapter 17, to upgrade Lustre
> from 2.13.0 and before to 2.15.X, we should run
>
> tune2fs -O ea_inode /dev/*mdtdev*
>
> After that, as I have posted, we encounter problem of mounting MDT. Then
> we cure this problem by following the section 18 of Lustre Operation manual.
>
> My personal suggestions are:
> 1. In the future, to do major revision upgrade for our production run
> systems (say,  2.12.X to 2.15.X, or 2.15.X to 2.16 or later), I will
> develop a small testing system, installing exactly the same software of the
> production run, and test the upgrade, to make sure that every steps are
> correct. We did this for upgrading Lustre with ZFS backend. But this time
> due to time pressure, we skip this step for upgrading Lustre with ldiskfs
> backend. I think no matter what situation, it is still worth to do this
> step in order to avoid any risks.
>
> 2. Currently compiling Lustre with ldiskfs backend is still a nightmare.
> The ldiskfs code is not a self-contained, stand along code. It actually
> copied codes from the kernel ext4 code, did a lot of patches, and then did
> the compilation, on the fly. So we have to be very careful to select the
> Linux kernel, to select a compatible one for both our hardware and Lustre
> version. The ZFS backend is much more cleaner. It is a stand along and
> self-contained code. We don't need to do patches on the fly. So I would
> like to suggest the Lustre developer to consider to make the ldiskfs to be
> a stand along and self-contained code in the future release. That will
> bring us a lot of convenient.
>
> Hope that the above experiences could be useful to our community.
>
> ps. Lustre Operation Manual:
> https://doc.lustre.org/lustre_manual.xhtml#Upgrading_2.x
>
> Best Regards,
>
> T.H.Hsieh
>
> Audet, Martin  於 2023年9月27日 週三 上午3:44寫道：
>
>> Hello all,
>>
>>
>> I would appreciate if the community would give more attention to this
>> issue because upgrading from 2.12.x to 2.15.x, two LTS versions, is
>> something that we can expect many cluster admin will try to do in the next
>> few months...
>>
>>
>> We ourselves plan to upgrade a small Lustre (production) system from
>> 2.12.9 to 2.15.3 in the next couple of weeks...
>>
>>
>> After seeing problems reports like this we start feeling a bit nervous...
>>
>>
>> The documentation for doing this major update appears to me as not very
>> specific...
>>
>>
>> In this document for example,
>> https://doc.lustre.org/lustre_manual.xhtml#upgradinglustre , the update
>> process appears not so difficult and there is no mention of using 
>> "tunefs.lustre
>> --writeconf" for this kind of update.
>>
>>
>> Or am I missing something ?
>>
>>
>> Thanks in advance for providing more tips for this kind of update.
>>
>>
>> Martin Audet
>> --
>> *From:* lustre-discuss  on
>> behalf of Tung-Han Hsieh via lustre-discuss <
>> lustre-discuss@lists.lustre.org>
>> *Sent:* September 23, 2023 2:20 PM
>> *To:* lustre-discuss@lists.lustre.org
>> *Subject:* [lustre-discuss] Cannot mount MDT after upgrading from Lustre
>> 2.12.6 to 2.15.3
>>
>>
>> Attention*** This email originated from outside of the NRC.
>> ***Attention*** Ce courriel provient de l'extérieur du CNRC.*
>> Dear All,
>>
>> Today we tried to upgrade Lustre file system from version 2.12.6

Re: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

2023-09-26 Thread Tung-Han Hsieh via lustre-discuss

://doc.lustre.org/lustre_manual.xhtml#upgradinglustre , the update
> process appears not so difficult and there is no mention of using 
> "tunefs.lustre
> --writeconf" for this kind of update.
>
>
> Or am I missing something ?
>
>
> Thanks in advance for providing more tips for this kind of update.
>
>
> Martin Audet
> --
> *From:* lustre-discuss  on
> behalf of Tung-Han Hsieh via lustre-discuss <
> lustre-discuss@lists.lustre.org>
> *Sent:* September 23, 2023 2:20 PM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] Cannot mount MDT after upgrading from Lustre
> 2.12.6 to 2.15.3
>
>
> Attention*** This email originated from outside of the NRC.
> ***Attention*** Ce courriel provient de l'extérieur du CNRC.*
> Dear All,
>
> Today we tried to upgrade Lustre file system from version 2.12.6 to
> 2.15.3. But after the work, we cannot mount MDT successfully. Our MDT is
> ldiskfs backend. The procedure of upgrade is
>
> 1. Install the new version of e2fsprogs-1.47.0
> 2. Install Lustre-2.15.3
> 3. After reboot, run: tunefs.lustre --writeconf /dev/md0
>
> Then when mounting MDT, we got the error message in dmesg:
>
> ===
> [11662.434724] LDISKFS-fs (md0): mounted filesystem with ordered data
> mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
> [11662.584593] Lustre: 3440:0:(scrub.c:189:scrub_file_load())
> chome-MDT: reset scrub OI count for format change (LU-16655)
> [11666.036253] Lustre: MGS: Logs for fs chome were removed by user
> request.  All servers must be restarted in order to regenerate the logs: rc
> = 0
> [11666.523144] Lustre: chome-MDT: Imperative Recovery not enabled,
> recovery window 300-900
> [11666.594098] LustreError: 3440:0:(mdd_device.c:1355:mdd_prepare())
> chome-MDD: get default LMV of root failed: rc = -2
> [11666.594291] LustreError:
> 3440:0:(obd_mount_server.c:2027:server_fill_super()) Unable to start
> targets: -2
> [11666.594951] Lustre: Failing over chome-MDT
> [11672.868438] Lustre: 3440:0:(client.c:2295:ptlrpc_expire_one_request())
> @@@ Request sent has timed out for slow reply: [sent 1695492248/real
> 1695492248]  req@5dfd9b53 x1777852464760768/t0(0)
> o251->MGC192.168.32.240@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl
> 1695492254 ref 2 fl Rpc:XNQr/0/ rc 0/-1 job:''
> [11672.925905] Lustre: server umount chome-MDT complete
> [11672.926036] LustreError: 3440:0:(super25.c:183:lustre_fill_super())
> llite: Unable to mount : rc = -2
> [11872.893970] LDISKFS-fs (md0): mounted filesystem with ordered data
> mode. Opts: (null)
> 
>
> Could anyone help to solve this problem ? Sorry that it is really urgent.
>
> Thank you very much.
>
> T.H.Hsieh
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] [EXTERNAL] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

2023-09-26 Thread Tung-Han Hsieh via lustre-discuss

We did run "tunefs.lustre --writeconf " for all the MDT and OST
partitions in each file servers. But after that, trying to mount MDT
resulted that error message. Note that doing tunefs.lustre --writeconf as
the final step to upgrade Lustre file system was followed the instructions
in Lustre documentation.

After a lot of struggle, we finally cure this problem by the following
procedure. This was based on two assumptions:
1. During upgrade, somehow some important config files in MDT were not
generated and missing.
2. During upgrade, somehow some important config files in MDT were
corrupted, but hopefully they could be regenerated.

Therefore, we followed the procedure to move data in MDT to another device
in the ldiskfs mount. Since this procedure involves rsync all the data in
the original MDT to the newly created MDT partition, we hope that if any
one of the above assumptions stands, hopefully we could get back the
missing / corrupted config by this way. The procedure exactly follows the
Lustre documentation of manually backing up MDT to another device.
1. Mount the broken MDT to /mnt via ldiskfs.
2. Find an empty partition, use mkfs.lustre to create a new MDT with
ldiskfs backend, with the same file system name and the same index as the
broken MDT. Then we mount it to  /mnt2 via ldiskfs.
3. Use getfattr to extract the extended file attributes of all files in
/mnt.
4. Use rsync -av --sparse to backup everything from /mnt to /mnt2.
5. Restore the extended file attributes of all files in /mnt2 by setfattr.
6. Remove the log files in /mnt2/, i.e., rm -rf oi.16* lfsck_* LFSCK CATALOG

Then we umount /mnt and /mnt2, trying to mount the newly created MTD. The
error message told that the index 0 was already assigned to the original
MDT. We should run tunefs.lustre --writeconf to clear it again. After
running tunefs.lustre,
we were very lucky to mount MDT back.

Now we have recovered the whole Lustre file system. But I still quite worry
that there might still have potential issues, since I am not sure whether I
did it correctly to solve this problem. So I am still watching the system
closely. If we were really so lucky that this problem was cured by this
way, then probably it could provide an opportunity of rescuing a broken MDT
if unfortunately we could not find any solutions.

Best Regards,

T.H.Hsieh

Mohr, Rick  於 2023年9月26日 週二 下午2:08寫道：

> Typically after an upgrade you do not need to perform a writeconf.  Did
> you perform the writeconf only on the MDT?  If so, that could be your
> problem.  When you do a writeconf to regenerate the lustre logs, you need
> to follow the whole procedure listed in the lustre manual.  You can try
> that to see if it fixes your issue.
>
> --Rick
>
> On 9/23/23, 2:22 PM, "lustre-discuss on behalf of Tung-Han Hsieh via
> lustre-discuss"  lustre-discuss-boun...@lists.lustre.org> on behalf of
> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>>
> wrote:
>
>
> Dear All,
>
>
> Today we tried to upgrade Lustre file system from version 2.12.6 to
> 2.15.3. But after the work, we cannot mount MDT successfully. Our MDT is
> ldiskfs backend. The procedure of upgrade is
>
>
>
>
> 1. Install the new version of e2fsprogs-1.47.0
> 2. Install Lustre-2.15.3
> 3. After reboot, run: tunefs.lustre --writeconf /dev/md0
>
>
>
>
> Then when mounting MDT, we got the error message in dmesg:
>
>
>
>
> ===
> [11662.434724] LDISKFS-fs (md0): mounted filesystem with ordered data
> mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
> [11662.584593] Lustre: 3440:0:(scrub.c:189:scrub_file_load())
> chome-MDT: reset scrub OI count for format change (LU-16655)
> [11666.036253] Lustre: MGS: Logs for fs chome were removed by user
> request. All servers must be restarted in order to regenerate the logs: rc
> = 0
> [11666.523144] Lustre: chome-MDT: Imperative Recovery not enabled,
> recovery window 300-900
> [11666.594098] LustreError: 3440:0:(mdd_device.c:1355:mdd_prepare())
> chome-MDD: get default LMV of root failed: rc = -2
> [11666.594291] LustreError:
> 3440:0:(obd_mount_server.c:2027:server_fill_super()) Unable to start
> targets: -2
> [11666.594951] Lustre: Failing over chome-MDT
> [11672.868438] Lustre: 3440:0:(client.c:2295:ptlrpc_expire_one_request())
> @@@ Request sent has timed out for slow reply: [sent 1695492248/real
> 1695492248] req@5dfd9b53 x1777852464760768/t0(0)
> o251->MGC192.168.32.240@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl
> 1695492254 ref 2 fl Rpc:XNQr/0/ rc 0/-1 job:''
> [11672.925905] Lustre: server umount chome-MDT complete
> [11672.926036] LustreError: 3440:0:(super25.c:183:lustre_fill_super())
> llite: Unable to mount : rc = -2
> [11872.893970]

[lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

2023-09-23 Thread Tung-Han Hsieh via lustre-discuss

Dear All,

Today we tried to upgrade Lustre file system from version 2.12.6 to 2.15.3.
But after the work, we cannot mount MDT successfully. Our MDT is ldiskfs
backend. The procedure of upgrade is

1. Install the new version of e2fsprogs-1.47.0
2. Install Lustre-2.15.3
3. After reboot, run: tunefs.lustre --writeconf /dev/md0

Then when mounting MDT, we got the error message in dmesg:

===
[11662.434724] LDISKFS-fs (md0): mounted filesystem with ordered data mode.
Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[11662.584593] Lustre: 3440:0:(scrub.c:189:scrub_file_load())
chome-MDT: reset scrub OI count for format change (LU-16655)
[11666.036253] Lustre: MGS: Logs for fs chome were removed by user
request.  All servers must be restarted in order to regenerate the logs: rc
= 0
[11666.523144] Lustre: chome-MDT: Imperative Recovery not enabled,
recovery window 300-900
[11666.594098] LustreError: 3440:0:(mdd_device.c:1355:mdd_prepare())
chome-MDD: get default LMV of root failed: rc = -2
[11666.594291] LustreError:
3440:0:(obd_mount_server.c:2027:server_fill_super()) Unable to start
targets: -2
[11666.594951] Lustre: Failing over chome-MDT
[11672.868438] Lustre: 3440:0:(client.c:2295:ptlrpc_expire_one_request())
@@@ Request sent has timed out for slow reply: [sent 1695492248/real
1695492248]  req@5dfd9b53 x1777852464760768/t0(0)
o251->MGC192.168.32.240@o2ib@0@lo:26/25 lens 224/224 e 0 to 1 dl 1695492254
ref 2 fl Rpc:XNQr/0/ rc 0/-1 job:''
[11672.925905] Lustre: server umount chome-MDT complete
[11672.926036] LustreError: 3440:0:(super25.c:183:lustre_fill_super())
llite: Unable to mount : rc = -2
[11872.893970] LDISKFS-fs (md0): mounted filesystem with ordered data mode.
Opts: (null)


Could anyone help to solve this problem ? Sorry that it is really urgent.

Thank you very much.

T.H.Hsieh
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] How to remove an OST completely

2021-03-08 Thread Tung-Han Hsieh via lustre-discuss

Dear Zeehan,

In our case, it looks like that the removed OST is disappeared. However,
sometimes we feel that some "shodow" of the removed OST still exists in
the Lustre file system.

In our system, running "lctl get_param osc.*.conn_uuid" it shows:

osc.chome-OST-osc-MDT.ost_conn_uuid=192.168.32.242@o2ib
osc.chome-OST0001-osc-MDT.ost_conn_uuid=192.168.32.241@o2ib
osc.chome-OST0002-osc-MDT.ost_conn_uuid=192.168.32.241@o2ib
osc.chome-OST0003-osc-MDT.ost_conn_uuid=192.168.32.241@o2ib
osc.chome-OST0008-osc-MDT.ost_conn_uuid=192.168.32.243@o2ib
osc.chome-OST0010-osc-MDT.ost_conn_uuid=192.168.32.241@o2ib
osc.chome-OST0011-osc-MDT.ost_conn_uuid=192.168.32.243@o2ib
osc.chome-OST0012-osc-MDT.ost_conn_uuid=192.168.32.243@o2ib
osc.chome-OST0013-osc-MDT.ost_conn_uuid=192.168.32.243@o2ib
osc.chome-OST0014-osc-MDT.ost_conn_uuid=192.168.32.243@o2ib

Note that OST0008 is the one we have removed before. In fact, the server
used to have the following OSTs:

OST0008, OST0009, OST000a, OST000b, OST000c, OST000d

which were all ldiskfs backend partitions. We want to convert them to ZFS
backend partitions. So, one by one, we locked them to prevent creating
new files, follow the lfs_migrate procedure to move all the data out,
and deactivate them by:

lctl conf_param chome-OST0008-osc-MDT.osc.active=0

and finally unmount them from the OST server. Whenever one OST was
successfully unmounted, we verified that there is no data lost. The
whole process took several months because we kept the whole system in
production run without stopping.

After the final one OST0008 was removed, we rebooted the OST server,
reinstalled Lustre with the zfs backend, repartitioned the storage,
reformatted the partitions, and remounted them back, which are OST0011,
OST0012, OST0013, and OST0014. Then, when we were happy to celebrate
that we have finally done the complicated and long task, we suddently
found that the "shadow" of OST0008 is still there. But the other old
OSTs seems really disappeared.

It is still unclear why only OST0008 behaves differently. We guess we
may need to shutdown the Lustre file system completely, reboot the MDT
server, and probably run "tunefs.lustre --writeconf" for all the MDT
and OSTs in order to clear out OST0008 completely. But we need to find a
chance to do that because a lot of users are quite busy in our system.

Best Regards,

T.H.Hsieh

On Mon, Mar 08, 2021 at 11:18:53AM +0300, Zeeshan Ali Shah wrote:
> Dear Tung-Han, even during all the above steps the OST still appears? we
> did the same for 3 OSTs in our centre and they disappeared correctly.
> 
> 
> Zeehan
> 
> On Mon, Mar 8, 2021 at 11:15 AM Tung-Han Hsieh <
> thhs...@twcp1.phys.ntu.edu.tw> wrote:
> 
> > Dear Zeeshan,
> >
> > Yes. We did lfs_migrate to move data out of the OST which we are
> > going to remove, and then deactivate it, and then unmount it. We
> > have verified that after the whole process, no data lost, but only
> > the total space of the Lustre file system decreased due to the
> > removed OST.
> >
> > Best Regards,
> >
> > T.H.Hsieh
> >
> >
> > On Mon, Mar 08, 2021 at 11:08:26AM +0300, Zeeshan Ali Shah wrote:
> > > Did you unmount the OST ? remember to lfs_migrate the data otherwise old
> > > data would give errors
> > >
> > > On Fri, Mar 5, 2021 at 11:59 AM Etienne Aujames via lustre-discuss <
> > > lustre-discuss@lists.lustre.org> wrote:
> > >
> > > > Hello,
> > > >
> > > > There is some process/work in progress on the LU-7668 to remove the OST
> > > > directly on the MGS configuration.
> > > >
> > > > In the comment section Andreas describes a way to remove an OST with
> > > > llog_print and llog_cancel (see https://review.whamcloud.com/41449).
> > > >
> > > > Stephane Thiell have submitted a patch (
> > > > https://review.whamcloud.com/41449/) to implement this process
> > directly
> > > > inside a lctl command "del_ost".
> > > >
> > > > This process could be applied live, the changes will take effect only
> > > > after whole system remount (when MGS configuration is read by
> > > > clients/MDT).
> > > >
> > > > This process does not replace the migrate/locking parts.
> > > >
> > > > We tested this process in production, but maybe for now this is bit
> > > > risky. So I recommend to backup the MGS configuration.
> > > >
> > > > Best regards.
> > > >
> > > > Etienne AUJAMES
> > > >
> > > > On Fri, 2021-03-05 at 12:56 +0800, Tung-Han Hsieh via lustre-

Re: [lustre-discuss] How to remove an OST completely

2021-03-08 Thread Tung-Han Hsieh via lustre-discuss

Dear Etienne,

Thank you very much for this information, which very interesting.

Will it be available in Lustre version 2.14 ? Hope that the Lustre
manual could also be updateded for the corresponding usage. Then
we would be happy to test it.

Best Regards,

T.H.Hsieh

On Fri, Mar 05, 2021 at 08:59:17AM +, Etienne Aujames wrote:
> Hello,
> 
> There is some process/work in progress on the LU-7668 to remove the OST
> directly on the MGS configuration.
> 
> In the comment section Andreas describes a way to remove an OST with
> llog_print and llog_cancel (see https://review.whamcloud.com/41449).
> 
> Stephane Thiell have submitted a patch (
> https://review.whamcloud.com/41449/) to implement this process directly
> inside a lctl command "del_ost".
> 
> This process could be applied live, the changes will take effect only
> after whole system remount (when MGS configuration is read by
> clients/MDT).
> 
> This process does not replace the migrate/locking parts.
> 
> We tested this process in production, but maybe for now this is bit
> risky. So I recommend to backup the MGS configuration.
> 
> Best regards.
> 
> Etienne AUJAMES
> 
> On Fri, 2021-03-05 at 12:56 +0800, Tung-Han Hsieh via lustre-discuss
> wrote:
> > Dear Angelos,
> > 
> > On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-
> > discuss wrote:
> > > Hi TH,
> > > 
> > > I think you'll have to set max_create_count=2 after step 7
> > > unless you
> > > unmount and remount your MDT.
> > 
> > Yes. You are right. We have to set max_create_count=2 for the
> > replaced
> > OST, otherwise it will not accept newly created files.
> > 
> > > And for step 4, I used conf_param instead of set_param during my
> > > drill and I
> > > noticed this might be more resilient if you are using a HA pair for
> > > the MDT
> > > because the MDS might try to activate the inactive OST during
> > > failover as
> > > set_param is only changing run time option?
> > > 
> > > Regards,
> > > Angelos
> > 
> > I am concerning that, sometimes, the replacement of the OST many take
> > a
> > long time. In between we may encounter some other events that need to
> > reboot the MDT servers. I am only sure that we can deactivate /
> > reactivate
> > the OST by conf_param when MDT server is not rebooted. Once MDT
> > server
> > is rebooted after setting conf_param=0 on the OST, I am not sure
> > whether
> > it can be recovered back or not.
> > 
> > So probably I missed another step. Between step 6 and 7, we need to
> > reactivate the old OST before mounting the new OST ?
> > 
> > 6. Prepare the new OST for replacement by mkfs.lustre with --replace
> >option, and set the index to the old OST index (e.g., 0x8):
> >
> > 
> > 6.5. Reactivate the old OST index:
> > 
> >lctl set_param osc.chome-OST0008-osc-MDT.active=1
> > 
> > 7. Mount the new OST (run in the new OST server).
> > 
> > 8. Release the new OST for accepting new objects:
> > 
> >lctl set_param osc.chome-OST0008-osc-
> > MDT.max_create_count=2
> > 
> > 
> > Cheers,
> > 
> > T.H.Hsieh
> > 
> > 
> > > On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > > > Dear Hans,
> > > > 
> > > > Thank you very much. Replacing the OST is new to me and very very
> > > > useful. We will try it next time.
> > > > 
> > > > So, according to the description of the manual, to replace the
> > > > OST
> > > > we probably need to:
> > > > 
> > > > 1. Lock the old OST (e.g., chome-OST0008) such that it will not
> > > > create new files (run in the MDT server):
> > > > 
> > > > lctl set_param osc.chome-OST0008-osc-
> > > > MDT.max_create_count=0
> > > > 
> > > > 2. Locate the list of files in the old OST: (e.g., chome-
> > > > OST0008):
> > > > (run in the client):
> > > > 
> > > > lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt
> > > > 
> > > > 3. Migrate the listed files in /tmp/OST0008.txt out of the old
> > > > OST.
> > > > (run in the client).
> > > > 
> > > > 4. Remove the old OST temporarily (run in the MDT server):
> > > > 
> > > > lctl set_param osc.chome-OST0008-osc-MDT.active=0
>

Re: [lustre-discuss] How to remove an OST completely

2021-03-08 Thread Tung-Han Hsieh via lustre-discuss

Dear Zeeshan,

Yes. We did lfs_migrate to move data out of the OST which we are
going to remove, and then deactivate it, and then unmount it. We
have verified that after the whole process, no data lost, but only
the total space of the Lustre file system decreased due to the
removed OST.

Best Regards,

T.H.Hsieh


On Mon, Mar 08, 2021 at 11:08:26AM +0300, Zeeshan Ali Shah wrote:
> Did you unmount the OST ? remember to lfs_migrate the data otherwise old
> data would give errors
> 
> On Fri, Mar 5, 2021 at 11:59 AM Etienne Aujames via lustre-discuss <
> lustre-discuss@lists.lustre.org> wrote:
> 
> > Hello,
> >
> > There is some process/work in progress on the LU-7668 to remove the OST
> > directly on the MGS configuration.
> >
> > In the comment section Andreas describes a way to remove an OST with
> > llog_print and llog_cancel (see https://review.whamcloud.com/41449).
> >
> > Stephane Thiell have submitted a patch (
> > https://review.whamcloud.com/41449/) to implement this process directly
> > inside a lctl command "del_ost".
> >
> > This process could be applied live, the changes will take effect only
> > after whole system remount (when MGS configuration is read by
> > clients/MDT).
> >
> > This process does not replace the migrate/locking parts.
> >
> > We tested this process in production, but maybe for now this is bit
> > risky. So I recommend to backup the MGS configuration.
> >
> > Best regards.
> >
> > Etienne AUJAMES
> >
> > On Fri, 2021-03-05 at 12:56 +0800, Tung-Han Hsieh via lustre-discuss
> > wrote:
> > > Dear Angelos,
> > >
> > > On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-
> > > discuss wrote:
> > > > Hi TH,
> > > >
> > > > I think you'll have to set max_create_count=2 after step 7
> > > > unless you
> > > > unmount and remount your MDT.
> > >
> > > Yes. You are right. We have to set max_create_count=2 for the
> > > replaced
> > > OST, otherwise it will not accept newly created files.
> > >
> > > > And for step 4, I used conf_param instead of set_param during my
> > > > drill and I
> > > > noticed this might be more resilient if you are using a HA pair for
> > > > the MDT
> > > > because the MDS might try to activate the inactive OST during
> > > > failover as
> > > > set_param is only changing run time option?
> > > >
> > > > Regards,
> > > > Angelos
> > >
> > > I am concerning that, sometimes, the replacement of the OST many take
> > > a
> > > long time. In between we may encounter some other events that need to
> > > reboot the MDT servers. I am only sure that we can deactivate /
> > > reactivate
> > > the OST by conf_param when MDT server is not rebooted. Once MDT
> > > server
> > > is rebooted after setting conf_param=0 on the OST, I am not sure
> > > whether
> > > it can be recovered back or not.
> > >
> > > So probably I missed another step. Between step 6 and 7, we need to
> > > reactivate the old OST before mounting the new OST ?
> > >
> > > 6. Prepare the new OST for replacement by mkfs.lustre with --replace
> > >option, and set the index to the old OST index (e.g., 0x8):
> > >
> > >
> > > 6.5. Reactivate the old OST index:
> > >
> > >lctl set_param osc.chome-OST0008-osc-MDT.active=1
> > >
> > > 7. Mount the new OST (run in the new OST server).
> > >
> > > 8. Release the new OST for accepting new objects:
> > >
> > >lctl set_param osc.chome-OST0008-osc-
> > > MDT.max_create_count=2
> > >
> > >
> > > Cheers,
> > >
> > > T.H.Hsieh
> > >
> > >
> > > > On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > > > > Dear Hans,
> > > > >
> > > > > Thank you very much. Replacing the OST is new to me and very very
> > > > > useful. We will try it next time.
> > > > >
> > > > > So, according to the description of the manual, to replace the
> > > > > OST
> > > > > we probably need to:
> > > > >
> > > > > 1. Lock the old OST (e.g., chome-OST0008) such that it will not
> > > > > create new files (run in the MDT server):
> > > > >
> > > > > lctl

Re: [lustre-discuss] How to remove an OST completely

2021-03-04 Thread Tung-Han Hsieh via lustre-discuss

Dear Angelos,

On Fri, Mar 05, 2021 at 12:15:19PM +0800, Angelos Ching via lustre-discuss 
wrote:
> Hi TH,
> 
> I think you'll have to set max_create_count=2 after step 7 unless you
> unmount and remount your MDT.

Yes. You are right. We have to set max_create_count=2 for the replaced
OST, otherwise it will not accept newly created files.

> And for step 4, I used conf_param instead of set_param during my drill and I
> noticed this might be more resilient if you are using a HA pair for the MDT
> because the MDS might try to activate the inactive OST during failover as
> set_param is only changing run time option?
> 
> Regards,
> Angelos

I am concerning that, sometimes, the replacement of the OST many take a
long time. In between we may encounter some other events that need to
reboot the MDT servers. I am only sure that we can deactivate / reactivate
the OST by conf_param when MDT server is not rebooted. Once MDT server
is rebooted after setting conf_param=0 on the OST, I am not sure whether
it can be recovered back or not.

So probably I missed another step. Between step 6 and 7, we need to
reactivate the old OST before mounting the new OST ?

6. Prepare the new OST for replacement by mkfs.lustre with --replace
   option, and set the index to the old OST index (e.g., 0x8):
   

6.5. Reactivate the old OST index:

   lctl set_param osc.chome-OST0008-osc-MDT.active=1

7. Mount the new OST (run in the new OST server).

8. Release the new OST for accepting new objects:

   lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=2


Cheers,

T.H.Hsieh


> On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > Dear Hans,
> > 
> > Thank you very much. Replacing the OST is new to me and very very
> > useful. We will try it next time.
> > 
> > So, according to the description of the manual, to replace the OST
> > we probably need to:
> > 
> > 1. Lock the old OST (e.g., chome-OST0008) such that it will not
> > create new files (run in the MDT server):
> > 
> > lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=0
> > 
> > 2. Locate the list of files in the old OST: (e.g., chome-OST0008):
> > (run in the client):
> > 
> > lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt
> > 
> > 3. Migrate the listed files in /tmp/OST0008.txt out of the old OST.
> > (run in the client).
> > 
> > 4. Remove the old OST temporarily (run in the MDT server):
> > 
> > lctl set_param osc.chome-OST0008-osc-MDT.active=0
> > 
> > (Note: should use "set_param" instead of "conf_param")
> > 
> > 5. Unmount the old OST partition (run in the old OST server)
> > 
> > 6. Prepare the new OST for replacement by mkfs.lustre with --replace
> > option, and set the index to the old OST index (e.g., 0x8):
> > (run in the new OST server)
> > 
> > mkfs.lustre --ost --mgsnode=XX --index=0x8 --replace 
> > 
> > 7. Mount the new OST (run in the new OST server).
> > 
> > 
> > Best Regards,
> > 
> > T.H.Hsieh
> > 
> > 
> > On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via 
> > lustre-discuss wrote:
> > > Hi,
> > > 
> > > The manual describe this:
> > > 
> > > https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> > > 
> > > There is a note telling you that it will still be there, but can be
> > > replaced.
> > > 
> > > Hope you migrated your data away from the OST also. Otherwise you would
> > > have lost it.
> > > 
> > > Cheers,
> > > Hans Henrik
> > > 
> > > On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> > > > Dear All,
> > > > 
> > > > Here is a question about how to remove an OST completely without
> > > > restarting the Lustre file system. Our Lustre version is 2.12.6.
> > > > 
> > > > We did the following steps to remove the OST:
> > > > 
> > > > 1. Lock the OST (e.g., chome-OST0008) such that it will not create
> > > > new files (run in the MDT server):
> > > > 
> > > > lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=0
> > > > 
> > > > 2. Locate the list of files in the target OST: (e.g., chome-OST0008):
> > > > (run in the client):
> > > > 
> > > > lfs find --obd chome-OST0008_UUID /home
> > > > 
> > > > 3. Remove OST (run in the MDT server):
> > > > lctl conf_param osc.chome-OST

Re: [lustre-discuss] How to remove an OST completely

2021-03-04 Thread Tung-Han Hsieh via lustre-discuss

Dear Hans,

Thank you very much. Replacing the OST is new to me and very very
useful. We will try it next time.

So, according to the description of the manual, to replace the OST
we probably need to:

1. Lock the old OST (e.g., chome-OST0008) such that it will not
   create new files (run in the MDT server):

   lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=0

2. Locate the list of files in the old OST: (e.g., chome-OST0008):
   (run in the client):

   lfs find --obd chome-OST0008_UUID /home > /tmp/OST0008.txt

3. Migrate the listed files in /tmp/OST0008.txt out of the old OST.
   (run in the client).

4. Remove the old OST temporarily (run in the MDT server):

   lctl set_param osc.chome-OST0008-osc-MDT.active=0

   (Note: should use "set_param" instead of "conf_param")

5. Unmount the old OST partition (run in the old OST server)

6. Prepare the new OST for replacement by mkfs.lustre with --replace
   option, and set the index to the old OST index (e.g., 0x8):
   (run in the new OST server)

   mkfs.lustre --ost --mgsnode=XX --index=0x8 --replace 

7. Mount the new OST (run in the new OST server).


Best Regards,

T.H.Hsieh


On Thu, Mar 04, 2021 at 04:59:54PM +0100, Hans Henrik Happe via lustre-discuss 
wrote:
> Hi,
> 
> The manual describe this:
> 
> https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost
> 
> There is a note telling you that it will still be there, but can be
> replaced.
> 
> Hope you migrated your data away from the OST also. Otherwise you would
> have lost it.
> 
> Cheers,
> Hans Henrik
> 
> On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> > Dear All,
> >
> > Here is a question about how to remove an OST completely without
> > restarting the Lustre file system. Our Lustre version is 2.12.6.
> >
> > We did the following steps to remove the OST:
> >
> > 1. Lock the OST (e.g., chome-OST0008) such that it will not create
> >new files (run in the MDT server):
> >
> >lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=0
> >
> > 2. Locate the list of files in the target OST: (e.g., chome-OST0008):
> >(run in the client):
> >
> >lfs find --obd chome-OST0008_UUID /home
> >
> > 3. Remove OST (run in the MDT server):
> >lctl conf_param osc.chome-OST0008-osc-MDT.active=0
> >
> > 4. Unmount the OST partition (run in the OST server)
> >
> > After that, the total size of the Lustre file system decreased, and
> > everything looks fine. However, without restarting (i.e., rebooting
> > Lustre MDT / OST servers), we still feel that the removed OST is
> > still exists. For example, in MDT:
> >
> > # lctl get_param osc.*.active
> > osc.chome-OST-osc-MDT.active=1
> > osc.chome-OST0001-osc-MDT.active=1
> > osc.chome-OST0002-osc-MDT.active=1
> > osc.chome-OST0003-osc-MDT.active=1
> > osc.chome-OST0008-osc-MDT.active=0
> > osc.chome-OST0010-osc-MDT.active=1
> > osc.chome-OST0011-osc-MDT.active=1
> > osc.chome-OST0012-osc-MDT.active=1
> > osc.chome-OST0013-osc-MDT.active=1
> > osc.chome-OST0014-osc-MDT.active=1
> >
> > We still see chome-OST0008. And in dmesg of MDT, we see a lot of:
> >
> > LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) 
> > chome-OST0008-osc-MDT:osp_attr_get update error 
> > [0x10008:0x10a54c:0x0]: rc = -108
> >
> > In addition, when running LFSCK in the MDT server:
> >
> > lctl lfsck_start -A
> >
> > even after all the works of MDT and OST are completed, we still see that 
> > (run in MDT server):
> >
> > lctl get_param mdd.*.lfsck_layout
> >
> > the status is not completed:
> >
> > mdd.chome-MDT.lfsck_layout=
> > name: lfsck_layout
> > magic: 0xb1732fed
> > version: 2
> > status: partial
> > flags: incomplete
> > param: all_targets
> > last_completed_time: 1614762495
> > time_since_last_completed: 4325 seconds
> > 
> >
> > We suspect that the "incomplete" part might due to the already removed
> > chome-OST0008.
> >
> > Is there any way to completely remove the chome-OST0008 from the Lustre
> > file system ? since that OST device has already been reformatted for
> > other usage.
> >
> > Thanks very much.
> >
> >
> > T.H.Hsieh
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] How to remove an OST completely

2021-03-03 Thread Tung-Han Hsieh via lustre-discuss

Dear All,

Here is a question about how to remove an OST completely without
restarting the Lustre file system. Our Lustre version is 2.12.6.

We did the following steps to remove the OST:

1. Lock the OST (e.g., chome-OST0008) such that it will not create
   new files (run in the MDT server):

   lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=0

2. Locate the list of files in the target OST: (e.g., chome-OST0008):
   (run in the client):

   lfs find --obd chome-OST0008_UUID /home

3. Remove OST (run in the MDT server):
   lctl conf_param osc.chome-OST0008-osc-MDT.active=0

4. Unmount the OST partition (run in the OST server)

After that, the total size of the Lustre file system decreased, and
everything looks fine. However, without restarting (i.e., rebooting
Lustre MDT / OST servers), we still feel that the removed OST is
still exists. For example, in MDT:

# lctl get_param osc.*.active
osc.chome-OST-osc-MDT.active=1
osc.chome-OST0001-osc-MDT.active=1
osc.chome-OST0002-osc-MDT.active=1
osc.chome-OST0003-osc-MDT.active=1
osc.chome-OST0008-osc-MDT.active=0
osc.chome-OST0010-osc-MDT.active=1
osc.chome-OST0011-osc-MDT.active=1
osc.chome-OST0012-osc-MDT.active=1
osc.chome-OST0013-osc-MDT.active=1
osc.chome-OST0014-osc-MDT.active=1

We still see chome-OST0008. And in dmesg of MDT, we see a lot of:

LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) 
chome-OST0008-osc-MDT:osp_attr_get update error [0x10008:0x10a54c:0x0]: 
rc = -108

In addition, when running LFSCK in the MDT server:

lctl lfsck_start -A

even after all the works of MDT and OST are completed, we still see that 
(run in MDT server):

lctl get_param mdd.*.lfsck_layout

the status is not completed:

mdd.chome-MDT.lfsck_layout=
name: lfsck_layout
magic: 0xb1732fed
version: 2
status: partial
flags: incomplete
param: all_targets
last_completed_time: 1614762495
time_since_last_completed: 4325 seconds


We suspect that the "incomplete" part might due to the already removed
chome-OST0008.

Is there any way to completely remove the chome-OST0008 from the Lustre
file system ? since that OST device has already been reformatted for
other usage.

Thanks very much.


T.H.Hsieh
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Cannot move data after upgrading to Lustre 2.12.6

2021-02-25 Thread Tung-Han Hsieh via lustre-discuss

Dear Cory,

Thank you very much for your reply. And sorry for my delayed report,
because these days we dit some tests and try to narrow down the problem.

I am not sure whether the "mv" problem we found is related to the
bug report:

https://jira.whamcloud.com/browse/LU-13392

or not. But these days we tried running LFSCK, which helped a lot.
Here I report our test results:

1. Create Lustre file system 1.8.8 (ldiskfs based), and store some
   data trees in it.

2. Upgrade to Lustre file system 2.12.6 (still ldiskfs based).

3. Enter the Lustre file system mount point (in the client side),
   running:

mv dir1/file dir2/

   failed with

mv: cannot move 'dir1/file' to 'dir2/file': No data available

4. Running LFSCK by the following two ways:

   - LFSCK runs for all MDT and OSTs:
 lctl lfsck_start -A

   - LFSCK runs for only the MDT:
 lctl lfsck_start -M 

   No matter which way, the above "mv" problem is fixed. In addition,
   after running LFSCK for all MDT and OSTs, we checked the final
   reports of LFSCK via:

   - In MDT server:
lctl get_param -n mdd.*.lfsck_namespace
lctl get_param -n mdd.*.lfsck_layout

   - In OST server:
lctl get_param -n obdfilter.*.lfsck_layout

   and saw that only MDT have fixed records (in failed_phase1, dirent_repaired,
   and linkea_repaired). OSTs do not. So we conclude that the "mv" problem
   only occures in MDT. As a result, running LFSCK at least for MDT devices
   would be an important SOP after upgrading Lustre file system.

5. However, there is one "mv" problem still remaining. Suppose in the
   client side the Lustre file system is mounted at /lustre, then

mv /lustre/file /lustre/dir/

   still failed. That is, we still cannot move the file under the "ROOT"
   of the Lustre file system to other sub-directories. It seems that the
   fixing in LFSCK was overlooked the Lustre "ROOT" directory itself.

So far we still do not find any way to fix the final problem mentioned
in 5. Any idea is very welcome. If there is no further input, we will
probably report this as a bug to the Jira Lustre Bug tracking website.

Cheers,

T.H.Hsieh

On Mon, Feb 22, 2021 at 01:57:37PM +, Spitz, Cory James wrote:
> Hello, T.H.Hsieh.
> 
> Your report sounds familiar to me.  Although you are concerned about upgrades 
> from 1.8.x, there were some other troubles reported when updating from 
> earlier 2.x.  You might want to take a closer look at 
> https://jira.whamcloud.com/browse/LU-13392.  I didn’t review it deeply and 
> maybe it isn’t even closely related to your trouble, but you may find it 
> helpful.  In any case since you seem so willing to experiment, I’m curious 
> what happens if you run LFSCK.  LFSCK ought to be able to add and check 
> FID-in-dirent and linkEA entries, both of which won’t exist in a 1.8.x 
> filesystem.  I think Xyratex even released an upgrade tool to make these 
> sorts of updates prior to mounting under 2.x for the first time.
> 
> -Cory
> 
> On 2/22/21, 1:22 AM, "lustre-discuss" 
>  wrote:
> 
> 
> Dear All,
> 
> After some tests in these days, now I want to report what I have found
> about "moving data error" more detailly.
> 
> As long as the Lustre file system was upgraded from the very old version
> 1.8.8 to 2.12.6, the problem appears, where MDT is ldiskfs based. Although
> probably no body care about the very old version like 1.8.8, but in case
> some people might encounter similar scenario, then probably this message
> could provide some information.
> 
> The problem I have found is: For any directories A/ and B/ created under
> Lustre-1.8.8, then after upgrading to Lustre-2.12.6, running the following
> "mv" command:
> 
> mv A/file B/
> 
> i.e., moving a file from A/ to B/, there is an error message and file
> moving failed:
> 
>  mv: cannot move 'A/file' to 'B/file': No data available
> 
> 
> I tested the following upgrade procedures:
> 
> 1. Lustre-1.8.8 -> Lustre-2.10.7 -> Lustre-2.12.6 (has problem)
>- Lustre file system created with Lustre-1.8.8, and directoies A/ and
>  B/ are stored in the Lustre file system (A/ and B/ have some files).
> 
>- Lustre-1.8.8 -> Lustre-2.10.7:
>  After installing 2.10.7 of Lustre software and corresponding e2fsprogs:
>  $ tunefs.lustre --writeconf /dev/sda1  (the MDT partition)
>  $ tunefs.lustre --writeconf /dev/sda2  (the OST partition)
>  $ tune2fs -O dirdata /dev/sda1
>  $ tune2fs -O dirdata /dev/sda2
> 
>  Then mounting Lustre file system in the client, no problem at all.
> 
>- Lustre-2.10.7 -> Lustre-2.12.6:
>  After installing 2.12.6 of Lustre software and corresponding e2fsprogs:
>  $ tunefs.lustre --writeconf /dev/sda1  (the MDT partition)
>  $ tunefs.lustre --writeconf /dev/sda2  (the OST partition)
> 
>  Then mounting Lustre file system in the client, the "mv" problem 
> appeared.
> 
> 2. Lustre-1.8.8 -> Lustre-2.12.6 (has problem)
>- Lustre file

Re: [lustre-discuss] Coordinating cluster start and shutdown?

Re: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

Re: [lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

Re: [lustre-discuss] [EXTERNAL] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

[lustre-discuss] Cannot mount MDT after upgrading from Lustre 2.12.6 to 2.15.3

Re: [lustre-discuss] How to remove an OST completely

Re: [lustre-discuss] How to remove an OST completely

Re: [lustre-discuss] How to remove an OST completely

Re: [lustre-discuss] How to remove an OST completely

Re: [lustre-discuss] How to remove an OST completely

[lustre-discuss] How to remove an OST completely

Re: [lustre-discuss] Cannot move data after upgrading to Lustre 2.12.6

12 matches

Site Navigation

Mail list logo

Footer information