On one of our filesystem, we add a few new OSTs almost every month with no downtime, this is very convenient. The only thing that I would recommend is to avoid doing that during a peak of I/Os on your filesystem (we usually do it as early as possible in the morning), as the added OSTs will immediately an heavy I/O load, likely because they are empty.
Best, Stephane > On Feb 22, 2019, at 2:03 PM, Andreas Dilger <adil...@whamcloud.com> wrote: > > This is not really correct. > > Lustre clients can handle the addition of OSTs to a running filesystem. The > MGS will register the new OSTs, and the clients will be notified by the MGS > that the OSTs have been added, so no need to unmount the clients during this > process. > > > Cheers, Andreas > > On Feb 21, 2019, at 19:23, Raj <rajgau...@gmail.com> wrote: > >> Hello Raj, >> It’s best and safe to unmount from all the clients and then do the upgrade. >> Your FS is getting more OSTs and changing conf in the existing ones, your >> client needs to get the new layout by remounting it. >> Also you mentioned about client eviction, during eviction the client has to >> drop it’s dirty pages and all the open file descriptors in the FS will be >> gone. >> >> On Thu, Feb 21, 2019 at 12:25 PM Raj Ayyampalayam <ans...@gmail.com> wrote: >> What can I expect to happen to the jobs that are suspended during the file >> system restart? >> Will the processes holding an open file handle die when I unsuspend them >> after the filesystem restart? >> >> Thanks! >> -Raj >> >> >> On Thu, Feb 21, 2019 at 12:52 PM Colin Faber <cfa...@gmail.com> wrote: >> Ah yes, >> >> If you're adding to an existing OSS, then you will need to reconfigure the >> file system which requires writeconf event. >> >> On Thu, Feb 21, 2019 at 10:00 AM Raj Ayyampalayam <ans...@gmail.com> wrote: >> The new OST's will be added to the existing file system (the OSS nodes are >> already part of the filesystem), I will have to re-configure the current HA >> resource configuration to tell it about the 4 new OST's. >> Our exascaler's HA monitors the individual OST and I need to re-configure >> the HA on the existing filesystem. >> >> Our vendor support has confirmed that we would have to restart the >> filesystem if we want to regenerate the HA configs to include the new OST's. >> >> Thanks, >> -Raj >> >> >> On Thu, Feb 21, 2019 at 11:23 AM Colin Faber <cfa...@gmail.com> wrote: >> It seems to me that steps may still be missing? >> >> You're going to rack/stack and provision the OSS nodes with new OSTs'. >> >> Then you're going to introduce failover options somewhere? new osts? >> existing system? etc? >> >> If you're introducing failover with the new OST's and leaving the existing >> system in place, you should be able to accomplish this without bringing the >> system offline. >> >> If you're going to be introducing failover to your existing system then you >> will need to reconfigure the file system to accommodate the new failover >> settings (failover nides, etc.) >> >> -cf >> >> >> On Thu, Feb 21, 2019 at 9:13 AM Raj Ayyampalayam <ans...@gmail.com> wrote: >> Our upgrade strategy is as follows: >> >> 1) Load all disks into the storage array. >> 2) Create RAID pools and virtual disks. >> 3) Create lustre file system using mkfs.lustre command. (I still have to >> figure out all the parameters used on the existing OSTs). >> 4) Create mount points on all OSSs. >> 5) Mount the lustre OSTs. >> 6) Maybe rebalance the filesystem. >> My understanding is that the above can be done without bringing the >> filesystem down. I want to create the HA configuration (corosync and >> pacemaker) for the new OSTs. This step requires the filesystem to be down. I >> want to know what would happen to the suspended processes across the cluster >> when I bring the filesystem down to re-generate the HA configs. >> >> Thanks, >> -Raj >> >> On Thu, Feb 21, 2019 at 12:59 AM Colin Faber <cfa...@gmail.com> wrote: >> Can you provide more details on your upgrade strategy? In some cases >> expanding your storage shouldn't impact client / job activity at all. >> >> On Wed, Feb 20, 2019, 11:09 AM Raj Ayyampalayam <ans...@gmail.com> wrote: >> Hello, >> >> We are planning on expanding our storage by adding more OSTs to our lustre >> file system. It looks like it would be easier to expand if we bring the >> filesystem down and perform the necessary operations. We are planning to >> suspend all the jobs running on the cluster. We originally planned to add >> new OSTs to the live filesystem. >> >> We are trying to determine the potential impact to the suspended jobs if we >> bring down the filesystem for the upgrade. >> One of the questions we have is what would happen to the suspended processes >> that hold an open file handle in the lustre file system when the filesystem >> is brought down for the upgrade? >> Will they recover from the client eviction? >> >> We do have vendor support and have engaged them. I wanted to ask the >> community and get some feedback. >> >> Thanks, >> -Raj >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org