Got it. I rather be safe than sorry. This is my first time doing a lustre configuration change.
Raj On Thu, Feb 21, 2019, 11:55 PM Raj <rajgau...@gmail.com> wrote: > I also agree with Colin's comment. > If the current OSTs are not touched, and you are only adding new OSTs to > existing OSS nodes and adding new ost-mount resources in your existing > (already running) Pacemaker configuration, you can achieve the upgrade with > no downtime. If your Corosync-Pacemaker configuration is working correctly, > you can failover and failback and take turn to reboot each OSS nodes. But, > chances of human error is too high in doing this. > > On Thu, Feb 21, 2019 at 10:30 PM Raj Ayyampalayam <ans...@gmail.com> > wrote: > >> Hi Raj, >> >> Thanks for the explanation. We will have to rethink our upgrade process. >> >> Thanks again. >> Raj >> >> On Thu, Feb 21, 2019, 10:23 PM Raj <rajgau...@gmail.com> wrote: >> >>> Hello Raj, >>> It’s best and safe to unmount from all the clients and then do the >>> upgrade. Your FS is getting more OSTs and changing conf in the existing >>> ones, your client needs to get the new layout by remounting it. >>> Also you mentioned about client eviction, during eviction the client has >>> to drop it’s dirty pages and all the open file descriptors in the FS will >>> be gone. >>> >>> On Thu, Feb 21, 2019 at 12:25 PM Raj Ayyampalayam <ans...@gmail.com> >>> wrote: >>> >>>> What can I expect to happen to the jobs that are suspended during the >>>> file system restart? >>>> Will the processes holding an open file handle die when I unsuspend >>>> them after the filesystem restart? >>>> >>>> Thanks! >>>> -Raj >>>> >>>> >>>> On Thu, Feb 21, 2019 at 12:52 PM Colin Faber <cfa...@gmail.com> wrote: >>>> >>>>> Ah yes, >>>>> >>>>> If you're adding to an existing OSS, then you will need to reconfigure >>>>> the file system which requires writeconf event. >>>>> >>>> >>>>> On Thu, Feb 21, 2019 at 10:00 AM Raj Ayyampalayam <ans...@gmail.com> >>>>> wrote: >>>>> >>>>>> The new OST's will be added to the existing file system (the OSS >>>>>> nodes are already part of the filesystem), I will have to re-configure >>>>>> the >>>>>> current HA resource configuration to tell it about the 4 new OST's. >>>>>> Our exascaler's HA monitors the individual OST and I need to >>>>>> re-configure the HA on the existing filesystem. >>>>>> >>>>>> Our vendor support has confirmed that we would have to restart the >>>>>> filesystem if we want to regenerate the HA configs to include the new >>>>>> OST's. >>>>>> >>>>>> Thanks, >>>>>> -Raj >>>>>> >>>>>> >>>>>> On Thu, Feb 21, 2019 at 11:23 AM Colin Faber <cfa...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> It seems to me that steps may still be missing? >>>>>>> >>>>>>> You're going to rack/stack and provision the OSS nodes with new >>>>>>> OSTs'. >>>>>>> >>>>>>> Then you're going to introduce failover options somewhere? new osts? >>>>>>> existing system? etc? >>>>>>> >>>>>>> If you're introducing failover with the new OST's and leaving the >>>>>>> existing system in place, you should be able to accomplish this without >>>>>>> bringing the system offline. >>>>>>> >>>>>>> If you're going to be introducing failover to your existing system >>>>>>> then you will need to reconfigure the file system to accommodate the new >>>>>>> failover settings (failover nides, etc.) >>>>>>> >>>>>>> -cf >>>>>>> >>>>>>> >>>>>>> On Thu, Feb 21, 2019 at 9:13 AM Raj Ayyampalayam <ans...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Our upgrade strategy is as follows: >>>>>>>> >>>>>>>> 1) Load all disks into the storage array. >>>>>>>> 2) Create RAID pools and virtual disks. >>>>>>>> 3) Create lustre file system using mkfs.lustre command. (I still >>>>>>>> have to figure out all the parameters used on the existing OSTs). >>>>>>>> 4) Create mount points on all OSSs. >>>>>>>> 5) Mount the lustre OSTs. >>>>>>>> 6) Maybe rebalance the filesystem. >>>>>>>> My understanding is that the above can be done without bringing the >>>>>>>> filesystem down. I want to create the HA configuration (corosync and >>>>>>>> pacemaker) for the new OSTs. This step requires the filesystem to be >>>>>>>> down. >>>>>>>> I want to know what would happen to the suspended processes across the >>>>>>>> cluster when I bring the filesystem down to re-generate the HA configs. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -Raj >>>>>>>> >>>>>>>> On Thu, Feb 21, 2019 at 12:59 AM Colin Faber <cfa...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Can you provide more details on your upgrade strategy? In some >>>>>>>>> cases expanding your storage shouldn't impact client / job activity >>>>>>>>> at all. >>>>>>>>> >>>>>>>>> On Wed, Feb 20, 2019, 11:09 AM Raj Ayyampalayam <ans...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> We are planning on expanding our storage by adding more OSTs to >>>>>>>>>> our lustre file system. It looks like it would be easier to expand >>>>>>>>>> if we >>>>>>>>>> bring the filesystem down and perform the necessary operations. We >>>>>>>>>> are >>>>>>>>>> planning to suspend all the jobs running on the cluster. We >>>>>>>>>> originally >>>>>>>>>> planned to add new OSTs to the live filesystem. >>>>>>>>>> >>>>>>>>>> We are trying to determine the potential impact to the suspended >>>>>>>>>> jobs if we bring down the filesystem for the upgrade. >>>>>>>>>> One of the questions we have is what would happen to the >>>>>>>>>> suspended processes that hold an open file handle in the lustre file >>>>>>>>>> system >>>>>>>>>> when the filesystem is brought down for the upgrade? >>>>>>>>>> Will they recover from the client eviction? >>>>>>>>>> >>>>>>>>>> We do have vendor support and have engaged them. I wanted to ask >>>>>>>>>> the community and get some feedback. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -Raj >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>>> lustre-discuss mailing list >>>>>>>>>> lustre-discuss@lists.lustre.org >>>>>>>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>> lustre-discuss mailing list >>>> lustre-discuss@lists.lustre.org >>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >>>> >>>
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org