Re: [lustre-discuss] Suspended jobs and rebooting lustre servers
Can you provide more details on your upgrade strategy? In some cases expanding your storage shouldn't impact client / job activity at all. On Wed, Feb 20, 2019, 11:09 AM Raj Ayyampalayam wrote: > Hello, > > We are planning on expanding our storage by adding more OSTs to our lustre > file system. It looks like it would be easier to expand if we bring the > filesystem down and perform the necessary operations. We are planning to > suspend all the jobs running on the cluster. We originally planned to add > new OSTs to the live filesystem. > > We are trying to determine the potential impact to the suspended jobs if > we bring down the filesystem for the upgrade. > One of the questions we have is what would happen to the suspended > processes that hold an open file handle in the lustre file system when the > filesystem is brought down for the upgrade? > Will they recover from the client eviction? > > We do have vendor support and have engaged them. I wanted to ask the > community and get some feedback. > > Thanks, > -Raj > ___ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Suspended jobs and rebooting lustre servers
Hi Raj, You can add the OSTs online, we have been doing it but if you are expanding the storage array, you might want to think about what is involved such as cabling etc depends on the recommendation from your storage vendor. We added an expansion on every Dell storage array last year and because of the physical location of these storage, we needed to do a full shutdown. It means we created a maintenance reservation and performed a full filesystem shutdown. In many occasions when we perform Lustre maintenance, we have suspended jobs but that was when we know the filesystem will stay online, some clients might get evicted during the failover but they will reconnect when jobs were resumed. In your case, if you want to do a full filesystem shutdown, you will have to unmount all the Lustre clients, it means the jobs will need to be killed in order to unmount the filesystem. We always use cat /proc/sys/lnet/peers or lshowmount to make sure that there are no other clients connected before doing the full FS shut down. Hope it helps. -- *Gin Tan* MASSIVE support and consulting services *Monash eResearch Centre* Monash University 15 Innovation Walk Ground Floor, Room G22 Clayton Campus Wellington Road Clayton VIC 3800 Australia T: +61 3 9902 0245 E: gin@monash.edu Z: https://monash.zoom.us/my/gintan www.monash.edu.au/eresearch > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Suspended jobs and rebooting lustre servers
Hello, We are planning on expanding our storage by adding more OSTs to our lustre file system. It looks like it would be easier to expand if we bring the filesystem down and perform the necessary operations. We are planning to suspend all the jobs running on the cluster. We originally planned to add new OSTs to the live filesystem. We are trying to determine the potential impact to the suspended jobs if we bring down the filesystem for the upgrade. One of the questions we have is what would happen to the suspended processes that hold an open file handle in the lustre file system when the filesystem is brought down for the upgrade? Will they recover from the client eviction? We do have vendor support and have engaged them. I wanted to ask the community and get some feedback. Thanks, -Raj ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Migrate MGS to ZFS
Thank you Andreas. I will try to migrate the MGS according my previous idea, based in the lustre operations manual section for separate a combined MDT/MGS. I agree that the dd backup of the current combined MDT/MGS is mandatory before try to perform the migration. Regards. = Fernando Pérez Institut de Ciències del Mar (CSIC) Departament Oceanografía Física i Tecnològica Passeig Marítim de la Barceloneta,37-49 08003 Barcelona Phone: (+34) 93 230 96 35 = On 2/20/19 4:33 AM, Andreas Dilger wrote: PS: it is always a good idea to make a backup of your MDT, since it is relatively small compared to the rest of the filesystem. A full-device "dd" copy doesn't take too long and is the most accurate backup for ldiskfs. Cheers, Andreas On Feb 19, 2019, at 19:31, Andreas Dilger wrote: Yes, it is possible to migrate the MGS files to another device as you propose. I don't think there is any particular difference if you move it to a separate ldiskfs or ZFS target. One caveat is that we don't test combined ZFS and ldiskfs targets on the same node, though in theory it would work. Migrating the MDT from ldiskfs to ZFS is also possible with newer versions of Lustre (2.12 for sure, I don't recall if it is in 2.10 or not). You need to follow a special process to do this, please see the Lustre Operations Manual for details. Cheers, Andreas On Feb 19, 2019, at 17:48, Fernando Pérez wrote: Dear lustre experts. Whats is the best way to migrate a MGS device to ZFS? Copy the CONFIGS/filesystem_name-* files from the old ldiskfs device to the new ZFS MGS device? Currently we have a combined MDT/MGT under ldiskfs with lustre 2.10.4. We want to upgrade to lustre 2.12.0 and then separate the combined MDT/MGT and migrate MDT and MGT to separate ZFS devices. Regards. = Fernando Pérez Institut de Ciències del Mar (CMIMA-CSIC) Departament Oceanografía Física i Tecnològica Passeig Marítim de la Barceloneta,37-49 08003 Barcelona Phone: (+34) 93 230 96 35 = ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org