Hi Lewis, it's difficult to tell how much data loss was actually related to the lustre upgrade itself. We have upgraded 6 file systems and we had to do it more or less in one shot, because at that time they were using a common MGS server. All servers of one file system must be on the same level (at least for the major upgrade 1.8 to 2.x, there is rolling upgrade for minor versions in the lustre 2 branch now, but I have no experience with that).
In any case the file systems should be clean before starting the upgrade, so I would recommend to run e2fsck on all targets and repair them before starting the upgrade. We did so, but unfortunately our e2fsprogs were not really up to date and after our lustre upgrade a lot of fixes for e2fsprogs were committed to whamclouds e2fsprogs git. So, probably some errors on the file systems were still present, but unnoticed when we did the upgrade. Lustre 2 introduces the FID (which is something like an inode number, where lustre 1.8 used the inode number of the underlying ldiskfs, but with the possibility to have several MDTs in one file system a replacement was needed). The FID is stored in the inode, but it can also be activated that the FIDs are stored in the directory node, which makes lookups faster, especially when there are many files in a directory. However, there were bugs in the code that takes care about adding the FID to the directory entry when the file system is converted from 1.8 to 2.x. So, I would recommend to use a version in which these bug are solved. We went to 2.4.1 that time. By default this fid_in_dirent feature is not automatically enabled, however, this is the only point where a performance boost may be expected... so we took the risk to enable this... and ran into some bugs. We had other file systems, still on 1.8, so with the server upgrade we didn't upgrade the clients, because lustre 2 clients wouldn't have been able to mount the 1.8 file systems. And we use quotas, and for this you need the 1.8.9 client with a patch that corrects a defect of the 1.8.9 client when it talks to 2.x servers (LU-3067). However, older 1.8 clients don't support the Lustre 2 quota (which came in 2.2 or 2.4, I'm not 100% sure). BTW, it still runs out of sync from time to time, but the limit seems to be fine now, it's just the numbers the users see. lfs quota prints out too low numbers and users run out of quota earlier than they expect... It's better in the latest 2.5 versions now. Here an unsorted(!) list of bugs we have hit during the lustre upgrade. For most of them we weren't the first ones, but I guess you could wait forever for the version in which all bugs are resolved :-) LU-3067 - already mentioned above, a patch for 1.8.9 clients interoperating with 2.x servers, however, 1.8.9 is needed for having quota working. Without this patch clients become unresponsive, 100% cpu load, then just hang and devices become unavailable, reboot doesn't work, so power cycle needed, but after a while the problem reappeared LU-4504 - e2fsck noticed quota issues similar to this bug on osts - use latest e2fsprogs, check again and then the ldiskfs backend doesn't run into this anymore e2fsck noticed quota issues on MDT "Problem in HTREE directory inode 21685465: block #16 not referenced" however, could be fixed by e2fsck LU-5626 mdt becomes readonly: one file system where the MDT was corrupted at earlier stage and obviously not fully repaired lbuged upon MDT mount, could only be mounted with noscrub option the mdt group_upcall (which can be configured with tunefs) used to be /usr/sbin/l_getgroups in lustre 1.8 and it was set by default - the program is called l_getidentity now, is not configured by default anymore. You should either change it with tunefs, or put an appropriate link in place as a fallback. Anyhow, lustre 2 file systems don't use it by default anymore. They just trust the client. It also means that users/groups are not needed anymore on lustre the servers. (we had lokal passwd/group files there so that secondary groups work properly, alternatively you could configure ldap, but without group_upcall, all this is handled by the lustre client. LU-5626 and LU-2627: .. directory entries were damaged by adding the FID, once all old directories were converted and all files somehow recovered (in several consecutive attempts), the problem is gone. The number of emergency maintenances is basically limited by the depth of your directory structure. It could be repaired by running e2fsck, followed by manually moving back everything (save the log of the e2fsck which tells you the relation of the objects in lost+found and their original path!) LU-4504 quota out of sync: turn off quota, run e2fsck, turn it on again - I believe that's something which must be done anyhow quite often, because there is no quotacheck anymore. It's run in the background when enabling quotas, but file systems have to be unmounted for this. Related to quota, there is a change in the lfs setquota command. The manual sais that soft limits must be < hard limits, but you have to specify them. You could put a zero, but in later versions it must be present on the command line. In 1.8 lfs setquota was more relaxed, but it simply didn't initialize some values properly. This change caused our quota management to fail. However, after fixing the call it worked fine again. LU-3861 quota severely broken: It was not possible to move files for some users/groups while it worked for others. Copying on the other hand seemed to work. Maybe this was in combination with one of the first attempts to fix the fid issue. However, neither e2fsck, nor tune2fs could fix the problem. We had to upgrade to e2fsprogs 1.42.7 which then contained some improvements which made e2fsk able to fix this and allowed ldiskfs running more stable afterwards. LU-3917: During the upgrade we needed to re-create the PENDING direktory on the ldiskfs level on one of our file systems LU-4743: We had to remove the CATALOGS file on another file system (otherwise the MDT wouldn't mount) And if you upgrade to 2.5, there was a bug which caused the MDS to crash when large_xattr (for wide striping) is not set and a user tries to use it anyway. But probably you don't have that many OSTs because the number was limited anyway in 1.8. A couple of other problems were related to the software which our supplier uses to manage the lustre servers, but that's not a lustre issue, it's just how a large number of servers is booted, maintained and configured. Anyhow, fighting these problems on top didn't make things easier ;-) That was a very much shortened list of our upgrade trouble (shortened not in the number of issues, but leaving out the log messages, discussions, attempts to repair things...). Later, we also have configured a separate MGS for each file system, upgraded once more to 2.5, reconfigured the lnet configuration - that was all much less trouble than the upgrade from 1.8 to 2.4.1 - maybe looking back that was a bad version but at some point you have to decide for a target version - and maybe I would do exactly the same step again, now with the knowledge what can happen and on which things I must keep an eye. I wouldn't enable the fid_in_dirent feature and I would for sure update e2fsprogs as a first step. best regards, Martin On 09/09/2015 03:16 PM, Lewis Hyatt wrote: > OK thanks for sharing your experience. Unfortunately I can't see a way > for us to get duplicate hardware, so we will have to give it a shot; > we were going to try the artificial test first as well. If you don't > mind taking another minute, I'd be curious what was the nature of the > problems you ran into... was it potential data loss, or just issues > getting it to perform the upgrade? Thanks again. > > -lewis
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org