Re: [lustre-discuss] Lustre OSS and clients on same physical server
Cory, For what it¹s worth, the existing tests and framework run in the single-node configuration without any special steps (or at least did within the last year or so). You just build lustre, run llmount to get servers up and client mounted, and then run tests/sanity.sh. You then get varying results each time you do this. Some tests are themselves flawed (ie racy), other tests are themselves fine but fail intermittently because of some more general problem like memory management issues. The issues that arise typically aren¹t easy to diagnose in my experience. The problem is resources - using resources to investigate this behavior instead of better testing in the typical multi-node configuration, or implementing new features, or doing code cleanup, etc. In other words, sadly, the usual problem. -Olaf On 7/15/16, 1:38 PM, "lustre-discuss on behalf of Cory Spitz"wrote: >Good input, Chris. Thanks. > >It sounds like we might need to move this over to lustre-devel. > >Someday, I¹d like to see us address some of these things and then add >some test framework tests that co-locate clients with servers. Not >necessarily because we expect co-located services, but because it could >be a useful driver of keeping Lustre a good memory manager. > >-Cory > >-- > > >On 7/15/16, 3:17 PM, "Christopher J. Morrone" wrote: > >On 07/15/2016 12:11 PM, Cory Spitz wrote: >> Chris, >> >> On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J. >>Morrone" >morro...@llnl.gov> wrote: >> >>> If you put both the client and server code on the same node and do any >>> serious amount of IO, it has been pretty easy in the past to get that >>> node to go completely out to lunch thrashing on memory issues >> >> Chris, you wrote ³in the past.² How current is your experience? I¹m >>sure it is still a good word of caution, but I¹d venture that modern >>Lustre (on a modern kernel) might fare a tad bit better. Does anyone >>have experience on current releases? > >Pretty recent. > >We have had memory management issues with servers and clients >independently at pretty much all periods of time, recent history >included. Putting the components together only exacerbates the issues. > >Lustre still has too many of its own caches with fixed, or nearly fixed >caches size, and places where it does not play well with the kernel >memory reclaim mechanisms. There are too many places where lustre >ignores the kernels requests for memory reclaim, and often goes on to >use even more memory. That significantly impedes the kernel's ability >to keep things responsive when memory contention arises. > >> I understand that it isn¹t a design goal for us, but perhaps we should >>pay some attention to this possibility? Perhaps we¹ll have interest in >>co-locating clients on servers in the near future as part of a >>replication, network striping, or archiving capability? > >There is going to need to be a lot of work to have Lustre's memory usage >be more dynamic, more aware of changing conditions on the system, and >more responsive to the kernel's requests to free memory. I imagine it >won't be terribly easy, especially in areas such as dirty and unstable >data which cannot be freed until it is safe on disk. But even for that, >there are no doubt ways to make things better. > >Chris > > > >___ >lustre-discuss mailing list >lustre-discuss@lists.lustre.org >http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre OSS and clients on same physical server
Good input, Chris. Thanks. It sounds like we might need to move this over to lustre-devel. Someday, I’d like to see us address some of these things and then add some test framework tests that co-locate clients with servers. Not necessarily because we expect co-located services, but because it could be a useful driver of keeping Lustre a good memory manager. -Cory -- On 7/15/16, 3:17 PM, "Christopher J. Morrone"wrote: On 07/15/2016 12:11 PM, Cory Spitz wrote: > Chris, > > On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J. Morrone" > > wrote: > >> If you put both the client and server code on the same node and do any >> serious amount of IO, it has been pretty easy in the past to get that >> node to go completely out to lunch thrashing on memory issues > > Chris, you wrote “in the past.” How current is your experience? I’m sure it > is still a good word of caution, but I’d venture that modern Lustre (on a > modern kernel) might fare a tad bit better. Does anyone have experience on > current releases? Pretty recent. We have had memory management issues with servers and clients independently at pretty much all periods of time, recent history included. Putting the components together only exacerbates the issues. Lustre still has too many of its own caches with fixed, or nearly fixed caches size, and places where it does not play well with the kernel memory reclaim mechanisms. There are too many places where lustre ignores the kernels requests for memory reclaim, and often goes on to use even more memory. That significantly impedes the kernel's ability to keep things responsive when memory contention arises. > I understand that it isn’t a design goal for us, but perhaps we should pay > some attention to this possibility? Perhaps we’ll have interest in > co-locating clients on servers in the near future as part of a replication, > network striping, or archiving capability? There is going to need to be a lot of work to have Lustre's memory usage be more dynamic, more aware of changing conditions on the system, and more responsive to the kernel's requests to free memory. I imagine it won't be terribly easy, especially in areas such as dirty and unstable data which cannot be freed until it is safe on disk. But even for that, there are no doubt ways to make things better. Chris ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre OSS and clients on same physical server
On 07/15/2016 12:11 PM, Cory Spitz wrote: > Chris, > > On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J. Morrone" >> wrote: > >> If you put both the client and server code on the same node and do any >> serious amount of IO, it has been pretty easy in the past to get that >> node to go completely out to lunch thrashing on memory issues > > Chris, you wrote “in the past.” How current is your experience? I’m sure it > is still a good word of caution, but I’d venture that modern Lustre (on a > modern kernel) might fare a tad bit better. Does anyone have experience on > current releases? Pretty recent. We have had memory management issues with servers and clients independently at pretty much all periods of time, recent history included. Putting the components together only exacerbates the issues. Lustre still has too many of its own caches with fixed, or nearly fixed caches size, and places where it does not play well with the kernel memory reclaim mechanisms. There are too many places where lustre ignores the kernels requests for memory reclaim, and often goes on to use even more memory. That significantly impedes the kernel's ability to keep things responsive when memory contention arises. > I understand that it isn’t a design goal for us, but perhaps we should pay > some attention to this possibility? Perhaps we’ll have interest in > co-locating clients on servers in the near future as part of a replication, > network striping, or archiving capability? There is going to need to be a lot of work to have Lustre's memory usage be more dynamic, more aware of changing conditions on the system, and more responsive to the kernel's requests to free memory. I imagine it won't be terribly easy, especially in areas such as dirty and unstable data which cannot be freed until it is safe on disk. But even for that, there are no doubt ways to make things better. Chris ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Lustre OSS and clients on same physical server
Chris, On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J. Morrone"wrote: > If you put both the client and server code on the same node and do any > serious amount of IO, it has been pretty easy in the past to get that > node to go completely out to lunch thrashing on memory issues Chris, you wrote “in the past.” How current is your experience? I’m sure it is still a good word of caution, but I’d venture that modern Lustre (on a modern kernel) might fare a tad bit better. Does anyone have experience on current releases? I understand that it isn’t a design goal for us, but perhaps we should pay some attention to this possibility? Perhaps we’ll have interest in co-locating clients on servers in the near future as part of a replication, network striping, or archiving capability? -Cory ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Error on a zpool underlying an OST
Hi Bob, Thank you for the notes. I began to examining the zpool before obtaining the new LSI card. I was unable to start lustre without the new card. Once I installed the replacement and re-examined the zpools the resilvered pool was re-scrubbed, exported and reimported, and to my surprise repaired. As a further test, I removed the spare disk that replaced the "apparent" bad disk and re-added the disk that was removed. The zpool resilvered ok and scrubbed clean. Lustre mounted and cleaned a few orphaned blocks but appeared fully functional from the client side. However, without a "snapshot" (file list, md5sums - though zfs does internal check sums) of the prior status I cannot be sure if a data file was lost. This is something I'll need to address. Maybe Robinhood can help with this? Thanks again for the notes. They will likely be useful in a similar scenario. Kevin On 07/12/2016 09:10 AM, Bob Ball wrote: The answer came offline, and I guess I never replied back to the original posting. This is what I learned. It deals with only a single file, not 1000's. --bob --- On Mon, 14 Mar 2016, Bob Ball wrote: OK, it would seem the affected user has already deleted this file, as the "lfs fid2path" returns: [root@umt3int01 ~]# lfs fid2path /lustre/umt3 [0x22582:0xb5c0:0x0] fid2path: error on FID [0x22582:0xb5c0:0x0]: No such file or directory I verified I could to it back and forth using a different file. I am making one last check, with the OST re-activated (I had set it inactive on our MDT/MGS to keep new files off while figuring this out). Nope, gone. Time to do the clear and remove the snapshot. Thanks for your help on this. bob On 3/14/2016 10:45 AM, Don Holmgren wrote: No, no downside. The snapshot really is just used so that I can do this sort of repair live. Once you've found the Lustre OID with "find", for ll_decode_filter_fid to work you'll have to then umount the OST and remount as type lustre. Good luck! Don Thank you! This is very helpful. I have no space to make a snapshot, so I will just umount this OST for a bit and remount it zfs. Our users can take some off-time if we are not busy just then. It will be an interesting process. I'm all set to drain and remake though, should this method not work. I was putting that off to start until later today as I've other issues just now. Since it would take me 2-3 days total to drain, remake and refill, your detailed method is far more likeable for me. Just to be certain, other than the temporary unavailability of the Lustre file system, do you see any downside to not working from a snapshot? bob On 3/14/2016 10:21 AM, Don Holmgren wrote: Hi Bob - I only get the lustre-discuss digest, so am not sure how to reply to that whole list. But I can reply directly to you regarding your posting (copied at the bottom). In the ZFS error message errors: Permanent errors have been detected in the following files: ost-007/ost0030:<0x2c90f> 0x2c90f is the ZFS inode number of the damaged item. To turn this into a Lustre filename, do the following: 1. First, you have to use "find" using that inode number to get the corresponding Lustre object ID. I do this via a ZFS snapshot, something like: zfs snapshot ost-007/ost0030@mar14 mount -t zfs ost-007/ost0030@mar14 /mnt/snapshot find /mnt/snapshot/O -inum 182543 (note 0x2c90f = 182543 decimal). This may return something like /mnt/snapshot/O/0/d22/54 if indeed the damaged item is a file object. 2. OK, assuming the "find" did return a file object like above (in this case the Lustre OID of the object is 54) you need to find the parent "FID" of that OID. Do this as follows on the OSS where you've mounted the snapshot: [root@lustrenew3 ~]# ll_decode_filter_fid /mnt/snapshot/O/0/d22/54 /mnt/snapshot/O/0/d22/54: parent=[0x204010a:0x0:0x0] stripe=0 3. That string "0x204010a:0x0:0x0" is related to the Lustre FID. You can use "lfs fid2path" to convert this to a filename. "lfs fid2path" must be execute on a client of your Lustre filesystem. And, on our Lustre, the return string must be slightly altered (chopped up differently): [root@client ~]# lfs fid2path /djhzlus [0x20400:0x10a:0x0] /djhzlus/test/copy1/l6496f21b7075m00155m031/gauge/Coulomb/l6496f21b7075m00155m031-Coul_002 Here /djhzlus was where the Lustre filesystem was mounted on my client (client). fid2path takes three numbers, in my case the first was the first 9 hex digits of the return from ll_decode_filter_fid, and the second was the last 5 hex digits (I supressed the leading zeros) and the third was 0x0 (not sure whether this was the 2nd or 3rd field from ll_decode_filter_fid. You can always use "lfs path2fid" on your Lustre client against another file in your filesystem to find the