Hello, I experimented with swap on lustre in as many ways as possible (without touching the code), and had the shortest path possible to no avail. The code is not able to handle it at all, and the system always hung.
Without serious code rewrites, this isn't going to work for you. -Jason -----Original Message----- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of John Hanks Sent: giovedì, 18. agosto 2011 05:55 To: land...@scalableinformatics.com Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Swap over lustre On Wed, Aug 17, 2011 at 8:57 PM, Joe Landman <land...@scalableinformatics.com> wrote: > On 08/17/2011 10:43 PM, John Hanks wrote: > As a rule of thumb, you should try to keep the path to swap as simple as > possible. No memory/buffer allocations on the way to a paging event if > you can possibly do this. I do have a long path there, will try simplifying that and see if it helps. > The lustre client (and most NFS or even network block devices) all do > memory allocation of buffers ... which is anathema to migrating pages > out to disk. You can easily wind up in a "death spiral" race condition > (and it sounds like you are there). You might be able to do something > with iSCSI or SRP (though these also do block allocations and could > trigger death spirals). If you can limit the number of buffers they > allocate, and then force them to allocate the buffers at startup (by > forcing some activity to the block device, and then pin this memory so > that they can't be ejected ...) you might have chance to do it as a > block device. I think SRP can do this, not sure if iSCSI initiators can > pin buffers in ram. > > You might look at the swapz patches (we haven't integrated them into our > kernel yet, but have been looking at it) to compress swap pages and > store them ... in ram. This may not work for you, but it could be an > option. I wasn't aware of swapz, that sounds really interesting. The codes that run the nodes out of memory tend to be sequencing applications, which seem like good candidates for memory compression. > Is there any particular reason you can't use a local drive for this > (such as you don't have local drives, or they aren't big/fast enough)? We're doing this on diskless nodes. I'm not looking to get a huge amount of swap, just enough to provide a place for the root filesystem to page out of the tmpfs so we can squeeze out all the RAM possible for applications. Since I don't expect it to get heavily used, I'm considering running vblade on a server and carving out small aoe LUNs. It seems logical that if a host can boot off of iscsi or aoe, that you could have a swap space there but I've never tried it with either protocol. FWIW, mounting a file on lustre via loopback to provide a local scratch filesystem works really well. jbh _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss