Re: Is 275GB of VDISK stupid?
>>> On Tue, Dec 4, 2007 at 9:15 AM, in message <[EMAIL PROTECTED]>, "Mrohs, Ray" <[EMAIL PROTECTED]> wrote: > Hi, > Here's a current swap status on SLES10 with 400M. > > swapon -s > > FilenameTypeSizeUsed > Priority > /dev/dasdf1 partition 74988 63932 > -1 > /dev/dasdg1 partition 149988 23064 > -2 > /dev/dasdh1 partition 224988 23088 > -3 > > Does this imply that dasdg1 completely filled up before using dasdh1? I'm unsure about how negative priorities work, but yes, the fact that they are different priorities implies that at some point, the first swap space filled up, then the second swap space, and then some was used of the third. If you're not seeing significant paging *rates* then this isn't necessarily a problem. It could just be that some really huge amount of startup code got paged out over time. If you are seeing significant rates, then it's time to bump up the amount of storage assigned to this system. Mark Post
Re: Is 275GB of VDISK stupid?
Hi, Here's a current swap status on SLES10 with 400M. swapon -s FilenameTypeSizeUsed Priority /dev/dasdf1 partition 74988 63932 -1 /dev/dasdg1 partition 149988 23064 -2 /dev/dasdh1 partition 224988 23088 -3 Does this imply that dasdg1 completely filled up before using dasdh1? Ray Mrohs U.S. Department of Justice 202-307-6896 > -Original Message- > From: The IBM z/VM Operating System > [mailto:[EMAIL PROTECTED] On Behalf Of Mark Post > Sent: Monday, December 03, 2007 5:29 PM > To: IBMVM@LISTSERV.UARK.EDU > Subject: Re: Is 275GB of VDISK stupid? > > >>> On Mon, Dec 3, 2007 at 1:05 PM, in message > <[EMAIL PROTECTED] > l.nyenet>, > "Romanowski, John (OFT)" <[EMAIL PROTECTED]> wrote: > > Rob said earlier that after linux starts using a lower priority swap > > area it doesn't "migrate back from swap2 to swap1 when > stuff is freed > > later." > > To be more explicit, if swap1 fills up, then swap2 starts > being used. If pages on swap1 get freed up, the pages that > were written to swap2 will never be migrated to swap1, even > if if they are paged in by Linux and then paged out again. > > > So do you find after swapoff/on a high priority VDISK that > linux starts > > using it? or does it ignore it and keep filling the dasd swap? > > Yes, but you could force the same behavior by doing a > swapoff/swapon on the lower priority disk. Since there are > (presumably the reason why you did this) free pages on the > VDISK, they'll be used first. > > > Mark Post >
Re: Is 275GB of VDISK stupid?
>>> On Mon, Dec 3, 2007 at 1:43 PM, in message <[EMAIL PROTECTED]>, Leland Lucius <[EMAIL PROTECTED]> wrote: > On 12/3/07 12:15 PM, "Jim Bohnsack" <[EMAIL PROTECTED]> wrote: > >> Leland Lucius wrote: >> It sounds like a good idea and since Linux is open source, I suspect >> that if you wrote it, Leland, we might use it. >> > The option would have to be on a per device basis since we'd still want > normal disk to use the ring approach. > > Unfortunately, I don't see it getting much use unless it were accepted into > the main tree since it would require a kernel rebuild. I don't think most > shops would care to do this. ;-) If the patch was written in such a way to only affect s390 (and didn't introduce its own performance problems), you might have a shot at getting it accepted into the official source. That route is now pretty available, what with the git390 server out there. (Even if you don't use it, just submit the patch and see where it goes.) Mark Post
Re: Is 275GB of VDISK stupid?
>>> On Mon, Dec 3, 2007 at 1:05 PM, in message <[EMAIL PROTECTED]>, "Romanowski, John (OFT)" <[EMAIL PROTECTED]> wrote: > Rob said earlier that after linux starts using a lower priority swap > area it doesn't "migrate back from swap2 to swap1 when stuff is freed > later." To be more explicit, if swap1 fills up, then swap2 starts being used. If pages on swap1 get freed up, the pages that were written to swap2 will never be migrated to swap1, even if if they are paged in by Linux and then paged out again. > So do you find after swapoff/on a high priority VDISK that linux starts > using it? or does it ignore it and keep filling the dasd swap? Yes, but you could force the same behavior by doing a swapoff/swapon on the lower priority disk. Since there are (presumably the reason why you did this) free pages on the VDISK, they'll be used first. Mark Post
Re: Is 275GB of VDISK stupid?
On Dec 3, 2007 7:16 PM, Brian Nielsen <[EMAIL PROTECTED]> wrote: > The swap off/on makes it look brand new by wiping out all prior knowledge Correct. That forces Linux to migrate pages off that disk. If there's a fair amount of blocks in-use (according to Linux) you will find that it takes some time for the swapoff to complete (while Linux swaps pages back in). Once you've done this, you could vary the disk offline, detach it, and get a new VDISK from VM (and thus let VM free up all those pages). I've actually done this automagically with a workload that was predictable, but I'm not sure it's worth the trouble. It's interesting to see what happens to "free" when you do this. Part of this magic is in "swap cache" (pages both in memory and on swap disk, because they were swapped back in but not modified yet). Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/
Re: Is 275GB of VDISK stupid?
On 12/3/07 12:15 PM, "Jim Bohnsack" <[EMAIL PROTECTED]> wrote: > Leland Lucius wrote: > It sounds like a good idea and since Linux is open source, I suspect > that if you wrote it, Leland, we might use it. > The option would have to be on a per device basis since we'd still want normal disk to use the ring approach. Unfortunately, I don't see it getting much use unless it were accepted into the main tree since it would require a kernel rebuild. I don't think most shops would care to do this. ;-) Leland
Re: Is 275GB of VDISK stupid?
After the swap off/on linux uses that swap area again. I believe what Ro b said/meant is that it doesn't reuse indiviual pages that it otherwise could/should. The swap off/on makes it look brand new by wiping out all prior knowledge . Brian Nielsen On Mon, 3 Dec 2007 13:05:57 -0500, Romanowski, John (OFT) <[EMAIL PROTECTED]> wrote: >Rob said earlier that after linux starts using a lower priority swap >area it doesn't "migrate back from swap2 to swap1 when stuff is freed >later." > >So do you find after swapoff/on a high priority VDISK that linux starts >using it? or does it ignore it and keep filling the dasd swap? > > > >This e-mail, including any attachments, may be confidential, privileged or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system. > > >-Original Message- > >From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On >Behalf Of Brian Nielsen >Sent: Monday, December 03, 2007 12:53 PM >To: IBMVM@LISTSERV.UARK.EDU >Subject: Re: Is 275GB of VDISK stupid? > >On Mon, 3 Dec 2007 08:43:45 -0500, Romanowski, John (OFT) ><[EMAIL PROTECTED]> wrote: > >>Now that the swap topic's open again: >> >>What is the basis for advising z/VM VDISK users to have a hierarchy of >>multiple linux swap areas of increasing sizes? Are there feature(s) >of >>the swapping algorithm that make that hierarchy principle optimal? > >The configuration we use includes swap space on real DASD at a lower >priority than the VDISK swap areas. Over time Linux will swap more to >the >real DASD than the VDISKs. At this point doing a swap off and then on >of >a VDISK swap area frees up the fast VDISK. Having various VDISK sizes >allows the flexibility of migrating smaller amounts of swap data during >busy periods and larger amounts during slow periods. > >Brian Nielsen
Re: Is 275GB of VDISK stupid?
Leland Lucius wrote: It sounds like a good idea and since Linux is open source, I suspect that if you wrote it, Leland, we might use it. Jim I realize that VDISK is special in the world of Linux, but why doesn't someone give us the option of preventing this? Looks to me like adding one line in swapfile.c would allow pages to cluster at the beginning of a disk instead of running to the end and starting over at the beginning. si->flags += SWP_SCANNING; --->goto lowest; if (unlikely(!si->cluster_nr)) { So, just make this a configurable option via procfs and let us decide. :-) Leland -- Jim Bohnsack Cornell University (607) 255-1760 [EMAIL PROTECTED]
Re: Is 275GB of VDISK stupid?
Rob said earlier that after linux starts using a lower priority swap area it doesn't "migrate back from swap2 to swap1 when stuff is freed later." So do you find after swapoff/on a high priority VDISK that linux starts using it? or does it ignore it and keep filling the dasd swap? This e-mail, including any attachments, may be confidential, privileged or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Brian Nielsen Sent: Monday, December 03, 2007 12:53 PM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Is 275GB of VDISK stupid? On Mon, 3 Dec 2007 08:43:45 -0500, Romanowski, John (OFT) <[EMAIL PROTECTED]> wrote: >Now that the swap topic's open again: > >What is the basis for advising z/VM VDISK users to have a hierarchy of >multiple linux swap areas of increasing sizes? Are there feature(s) of >the swapping algorithm that make that hierarchy principle optimal? The configuration we use includes swap space on real DASD at a lower priority than the VDISK swap areas. Over time Linux will swap more to the real DASD than the VDISKs. At this point doing a swap off and then on of a VDISK swap area frees up the fast VDISK. Having various VDISK sizes allows the flexibility of migrating smaller amounts of swap data during busy periods and larger amounts during slow periods. Brian Nielsen
Re: Is 275GB of VDISK stupid?
On Mon, 3 Dec 2007 08:43:45 -0500, Romanowski, John (OFT) <[EMAIL PROTECTED]> wrote: >Now that the swap topic's open again: > >What is the basis for advising z/VM VDISK users to have a hierarchy of >multiple linux swap areas of increasing sizes? Are there feature(s) of >the swapping algorithm that make that hierarchy principle optimal? The configuration we use includes swap space on real DASD at a lower priority than the VDISK swap areas. Over time Linux will swap more to th e real DASD than the VDISKs. At this point doing a swap off and then on of a VDISK swap area frees up the fast VDISK. Having various VDISK sizes allows the flexibility of migrating smaller amounts of swap data during busy periods and larger amounts during slow periods. Brian Nielsen
Re: Is 275GB of VDISK stupid?
On Dec 3, 2007 4:51 PM, Romanowski, John (OFT) <[EMAIL PROTECTED]> wrote: > Leland, > If you're looking at code for that swapping algorithm: > what happens when highest priority swap area (swap1) gets to the end, > swap1 has free slots and the next higher priority swap area (swap2) has > free clusters? > Does linux start over at the beginning of swap1 and fill swap1 before > allocating from swap2? That's the point of priority of the swap device. You make Linux re-use swap1 before spilling to swap2. Note that Linux will not migrate back from swap2 to swap1 when stuff is freed later. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/
Re: Is 275GB of VDISK stupid?
On Dec 3, 2007 4:25 PM, Leland Lucius <[EMAIL PROTECTED]> wrote: > I realize that VDISK is special in the world of Linux, but why doesn't > someone give us the option of preventing this? Looks to me like adding one > line in swapfile.c would allow pages to cluster at the beginning of a disk > instead of running to the end and starting over at the beginning. It's may not be a good idea to do sequential scanning of swap slots, but a push down stack of free slots might be cute. An even better alternative that we discussed on linux-390 is to have a facility to make Linux tell VM to drop the page from disk (makes also sense for COW devices). But this is chicken & egg: there's nothing now and if you make it, there's nothing that uses it... Some restrictions that Linux puts on I/O requests are self-imposed and not all necessary on ECKD, and certainly not on VDISK. But again, changes to the main kernel sources just for one architecture will not come easily. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/
Re: Is 275GB of VDISK stupid?
Leland, If you're looking at code for that swapping algorithm: what happens when highest priority swap area (swap1) gets to the end, swap1 has free slots and the next higher priority swap area (swap2) has free clusters? Does linux start over at the beginning of swap1 and fill swap1 before allocating from swap2? This e-mail, including any attachments, may be confidential, privileged or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Leland Lucius Sent: Monday, December 03, 2007 10:26 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Is 275GB of VDISK stupid? On 12/3/07 2:55 AM, "Rob van der Heij" <[EMAIL PROTECTED]> wrote: > > Because of the Linux algorithm for using swap, a VDISK used for swap > even a little will eventually be used completely. > I realize that VDISK is special in the world of Linux, but why doesn't someone give us the option of preventing this? Looks to me like adding one line in swapfile.c would allow pages to cluster at the beginning of a disk instead of running to the end and starting over at the beginning. si->flags += SWP_SCANNING; --->goto lowest; if (unlikely(!si->cluster_nr)) { So, just make this a configurable option via procfs and let us decide. :-) Leland
Re: Is 275GB of VDISK stupid?
On 12/3/07 2:55 AM, "Rob van der Heij" <[EMAIL PROTECTED]> wrote: > > Because of the Linux algorithm for using swap, a VDISK used for swap > even a little will eventually be used completely. > I realize that VDISK is special in the world of Linux, but why doesn't someone give us the option of preventing this? Looks to me like adding one line in swapfile.c would allow pages to cluster at the beginning of a disk instead of running to the end and starting over at the beginning. si->flags += SWP_SCANNING; --->goto lowest; if (unlikely(!si->cluster_nr)) { So, just make this a configurable option via procfs and let us decide. :-) Leland
Re: Is 275GB of VDISK stupid?
On Dec 3, 2007 2:43 PM, Romanowski, John (OFT) <[EMAIL PROTECTED]> wrote: > It seems hasty to say that "Because of the Linux algorithm for using > swap, a VDISK used for swap even a little will eventually be used > completely". > That's the same as saying a linux swap area used even a little will > eventually be used completely. Why would linux do that? That's not > what my SLES9 guests do. Maybe our idea of "eventually" is different. ;-) But yes, in order to optimize the Linux I/O (reduce seek times, allow I/O's to be merged, etc) Linux prefers to pick a "virgin" pages in the VDISK rather than ones that have been freed by swap-in. In the view of z/VM, the freed pages are still "used" because there is something in them and Linux has not told VM can forget it. So with some amount of swapping going on, eventually all pages of the VDISK have been used and VM views them as in-use, even though Linux still has only a small amount of pages swapped out. If your performance monitor shows use - linux number of swapped pages - vdisk number of resident pages - vdisk paging rates then it becomes very clear that this is happening. > Now that the swap topic's open again: > > What is the basis for advising z/VM VDISK users to have a hierarchy of > multiple linux swap areas of increasing sizes? Are there feature(s) of > the swapping algorithm that make that hierarchy principle optimal? Exactly the thing above. When you have one big VDISK and the oldest frames get paged out by VM, every page that Linux selects for swap-out will first require a page-in by z/VM (useless, because Linux does not need that data). Ideally you want your top swap disk to be large enough that it does not overflow even when Linux needs most memory. And small enough that it remains resident on z/VM. If there's different levels of utilization in Linux during the day, you may need multiple levels of VDISK to fit those requirements. At the beginning of such a level of high resource requirements you will find z/VM page in the VDISK, but then it remains resident during the period of high usage. The idea with the stack of VDISKs in different size (and with different swap priority) is to get started when you have no clue about the requirements. When you have measured, you can probably come up with something smarter. Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/
Re: Is 275GB of VDISK stupid?
It seems hasty to say that "Because of the Linux algorithm for using swap, a VDISK used for swap even a little will eventually be used completely". That's the same as saying a linux swap area used even a little will eventually be used completely. Why would linux do that? That's not what my SLES9 guests do. Now that the swap topic's open again: What is the basis for advising z/VM VDISK users to have a hierarchy of multiple linux swap areas of increasing sizes? Are there feature(s) of the swapping algorithm that make that hierarchy principle optimal? This e-mail, including any attachments, may be confidential, privileged or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system. -Original Message- From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Rob van der Heij Sent: Monday, December 03, 2007 3:56 AM To: IBMVM@LISTSERV.UARK.EDU Subject: Re: Is 275GB of VDISK stupid? Because of the Linux algorithm for using swap, a VDISK used for swap even a little will eventually be used completely. So you need to prepare for all of these disks to end up in z/VM paging space. If you see z/VM page in your VDISK on a constant basis, you should look at making the VDISK smaller. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/
Re: Is 275GB of VDISK stupid?
On Dec 3, 2007 7:13 AM, Leland Lucius <[EMAIL PROTECTED]> wrote: > But, I like a little excitement every so often, so I got this crazy idea to > replace all secondary swap with VDISK and just boost up the VM paging > volumes. That seems like a good idea to me. But what else can I say, since we have been promoting this for a while. As long as a VDISK does not get used, the cost is neglectable. When you set up proper monitoring to detect when it gets used, you could get away with less than the maximum amount of paging space for VM. > We don't actually hit Linux swap all that much so probably 15% or so of that > 275GB is ever really in use. (Yes, I know...we're probably oversizing our > guests, but that's a different story.) > I know I'd have go boost up the number of paging volumes, but does VM have > to map all of that storage even if it doesn't get used? You need to provide enough z/VM paging space for what is being used. And we say ideally a factor 2 over that to allow for efficient paging. If you have 15% of the 275G in use at 50% full, then one or two servers misbehaving would not yet cause you too much trouble. But do monitor it. If you don't monitor you must provide space for what might possibly get used (which is 6 times as much in your case). Because of the Linux algorithm for using swap, a VDISK used for swap even a little will eventually be used completely. So you need to prepare for all of these disks to end up in z/VM paging space. If you see z/VM page in your VDISK on a constant basis, you should look at making the VDISK smaller. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/
Is 275GB of VDISK stupid?
Okay, I'm pretty sure I know the answer to that, but stupid and me sit next to each other fairly often... :-) Anyway, we're in the midst of refreshing our DASD and part of that will be to get rid of MOD3s. This happens to be what we use for secondary swap on each Linux guest so a little rethinking is in order. (We use DCSS for primary swap.) The boring alternative would be to do a one for one swap of a MOD9 for a MOD3. Or define a pool of MOD9s and dole out a MOD3s worth using DIRMAINT. But, I like a little excitement every so often, so I got this crazy idea to replace all secondary swap with VDISK and just boost up the VM paging volumes. We don't actually hit Linux swap all that much so probably 15% or so of that 275GB is ever really in use. (Yes, I know...we're probably oversizing our guests, but that's a different story.) I know I'd have go boost up the number of paging volumes, but does VM have to map all of that storage even if it doesn't get used? So what do y'all think? Have I been drinking too much of Adam's cough syrup? Leland