Re: Is 275GB of VDISK stupid?

2007-12-04 Thread Mrohs, Ray
Hi,
Here's a current swap status on SLES10 with 400M.

swapon -s

FilenameTypeSizeUsed
Priority
/dev/dasdf1 partition   74988   63932
-1
/dev/dasdg1 partition   149988  23064
-2
/dev/dasdh1 partition   224988  23088
-3

Does this imply that dasdg1 completely filled up before using dasdh1?


Ray Mrohs
U.S. Department of Justice
202-307-6896
 

 -Original Message-
 From: The IBM z/VM Operating System 
 [mailto:[EMAIL PROTECTED] On Behalf Of Mark Post
 Sent: Monday, December 03, 2007 5:29 PM
 To: IBMVM@LISTSERV.UARK.EDU
 Subject: Re: Is 275GB of VDISK stupid?
 
  On Mon, Dec 3, 2007 at  1:05 PM, in message
 [EMAIL PROTECTED]
 l.nyenet,
 Romanowski, John (OFT) [EMAIL PROTECTED] wrote: 
  Rob said earlier that after linux starts using a lower priority swap
  area it doesn't migrate back from swap2 to swap1 when 
 stuff is freed
  later.
 
 To be more explicit, if swap1 fills up, then swap2 starts 
 being used.  If pages on swap1 get freed up, the pages that 
 were written to swap2 will never be migrated to swap1, even 
 if if they are paged in by Linux and then paged out again.
 
  So do you find after swapoff/on a high priority VDISK that 
 linux starts
  using it? or does it ignore it and keep filling the dasd swap?
 
 Yes, but you could force the same behavior by doing a 
 swapoff/swapon on the lower priority disk.  Since there are 
 (presumably the reason why you did this) free pages on the 
 VDISK, they'll be used first.
 
 
 Mark Post
 


Re: Is 275GB of VDISK stupid?

2007-12-04 Thread Mark Post
 On Tue, Dec 4, 2007 at  9:15 AM, in message
[EMAIL PROTECTED],
Mrohs, Ray [EMAIL PROTECTED] wrote: 
 Hi,
 Here's a current swap status on SLES10 with 400M.
 
 swapon -s
 
 FilenameTypeSizeUsed
 Priority
 /dev/dasdf1 partition   74988   63932
 -1
 /dev/dasdg1 partition   149988  23064
 -2
 /dev/dasdh1 partition   224988  23088
 -3
 
 Does this imply that dasdg1 completely filled up before using dasdh1?

I'm unsure about how negative priorities work, but yes, the fact that they are 
different priorities implies that at some point, the first swap space filled 
up, then the second swap space, and then some was used of the third.  If you're 
not seeing significant paging *rates* then this isn't necessarily a problem.  
It could just be that some really huge amount of startup code got paged out 
over time.  If you are seeing significant rates, then it's time to bump up the 
amount of storage assigned to this system.


Mark Post


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Rob van der Heij
On Dec 3, 2007 7:13 AM, Leland Lucius [EMAIL PROTECTED] wrote:

 But, I like a little excitement every so often, so I got this crazy idea to
 replace all secondary swap with VDISK and just boost up the VM paging
 volumes.

That seems like a good idea to me. But what else can I say, since we
have been promoting this for a while. As long as a VDISK does not get
used, the cost is neglectable. When you set up proper monitoring to
detect when it gets used, you could get away with less than the
maximum amount of paging space for VM.

 We don't actually hit Linux swap all that much so probably 15% or so of that
 275GB is ever really in use.  (Yes, I know...we're probably oversizing our
 guests, but that's a different story.)

 I know I'd have go boost up the number of paging volumes, but does VM have
 to map all of that storage even if it doesn't get used?

You need to provide enough z/VM paging space for what is being used.
And we say ideally a factor 2 over that to allow for efficient paging.
If you have 15% of the 275G in use at 50% full, then one or two
servers misbehaving would not yet cause you too much trouble. But do
monitor it. If you don't monitor you must provide space for what might
possibly get used (which is 6 times as much in your case).

Because of the Linux algorithm for using swap, a VDISK used for swap
even a little will eventually be used completely. So you need to
prepare for all of these disks to end up in z/VM paging space. If you
see z/VM page in your VDISK on a constant basis, you should look at
making the VDISK smaller.

Rob
-- 
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Romanowski, John (OFT)
It seems hasty to say that Because of the Linux algorithm for using
swap, a VDISK used for swap even a little will eventually be used
completely.
 That's the same as saying a linux swap area used even a little will
eventually be used completely.  Why would linux do that?   That's not
what my SLES9 guests do.  

Now that the swap topic's open again:

What is the basis for advising z/VM VDISK users to have a hierarchy of
multiple linux swap areas of increasing sizes?   Are there feature(s) of
the swapping algorithm that make that hierarchy principle optimal?   



This e-mail, including any attachments, may be confidential, privileged or 
otherwise legally protected. It is intended only for the addressee. If you 
received this e-mail in error or from someone who was not authorized to send it 
to you, do not disseminate, copy or otherwise use this e-mail or its 
attachments.  Please notify the sender immediately by reply e-mail and delete 
the e-mail from your system.


-Original Message-

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Rob van der Heij
Sent: Monday, December 03, 2007 3:56 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Is 275GB of VDISK stupid?

snip
Because of the Linux algorithm for using swap, a VDISK used for swap
even a little will eventually be used completely. So you need to
prepare for all of these disks to end up in z/VM paging space. If you
see z/VM page in your VDISK on a constant basis, you should look at
making the VDISK smaller.

Rob
-- 
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Rob van der Heij
On Dec 3, 2007 2:43 PM, Romanowski, John (OFT)
[EMAIL PROTECTED] wrote:

 It seems hasty to say that Because of the Linux algorithm for using
 swap, a VDISK used for swap even a little will eventually be used
 completely.
  That's the same as saying a linux swap area used even a little will
 eventually be used completely.  Why would linux do that?   That's not
 what my SLES9 guests do.

Maybe our idea of eventually is different. ;-)  But yes, in order to
optimize the Linux I/O (reduce seek times, allow I/O's to be merged,
etc) Linux prefers to pick a virgin pages in the VDISK rather than
ones that have been freed by swap-in. In the view of z/VM, the freed
pages are still used because there is something in them and Linux
has not told VM can forget it. So with some amount of swapping going
on, eventually all pages of the VDISK have been used and VM views them
as in-use, even though Linux still has only a small amount of pages
swapped out.

If your performance monitor shows use
 - linux number of swapped pages
 - vdisk number of resident pages
 - vdisk paging rates
then it becomes very clear that this is happening.

 Now that the swap topic's open again:

 What is the basis for advising z/VM VDISK users to have a hierarchy of
 multiple linux swap areas of increasing sizes?   Are there feature(s) of
 the swapping algorithm that make that hierarchy principle optimal?

Exactly the thing above. When you have one big VDISK and the oldest
frames get paged out by VM, every page that Linux selects for swap-out
will first require a page-in by z/VM (useless, because Linux does not
need that data).
Ideally you want your top swap disk to be large enough that it does
not overflow even when Linux needs most memory. And small enough that
it remains resident on z/VM. If there's different levels of
utilization in Linux during the day, you may need multiple levels of
VDISK to fit those requirements. At the beginning of such a level of
high resource requirements you will find z/VM page in the VDISK, but
then it remains resident during the period of high usage.
The idea with the stack of VDISKs in different size (and with
different swap priority) is to get started when you have no clue about
the requirements. When you have measured, you can probably come up
with something smarter.

Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Leland Lucius
On 12/3/07 2:55 AM, Rob van der Heij [EMAIL PROTECTED]
wrote:
 
 Because of the Linux algorithm for using swap, a VDISK used for swap
 even a little will eventually be used completely.

I realize that VDISK is special in the world of Linux, but why doesn't
someone give us the option of preventing this?  Looks to me like adding one
line in swapfile.c would allow pages to cluster at the beginning of a disk
instead of running to the end and starting over at the beginning.

si-flags += SWP_SCANNING;
---goto lowest;
if (unlikely(!si-cluster_nr)) {

So, just make this a configurable option via procfs and let us decide.   :-)

Leland


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Romanowski, John (OFT)
Leland,
If you're looking at code for that swapping algorithm: 
what happens when highest priority swap area (swap1) gets to the end,
swap1 has free slots and the next higher priority swap area (swap2) has
free clusters?
 Does linux start over at the beginning of swap1 and fill swap1 before
allocating from swap2? 



This e-mail, including any attachments, may be confidential, privileged or 
otherwise legally protected. It is intended only for the addressee. If you 
received this e-mail in error or from someone who was not authorized to send it 
to you, do not disseminate, copy or otherwise use this e-mail or its 
attachments.  Please notify the sender immediately by reply e-mail and delete 
the e-mail from your system.


-Original Message-

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Leland Lucius
Sent: Monday, December 03, 2007 10:26 AM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Is 275GB of VDISK stupid?

On 12/3/07 2:55 AM, Rob van der Heij [EMAIL PROTECTED]
wrote:
 
 Because of the Linux algorithm for using swap, a VDISK used for swap
 even a little will eventually be used completely.

I realize that VDISK is special in the world of Linux, but why doesn't
someone give us the option of preventing this?  Looks to me like adding
one
line in swapfile.c would allow pages to cluster at the beginning of a
disk
instead of running to the end and starting over at the beginning.

si-flags += SWP_SCANNING;
---goto lowest;
if (unlikely(!si-cluster_nr)) {

So, just make this a configurable option via procfs and let us decide.
:-)

Leland


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Rob van der Heij
On Dec 3, 2007 4:25 PM, Leland Lucius [EMAIL PROTECTED] wrote:

 I realize that VDISK is special in the world of Linux, but why doesn't
 someone give us the option of preventing this?  Looks to me like adding one
 line in swapfile.c would allow pages to cluster at the beginning of a disk
 instead of running to the end and starting over at the beginning.

It's may not be a good idea to do sequential scanning of swap slots,
but a push down stack of free slots might be cute.
An even better alternative that we discussed on linux-390 is to have a
facility to make Linux tell VM to drop the page from disk (makes also
sense for COW devices). But this is chicken  egg: there's nothing now
and if you make it, there's nothing that uses it...
Some restrictions that Linux puts on I/O requests are self-imposed and
not all necessary on ECKD, and certainly not on VDISK. But again,
changes to the main kernel sources just for one architecture will not
come easily.

Rob
-- 
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Rob van der Heij
On Dec 3, 2007 4:51 PM, Romanowski, John (OFT)
[EMAIL PROTECTED] wrote:
 Leland,
 If you're looking at code for that swapping algorithm:
 what happens when highest priority swap area (swap1) gets to the end,
 swap1 has free slots and the next higher priority swap area (swap2) has
 free clusters?
  Does linux start over at the beginning of swap1 and fill swap1 before
 allocating from swap2?

That's the point of priority of the swap device. You make Linux re-use
swap1 before spilling to swap2. Note that Linux will not migrate back
from swap2 to swap1 when stuff is freed later.

Rob
-- 
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Romanowski, John (OFT)
Rob said earlier that after linux starts using a lower priority swap
area it doesn't migrate back from swap2 to swap1 when stuff is freed
later.

So do you find after swapoff/on a high priority VDISK that linux starts
using it? or does it ignore it and keep filling the dasd swap?



This e-mail, including any attachments, may be confidential, privileged or 
otherwise legally protected. It is intended only for the addressee. If you 
received this e-mail in error or from someone who was not authorized to send it 
to you, do not disseminate, copy or otherwise use this e-mail or its 
attachments.  Please notify the sender immediately by reply e-mail and delete 
the e-mail from your system.


-Original Message-

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Brian Nielsen
Sent: Monday, December 03, 2007 12:53 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Is 275GB of VDISK stupid?

On Mon, 3 Dec 2007 08:43:45 -0500, Romanowski, John (OFT) 
[EMAIL PROTECTED] wrote:

Now that the swap topic's open again:

What is the basis for advising z/VM VDISK users to have a hierarchy of
multiple linux swap areas of increasing sizes?   Are there feature(s)
of
the swapping algorithm that make that hierarchy principle optimal?   

The configuration we use includes swap space on real DASD at a lower 
priority than the VDISK swap areas.  Over time Linux will swap more to
the 
real DASD than the VDISKs.  At this point doing a swap off and then on
of 
a VDISK swap area frees up the fast VDISK.  Having various VDISK sizes 
allows the flexibility of migrating smaller amounts of swap data during 
busy periods and larger amounts during slow periods.

Brian Nielsen


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Brian Nielsen
On Mon, 3 Dec 2007 08:43:45 -0500, Romanowski, John (OFT) 
[EMAIL PROTECTED] wrote:

Now that the swap topic's open again:

What is the basis for advising z/VM VDISK users to have a hierarchy of
multiple linux swap areas of increasing sizes?   Are there feature(s) of

the swapping algorithm that make that hierarchy principle optimal?  
 

The configuration we use includes swap space on real DASD at a lower 
priority than the VDISK swap areas.  Over time Linux will swap more to th
e 
real DASD than the VDISKs.  At this point doing a swap off and then on of
 
a VDISK swap area frees up the fast VDISK.  Having various VDISK sizes 

allows the flexibility of migrating smaller amounts of swap data during 

busy periods and larger amounts during slow periods.

Brian Nielsen


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Brian Nielsen
After the swap off/on linux uses that swap area again.  I believe what Ro
b 
said/meant is that it doesn't reuse indiviual pages that it otherwise 
could/should.

The swap off/on makes it look brand new by wiping out all prior knowledge
.

Brian Nielsen

On Mon, 3 Dec 2007 13:05:57 -0500, Romanowski, John (OFT) 
[EMAIL PROTECTED] wrote:

Rob said earlier that after linux starts using a lower priority swap
area it doesn't migrate back from swap2 to swap1 when stuff is freed
later.

So do you find after swapoff/on a high priority VDISK that linux starts
using it? or does it ignore it and keep filling the dasd swap?



This e-mail, including any attachments, may be confidential, privileged 

or otherwise legally protected. It is intended only for the addressee. If
 
you received this e-mail in error or from someone who was not authorized 

to send it to you, do not disseminate, copy or otherwise use this e-mail 

or its attachments.  Please notify the sender immediately by reply e-mail
 
and delete the e-mail from your system.


-Original Message-

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Brian Nielsen
Sent: Monday, December 03, 2007 12:53 PM
To: IBMVM@LISTSERV.UARK.EDU
Subject: Re: Is 275GB of VDISK stupid?

On Mon, 3 Dec 2007 08:43:45 -0500, Romanowski, John (OFT) 
[EMAIL PROTECTED] wrote:

Now that the swap topic's open again:

What is the basis for advising z/VM VDISK users to have a hierarchy of
multiple linux swap areas of increasing sizes?   Are there feature(s)
of
the swapping algorithm that make that hierarchy principle optimal?  
 

The configuration we use includes swap space on real DASD at a lower 
priority than the VDISK swap areas.  Over time Linux will swap more to
the 
real DASD than the VDISKs.  At this point doing a swap off and then on
of 
a VDISK swap area frees up the fast VDISK.  Having various VDISK sizes 

allows the flexibility of migrating smaller amounts of swap data during 

busy periods and larger amounts during slow periods.

Brian Nielsen


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Jim Bohnsack

Leland Lucius wrote:
It sounds like a good idea and since Linux is open source, I suspect 
that if you wrote it, Leland, we might use it.


Jim

I realize that VDISK is special in the world of Linux, but why doesn't
someone give us the option of preventing this?  Looks to me like adding one
line in swapfile.c would allow pages to cluster at the beginning of a disk
instead of running to the end and starting over at the beginning.

si-flags += SWP_SCANNING;
---goto lowest;
if (unlikely(!si-cluster_nr)) {

So, just make this a configurable option via procfs and let us decide.   :-)

Leland

  



--
Jim Bohnsack
Cornell University
(607) 255-1760
[EMAIL PROTECTED]


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Leland Lucius
On 12/3/07 12:15 PM, Jim Bohnsack [EMAIL PROTECTED] wrote:

 Leland Lucius wrote:
 It sounds like a good idea and since Linux is open source, I suspect
 that if you wrote it, Leland, we might use it.
 
The option would have to be on a per device basis since we'd still want
normal disk to use the ring approach.

Unfortunately, I don't see it getting much use unless it were accepted into
the main tree since it would require a kernel rebuild.  I don't think most
shops would care to do this.  ;-)

Leland


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Rob van der Heij
On Dec 3, 2007 7:16 PM, Brian Nielsen [EMAIL PROTECTED] wrote:

 The swap off/on makes it look brand new by wiping out all prior knowledge

Correct. That forces Linux to migrate pages off that disk. If there's
a fair amount of blocks in-use (according to Linux) you will find that
it takes some time for the swapoff to complete (while Linux swaps
pages back in). Once you've done this, you could vary the disk
offline, detach it, and get a new VDISK from VM (and thus let VM free
up all those pages). I've actually done this automagically with a
workload that was predictable, but I'm not sure it's worth the
trouble.

It's interesting to see what happens to free when you do this. Part
of this magic is in swap cache (pages both in memory and on swap
disk, because they were swapped back in but not modified yet).

Rob
-- 
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Mark Post
 On Mon, Dec 3, 2007 at  1:05 PM, in message
[EMAIL PROTECTED],
Romanowski, John (OFT) [EMAIL PROTECTED] wrote: 
 Rob said earlier that after linux starts using a lower priority swap
 area it doesn't migrate back from swap2 to swap1 when stuff is freed
 later.

To be more explicit, if swap1 fills up, then swap2 starts being used.  If pages 
on swap1 get freed up, the pages that were written to swap2 will never be 
migrated to swap1, even if if they are paged in by Linux and then paged out 
again.

 So do you find after swapoff/on a high priority VDISK that linux starts
 using it? or does it ignore it and keep filling the dasd swap?

Yes, but you could force the same behavior by doing a swapoff/swapon on the 
lower priority disk.  Since there are (presumably the reason why you did this) 
free pages on the VDISK, they'll be used first.


Mark Post


Re: Is 275GB of VDISK stupid?

2007-12-03 Thread Mark Post
 On Mon, Dec 3, 2007 at  1:43 PM, in message
[EMAIL PROTECTED], Leland Lucius [EMAIL PROTECTED]
wrote: 
 On 12/3/07 12:15 PM, Jim Bohnsack [EMAIL PROTECTED] wrote:
 
 Leland Lucius wrote:
 It sounds like a good idea and since Linux is open source, I suspect
 that if you wrote it, Leland, we might use it.
 
 The option would have to be on a per device basis since we'd still want
 normal disk to use the ring approach.
 
 Unfortunately, I don't see it getting much use unless it were accepted into
 the main tree since it would require a kernel rebuild.  I don't think most
 shops would care to do this.  ;-)

If the patch was written in such a way to only affect s390 (and didn't 
introduce its own performance problems), you might have a shot at getting it 
accepted into the official source.  That route is now pretty available, what 
with the git390 server out there.  (Even if you don't use it, just submit the 
patch and see where it goes.)


Mark Post