Re: swapoff?

2002-10-23 Thread David Schultz
Thus spake Matthew Dillon [EMAIL PROTECTED]:
 :The concern was that there could be a race where the process is
 :swapped out again after I have swapped it back in but before I can
 :dirty its pages.  (Perhaps I need to hold the process lock a bit
 :longer.)  Under heavy swapping load, swapoff() is failing to find
 :a single page about one time out of ten, and I thought that might
 :be the cause.
 
 Have you definitively tracked down the missing page?  It
 ought to be fairly easy to do from a kernel core with a
 gdb macro.

No, I've tested it extensively, and I haven't been able to
reproduce the problem since I updated my sources.  (It was hard to
reproduce beforehand.)  I did two more runs with one swap device
and two runs with two swap devices, and it worked even when the
system was thrashing.

The latest patches are at

http://www.CSUA.Berkeley.EDU/~das/swapoff.patch4

Performance is now much better when there are multiple swap
devices.  Instead of effectively having to wait for each hash
chain to become quiescent, swapoff now skips busy objects, then
does a complete rescan if it missed anything.  Only a few rescans
are required, even with multiple active swap devices.
A clustering optimization might still be worthwhile, but that can
be done another day.

(Sorry for the delay in getting back to you.  I've been too busy
and sick for the last week to work on this.)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-23 Thread Matthew Dillon
This is looking really good.  I'm going to start running it on my
-current boxes.  I think it could be committed after the 5.0 release
rolls, as well as MFCd to -stable (which I would be happy to do the
work for).

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:No, I've tested it extensively, and I haven't been able to
:reproduce the problem since I updated my sources.  (It was hard to
:reproduce beforehand.)  I did two more runs with one swap device
:and two runs with two swap devices, and it worked even when the
:system was thrashing.
:
:The latest patches are at
:
:   http://www.CSUA.Berkeley.EDU/~das/swapoff.patch4
:
:Performance is now much better when there are multiple swap
:devices.  Instead of effectively having to wait for each hash
:chain to become quiescent, swapoff now skips busy objects, then
:does a complete rescan if it missed anything.  Only a few rescans
:are required, even with multiple active swap devices.
:A clustering optimization might still be worthwhile, but that can
:be done another day.
:
:(Sorry for the delay in getting back to you.  I've been too busy
:and sick for the last week to work on this.)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-15 Thread David Schultz

Thus spake Matthew Dillon [EMAIL PROTECTED]:
 :The concern was that there could be a race where the process is
 :swapped out again after I have swapped it back in but before I can
 :dirty its pages.  (Perhaps I need to hold the process lock a bit
 :longer.)  Under heavy swapping load, swapoff() is failing to find
 :a single page about one time out of ten, and I thought that might
 :be the cause.
 
 Have you definitively tracked down the missing page?  It
 ought to be fairly easy to do from a kernel core with a
 gdb macro.

I haven't figured out gdb macros yet, and I'm kinda busy
throughout the week, but I'll try to track down the page this
weekend.

 This sounds reasonable, it's certainly worth a shot.  Another
 thing you may be able to do is to try to cluster the pageins
 on an object-by-object basis since our swap-scanning is 
 probably resulting in having to wait for the PIP count of
 the same object many times (for large objects with lots of 
 swap blocks). 
 
 Hmm.  Since we have to track down every swap block and we can
 (do?) get a count of pages that need to be swapped in, we could
 remove the PIP wait code and simply loop on the swap hash table
 over and over again until we've found all the swap blocks that
 we know have been allocated to that device.  Or something like that.

The latter idea was more or less what I had in mind. The only
catch is that you have to wait at least once for each pass or
you'll never give up the processor.  It solves the basic problem
with the current approach, which is that you get livelock if
there's too much swapping activity to other swap devices.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-14 Thread David Schultz

[ Latest patches at http://csua.berkeley.edu/~das/swapoff.patch3 ]

Thus spake Matthew Dillon [EMAIL PROTECTED]:
 :I'm worried that vm_proc_swapin_all() has a similar race with the
 :swapout daemon.  Presently I assume that my references to the
 :UPAGES object and the associated pages remain valid after the
 :faultin(), and that I can use swap_pager_freeswapspace() to free
 :the correct metadata, instead of calling swap_pager_unswapped() on
 :each page.  Should just hold the process lock until the metadata
 :are freed?
 
 Hmm.  Well, the proc lock is not held during vm_proc_swapin()
 (but the PS_SWAPPINGIN flag is set).  The proc lock is held during
 vm_proc_swapout().
 
 In your vm_proc_swapin_all() you seem to be doing the right thing
 in regards to the mutexes and retry, and you have already marked
 the device is SW_CLOSING so if something does get in there and
 try to swap the process back in it shouldn't allocate swap you are
 trying to free.
 
 I think you may be ok.

The concern was that there could be a race where the process is
swapped out again after I have swapped it back in but before I can
dirty its pages.  (Perhaps I need to hold the process lock a bit
longer.)  Under heavy swapping load, swapoff() is failing to find
a single page about one time out of ten, and I thought that might
be the cause.

I have tweaked swap_pager.c as you suggested earlier.  It runs
about an order of magnitude slower under load now, since it's
doing a vm_object_pip_wait() on every swap-backed object in the
system that's currently paging, even for objects that are paging
to a different swap device.  Unless you have a better idea, I
think one way to improve performance might be to skip the busy
objects, and after the whole hash has been scanned, rescan
starting at the first index that was skipped.  Of course, it would
have to wait for at least one object on each iteration so it
doesn't get into a tight loop.

Another important optimization is to page in the entire block at
once, rather than doing it a page at a time.  I tried to do this
with the following algorithm:

- grab SWAP_META_PAGES pages
- note which ones are already in core using a bitmap
- call getpages() to retrieve the entire range
- re-lookup all of the pages at the appropriate offset
  within the object in case they've changed or gone away
- dirty them, move them to the appropriate queue (based
  on the values in the bitmap computed earlier), and
  remove their backing store

This didn't work, and it produced all sorts of interesting panics
for reasons I haven't yet figured out.  My latest patch has some
remnants of of some my attempts in swp_pager_force_pagein(), but
I'll probably leave that optimization for another day unless you
can see an obvious flaw in my approach.

BTW, thanks for all of your help!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-14 Thread Matthew Dillon

:The concern was that there could be a race where the process is
:swapped out again after I have swapped it back in but before I can
:dirty its pages.  (Perhaps I need to hold the process lock a bit
:longer.)  Under heavy swapping load, swapoff() is failing to find
:a single page about one time out of ten, and I thought that might
:be the cause.

Have you definitively tracked down the missing page?  It
ought to be fairly easy to do from a kernel core with a
gdb macro.

:I have tweaked swap_pager.c as you suggested earlier.  It runs
:about an order of magnitude slower under load now, since it's
:doing a vm_object_pip_wait() on every swap-backed object in the
:system that's currently paging, even for objects that are paging
:to a different swap device.  Unless you have a better idea, I
:think one way to improve performance might be to skip the busy
:objects, and after the whole hash has been scanned, rescan
:starting at the first index that was skipped.  Of course, it would
:have to wait for at least one object on each iteration so it
:doesn't get into a tight loop.

This sounds reasonable, it's certainly worth a shot.  Another
thing you may be able to do is to try to cluster the pageins
on an object-by-object basis since our swap-scanning is 
probably resulting in having to wait for the PIP count of
the same object many times (for large objects with lots of 
swap blocks). 

Hmm.  Since we have to track down every swap block and we can
(do?) get a count of pages that need to be swapped in, we could
remove the PIP wait code and simply loop on the swap hash table
over and over again until we've found all the swap blocks that
we know have been allocated to that device.  Or something like that.

:Another important optimization is to page in the entire block at
:once, rather than doing it a page at a time.  I tried to do this
:with the following algorithm:
:
:   - grab SWAP_META_PAGES pages
:   - note which ones are already in core using a bitmap
:   - call getpages() to retrieve the entire range
:...
:
:This didn't work, and it produced all sorts of interesting panics
:for reasons I haven't yet figured out.  My latest patch has some
:remnants of of some my attempts in swp_pager_force_pagein(), but
:I'll probably leave that optimization for another day unless you
:can see an obvious flaw in my approach.

I can probably deal with that post-commit.  Lets ignore it for now.
The main goal at the moment should be robustness.

:BTW, thanks for all of your help!

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-11 Thread Matthew Dillon

:
:Thanks, your solution looks pretty good.  I guess as part of the
:try_to_page_it_all_in, I'll want to call swap_pager_unswapped() on
:each page.  Now I really wish I had noticed swap_pager_unswapped()
:earlier; it would have made my job much easier!

As long as you properly check and dirty the page you can get rid
of the backing swap.

:I'm worried that vm_proc_swapin_all() has a similar race with the
:swapout daemon.  Presently I assume that my references to the
:UPAGES object and the associated pages remain valid after the
:faultin(), and that I can use swap_pager_freeswapspace() to free
:the correct metadata, instead of calling swap_pager_unswapped() on
:each page.  Should just hold the process lock until the metadata
:are freed?

Hmm.  Well, the proc lock is not held during vm_proc_swapin()
(but the PS_SWAPPINGIN flag is set).  The proc lock is held during
vm_proc_swapout().

In your vm_proc_swapin_all() you seem to be doing the right thing
in regards to the mutexes and retry, and you have already marked
the device is SW_CLOSING so if something does get in there and
try to swap the process back in it shouldn't allocate swap you are
trying to free.

I think you may be ok.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-11 Thread David Schultz
Thus spake Matthew Dillon [EMAIL PROTECTED]:
 This is a sticky situation because both the VM object and the
 swblocks may be manipulated by other processes when you block.  I
 think what you need to try to do is this (it's a mess, if you can think
 of a better solution definitely go another route!)
 
 while ((swap = *pswap) != NULL) {
   if (anything_is_swapped_to_the_device) {
   try_to_page_it_all_in
   (note that the swblock structure is invalid the moment you
   block, so swp_pager_force_pagein() should be given
   the whole range).
   /* fall through to retry */
   } else if (the_related_object_pip_count_is_not_zero) {
   vm_object_pip_sleep(...)
   /* fall through to retry */
   } else if (swap-swb_count = 0) {
   free the swap block
   *pswap = swap-swb_hnext;
   }
 }

Thanks, your solution looks pretty good.  I guess as part of the
try_to_page_it_all_in, I'll want to call swap_pager_unswapped() on
each page.  Now I really wish I had noticed swap_pager_unswapped()
earlier; it would have made my job much easier!

I'm worried that vm_proc_swapin_all() has a similar race with the
swapout daemon.  Presently I assume that my references to the
UPAGES object and the associated pages remain valid after the
faultin(), and that I can use swap_pager_freeswapspace() to free
the correct metadata, instead of calling swap_pager_unswapped() on
each page.  Should just hold the process lock until the metadata
are freed?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-08 Thread David Schultz

Rather than spamming the list with another 37K patch, I posted a
revised version at

http://csua.berkeley.edu/~das/swapoff.patch2

Thus spake Matthew Dillon [EMAIL PROTECTED]:
 The swap_pager_isswapped() function may not be doing
 a sufficient test:
[...]
 It is quite possible for a VM page to be present but invalid,
 meaning that the swap is still valid.  You could incorrectly
 return that the object is not swapped when in fact it is.
 
 BUT, since you only appear to be making this call on
 the process's UPAGES object, there may not be a problem.
 Perhapss the best thing to do is to not do the vm_page_lookup()
 call and instead just unconditionally faultin() the uarea
 if it looks like there might be a problem.

I revised the patches to do as you suggested.  It turns out that a
couple of extra lines are needed, because when the scan over all
processes restarts, pagein_all() will no longer automagically skip
over processes it has already swapped in.  It has to immediately
free the swap metadata for the UPAGES object and dirty the
associated pages (as opposed to letting swap_pager_swapoff do it);
otherwise it will loop forever trying to swap in the same process.

 You may need a master lock to ensure that only swapon() or 
 swapoff() is 'in progress' at any given moment.

Added.  (This was a deficiency in the original swapon() as well.)

 The vm_page_grab() call below may block, I think:
[...]
 I think you may want to do the pip_add before calling vm_page_grab().

Yep, fixed.

I also tweaked the calculation that determines whether there is
enough virtual memory to remove the device, but it doesn't seem to
detect when there is insufficient space.  (I actually thought it
was right the first time.)  Can you see anything obviously wrong
with my math?

The code works fine in all of my tests, except that calling
swapoff() when the system is under heavy paging load and has
multiple swap devices sometimes leads to a few pages being missed
by the scan.  I think the problem is that some process allocates
some swap and starts paging out just before the device is marked
as off-limits.  Am I missing a simple solution to this problem?
(For now, I kludge around the issue by rescanning if there are
still blocks remaining.)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-08 Thread David Schultz

Thus spake Nate Lawson [EMAIL PROTECTED]:
 Nice, thanks for doing this.  How about some more accurate names for the
 userland routines instead of this_is_swapoff and twiddle?

Sure, suggest something and I'll change it.  I shamelessly stole
'this_is_swapoff' from w / uptime, but you can blame me for
'twiddle'.  The function was originally called 'add', I think, but
now it adds or removes depending on whether it's being called as
swapon or swapoff...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-08 Thread Matthew Dillon

:The code works fine in all of my tests, except that calling
:swapoff() when the system is under heavy paging load and has
:multiple swap devices sometimes leads to a few pages being missed
:by the scan.  I think the problem is that some process allocates
:some swap and starts paging out just before the device is marked
:as off-limits.  Am I missing a simple solution to this problem?
:(For now, I kludge around the issue by rescanning if there are
:still blocks remaining.)

Hmm.  Yes, I think the issue here is that you may be missing
pages in objects which are undergoing I/O.  You may need to
wait for other paging on the object (the pip count) to go to
zero.  I will review that code more carefully in a little bit
and give you a definitive answer.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-08 Thread Matthew Dillon


:...
:detect when there is insufficient space.  (I actually thought it
:was right the first time.)  Can you see anything obviously wrong
:with my math?
:
:The code works fine in all of my tests, except that calling
:swapoff() when the system is under heavy paging load and has
:multiple swap devices sometimes leads to a few pages being missed
:by the scan.  I think the problem is that some process allocates
:some swap and starts paging out just before the device is marked
:as off-limits.  Am I missing a simple solution to this problem?
:(For now, I kludge around the issue by rescanning if there are
:still blocks remaining.)

Ok, I think the problem is in swap_pager_swapoff() and
swp_pager_force_pagein().  Another process may be manipulating
the swblock (or a prior swblock) while swp_pager_force_pagein()
is blocked.

In fact, the swap block can be ripped out from under
swap_pager_swapoff() if swp_pager_force_pagein() blocks.  i.e.
the 'swap' structure may be invalid after you call
swp_pager_force_pagein().

This is a sticky situation because both the VM object and the
swblocks may be manipulated by other processes when you block.  I
think what you need to try to do is this (it's a mess, if you can think
of a better solution definitely go another route!)

while ((swap = *pswap) != NULL) {
if (anything_is_swapped_to_the_device) {
try_to_page_it_all_in
(note that the swblock structure is invalid the moment you
block, so swp_pager_force_pagein() should be given
the whole range).
/* fall through to retry */
} else if (the_related_object_pip_count_is_not_zero) {
vm_object_pip_sleep(...)
/* fall through to retry */
} else if (swap-swb_count = 0) {
free the swap block
*pswap = swap-swb_hnext;
}
}

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-07 Thread Matthew Dillon


:I'm resurrecting this thread because I finally got around to
:finishing up the patches to implement swapoff.  I would appreciate
:some review of them, particularly to verify that I have done the
:right thing WRT synchronization.  I have not optimized it to do
:read clustering, but I have ensured that such an optimization
:could be made.  Other than that, I don't know of any deficiencies.

This is great, David.  The code is about as clean as it's
possible to make in a swapoff implementation.  There are
some minor inefficiencies.. some shortcuts that can be 
taken in the blist code for example, but I don't think we
have to worry about them for this initial implementation.

The SW_CLOSING test is an excellent solution to dealing
with the swap bitmap when paging in from the dying
swap area.

The swap_pager_isswapped() function may not be doing
a sufficient test:


:+  pswap = swp_pager_hash(object, index);
:+
:+  if ((swap = *pswap) != NULL) {
:+  for (i = 0; i  SWAP_META_PAGES; ++i) {
:+  daddr_t v = swap-swb_pages[i];
:+  if (v != SWAPBLK_NONE 
:+  BLK2DEVIDX(v) == devidx 
:+!vm_page_lookup(object, swap-swb_index+i))
:+  return 1;
:+  }
:+  }

It is quite possible for a VM page to be present but invalid,
meaning that the swap is still valid.  You could incorrectly
return that the object is not swapped when in fact it is.

BUT, since you only appear to be making this call on
the process's UPAGES object, there may not be a problem.
Perhapss the best thing to do is to not do the vm_page_lookup()
call and instead just unconditionally faultin() the uarea
if it looks like there might be a problem.

You may need a master lock to ensure that only swapon() or 
swapoff() is 'in progress' at any given moment.

The vm_page_grab() call below may block, I think:

:+
:+  if (object-type != OBJT_SWAP)
:+  panic(swp_pager_force_pagein: object not backed by swap);
:+
:+  m = vm_page_grab(object, pindex, VM_ALLOC_NORMAL | VM_ALLOC_RETRY);
:+  if (m-valid == VM_PAGE_BITS_ALL) {
:+  /*
:+   * The page is already in memory, but must be
:+   * dirtied, since we're taking away its backing store.
:+   */
:+  vm_page_lock_queues();
:+  vm_page_activate(m);
:+  vm_page_dirty(m);
:+  vm_page_wakeup(m);
:+  vm_page_unlock_queues();
:+  return 1;
:+  }
:+
:+  vm_object_pip_add(object, 1);

I think you may want to do the pip_add before calling vm_page_grab().

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-10-07 Thread Nate Lawson

Nice, thanks for doing this.  How about some more accurate names for the
userland routines instead of this_is_swapoff and twiddle?

-Nate


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Matthew Dillon


:BTW, NetBSD's new UVM code has the ability to do this.  Perhaps
:it's worth looking in to how difficult it would really be in FreeBSD...
:
:To Unsubscribe: send mail to [EMAIL PROTECTED]
:with unsubscribe freebsd-hackers in the body of the message

Someone got it mostly working a year or two ago if I remember right
but I don't know what happened to it finally.

Implementing swapoff is a bunch of grunt-work but not too hard in
concept.  Basically the work involved is this:

* Make a calculation to be sure that it is possible to turn off
  the swap device and not run the system out of VM.  If it is not
  possible do not allow the swapoff.

* Allocate all the free bitmap bits related to the swap device you
  are trying to remove to prevent pageouts to the device you are
  removing.

* Flag the swap device being removed and then scan all OBJT_SWAP
  VM Objects looking for swap blocks associated with the device,
  and force a page-in of those blocks.  The getpages code for the
  swap backing store would detect the flag and not clear the swap
  bitmap bits as it pages-in the data.

  (Forcing a pagein may force pages to cycle back out to another
  swap device, so special treatment of the paged-in pages (like
  immediately placing it in the VM page cache instead of the 
  active or inactive queues) is necessary to reduce load effects
  on the system.

* The swap device being removed can now be closed and the related
  swap device index marked free.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread David Schultz

Thus spake Matthew Dillon [EMAIL PROTECTED]:
 Implementing swapoff is a bunch of grunt-work but not too hard in
 concept.  Basically the work involved is this:

Sounds like a plan, and not too tricky.  Perhaps I'll see if I can
figure it out when I have some free time.

   * Make a calculation to be sure that it is possible to turn off
 the swap device and not run the system out of VM.  If it is not
 possible do not allow the swapoff.

Can't you have a race condition here where you decide that you
have enough space, and by the time you've deallocated half of the
swapfile that's no longer the case?  It seems like the correct
thing to do in that case is abort the system call (which could be
painful).  Perhaps the best thing to do in this case is wait for
vm_pageout_scan to kill a few pigs.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Matthew Dillon


:Can't you have a race condition here where you decide that you
:have enough space, and by the time you've deallocated half of the
:swapfile that's no longer the case?  It seems like the correct
:thing to do in that case is abort the system call (which could be
:painful).  Perhaps the best thing to do in this case is wait for
:vm_pageout_scan to kill a few pigs.

I wouldn't worry about it.  Nobody turns off swap on a running system
at a whim.  It just needs to prevent stupid mistakes like trying to
remove a swap device without having adequate memory + other swap to
take care of the data.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Peter Wemm

David Schultz wrote:
 Thus spake Matthew Dillon [EMAIL PROTECTED]:
  Implementing swapoff is a bunch of grunt-work but not too hard in
  concept.  Basically the work involved is this:
 
 Sounds like a plan, and not too tricky.  Perhaps I'll see if I can
 figure it out when I have some free time.
 
  * Make a calculation to be sure that it is possible to turn off
the swap device and not run the system out of VM.  If it is not
possible do not allow the swapoff.
 
 Can't you have a race condition here where you decide that you
 have enough space, and by the time you've deallocated half of the
 swapfile that's no longer the case?  It seems like the correct
 thing to do in that case is abort the system call (which could be
 painful).  Perhaps the best thing to do in this case is wait for
 vm_pageout_scan to kill a few pigs.

One system I used to use years and years ago seperated this process into
stages.  The swap(1M) command could be used to enable, disable and 'weight'
allocation to swap areas.  The add was easy.  'delete' would cause the
device to be attempted to be paged in, but if the system looked like it was
going to run out of resources it would fail and stop right there.  You could
either turn allocation back on, or kill processes or wait for the pager catch
up with moving stuff out to other swap spaces.  When (if) it finally hit
zero inuse, it would be deleted.

It did manage multiple swap spaces as seperate entities with different fill
levels etc [rather than one giant logical swap area], so doing it this way
kinda made sense.  I did actually use it once and it even worked. :-)
(I cannibalized my /tmp file system and used it for swap for a project, and
 then turned it off and re-mkfs'ed it)

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Andrey Alekseyev

Also this would probably be useful in the situation when you need
to change swap device on a running system.  We had to do this once
or twice on a very busy commerical mail server running Solaris. We
needed to dismount current swap device and use it for other purpose
while having switched paging/swapping to another disk.

 I wouldn't worry about it.  Nobody turns off swap on a running system
 at a whim.  It just needs to prevent stupid mistakes like trying to
 remove a swap device without having adequate memory + other swap to
 take care of the data.
 
   -Matt
   Matthew Dillon 
   [EMAIL PROTECTED]


-- 
Andrey Alekseyev. Zenon N.S.P.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Terry Lambert

Matthew Dillon wrote:
 * Flag the swap device being removed and then scan all OBJT_SWAP
   VM Objects looking for swap blocks associated with the device,
   and force a page-in of those blocks.  The getpages code for the
   swap backing store would detect the flag and not clear the swap
   bitmap bits as it pages-in the data.
 
   (Forcing a pagein may force pages to cycle back out to another
   swap device, so special treatment of the paged-in pages (like
   immediately placing it in the VM page cache instead of the
   active or inactive queues) is necessary to reduce load effects
   on the system.

Uh... so you set the bit that tells you it's allocated to prevent
it being allocated?

When I swap something in and the bit is set, how do I know that it's
in, except that it's not allocated?

In other words, I do what you say... how do I know when the device
has been drained out, vs. being in use?

I think you have to disable swapping to the device some other way,
and then return fromt he swapoff only when the bitmap is all zero.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Jon Mini

On Sat, Jul 13, 2002 at 03:17:33AM -0700, Terry Lambert wrote:
 Matthew Dillon wrote:
  * Flag the swap device being removed and then scan all OBJT_SWAP
VM Objects looking for swap blocks associated with the device,
and force a page-in of those blocks.  The getpages code for the
swap backing store would detect the flag and not clear the swap
bitmap bits as it pages-in the data.
  
(Forcing a pagein may force pages to cycle back out to another
swap device, so special treatment of the paged-in pages (like
immediately placing it in the VM page cache instead of the
active or inactive queues) is necessary to reduce load effects
on the system.
 
 Uh... so you set the bit that tells you it's allocated to prevent
 it being allocated?
 
 When I swap something in and the bit is set, how do I know that it's
 in, except that it's not allocated?

 In other words, I do what you say... how do I know when the device
 has been drained out, vs. being in use?
 
 I think you have to disable swapping to the device some other way,
 and then return fromt he swapoff only when the bitmap is all zero.
 

Well, I think that disabling the device another way would probably
be a better approach, but you don't *have* to do it another way.
You can:

  1) Flag the device as being removed.
  2) Scan through the bitmap, and for each page allocated, add it to a list
 if pages to be paged in. For each page free, set its bit to keep it from
 being used.
  3) Once you've set all of the bits to 1, force a pagein of every page on the
 list. If the size of the list of prohibitive, you can force the pagein
 every time the list becomes full (a low/high watermark system would
 probably be the most effective).
  3) When pages get paged in, check the 'device being removed' flag
 and only clear the bit in the bitmap if the flag isn't set. Also,
 decrement a counter of the total pages in use on the device.
  4) When the counter reaches zero, remove the device.

One would need to increment the counter when they page out, obviously, but
that is not a problem.

This works, but it has the potential problem that if the list of pages to be
paged in can't grow large enough, you might page pages out to the device you're
trying to disable, only to page them back in basically immediately. This is
rather silly, but would still function.

Personally, I think it would be more intuitive to add a check to the allocation
algorithm that forces it to not consider devices flaged for removal, and
mark each page as free after it comes in. When the bitmap is clean you're done. 
However, you're still going to want to count the pages allocated on a device
because it's a lot easier to check the counter than to scan the whole bitmap.

-- 
Jonathan Mini [EMAIL PROTECTED]
http://www.freebsd.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Terry Lambert

Jon Mini wrote:
 Personally, I think it would be more intuitive to add a check to the allocation
 algorithm that forces it to not consider devices flaged for removal, and
 mark each page as free after it comes in. When the bitmap is clean you're done.
 However, you're still going to want to count the pages allocated on a device
 because it's a lot easier to check the counter than to scan the whole bitmap.

Flagging is how you would have to do it.

Counting is fairly useless, since it assumes that, given a bit in
the bitmap, you can reverse lookup the page that points to it, and
there is not a reverse mapping for this, only a forward (swap
mappings swapped out don't write swap metadata to the disk).  I
guess if you did it over and over again, you could know when you
are done.

I think that what has to happen is that you are going to have to
scan all page mappings in the system to find anyone using the
device, and page it in that way.  My reasoning on this is that the
page mappings exist only in the context of the current process
address space.  This means that unless you force in each process
in turn so that the mappings are relevent to its address space,
and then force out what's in core (to a new device) in order to
get the pages swapped to the disappearing device, the only really
clean way to do it is to rewrite the mappings after manually
copying the page from one device to the other using non-swappable
pages in the kernel address space as temporary buffers.

It's doable, but it's ugly... it's nearly the same problem you'd
face if you wanted to defrag kernel memory so that you could do
a contigmalloc very late in the game, only a bit easier (the
kernel does not set ELF attributes to indicate pages containg
code or data not in the paging path for you to do this).

I'm not positive if you would need to lock the pages you are moving,
or not.  I think so.  The problem is when you have a program that
has something like a writeable data page from a shared library,
which has been copied via copy-on-write, and then the program that
did it forks (this happens all the time in any fork for libc and
similar offset fixups).  You would end up needing to lock all
processes which referenced a page that had been swapped out,
with multiple vm_object_t's.

Maybe I'm missing something, and there really is a way to get this
information back from a known bit in the bitmap... Matt?  It was
your suggestion?  Am I not seeing a function right in front of me?

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread David Schultz

Thus spake Peter Wemm [EMAIL PROTECTED]:
 One system I used to use years and years ago seperated this process into
 stages.  The swap(1M) command could be used to enable, disable and 'weight'
 allocation to swap areas.  The add was easy.  'delete' would cause the
 device to be attempted to be paged in, but if the system looked like it was
 going to run out of resources it would fail and stop right there.  You could
 either turn allocation back on, or kill processes or wait for the pager catch
 up with moving stuff out to other swap spaces.  When (if) it finally hit
 zero inuse, it would be deleted.
 
 It did manage multiple swap spaces as seperate entities with different fill
 levels etc [rather than one giant logical swap area], so doing it this way
 kinda made sense.  I did actually use it once and it even worked. :-)
 (I cannibalized my /tmp file system and used it for swap for a project, and
  then turned it off and re-mkfs'ed it)

The weight idea is very interesting.  NetBSD does this using
priorities; all the swap devices of a given priority are filled
round robin before devices of lower priority, the idea being that
the slower ones are a last resort (e.g. NFS).  On the other hand,
this design allows large and fast swap devices to start swapping
to death before the `backup' devices see any action.  It isn't
clear to me whether priorities or fill levels are better.
(Certainly a hybrid is possible, that is, weights within priority
levels.)

This may be a better project for me than swapoff in the immediate
future because I won't have to understand how to track down the
appropriate VM objects and handle them in a kosher manner.
Implementing weights/priorities will also involve dynamically
allocating struct swdevt's, which should be done anyway and will
only be harder after swapoff() is written.

BTW, I believe the comment about swfree() in vm_swap.c is outdated
as of rev. 1.17, and nothing uses SW_FREED anymore.  This means
that technically, swap devices don't have any flags right now, but
that could change with swapoff().

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Terry Lambert

David Schultz wrote:
 The weight idea is very interesting.  NetBSD does this using
 priorities; all the swap devices of a given priority are filled
 round robin before devices of lower priority, the idea being that
 the slower ones are a last resort (e.g. NFS).  On the other hand,
 this design allows large and fast swap devices to start swapping
 to death before the `backup' devices see any action.  It isn't
 clear to me whether priorities or fill levels are better.
 (Certainly a hybrid is possible, that is, weights within priority
 levels.)

I like the idea of a moving average on time-from-request-to-service.
8-).  Works great for Server Load Balancing, too.  The moving
average takes load into account, without explicit load notification
(i.e. no need to have a load notification protocol between NFS
clients and servers, etc.).


 This may be a better project for me than swapoff in the immediate
 future because I won't have to understand how to track down the
 appropriate VM objects and handle them in a kosher manner.
 Implementing weights/priorities will also involve dynamically
 allocating struct swdevt's, which should be done anyway and will
 only be harder after swapoff() is written.

8-).  Now that everyone is talking about it, better get my
hacks in first, so that other people have to integrate with my
changes, instead of the other way around...

Actually, I think it's a nice idea for an incremental project.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Matthew Dillon

We are not going to be doing any sort of weighting.  It's an idea whos
time has come... and gone again.  It might have been useful 8 years ago
but it is not useful today.

Also, please note that it is not possible to reverse-lookup a swap bitmap
block and get the VM object / page number.  The OBJT_SWAP VM objects have
to be scanned to get the swap bitmap blocks.  Nor does it make much sense
to try to 'record' the blocks somewhere, there could be hundreds of 
thousands of blocks and memory is not normally a luxury in this situation.

All you need to do is prevent new blocks from being allocated from the
old swap device.  Since the radix tree bitmap code cannot make a
distinction between devices the easiest way to do this is to simply
allocate all the free bits associated with the device (which you can do),
and prevent any existing allocated blocks from being freed from the
bitmap (which is a simple calculation) ... and of course mark the page
dirty again since its backing store is being ripped out from under it.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread David Schultz

Thus spake Matthew Dillon [EMAIL PROTECTED]:
 We are not going to be doing any sort of weighting.  It's an idea whos
 time has come... and gone again.  It might have been useful 8 years ago
 but it is not useful today.
 
 Also, please note that it is not possible to reverse-lookup a swap bitmap
 block and get the VM object / page number.  The OBJT_SWAP VM objects have
 to be scanned to get the swap bitmap blocks.  Nor does it make much sense
 to try to 'record' the blocks somewhere, there could be hundreds of 
 thousands of blocks and memory is not normally a luxury in this situation.

I'm aware of that.  That's why swapoff is a harder project; it
requires working at more levels of abstraction, not all of which I
fully understand yet.  At least most of the VM stuff is
well-documented now. ;-)

 All you need to do is prevent new blocks from being allocated from the
 old swap device.  Since the radix tree bitmap code cannot make a
 distinction between devices the easiest way to do this is to simply
 allocate all the free bits associated with the device (which you can do),
 and prevent any existing allocated blocks from being freed from the
 bitmap (which is a simple calculation) ... and of course mark the page
 dirty again since its backing store is being ripped out from under it.

This makes sense.  I was originally thinking of marking the device
as off-limits to new allocations, but I realize now why that would
not work.  As long as the logical swap blocks that correspond to
the device are still fair game for the swap pager, swapdev_strategy
will still have to swap out to the device.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-13 Thread Peter Wemm

Matthew Dillon wrote:
 We are not going to be doing any sort of weighting.  It's an idea whos
 time has come... and gone again.  It might have been useful 8 years ago
 but it is not useful today.

Thank goodness! :-)

Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
All of this is for nothing if we don't go to the stars - JMS/B5


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



swapoff?

2002-07-12 Thread Sean Kelly

Not sure if this is offtopic here or not, so apologies ahead of time if so.

It has been many years since I used Linux, but one thing I recall is that
there was a `swapoff` command in Linux to complement the `swapon` command.
Are there any patches or plans to implement such a thing in FreeBSD?

My familiarity with the workings of FreeBSD is still pretty minimal. Are
there certain reasons that there currently is no way to stop paging to a
device/file?

-- 
Sean Kelly | PGP KeyID: 77042C7B
[EMAIL PROTECTED] | http://www.zombie.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-12 Thread David Schultz

Thus spake Sean Kelly [EMAIL PROTECTED]:
 My familiarity with the workings of FreeBSD is still pretty minimal. Are
 there certain reasons that there currently is no way to stop paging to a
 device/file?

I imagine the implementation of this would be complicated, as it
is in Linux.  You'd have to prevent further allocations on the
swap device, then figure out where to evict the pages already
allocated on the device.  You also have to be able to back out if
you run out of space to put things in the process.  Maybe someone
who is familiar with the race conditions involved will implement
it some day, but swapoff would only occasionally be useful... at
least until everyone is using hot-swappable swap.  ;-)

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: swapoff?

2002-07-12 Thread David Schultz

BTW, NetBSD's new UVM code has the ability to do this.  Perhaps
it's worth looking in to how difficult it would really be in FreeBSD...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message