Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Andrea Arcangeli

On Fri, May 25, 2001 at 10:01:37PM -0400, Ben LaHaise wrote:
> On Sat, 26 May 2001, Andrea Arcangeli wrote:
> 
> > On Fri, May 25, 2001 at 09:38:36PM -0400, Ben LaHaise wrote:
> > > You're missing a few subtle points:
> > >
> > >   1. reservations are against a specific zone
> >
> > A single zone is not used only for one thing, period. In my previous
> > email I enlighted the only conditions under which a reserved pool can
> > avoid a deadlock.
> 
> Well, until we come up with a better design for a zone allocator that
> doesn't involve walking lists and polluting the cache all over the place,
> it'll be against a single zone.

I meant each zone is used by more than one user, that definitely won't
change.

> > >   2. try_to_free_pages uses the swap reservation
> >
> > try_to_free_pages has an huge stacking under it, bounce
> > bufferes/loop/nbd/whatever being just some of them.
> 
> Fine, then add one to the bounce buffer allocation code, it's all of about
> 3 lines added.

Yes, you should add it to the bounce buffers to the loop to nbd to
whatever and then remove it from all other places that you put into it
right now. That's why I'm saying your patch won't fix anything (but
hide) as it was in its first revision.

> I never said you didn't.  But Ingo's patch DOES NOT PROTECT AGAINST
> DEADLOCKS CAUSED BY INTERRUPT ALLOCATIONS.  Heck, it doesn't even fix the

It does, but only for the create_bounces. As said if you want to "fix",
not to "hide" you need to address every single case, a generic reserved
pool is just useless. Now try to get a dealdock in 2.4.5 with tasks
locked up in create_bounces() if you say it does not protect against
irqs. see?

> That said, the reservation concept is generic code, which the bounce
> buffer patch most certainly isn't.  It can even be improved to overlap

What I said is that instead of handling the pool by hand in every single
place one could write an API to generalize it, but very often a simple
API hooked into the page allocator may not be flexible enough to
guarantee the kind of allocations we need, highmem is just one example
where we need more flexibility not just for the pages but also for the
bh (but ok that's mostly an implementation issue, if you do the math
right, it's harder but you can still use a generic API).

> with the page cache pages in the zone, so it isn't even really "free" ram
> as currently implemented.

yes that would be a very nice property, again I'm not against a generic
interface for creating reserved pool of memory (I mentioned that
possibilty before reading your patch after all). It's just the
implemetation (mainly the per-task hook overwritten) that didn't
convinced me and the usage that was at least apparently obviously wrong
to my eyes.

> Re-read the above and reconsider.  The reservation doesn't need to be
> permitted until after page_alloc has blocked.  Heck, do a schedule before

I don't see what you mean here, could you elaborate?

> Atomicity isn't what I care about.  It's about being able to keep memory
> around so that certain allocations can proceed, and those pools cannot be
> eaten into by other tasks.  [..]

Those pools needs to be gloabl unless
you want to allocate them at fork() for every single task like you did
for some the kernel threads, and if you make them global per-zone or
per-whatever not every single case it will deadlock.  Or better it will
works by luck, it proceeds until you don't have a case where you didn't
only needed 32 pages reserved, but where you needed 33 pages reserved to
avoid the deadlock, it's on the same lines of the pci_map_* brokeness in
some sense.

Allocating those pools per-task is just a total waste, those are
"security" pools, in the 99% of cases you won't need them and you will
allocate the memory dynamically, they just needs to be there for
correctness and to avoid the dealdock very seldom.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Andrea Arcangeli

On Fri, May 25, 2001 at 09:38:36PM -0400, Ben LaHaise wrote:
> You're missing a few subtle points:
> 
>   1. reservations are against a specific zone

A single zone is not used only for one thing, period. In my previous
email I enlighted the only conditions under which a reserved pool can
avoid a deadlock.

>   2. try_to_free_pages uses the swap reservation

try_to_free_pages has an huge stacking under it, bounce
bufferes/loop/nbd/whatever being just some of them.

>   3. irqs can no longer eat memory allocations that are needed for
>  swap

you don't even need irq to still deadlock.

> Note that with this patch the current garbage in the zone structure with
> pages_min (which doesn't work reliably) becomes obsolete.

The "garbage" is just an heuristic to allow atomic allocation to work in
the common case dynamically. Anything deadlock related cannot rely on
pages_min.

I am talking about fixing the thing, of course I perfectly know you can
hide it pretty well, but I definitely hate those kind of hiding patches.

> > The only case where a reserved pool make sense is when you know that
> > waiting (i.e. running a task queue, scheduling and trying again to
> > allocate later) you will succeed the allocation for sure eventually
> > (possibly with a FIFO policy to make also starvation impossible, not
> > only deadlocks). If you don't have that guarantee those pools
> > atuomatically become only a sourcecode and runtime waste, possibly they
> > could hide core bugs in the allocator or stuff like that.
> 
> You're completely wrong here.

I don't think so.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Rik van Riel

On Sat, 26 May 2001, Andrea Arcangeli wrote:

> Please merge this one in 2.4 for now (originally from Ingo, I only
> improved it), this is a real definitive fix

With the only minor detail being that it DOESN'T WORK.

You're not solving the problems of GFP_BUFFER allocators
looping forever in __alloc_pages(), the deadlock can still
happen.

You've only solved the 1 specific case of highmem.c getting
a page for bounce buffers, but you'll happily let the thing
deadlock while trying to get buffer heads for a normal low
memory page!

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Andrea Arcangeli

On Fri, May 25, 2001 at 08:29:38PM -0400, Ben LaHaise wrote:
> amount of bounce buffers to guarentee progress while submitting io.  The
> -ac kernels have a patch from Ingo that provides private pools for bounce
> buffers and buffer_heads.  I went a step further and have a memory
> reservation patch that provides for memory pools being reserved against a
> particular zone.  This is needed to prevent the starvation that irq
> allocations can cause.
> 
> Some of these cleanups are 2.5 fodder, but we really need something in 2.4
> right now, so...

Please merge this one in 2.4 for now (originally from Ingo, I only
improved it), this is a real definitive fix and there's no nicer way to
handle that unless you want to generalize an API for people to generate
private anti-deadlock ("make sure to always make a progress") memory
pools:

diff -urN 2.4.4/mm/highmem.c highmem-deadlock/mm/highmem.c
--- 2.4.4/mm/highmem.c  Sat Apr 28 05:24:48 2001
+++ highmem-deadlock/mm/highmem.c   Sat Apr 28 18:21:24 2001
@@ -159,6 +159,19 @@
spin_unlock(_lock);
 }
 
+#define POOL_SIZE 32
+
+/*
+ * This lock gets no contention at all, normally.
+ */
+static spinlock_t emergency_lock = SPIN_LOCK_UNLOCKED;
+
+int nr_emergency_pages;
+static LIST_HEAD(emergency_pages);
+
+int nr_emergency_bhs;
+static LIST_HEAD(emergency_bhs);
+
 /*
  * Simple bounce buffer support for highmem pages.
  * This will be moved to the block layer in 2.5.
@@ -203,17 +216,72 @@
 
 static inline void bounce_end_io (struct buffer_head *bh, int uptodate)
 {
+   struct page *page;
struct buffer_head *bh_orig = (struct buffer_head *)(bh->b_private);
+   unsigned long flags;
 
bh_orig->b_end_io(bh_orig, uptodate);
-   __free_page(bh->b_page);
+
+   page = bh->b_page;
+
+   spin_lock_irqsave(_lock, flags);
+   if (nr_emergency_pages >= POOL_SIZE)
+   __free_page(page);
+   else {
+   /*
+* We are abusing page->list to manage
+* the highmem emergency pool:
+*/
+   list_add(>list, _pages);
+   nr_emergency_pages++;
+   }
+   
+   if (nr_emergency_bhs >= POOL_SIZE) {
 #ifdef HIGHMEM_DEBUG
-   /* Don't clobber the constructed slab cache */
-   init_waitqueue_head(>b_wait);
+   /* Don't clobber the constructed slab cache */
+   init_waitqueue_head(>b_wait);
 #endif
-   kmem_cache_free(bh_cachep, bh);
+   kmem_cache_free(bh_cachep, bh);
+   } else {
+   /*
+* Ditto in the bh case, here we abuse b_inode_buffers:
+*/
+   list_add(>b_inode_buffers, _bhs);
+   nr_emergency_bhs++;
+   }
+   spin_unlock_irqrestore(_lock, flags);
 }
 
+static __init int init_emergency_pool(void)
+{
+   spin_lock_irq(_lock);
+   while (nr_emergency_pages < POOL_SIZE) {
+   struct page * page = alloc_page(GFP_ATOMIC);
+   if (!page) {
+   printk("couldn't refill highmem emergency pages");
+   break;
+   }
+   list_add(>list, _pages);
+   nr_emergency_pages++;
+   }
+   while (nr_emergency_bhs < POOL_SIZE) {
+   struct buffer_head * bh = kmem_cache_alloc(bh_cachep, SLAB_ATOMIC);
+   if (!bh) {
+   printk("couldn't refill highmem emergency bhs");
+   break;
+   }
+   list_add(>b_inode_buffers, _bhs);
+   nr_emergency_bhs++;
+   }
+   spin_unlock_irq(_lock);
+   printk("allocated %d pages and %d bhs reserved for the highmem bounces\n",
+  nr_emergency_pages, nr_emergency_bhs);
+
+   return 0;
+}
+
+__initcall(init_emergency_pool);
+
 static void bounce_end_io_write (struct buffer_head *bh, int uptodate)
 {
bounce_end_io(bh, uptodate);
@@ -228,6 +296,82 @@
bounce_end_io(bh, uptodate);
 }
 
+struct page *alloc_bounce_page (void)
+{
+   struct list_head *tmp;
+   struct page *page;
+
+repeat_alloc:
+   page = alloc_page(GFP_BUFFER);
+   if (page)
+   return page;
+   /*
+* No luck. First, kick the VM so it doesnt idle around while
+* we are using up our emergency rations.
+*/
+   wakeup_bdflush(0);
+
+   /*
+* Try to allocate from the emergency pool.
+*/
+   tmp = _pages;
+   spin_lock_irq(_lock);
+   if (!list_empty(tmp)) {
+   page = list_entry(tmp->next, struct page, list);
+   list_del(tmp->next);
+   nr_emergency_pages--;
+   }
+   spin_unlock_irq(_lock);
+   if (page)
+   return page;
+
+   /* we need to wait I/O completion */
+   run_task_queue(_disk);
+
+   current->policy |= SCHED_YIELD;
+   __set_current_state(TASK_RUNNING);
+   schedule();
+   goto repeat_alloc;
+}
+
+struct 

Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Ben LaHaise wrote:
>
> Highmem systems currently manage to hang themselves quite completely upon
> running out of memory in the normal zone.  One of the failure modes is
> looping in __alloc_pages from get_unused_buffer_head to map a dirty page.
> Another results in looping on allocation of a bounce page for writing a
> dirty highmem page back to disk.

That's not the part of the patch I object to - fixing that is fine.

What I object to it that it special-cases the zone names, even though that
doesn't necessarily make any sense at all.

What about architectures that have other zones? THAT is the kind of
fundmanental design mistake that special-casing DMA and NORMAL is horrible
for.

alloc_pages() doesn't have that kind of problem. To alloc_pages(),
GFP_BUFFER is not "oh, DMA or NORMAL". There, it is simply "oh, use the
zonelist pointed to by GFP_BUFFER". No special casing, no stupid #ifdef
CONFIG_HIGHMEM.

THAT is what I object to.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:
>
> The function do_try_to_free_pages() also gets called when we're
> only short on inactive pages, but we still have TONS of free
> memory. In that case, I don't think we'd actually want to steal
> free memory from anyone.

Well, kmem_cache_reap() doesn't even steal memory from anybody: it just
makes this "tagged for xxx" memory be available to "non-xxx" users too.

And the fact that we're calling do_try_to_free_pages() at all obviously
implies that even if we have memory free, it isn't in the right hands..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Rik van Riel

On Fri, 25 May 2001, Linus Torvalds wrote:

> Oh, also: the logic behind the change of the kmem_cache_reap() - instead
> of making it conditional on the _reverse_ test of what it has historically
> been, why isn't it just completely unconditional? You've basically
> dismissed the only valid reason for it to have been (illogically)
> conditional, so I'd have expected that just _removing_ the test is better
> than reversing it like your patch does..
>
> No?

The function do_try_to_free_pages() also gets called when we're
only short on inactive pages, but we still have TONS of free
memory. In that case, I don't think we'd actually want to steal
free memory from anyone.

Moving it into the same if() conditional the other memory
freeers are in would make sense, though ...

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Linus Torvalds



On Sat, 26 May 2001, Alan Cox wrote:
>
> But Linus is right I think - VM changes often prove 'interesting'. Test it in
> -ac , gets some figures for real world use then plan further

.. on the other hand, thinking more about this, I'd rather be called
"stupid" than "stodgy".

So I think I'll buy some experimentation. That HIGHMEM change is too ugly
to live, though, I'd really like to hear more about why something that
ugly is necessary.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Rik van Riel

On Sat, 26 May 2001, Alan Cox wrote:

> But Linus is right I think - VM changes often prove
> 'interesting'. Test it in -ac , gets some figures for real world
> use then plan further

Oh well. As long as he takes the patch to page_alloc.c, otherwise
everybody _will_ have to "experiment" with the -ac kernels just
to have a system with highmem which doesn't deadlock ;)

cheers,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:
>
> Without the patch my workstation (with ~180MB RAM) usually has
> between 50 and 80MB of inode/dentry cache and real usable stuff
> gets  swapped out.

All I want is more people giving feedback.

It's clear that neither my nor your machine is a good thing to base things
on.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:
>
> Yeah, I guess the way Linux 2.2 balances things is way too
> experimental ;)

Ehh.. Take a look at the other differences between the VM's. Which may
make a 2.2.x approach completely bogus.

And take a look at how long the 2.2.x VM took to stabilize, and how
INCREDIBLY BAD some of those kernels were.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Rik van Riel

On Fri, 25 May 2001, Linus Torvalds wrote:
> On Fri, 25 May 2001, Rik van Riel wrote:
> >
> > OK, shoot me.  Here it is again, this time _with_ patch...
>
> I'm not going to apply this as long as it plays experimental
> games with "shrink_icache()" and friends. I haven't seen anybody
> comment on the performance on this, and I can well imagine that
> it would potentially shrink the dcache too aggressively when
> there are lots of inactive-dirty pages around, where
> page_launder is the right thing to do, and shrinking
> icache/dcache might not be.

Your analysis exactly describes what happens in your current
code, though I have to admit that my patch doesn't change it.

Without the patch my workstation (with ~180MB RAM) usually has
between 50 and 80MB of inode/dentry cache and real usable stuff
gets  swapped out.

Either you can go make Linux 2.4 usable again for normal people,
or you can go buy us all a gig of ram.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Rik van Riel

On Fri, 25 May 2001, Linus Torvalds wrote:
> On Fri, 25 May 2001, Rik van Riel wrote:
> >
> > OK, shoot me.  Here it is again, this time _with_ patch...
>
> I'm not going to apply this as long as it plays experimental games with
> "shrink_icache()" and friends. I haven't seen anybody comment on the
> performance on this,

Yeah, I guess the way Linux 2.2 balances things is way too
experimental ;)


Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:
>
> OK, shoot me.  Here it is again, this time _with_ patch...

I'm not going to apply this as long as it plays experimental games with
"shrink_icache()" and friends. I haven't seen anybody comment on the
performance on this, and I can well imagine that it would potentially
shrink the dcache too aggressively when there are lots of inactive-dirty
pages around, where page_launder is the right thing to do, and shrinking
icache/dcache might not be.

I'd really like to avoid having the MM stuff fluctuate too much. Have
people tested this under different loads etc?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[with-PATCH-really] highmem deadlock removal, balancing & cleanup

2001-05-25 Thread Rik van Riel

OK, shoot me.  Here it is again, this time _with_ patch...

-- Forwarded message --
Date: Fri, 25 May 2001 16:53:38 -0300 (BRST)
From: Rik van Riel <[EMAIL PROTECTED]>

Hi Linus,

the following patch does:

1) Remove GFP_BUFFER and HIGHMEM related deadlocks, by letting
   these allocations fail instead of looping forever in
   __alloc_pages() when they cannot make any progress there.

   Now Linux no longer hangs on highmem machines with heavy
   write loads.

2) Clean up the __alloc_pages() / __alloc_pages_limit() code
   a bit, moving the direct reclaim condition from the latter
   function into the former so we run it less often ;)

3) Remove the superfluous wakeups from __alloc_pages(), not
   only are the tests a real CPU eater, they also have the
   potential of waking up bdflush in a situation where it
   shouldn't run in the first place.  The kswapd wakeup didn't
   seem to have any effect either.

4) Do make sure GFP_BUFFER allocations NEVER eat into the
   very last pages of the system. It is important to preserve
   the following ordering:
- normal allocations
- GFP_BUFFER
- atomic allocations
- other recursive allocations

   Using this ordering, we can be pretty sure that eg. a
   GFP_BUFFER allocation to swap something out to an
   encrypted device won't eat the memory the device driver
   will need to perform its functions. It also means that
   a gigabit network flood won't eat those pages...

5) Change nr_free_buffer_pages() a bit to not return pages
   which cannot be used as buffer pages, this makes a BIG
   difference on highmem machines (which now DO have a working
   write throttling again).

6) Simplify the refill_inactive() loop enough that it actually
   works again. Calling page_launder() and shrink_i/d_memory()
   by the same if condition means that the different caches
   get balanced against each other again.

   The illogical argument for not shrinking the slab cache
   while we're under a free shortage turned out to be very
   much illogical too.  All needed buffer heads will have been
   allocated in page_launder() and shrink_i/d_memory() before
   we get here and we can be pretty sure that these functions
   will keep re-using those same buffer heads as soon as the
   IO finishes.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/



--- linux-2.4.5-pre6/mm/page_alloc.c.orig   Fri May 25 16:13:39 2001
+++ linux-2.4.5-pre6/mm/page_alloc.cFri May 25 16:35:50 2001
@@ -251,10 +251,10 @@
water_mark = z->pages_high;
}

-   if (z->free_pages + z->inactive_clean_pages > water_mark) {
+   if (z->free_pages + z->inactive_clean_pages >= water_mark) {
struct page *page = NULL;
/* If possible, reclaim a page directly. */
-   if (direct_reclaim && z->free_pages < z->pages_min + 8)
+   if (direct_reclaim)
page = reclaim_page(z);
/* If that fails, fall back to rmqueue. */
if (!page)
@@ -299,21 +299,6 @@
if (order == 0 && (gfp_mask & __GFP_WAIT))
direct_reclaim = 1;

-   /*
-* If we are about to get low on free pages and we also have
-* an inactive page shortage, wake up kswapd.
-*/
-   if (inactive_shortage() > inactive_target / 2 && free_shortage())
-   wakeup_kswapd();
-   /*
-* If we are about to get low on free pages and cleaning
-* the inactive_dirty pages would fix the situation,
-* wake up bdflush.
-*/
-   else if (free_shortage() && nr_inactive_dirty_pages > free_shortage()
-   && nr_inactive_dirty_pages >= freepages.high)
-   wakeup_bdflush(0);
-
 try_again:
/*
 * First, see if we have any zones with lots of free memory.
@@ -329,7 +314,7 @@
if (!z->size)
BUG();

-   if (z->free_pages >= z->pages_low) {
+   if (z->free_pages >= z->pages_min + 8) {
page = rmqueue(z, order);
if (page)
return page;
@@ -443,18 +428,26 @@
}
/*
 * When we arrive here, we are really tight on memory.
+* Since kswapd didn't succeed in freeing pages for us,
+* we try to help it.
+*
+* Single page allocs loop until the allocation succeeds.
+* Multi-page allocs can fail due to memory fragmentation;
+* in that case we bail out to prevent infinite loops and
+ 

[with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Rik van Riel

OK, shoot me.  Here it is again, this time _with_ patch...

-- Forwarded message --
Date: Fri, 25 May 2001 16:53:38 -0300 (BRST)
From: Rik van Riel [EMAIL PROTECTED]

Hi Linus,

the following patch does:

1) Remove GFP_BUFFER and HIGHMEM related deadlocks, by letting
   these allocations fail instead of looping forever in
   __alloc_pages() when they cannot make any progress there.

   Now Linux no longer hangs on highmem machines with heavy
   write loads.

2) Clean up the __alloc_pages() / __alloc_pages_limit() code
   a bit, moving the direct reclaim condition from the latter
   function into the former so we run it less often ;)

3) Remove the superfluous wakeups from __alloc_pages(), not
   only are the tests a real CPU eater, they also have the
   potential of waking up bdflush in a situation where it
   shouldn't run in the first place.  The kswapd wakeup didn't
   seem to have any effect either.

4) Do make sure GFP_BUFFER allocations NEVER eat into the
   very last pages of the system. It is important to preserve
   the following ordering:
- normal allocations
- GFP_BUFFER
- atomic allocations
- other recursive allocations

   Using this ordering, we can be pretty sure that eg. a
   GFP_BUFFER allocation to swap something out to an
   encrypted device won't eat the memory the device driver
   will need to perform its functions. It also means that
   a gigabit network flood won't eat those pages...

5) Change nr_free_buffer_pages() a bit to not return pages
   which cannot be used as buffer pages, this makes a BIG
   difference on highmem machines (which now DO have a working
   write throttling again).

6) Simplify the refill_inactive() loop enough that it actually
   works again. Calling page_launder() and shrink_i/d_memory()
   by the same if condition means that the different caches
   get balanced against each other again.

   The illogical argument for not shrinking the slab cache
   while we're under a free shortage turned out to be very
   much illogical too.  All needed buffer heads will have been
   allocated in page_launder() and shrink_i/d_memory() before
   we get here and we can be pretty sure that these functions
   will keep re-using those same buffer heads as soon as the
   IO finishes.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/



--- linux-2.4.5-pre6/mm/page_alloc.c.orig   Fri May 25 16:13:39 2001
+++ linux-2.4.5-pre6/mm/page_alloc.cFri May 25 16:35:50 2001
@@ -251,10 +251,10 @@
water_mark = z-pages_high;
}

-   if (z-free_pages + z-inactive_clean_pages  water_mark) {
+   if (z-free_pages + z-inactive_clean_pages = water_mark) {
struct page *page = NULL;
/* If possible, reclaim a page directly. */
-   if (direct_reclaim  z-free_pages  z-pages_min + 8)
+   if (direct_reclaim)
page = reclaim_page(z);
/* If that fails, fall back to rmqueue. */
if (!page)
@@ -299,21 +299,6 @@
if (order == 0  (gfp_mask  __GFP_WAIT))
direct_reclaim = 1;

-   /*
-* If we are about to get low on free pages and we also have
-* an inactive page shortage, wake up kswapd.
-*/
-   if (inactive_shortage()  inactive_target / 2  free_shortage())
-   wakeup_kswapd();
-   /*
-* If we are about to get low on free pages and cleaning
-* the inactive_dirty pages would fix the situation,
-* wake up bdflush.
-*/
-   else if (free_shortage()  nr_inactive_dirty_pages  free_shortage()
-nr_inactive_dirty_pages = freepages.high)
-   wakeup_bdflush(0);
-
 try_again:
/*
 * First, see if we have any zones with lots of free memory.
@@ -329,7 +314,7 @@
if (!z-size)
BUG();

-   if (z-free_pages = z-pages_low) {
+   if (z-free_pages = z-pages_min + 8) {
page = rmqueue(z, order);
if (page)
return page;
@@ -443,18 +428,26 @@
}
/*
 * When we arrive here, we are really tight on memory.
+* Since kswapd didn't succeed in freeing pages for us,
+* we try to help it.
+*
+* Single page allocs loop until the allocation succeeds.
+* Multi-page allocs can fail due to memory fragmentation;
+* in that case we bail out to prevent infinite loops and
+* hanging device drivers 

Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:

 OK, shoot me.  Here it is again, this time _with_ patch...

I'm not going to apply this as long as it plays experimental games with
shrink_icache() and friends. I haven't seen anybody comment on the
performance on this, and I can well imagine that it would potentially
shrink the dcache too aggressively when there are lots of inactive-dirty
pages around, where page_launder is the right thing to do, and shrinking
icache/dcache might not be.

I'd really like to avoid having the MM stuff fluctuate too much. Have
people tested this under different loads etc?

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Rik van Riel

On Fri, 25 May 2001, Linus Torvalds wrote:
 On Fri, 25 May 2001, Rik van Riel wrote:
 
  OK, shoot me.  Here it is again, this time _with_ patch...

 I'm not going to apply this as long as it plays experimental games with
 shrink_icache() and friends. I haven't seen anybody comment on the
 performance on this,

Yeah, I guess the way Linux 2.2 balances things is way too
experimental ;)


Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Rik van Riel

On Fri, 25 May 2001, Linus Torvalds wrote:
 On Fri, 25 May 2001, Rik van Riel wrote:
 
  OK, shoot me.  Here it is again, this time _with_ patch...

 I'm not going to apply this as long as it plays experimental
 games with shrink_icache() and friends. I haven't seen anybody
 comment on the performance on this, and I can well imagine that
 it would potentially shrink the dcache too aggressively when
 there are lots of inactive-dirty pages around, where
 page_launder is the right thing to do, and shrinking
 icache/dcache might not be.

Your analysis exactly describes what happens in your current
code, though I have to admit that my patch doesn't change it.

Without the patch my workstation (with ~180MB RAM) usually has
between 50 and 80MB of inode/dentry cache and real usable stuff
gets  swapped out.

Either you can go make Linux 2.4 usable again for normal people,
or you can go buy us all a gig of ram.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:

 Yeah, I guess the way Linux 2.2 balances things is way too
 experimental ;)

Ehh.. Take a look at the other differences between the VM's. Which may
make a 2.2.x approach completely bogus.

And take a look at how long the 2.2.x VM took to stabilize, and how
INCREDIBLY BAD some of those kernels were.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:

 Without the patch my workstation (with ~180MB RAM) usually has
 between 50 and 80MB of inode/dentry cache and real usable stuff
 gets  swapped out.

All I want is more people giving feedback.

It's clear that neither my nor your machine is a good thing to base things
on.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Andrea Arcangeli

On Fri, May 25, 2001 at 10:01:37PM -0400, Ben LaHaise wrote:
 On Sat, 26 May 2001, Andrea Arcangeli wrote:
 
  On Fri, May 25, 2001 at 09:38:36PM -0400, Ben LaHaise wrote:
   You're missing a few subtle points:
  
 1. reservations are against a specific zone
 
  A single zone is not used only for one thing, period. In my previous
  email I enlighted the only conditions under which a reserved pool can
  avoid a deadlock.
 
 Well, until we come up with a better design for a zone allocator that
 doesn't involve walking lists and polluting the cache all over the place,
 it'll be against a single zone.

I meant each zone is used by more than one user, that definitely won't
change.

 2. try_to_free_pages uses the swap reservation
 
  try_to_free_pages has an huge stacking under it, bounce
  bufferes/loop/nbd/whatever being just some of them.
 
 Fine, then add one to the bounce buffer allocation code, it's all of about
 3 lines added.

Yes, you should add it to the bounce buffers to the loop to nbd to
whatever and then remove it from all other places that you put into it
right now. That's why I'm saying your patch won't fix anything (but
hide) as it was in its first revision.

 I never said you didn't.  But Ingo's patch DOES NOT PROTECT AGAINST
 DEADLOCKS CAUSED BY INTERRUPT ALLOCATIONS.  Heck, it doesn't even fix the

It does, but only for the create_bounces. As said if you want to fix,
not to hide you need to address every single case, a generic reserved
pool is just useless. Now try to get a dealdock in 2.4.5 with tasks
locked up in create_bounces() if you say it does not protect against
irqs. see?

 That said, the reservation concept is generic code, which the bounce
 buffer patch most certainly isn't.  It can even be improved to overlap

What I said is that instead of handling the pool by hand in every single
place one could write an API to generalize it, but very often a simple
API hooked into the page allocator may not be flexible enough to
guarantee the kind of allocations we need, highmem is just one example
where we need more flexibility not just for the pages but also for the
bh (but ok that's mostly an implementation issue, if you do the math
right, it's harder but you can still use a generic API).

 with the page cache pages in the zone, so it isn't even really free ram
 as currently implemented.

yes that would be a very nice property, again I'm not against a generic
interface for creating reserved pool of memory (I mentioned that
possibilty before reading your patch after all). It's just the
implemetation (mainly the per-task hook overwritten) that didn't
convinced me and the usage that was at least apparently obviously wrong
to my eyes.

 Re-read the above and reconsider.  The reservation doesn't need to be
 permitted until after page_alloc has blocked.  Heck, do a schedule before

I don't see what you mean here, could you elaborate?

 Atomicity isn't what I care about.  It's about being able to keep memory
 around so that certain allocations can proceed, and those pools cannot be
 eaten into by other tasks.  [..]

Those pools needs to be gloabl unless
you want to allocate them at fork() for every single task like you did
for some the kernel threads, and if you make them global per-zone or
per-whatever not every single case it will deadlock.  Or better it will
works by luck, it proceeds until you don't have a case where you didn't
only needed 32 pages reserved, but where you needed 33 pages reserved to
avoid the deadlock, it's on the same lines of the pci_map_* brokeness in
some sense.

Allocating those pools per-task is just a total waste, those are
security pools, in the 99% of cases you won't need them and you will
allocate the memory dynamically, they just needs to be there for
correctness and to avoid the dealdock very seldom.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Rik van Riel

On Sat, 26 May 2001, Andrea Arcangeli wrote:

 Please merge this one in 2.4 for now (originally from Ingo, I only
 improved it), this is a real definitive fix

With the only minor detail being that it DOESN'T WORK.

You're not solving the problems of GFP_BUFFER allocators
looping forever in __alloc_pages(), the deadlock can still
happen.

You've only solved the 1 specific case of highmem.c getting
a page for bounce buffers, but you'll happily let the thing
deadlock while trying to get buffer heads for a normal low
memory page!

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Andrea Arcangeli

On Fri, May 25, 2001 at 09:38:36PM -0400, Ben LaHaise wrote:
 You're missing a few subtle points:
 
   1. reservations are against a specific zone

A single zone is not used only for one thing, period. In my previous
email I enlighted the only conditions under which a reserved pool can
avoid a deadlock.

   2. try_to_free_pages uses the swap reservation

try_to_free_pages has an huge stacking under it, bounce
bufferes/loop/nbd/whatever being just some of them.

   3. irqs can no longer eat memory allocations that are needed for
  swap

you don't even need irq to still deadlock.

 Note that with this patch the current garbage in the zone structure with
 pages_min (which doesn't work reliably) becomes obsolete.

The garbage is just an heuristic to allow atomic allocation to work in
the common case dynamically. Anything deadlock related cannot rely on
pages_min.

I am talking about fixing the thing, of course I perfectly know you can
hide it pretty well, but I definitely hate those kind of hiding patches.

  The only case where a reserved pool make sense is when you know that
  waiting (i.e. running a task queue, scheduling and trying again to
  allocate later) you will succeed the allocation for sure eventually
  (possibly with a FIFO policy to make also starvation impossible, not
  only deadlocks). If you don't have that guarantee those pools
  atuomatically become only a sourcecode and runtime waste, possibly they
  could hide core bugs in the allocator or stuff like that.
 
 You're completely wrong here.

I don't think so.

Andrea
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Rik van Riel

On Fri, 25 May 2001, Linus Torvalds wrote:

 Oh, also: the logic behind the change of the kmem_cache_reap() - instead
 of making it conditional on the _reverse_ test of what it has historically
 been, why isn't it just completely unconditional? You've basically
 dismissed the only valid reason for it to have been (illogically)
 conditional, so I'd have expected that just _removing_ the test is better
 than reversing it like your patch does..

 No?

The function do_try_to_free_pages() also gets called when we're
only short on inactive pages, but we still have TONS of free
memory. In that case, I don't think we'd actually want to steal
free memory from anyone.

Moving it into the same if() conditional the other memory
freeers are in would make sense, though ...

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Andrea Arcangeli

On Fri, May 25, 2001 at 08:29:38PM -0400, Ben LaHaise wrote:
 amount of bounce buffers to guarentee progress while submitting io.  The
 -ac kernels have a patch from Ingo that provides private pools for bounce
 buffers and buffer_heads.  I went a step further and have a memory
 reservation patch that provides for memory pools being reserved against a
 particular zone.  This is needed to prevent the starvation that irq
 allocations can cause.
 
 Some of these cleanups are 2.5 fodder, but we really need something in 2.4
 right now, so...

Please merge this one in 2.4 for now (originally from Ingo, I only
improved it), this is a real definitive fix and there's no nicer way to
handle that unless you want to generalize an API for people to generate
private anti-deadlock (make sure to always make a progress) memory
pools:

diff -urN 2.4.4/mm/highmem.c highmem-deadlock/mm/highmem.c
--- 2.4.4/mm/highmem.c  Sat Apr 28 05:24:48 2001
+++ highmem-deadlock/mm/highmem.c   Sat Apr 28 18:21:24 2001
@@ -159,6 +159,19 @@
spin_unlock(kmap_lock);
 }
 
+#define POOL_SIZE 32
+
+/*
+ * This lock gets no contention at all, normally.
+ */
+static spinlock_t emergency_lock = SPIN_LOCK_UNLOCKED;
+
+int nr_emergency_pages;
+static LIST_HEAD(emergency_pages);
+
+int nr_emergency_bhs;
+static LIST_HEAD(emergency_bhs);
+
 /*
  * Simple bounce buffer support for highmem pages.
  * This will be moved to the block layer in 2.5.
@@ -203,17 +216,72 @@
 
 static inline void bounce_end_io (struct buffer_head *bh, int uptodate)
 {
+   struct page *page;
struct buffer_head *bh_orig = (struct buffer_head *)(bh-b_private);
+   unsigned long flags;
 
bh_orig-b_end_io(bh_orig, uptodate);
-   __free_page(bh-b_page);
+
+   page = bh-b_page;
+
+   spin_lock_irqsave(emergency_lock, flags);
+   if (nr_emergency_pages = POOL_SIZE)
+   __free_page(page);
+   else {
+   /*
+* We are abusing page-list to manage
+* the highmem emergency pool:
+*/
+   list_add(page-list, emergency_pages);
+   nr_emergency_pages++;
+   }
+   
+   if (nr_emergency_bhs = POOL_SIZE) {
 #ifdef HIGHMEM_DEBUG
-   /* Don't clobber the constructed slab cache */
-   init_waitqueue_head(bh-b_wait);
+   /* Don't clobber the constructed slab cache */
+   init_waitqueue_head(bh-b_wait);
 #endif
-   kmem_cache_free(bh_cachep, bh);
+   kmem_cache_free(bh_cachep, bh);
+   } else {
+   /*
+* Ditto in the bh case, here we abuse b_inode_buffers:
+*/
+   list_add(bh-b_inode_buffers, emergency_bhs);
+   nr_emergency_bhs++;
+   }
+   spin_unlock_irqrestore(emergency_lock, flags);
 }
 
+static __init int init_emergency_pool(void)
+{
+   spin_lock_irq(emergency_lock);
+   while (nr_emergency_pages  POOL_SIZE) {
+   struct page * page = alloc_page(GFP_ATOMIC);
+   if (!page) {
+   printk(couldn't refill highmem emergency pages);
+   break;
+   }
+   list_add(page-list, emergency_pages);
+   nr_emergency_pages++;
+   }
+   while (nr_emergency_bhs  POOL_SIZE) {
+   struct buffer_head * bh = kmem_cache_alloc(bh_cachep, SLAB_ATOMIC);
+   if (!bh) {
+   printk(couldn't refill highmem emergency bhs);
+   break;
+   }
+   list_add(bh-b_inode_buffers, emergency_bhs);
+   nr_emergency_bhs++;
+   }
+   spin_unlock_irq(emergency_lock);
+   printk(allocated %d pages and %d bhs reserved for the highmem bounces\n,
+  nr_emergency_pages, nr_emergency_bhs);
+
+   return 0;
+}
+
+__initcall(init_emergency_pool);
+
 static void bounce_end_io_write (struct buffer_head *bh, int uptodate)
 {
bounce_end_io(bh, uptodate);
@@ -228,6 +296,82 @@
bounce_end_io(bh, uptodate);
 }
 
+struct page *alloc_bounce_page (void)
+{
+   struct list_head *tmp;
+   struct page *page;
+
+repeat_alloc:
+   page = alloc_page(GFP_BUFFER);
+   if (page)
+   return page;
+   /*
+* No luck. First, kick the VM so it doesnt idle around while
+* we are using up our emergency rations.
+*/
+   wakeup_bdflush(0);
+
+   /*
+* Try to allocate from the emergency pool.
+*/
+   tmp = emergency_pages;
+   spin_lock_irq(emergency_lock);
+   if (!list_empty(tmp)) {
+   page = list_entry(tmp-next, struct page, list);
+   list_del(tmp-next);
+   nr_emergency_pages--;
+   }
+   spin_unlock_irq(emergency_lock);
+   if (page)
+   return page;
+
+   /* we need to wait I/O completion */
+   run_task_queue(tq_disk);
+
+   current-policy |= SCHED_YIELD;
+   

Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Rik van Riel wrote:

 The function do_try_to_free_pages() also gets called when we're
 only short on inactive pages, but we still have TONS of free
 memory. In that case, I don't think we'd actually want to steal
 free memory from anyone.

Well, kmem_cache_reap() doesn't even steal memory from anybody: it just
makes this tagged for xxx memory be available to non-xxx users too.

And the fact that we're calling do_try_to_free_pages() at all obviously
implies that even if we have memory free, it isn't in the right hands..

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Rik van Riel

On Sat, 26 May 2001, Alan Cox wrote:

 But Linus is right I think - VM changes often prove
 'interesting'. Test it in -ac , gets some figures for real world
 use then plan further

Oh well. As long as he takes the patch to page_alloc.c, otherwise
everybody _will_ have to experiment with the -ac kernels just
to have a system with highmem which doesn't deadlock ;)

cheers,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Linus Torvalds



On Fri, 25 May 2001, Ben LaHaise wrote:

 Highmem systems currently manage to hang themselves quite completely upon
 running out of memory in the normal zone.  One of the failure modes is
 looping in __alloc_pages from get_unused_buffer_head to map a dirty page.
 Another results in looping on allocation of a bounce page for writing a
 dirty highmem page back to disk.

That's not the part of the patch I object to - fixing that is fine.

What I object to it that it special-cases the zone names, even though that
doesn't necessarily make any sense at all.

What about architectures that have other zones? THAT is the kind of
fundmanental design mistake that special-casing DMA and NORMAL is horrible
for.

alloc_pages() doesn't have that kind of problem. To alloc_pages(),
GFP_BUFFER is not oh, DMA or NORMAL. There, it is simply oh, use the
zonelist pointed to by GFP_BUFFER. No special casing, no stupid #ifdef
CONFIG_HIGHMEM.

THAT is what I object to.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [with-PATCH-really] highmem deadlock removal, balancing cleanup

2001-05-25 Thread Linus Torvalds



On Sat, 26 May 2001, Alan Cox wrote:

 But Linus is right I think - VM changes often prove 'interesting'. Test it in
 -ac , gets some figures for real world use then plan further

.. on the other hand, thinking more about this, I'd rather be called
stupid than stodgy.

So I think I'll buy some experimentation. That HIGHMEM change is too ugly
to live, though, I'd really like to hear more about why something that
ugly is necessary.

Linus

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/