RE: How to fix malloc.

2002-02-25 Thread John Baldwin


On 23-Feb-02 Alfred Perlstein wrote:
 * Matthew Dillon [EMAIL PROTECTED] [020223 12:51] wrote:
 
 :Here is the most up-to-date version of pgrp/session lock (at Change 6700):
 :
 :http://people.FreeBSD.org/~tanimura/patches/pgrp10.diff.gz
 :
 :I would like to commit this on the next Sunday. Otherwise, my patch
 :would conflict with other patches, especially tty.
 :
 :-- 
 :Seigo Tanimura [EMAIL PROTECTED] [EMAIL PROTECTED]
 
 Do you have any plans to get pgdelete() out from under Giant?  That
 would allow leavepgrp(), doenterpgrp(), enterpgrp(), enterthispgrp(),
 setsid() (mostly) to be taken out from under Giant, and perhaps a few
 others.
 
 I was thinking of simply having a free list of sessions and process
 groups, locked by PGRPSESS_XLOCK().  pgdelete() would then not have
 to call FREE() and setsid() would almost always be able to pull a new
 structure of the appropriate free list and thus not have to obtain Giant
 for the MALLOC.
 
 All these per-subsystem free-lists are making me nervous in both
 complexity and wasted code...
 
 Ok, instead of keeping all these per-subsystem free-lists here's what
 we do:
 
 In kern_malloc:free() right at the point of
   if (size  MAXALLOCSAVE) we check if we have Giant or not.
 if we do not then we simply queue the memory
 however, if we do then we call into kmem_free with all the queued memory.
 
 This ought to solve the issue without making us keep all these
 per-cpu caches.
 
 By the way, since MAXALLOCSAVE is some multiple of PAGE_SIZE, we
 really don't have to worry about it when freeing small structures
 although that puts evilness into malloc(9) consumers.
 
 Can you please consider that instead of continueing down this path
 of per-subsystem caches?

Just as a general heads up, the slab memory allocator Jeff Roberson is working
on will not need Giant for the free path, so can we just live with Giant
locking around free() for now rather than adding in a bunch of temporary hacks
to reinvent the wheel that we have to go back and remove later?

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-24 Thread Terry Lambert

Jake Burkholder wrote:
 Jeff Roberson (jeff@) has been working on a slab allocator that goes
 a long way to making malloc(), free() and the zone allocator not require
 giant.  I've reviewed what he's got so far and it looks pretty damn good
 to me, I'll see about getting him to post it.  He's working on adding the
 per-cpu queues now.

A design like this resolves my objection to the pure SLAB
allocator; Vahalia suggests this as a potential enhancment
in his book, and the authors of the SLAB allocator mention
it in their second paper (~1996/1997).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-24 Thread Terry Lambert

Alfred Perlstein wrote:
 * Matthew Dillon [EMAIL PROTECTED] [020223 14:43] wrote:
  This is approximately what I am thinking.  Note that this gives us the
  flexibility to create a larger infrastructure around the bucket cache,
  such as implement per-cpu caches and so on and so forth.  What I have
  here is the minimal implementation.
 
 I strongly object to this implementation right now, please do not
 commit it.  I already explained to you how to make the problem
 go away but instead you insist on some complex api that pins
 memory down like the damn zone allocator.  It's not needed, so
 please don't do it.

Actually, the zone allocator is not far off, I think.

Imagine if the entire possible KVA space (real RAM + swap) was
preallocated PTEs.  Allocations could treat it as anonymous
memory, for which a mapping process was not required, and
all allocations would be interrupt safe by default, without
having to special case the code one way or the other.

This seems, to me, to be the right idea.

The only issue left is that the maps take real memory that is
wired down.  This raises the possibility of adding to the swap
where swap + RAM  KVA  swap + RAM + new_swap = KVA, which
would imply mappings bneing required on the adding of swap (via
swapon).

Not that painful, but it does imply a 1:1000 limit ratio on
real vs. virtual RAM to get to the page mapping overhead.  4M
pages would cover some of that problem... but making allocations
swappable is often desirable, even in the kernel, so you would
need to special case those mappings... and 4M and 4K pages play
badly together, unless you know what you are doing, and you know
the undocumented bugs (c.v. the recent AMD AGP thing).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Success! critical_enter()/critical_exit() revamp (was Re: malloc_bucket() idea (was Re: How to fix malloc.))

2002-02-24 Thread Matthew Dillon

I'm going to wait until tomorrow before posting it.  I have a bunch of
cleanup to do.

Bruce, your sys.dif was *invaluable*!  It would have taken me a lot 
longer to figure out the interrupt disablement requirement around
the icu_lock, and the statclock and hardclock forwarding issues (which I
got working by the way!).

It turns out that putting the pending masks in the per-cpu area was
the right solution!  It made statclock and hardclock forwarding easy
(I can see why you didn't even try in your patch set, doing it with
a global mask would be nearly impossible).  In fact, it made everything
unbelievably easy.

Apart from all the assembly coding, there were two serious issues.
fork_exit() assumes that interrupts are hard-disabled on entry.  I
readjusted the code such that the trampoline assembly initialized
the td_critnest count so it could STI prior to calling fork_exit().

The second issue is that cpu_switch() does not save/restore the 
PSL_I (interrupt disablement mask).  I added a PSL word to the PCB
structure to take care of the problem.  Without this if you have
a thread switch away while holding interrupts hard-disabled, the
next thread will inherit the hard disablement.  I saw the sti's
you added in various places but I figured the best solution was
to simply save/restore the state.  The original code didn't have
this problem because interrupts were always hard-disabled while
holding the sched_lock and, of course, cpu_switch() is always
called while holding sched_lock.  (Similarly, icu_lock needed 
hard-disablements wrapped around it for the same reason, because
a level interrupt still needs to be able to get icu_lock in order to
defer itself).

In anycase, I have successfully built the world in a -current 
SMP + apic_vector system.  Tomorrow I will cleanup on the UP icu_vector
code to match the apic_vector code and post the results.  I also
have a sysctl that allows developers to toggle the mode for testing
purposes (old way or new way).

Finally, I took your suggestion in regards to not trying to combine
the masks together.  I have a 'some sort of interrupt is pending'
variable then I have fpending (for fast interrupts), ipending (for
normal interrupt threads), and spending (which I use for the stat and
hardclock forwarding).  They're all per-cpu entities, of course.
unpend() prioritizes them.

In anycase, I'll post it all tomorrow after I've got icu_vector cleaned
up.  One of the best things about this patch set is that it is really
flexible.  We should be able to really make interrupts fly.  In fact,
it should even be possible for one cpu to steal another cpu's pending
interrupt(s) fairly trivially, without having to resort to IPIs.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-24 Thread Robert Watson


On Sat, 23 Feb 2002, Alfred Perlstein wrote:

 Usually when I see diff(1) output from you I usually expect a commit
 within the next half hour or so, I just wanted to make myself clear on
 the issue.  No worries. :) 
 
 Yes, and hopefully JeffR's allocator will fix our problems, that is if
 it ever makes it out of p4. :)

That assumes that it was actually in P4 in the first place.  Jeffr's
commit bit was approved in the last two days, and he's begun posting his
patches to -arch.  I hope to see his memory allocator patch there soon
:-).

Robert N M Watson FreeBSD Core Team, TrustedBSD Project
[EMAIL PROTECTED]  NAI Labs, Safeport Network Services



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! critical_enter()/critical_exit() revamp (was Re:malloc_bucket() idea (was Re: How to fix malloc.))

2002-02-24 Thread Bruce Evans

On Sun, 24 Feb 2002, Matthew Dillon wrote:

 Bruce, your sys.dif was *invaluable*!  It would have taken me a lot
 longer to figure out the interrupt disablement requirement around
 the icu_lock, and the statclock and hardclock forwarding issues (which I
 got working by the way!).

Thanks.

 It turns out that putting the pending masks in the per-cpu area was
 the right solution!  It made statclock and hardclock forwarding easy
 (I can see why you didn't even try in your patch set, doing it with
 a global mask would be nearly impossible).  In fact, it made everything
 unbelievably easy.

 ...

  [This paragraph reordered]
 up.  One of the best things about this patch set is that it is really
 flexible.  We should be able to really make interrupts fly.  In fact,
 it should even be possible for one cpu to steal another cpu's pending
 interrupt(s) fairly trivially, without having to resort to IPIs.

Good idea.  Stealing would be even easier if the mask were global :-).

 The second issue is that cpu_switch() does not save/restore the
 PSL_I (interrupt disablement mask).  I added a PSL word to the PCB
 structure to take care of the problem.  Without this if you have
 a thread switch away while holding interrupts hard-disabled, the
 next thread will inherit the hard disablement.  I saw the sti's
 you added in various places but I figured the best solution was
 to simply save/restore the state.  The original code didn't have

cpu_switch() certainly needs to do this if it can be called with the
interrupt enable flag[s] in different states.  I need the sti's (actually
enable_intr()'s because I don't want fast interrupts to be disabled
during context switches.  This works because enabling interrupts is sure
to be safe, since we might be switching to a thread that will enable
them.  Some sort of lock is needed to prevent interrupts interfering
with the switch.  I think soft-masking them in critical_enter() is
sufficient in your version too.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! critical_enter()/critical_exit() revamp (was Re:malloc_bucket() idea (was Re: How to fix malloc.))

2002-02-24 Thread Matthew Dillon

: up.  One of the best things about this patch set is that it is really
: flexible.  We should be able to really make interrupts fly.  In fact,
: it should even be possible for one cpu to steal another cpu's pending
: interrupt(s) fairly trivially, without having to resort to IPIs.
:
:Good idea.  Stealing would be even easier if the mask were global :-).
 
Well, for the moment I am attempting to avoid having to use lock;'d
bus cycles to keep things efficient.  When we get to the stealing part
we might wind up using lock;, but I'll deal with that when the time
comes.

: The second issue is that cpu_switch() does not save/restore the
: PSL_I (interrupt disablement mask).  I added a PSL word to the PCB
: structure to take care of the problem.  Without this if you have
: a thread switch away while holding interrupts hard-disabled, the
: next thread will inherit the hard disablement.  I saw the sti's
: you added in various places but I figured the best solution was
: to simply save/restore the state.  The original code didn't have
:
:cpu_switch() certainly needs to do this if it can be called with the
:interrupt enable flag[s] in different states.  I need the sti's (actually
:enable_intr()'s because I don't want fast interrupts to be disabled
:during context switches.  This works because enabling interrupts is sure
:to be safe, since we might be switching to a thread that will enable
:them.  Some sort of lock is needed to prevent interrupts interfering
:with the switch.  I think soft-masking them in critical_enter() is
:sufficient in your version too.
:
:Bruce

I don't think we want to make sched_lock any more complex then it
already is, so at least for the foreseeable future we are not
going to be able to actually execute an interrupt handler until
the sched_lock is released in (typically) msleep().  I am rather
annoyed that two levels of procedure have to be called with the
sched_lock held (mi_switch() and cpu_switch()), leaving interrupts
disabled for a fairly long period of time, but I don't see any way
around it right now.

Eventually (presumably) we will have per-cpu run queues.  That combined
with interrupt stealing may resolve the problem for us.   I am still
not convinced that making the various *pending* flags globals will
be more efficient, because it introduces significant cache mastership
issues.  It might be easier to do this:

interrupt_vector 
{
if (td-td_critnest) {
... try to forward the interrupt to another cpu that is not
in a critical section ...
... else set bit in local ipending mask.
} else {
... take or schedule interrupt ...
}
}

Or possibly:

interrupt_vector
{
if (td-td_critnest) {
if (!mtx_owned(sched_lock)  mtx_trylock(sched_lock)) {
... we can still schedule the interrupt even
though we are in a critical section, we
just can't switch to it yet ...
} else {
... try to forward the interrupt to a cpu that is not
in a critical section ...
... else set bit in local ipending mask.
}
} else {
... take or schedule interrupt ...
}
}

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Success! critical_enter()/critical_exit() revamp (was Re:malloc_bucket() idea (was Re: How to fix malloc.))

2002-02-24 Thread Bruce Evans

On Sun, 24 Feb 2002, Matthew Dillon wrote:

 :cpu_switch() certainly needs to do this if it can be called with the
 :interrupt enable flag[s] in different states.  I need the sti's (actually
 :enable_intr()'s because I don't want fast interrupts to be disabled
 :during context switches.  This works because enabling interrupts is sure
 :to be safe, since we might be switching to a thread that will enable
 :them.  Some sort of lock is needed to prevent interrupts interfering
 :with the switch.  I think soft-masking them in critical_enter() is
 :sufficient in your version too.

 I don't think we want to make sched_lock any more complex then it
 already is, so at least for the foreseeable future we are not
 going to be able to actually execute an interrupt handler until
 the sched_lock is released in (typically) msleep().  I am rather

Well, my kernel has been executing fast interrupt handlers while sched_lock
is held for almost a year.  It's actually less complicated with respect to
sched_lock but more complicated with respect to fast interrupt handlers.

 annoyed that two levels of procedure have to be called with the
 sched_lock held (mi_switch() and cpu_switch()), leaving interrupts
 disabled for a fairly long period of time, but I don't see any way
 around it right now.

The worst offenders for interrupt latency seemed to be calcru() and/or
the sched_locking related to fork and/or exit.  Latency was many thousand
instructions (reasonable only on 100+ MIPS machines).  sched_locking for
calcru() is moostly bogus and should be easy to avoid, but not so for
context switching.

 Eventually (presumably) we will have per-cpu run queues.  That combined
 with interrupt stealing may resolve the problem for us.   I am still
 not convinced that making the various *pending* flags globals will
 be more efficient, because it introduces significant cache mastership
 issues.  It might be easier to do this:

OK.  I don't care about this yet.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: How to fix malloc.

2002-02-23 Thread Terry Lambert

Alfred Perlstein wrote:
 All these per-subsystem free-lists are making me nervous in both
 complexity and wasted code...

Me too.

 Ok, instead of keeping all these per-subsystem free-lists here's what
 we do:
 
 In kern_malloc:free() right at the point of
   if (size  MAXALLOCSAVE) we check if we have Giant or not.
 if we do not then we simply queue the memory
 however, if we do then we call into kmem_free with all the queued memory.
 
 This ought to solve the issue without making us keep all these
 per-cpu caches.

One modification: limit the number that are freed per invocation
to some number small enough that there won't be a big latency.

Once everything gets to this point, though, there will be nothing
to trigger a free with giant held, and you'll just queue things
up forever.

Really, we need counted queues -- queues that know the number of
elemenets on them.  This is a requirement for RED-Queueing, and
it will let us know when the queue gets deep... then you can grab
giant, and flush down the queue if it hits a high watermark.

Obviously, the correct way to handle this is per CPU memory pools
that don't have any need for lock contention at all on the real
frees.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: How to fix malloc.

2002-02-23 Thread Matthew Dillon

:All these per-subsystem free-lists are making me nervous in both
:complexity and wasted code...
:
:Ok, instead of keeping all these per-subsystem free-lists here's what
:we do:
:
:In kern_malloc:free() right at the point of
:  if (size  MAXALLOCSAVE) we check if we have Giant or not.
:if we do not then we simply queue the memory
:however, if we do then we call into kmem_free with all the queued memory.
:
:This ought to solve the issue without making us keep all these
:per-cpu caches.

Hmm.  I don't like generalizing it to that extent.  What if we simply
assert that Giant is held if size  MAXALLOCSAVE in free()?  i.e. for
the moment we require that any 'large allocations' use Giant.  Then 
we can freely use free() to free smaller allocations without Giant.

There are three other problems: (1) malloc() itself will call kmem_malloc()
even if M_NOWAIT is passed, so we can't optimize malloc()-without-giant
unless we add a new flag to deal with this situation.  

And, (2) we have a single mutex, malloc_mtx.  This is Bosko's code so
I am adding him to the To: list.  Bosko, it looks like with a simple
cleanup to the msleep() call we can use a pool lock instead of malloc_mtx,
which would give us the ability to lock malloc() on a bucket-by-bucket
basis.  The addition of a another malloc flag will allow us to tell
malloc() to return NULL if it can't do the operation with its own mutex.

And, (3) we wind up getting and releasing two mutexes instead of one
in the code I was originally refering to, because setsid() already gets
an SX lock and I was just going to fold the free list into that.  Still,
I don't mind doing it via malloc().

What do people think?

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

:By the way, since MAXALLOCSAVE is some multiple of PAGE_SIZE, we
:really don't have to worry about it when freeing small structures
:although that puts evilness into malloc(9) consumers.
:
:Can you please consider that instead of continueing down this path
:of per-subsystem caches?
:
:thanks,
:-- 
:-Alfred Perlstein [[EMAIL PROTECTED]]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon

This is approximately what I am thinking.  Note that this gives us the
flexibility to create a larger infrastructure around the bucket cache,
such as implement per-cpu caches and so on and so forth.  What I have
here is the minimal implementation.

-Matt

Index: kern/kern_malloc.c
===
RCS file: /home/ncvs/src/sys/kern/kern_malloc.c,v
retrieving revision 1.93
diff -u -r1.93 kern_malloc.c
--- kern/kern_malloc.c  12 Sep 2001 08:37:44 -  1.93
+++ kern/kern_malloc.c  23 Feb 2002 22:41:56 -
@@ -116,6 +116,63 @@
 #endif /* INVARIANTS */
 
 /*
+ * init_malloc_bucket:
+ *
+ * Initialize a malloc bucket
+ */
+void
+init_malloc_bucket(struct malloc_bucket *bucket, struct malloc_type *type, int size)
+{
+bzero(bucket, sizeof(struct malloc_bucket));
+bucket-b_size = size;
+bucket-b_type = type;
+bucket-b_mtx = mtx_pool_find(bucket);
+}
+
+/*
+ * malloc_bucket:
+ *
+ * Allocate a structure from the supplied low-latency cache.  NULL is
+ * returned if the cache is empty.
+ */
+void *
+malloc_bucket(struct malloc_bucket *bucket)
+{
+   void *ptr = NULL;
+
+   if (bucket-b_next) {
+   mtx_lock(bucket-b_mtx);
+   if ((ptr = bucket-b_next) != NULL) {
+   bucket-b_next = *(caddr_t *)ptr;
+   KASSERT(bucket-b_count  0, (bucket count mismatch));
+   --bucket-b_count;
+   *(caddr_t *)ptr = WEIRD_ADDR;
+   } else {
+   KASSERT(bucket-b_count == 0, (bucket count mismatch));
+   }
+   mtx_unlock(bucket-b_mtx);
+   }
+   return(ptr);
+}
+
+/*
+ * free_bucket:
+ *
+ * Free a structure to the low latency cache.
+ *
+ */
+void
+free_bucket(struct malloc_bucket *bucket, void *ptr)
+{
+   mtx_lock(bucket-b_mtx);
+   *(caddr_t *)ptr = bucket-b_next;
+   bucket-b_next = (caddr_t)ptr;
+   ++bucket-b_count;
+   mtx_unlock(bucket-b_mtx);
+   /* XXX if b_count  something, wakeup our cleaner? */
+}
+
+/*
  * malloc:
  *
  * Allocate a block of memory.
Index: sys/malloc.h
===
RCS file: /home/ncvs/src/sys/sys/malloc.h,v
retrieving revision 1.54
diff -u -r1.54 malloc.h
--- sys/malloc.h10 Aug 2001 06:37:04 -  1.54
+++ sys/malloc.h23 Feb 2002 22:36:09 -
@@ -109,6 +109,18 @@
longkb_couldfree;   /* over high water mark and could free */
 };
 
+/*
+ * malloc_bucket holder for fast low-latency malloc_bucket() and free_bucket()
+ * calls.
+ */
+struct malloc_bucket {
+   caddr_t b_next;
+   int b_count;
+   int b_size;
+   struct mtx *b_mtx;
+   struct malloc_type *b_type;
+};
+
 #ifdef _KERNEL
 #defineMINALLOCSIZE(1  MINBUCKET)
 #define BUCKETINDX(size) \

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Terry Lambert

Matthew Dillon wrote:
 This is approximately what I am thinking.  Note that this gives us the
 flexibility to create a larger infrastructure around the bucket cache,
 such as implement per-cpu caches and so on and so forth.  What I have
 here is the minimal implementation.

Uh, probably freeing the mutex of the new bucket after holding
the one on the next list is probably not what you wanted to
happen.  8-).  I think you would nead a seperate head, and
then use the head containing structure mutex, instead.

I understand what you were trying to do, though...

I guess you are relying on the use of different buckets to spread
the pain?  Right now the pain is centered arouns one struct type
anyway (the cred one that started this dicussion), so you are still
going to contend the same resource for each (just bucket hash mtx
instead of giant, for the bucket type).  Also, you probably want
it to be a STAILQ.

I'm not sure that this is actually a win?!?  I guess if there are
a bunch of these buckets, you would end up with different hash
values for the mutexes... though relying on a hash seems wrong,
since you can't really control collision domains that way; eventally,
you will get people with widely different perofromance based on
where the linker linked their bucket list head to, and that's
probably not good.

OTOH, A per CPU bucket list means no bucket mtx would be necessary;
without that, it's just replacing one type of contention for another,
I think, until you start making mutex selection indeterminate.  8^(.

Really, this delayed freeing idea is starting to get uglier than
just going to per CPU pools...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon


:OTOH, A per CPU bucket list means no bucket mtx would be necessary;
:without that, it's just replacing one type of contention for another,
:I think, until you start making mutex selection indeterminate.  8^(.
:
:Really, this delayed freeing idea is starting to get uglier than
:just going to per CPU pools...
:
:-- Terry

Well, if we want to rewrite malloc() a per-cpu pool inside a critical
section would not be that difficult.  The 'bucket' array would
simply be placed in the per-cpu area.  However, with malloc() we still
have a serious problem with the malloc_type structure statistics.  There
would have to be per-cpu information for those as well.

critical_enter() isn't much better then a mutex.  It can
be optimized to get rid of the inline cli  sti however.  Actually, it
can be optimized trivially for i386, all we have to do is check the
critical nesting count from the interrupt code and set the 
interrupts-disabled bit in the returned (supervisor mode) frame.

critical_enter() and critical_exit() would then be nearly perfect.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Jake Burkholder

Apparently, On Sat, Feb 23, 2002 at 02:43:35PM -0800,
Matthew Dillon said words to the effect of;

 This is approximately what I am thinking.  Note that this gives us the
 flexibility to create a larger infrastructure around the bucket cache,
 such as implement per-cpu caches and so on and so forth.  What I have
 here is the minimal implementation.
 
   -Matt

Jeff Roberson (jeff@) has been working on a slab allocator that goes
a long way to making malloc(), free() and the zone allocator not require
giant.  I've reviewed what he's got so far and it looks pretty damn good
to me, I'll see about getting him to post it.  He's working on adding the
per-cpu queues now.

Jake

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Bruce Evans

On Sat, 23 Feb 2002, Matthew Dillon wrote:

 critical_enter() isn't much better then a mutex.  It can
 be optimized to get rid of the inline cli  sti however.  Actually, it
 can be optimized trivially for i386, all we have to do is check the
 critical nesting count from the interrupt code and set the
 interrupts-disabled bit in the returned (supervisor mode) frame.

 critical_enter() and critical_exit() would then be nearly perfect.

My version of it does less than this.  I only use it to help implement
spinlocks.

Index: kern_switch.c
===
RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v
retrieving revision 1.20
diff -u -2 -r1.20 kern_switch.c
--- kern_switch.c   11 Feb 2002 20:37:51 -  1.20
+++ kern_switch.c   13 Feb 2002 05:34:20 -
@@ -70,14 +70,21 @@
 }

-/* Critical sections that prevent preemption. */
+/*-
+ * Critical section handling.
+ * XXX doesn't belong here.
+ *
+ * Entering a critical section only blocks non-fast interrupts.
+ * critical_enter() is similar to splhigh() in a 2-level spl setup under
+ * old versions of FreeBSD.
+ *
+ * Exiting from all critical sections unblocks non-fast interrupts and runs
+ * the handlers of any that were blocked.  critical_exit() is similar to
+ * spl(old_level) in a 2-level spl setup under old versions of FreeBSD.
+ */
 void
 critical_enter(void)
 {
-   struct thread *td;

-   td = curthread;
-   if (td-td_critnest == 0)
-   td-td_savecrit = cpu_critical_enter();
-   td-td_critnest++;
+   curthread-td_critnest++;
 }

@@ -85,12 +92,7 @@
 critical_exit(void)
 {
-   struct thread *td;

-   td = curthread;
-   if (td-td_critnest == 1) {
-   td-td_critnest = 0;
-   cpu_critical_exit(td-td_savecrit);
-   } else
-   td-td_critnest--;
+   if (--curthread-td_critnest == 0  (ipending | spending) != 0)
+   unpend();
 }

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon

:My version of it does less than this.  I only use it to help implement
:spinlocks.

You put together infrastructure to deal with pending pci interrupts?
If so, then why not commit it (or at least commit a version #ifdef'd
for the i386 architecture).

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


:Index: kern_switch.c
:===
:RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v
:retrieving revision 1.20
:diff -u -2 -r1.20 kern_switch.c
:--- kern_switch.c  11 Feb 2002 20:37:51 -  1.20
:+++ kern_switch.c  13 Feb 2002 05:34:20 -
:@@ -70,14 +70,21 @@
: }
:
:-/* Critical sections that prevent preemption. */
:+/*-
:+ * Critical section handling.
:+ * XXX doesn't belong here.
:+ *
:+ * Entering a critical section only blocks non-fast interrupts.
:+ * critical_enter() is similar to splhigh() in a 2-level spl setup under
:+ * old versions of FreeBSD.
:+ *
:+ * Exiting from all critical sections unblocks non-fast interrupts and runs
:+ * the handlers of any that were blocked.  critical_exit() is similar to
:+ * spl(old_level) in a 2-level spl setup under old versions of FreeBSD.
:+ */
: void
: critical_enter(void)
: {
:-  struct thread *td;
:
:-  td = curthread;
:-  if (td-td_critnest == 0)
:-  td-td_savecrit = cpu_critical_enter();
:-  td-td_critnest++;
:+  curthread-td_critnest++;
: }
:
:@@ -85,12 +92,7 @@
: critical_exit(void)
: {
:-  struct thread *td;
:
:-  td = curthread;
:-  if (td-td_critnest == 1) {
:-  td-td_critnest = 0;
:-  cpu_critical_exit(td-td_savecrit);
:-  } else
:-  td-td_critnest--;
:+  if (--curthread-td_critnest == 0  (ipending | spending) != 0)
:+  unpend();
: }
:
:Bruce

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Alfred Perlstein

* Matthew Dillon [EMAIL PROTECTED] [020223 14:43] wrote:
 This is approximately what I am thinking.  Note that this gives us the
 flexibility to create a larger infrastructure around the bucket cache,
 such as implement per-cpu caches and so on and so forth.  What I have
 here is the minimal implementation.

I strongly object to this implementation right now, please do not
commit it.  I already explained to you how to make the problem
go away but instead you insist on some complex api that pins
memory down like the damn zone allocator.  It's not needed, so
please don't do it.

-Alfred

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon


:
:* Matthew Dillon [EMAIL PROTECTED] [020223 14:43] wrote:
: This is approximately what I am thinking.  Note that this gives us the
: flexibility to create a larger infrastructure around the bucket cache,
: such as implement per-cpu caches and so on and so forth.  What I have
: here is the minimal implementation.
:
:I strongly object to this implementation right now, please do not
:commit it.  I already explained to you how to make the problem
:go away but instead you insist on some complex api that pins
:memory down like the damn zone allocator.  It's not needed, so
:please don't do it.
:
:-Alfred

Woa!  Timeout!  I'm not planning on comitting any sort of malloc thingy.
That was a 10 second thought experiment.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Alfred Perlstein

* Matthew Dillon [EMAIL PROTECTED] [020223 16:41] wrote:
 
 :
 :* Matthew Dillon [EMAIL PROTECTED] [020223 14:43] wrote:
 : This is approximately what I am thinking.  Note that this gives us the
 : flexibility to create a larger infrastructure around the bucket cache,
 : such as implement per-cpu caches and so on and so forth.  What I have
 : here is the minimal implementation.
 :
 :I strongly object to this implementation right now, please do not
 :commit it.  I already explained to you how to make the problem
 :go away but instead you insist on some complex api that pins
 :memory down like the damn zone allocator.  It's not needed, so
 :please don't do it.
 :
 :-Alfred
 
 Woa!  Timeout!  I'm not planning on comitting any sort of malloc thingy.
 That was a 10 second thought experiment.

Usually when I see diff(1) output from you I usually expect a commit
within the next half hour or so, I just wanted to make myself clear
on the issue.  No worries. :)

Yes, and hopefully JeffR's allocator will fix our problems, that
is if it ever makes it out of p4. :)

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
'Instead of asking why a piece of software is using 1970s technology,
 start asking why software is ignoring 30 years of accumulated wisdom.'
Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Bruce Evans

On Sat, 23 Feb 2002, Matthew Dillon wrote:

 :My version of it does less than this.  I only use it to help implement
 :spinlocks.

 You put together infrastructure to deal with pending pci interrupts?
 If so, then why not commit it (or at least commit a version #ifdef'd
 for the i386 architecture).

It's too messy and unfinished (doesn't work right for SMP or irqs = 16),
and dificult to untangle from my other patches.  I posted these partial
ones to attempt to inhibit() recomplication of the current critical*
functions in directions that I don't want to go :-).

 :Index: kern_switch.c
 :===
 :RCS file: /home/ncvs/src/sys/kern/kern_switch.c,v
 :retrieving revision 1.20
 :diff -u -2 -r1.20 kern_switch.c
 :--- kern_switch.c11 Feb 2002 20:37:51 -  1.20
 :+++ kern_switch.c13 Feb 2002 05:34:20 -
 :@@ -70,14 +70,21 @@
 : }
 :
 :-/* Critical sections that prevent preemption. */
 :+/*-
 :+ * Critical section handling.
 :+ * XXX doesn't belong here.
 :+ *
 :+ * Entering a critical section only blocks non-fast interrupts.
 :+ * critical_enter() is similar to splhigh() in a 2-level spl setup under
 :+ * old versions of FreeBSD.
 :+ *
 :+ * Exiting from all critical sections unblocks non-fast interrupts and runs
 :+ * the handlers of any that were blocked.  critical_exit() is similar to
 :+ * spl(old_level) in a 2-level spl setup under old versions of FreeBSD.
 :+ */
 : void
 : critical_enter(void)
 : {
 :-struct thread *td;
 :
 :-td = curthread;
 :-if (td-td_critnest == 0)
 :-td-td_savecrit = cpu_critical_enter();
 :-td-td_critnest++;
 :+curthread-td_critnest++;
 : }
 :
 :@@ -85,12 +92,7 @@
 : critical_exit(void)
 : {
 :-struct thread *td;
 :
 :-td = curthread;
 :-if (td-td_critnest == 1) {
 :-td-td_critnest = 0;
 :-cpu_critical_exit(td-td_savecrit);
 :-} else
 :-td-td_critnest--;
 :+if (--curthread-td_critnest == 0  (ipending | spending) != 0)
 :+unpend();
 : }

ipending here works much as in RELENG_4.  It is ORed into by sched_ithd()
if curthread-td_critnest != 0.  Nothing special is needed for pci
(the ICU masks pending interrupts).

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon

Bruce, I've looked at the vector assembly and it looks like cleaning
up critical_enter() and critical_exit() for i386 ought to be simple.

If you have a complete patch set I would like to see it posted for 
review or comitted, or if you don't want to deal with the commit I
would love to have it so I can bang it into shape for commit.

Having a sleek critical_enter()/critical_exit() would double sched_lock's
performance.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon


:
:On Sat, 23 Feb 2002, Matthew Dillon wrote:
:
: :My version of it does less than this.  I only use it to help implement
: :spinlocks.
:
: You put together infrastructure to deal with pending pci interrupts?
: If so, then why not commit it (or at least commit a version #ifdef'd
: for the i386 architecture).
:
:It's too messy and unfinished (doesn't work right for SMP or irqs = 16),
:and dificult to untangle from my other patches.  I posted these partial
:ones to attempt to inhibit() recomplication of the current critical*
:functions in directions that I don't want to go :-).

Oh, ok.  Hmm.  Well, do you mind if I throw together an interrupt-set-cli-
in-return-frame patch set?  I think it would wind up being just as fast.

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon

:It's too messy and unfinished (doesn't work right for SMP or irqs = 16),
:and dificult to untangle from my other patches.  I posted these partial
:ones to attempt to inhibit() recomplication of the current critical*
:functions in directions that I don't want to go :-).
:...
:
:ipending here works much as in RELENG_4.  It is ORed into by sched_ithd()
:if curthread-td_critnest != 0.  Nothing special is needed for pci
:(the ICU masks pending interrupts).
:
:Bruce

Ok, I have most of it coded up.  Could you explain what 'spending'
was for?

I am doing it slightly different then you.  To avoid having to use
locked instructions I have placed ipending in the thread structure:

void
critical_enter(void)
{
struct thread *td = curthread;
++td-td_critnest;
}

void
critical_exit(void)
{
struct thread *td = curthread;
KASSERT(td-td_critnest  0, (bad td_critnest value!));
if (--td-td_critnest == 0) {
if (td-td_ipending)
unpend();
}
}

The apic and icu vector code will do a simple td_critnest test and
OR the irq bit into td-td_ipending (it conveniently already has 
the struct thread in %ebx).  And the vector code will also check and
handle any non-zero td_ipending if critnest is 0, handling the 
1-0 transition/preemption race.

I'll post the patch when it gets a little further along.

How should I deal with fast interrupts?

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Bruce Evans

On Sat, 23 Feb 2002, Matthew Dillon wrote:

 Bruce, I've looked at the vector assembly and it looks like cleaning
 up critical_enter() and critical_exit() for i386 ought to be simple.

 If you have a complete patch set I would like to see it posted for
 review or comitted, or if you don't want to deal with the commit I
 would love to have it so I can bang it into shape for commit.

Patches for most of my sys tree are in sys.tar.gz in my home directory
on freefall.

 Having a sleek critical_enter()/critical_exit() would double sched_lock's
 performance.

I'm not sure about that.  Perhaps I've already optimized sched_lock
enough that I don't notice.  My changes were actually directed at
avoiding sched_lock's interrupt latency and it wasn't until the
critical_enter()/ exit() calls pessimized mtx_{lock,unlock}_spin in
-current that I managed to share enough code (the critnest
increment/decrement) to avoid losing efficiency.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Bruce Evans

On Sat, 23 Feb 2002, Matthew Dillon wrote:

 :It's too messy and unfinished (doesn't work right for SMP or irqs = 16),
 :and dificult to untangle from my other patches.  I posted these partial
 :ones to attempt to inhibit() recomplication of the current critical*
 :functions in directions that I don't want to go :-).

 Oh, ok.  Hmm.  Well, do you mind if I throw together an interrupt-set-cli-
 in-return-frame patch set?  I think it would wind up being just as fast.

I don't see how that would work.  We need to call critical_enter() from
ordinary code, so there is no free frame pop.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Matthew Dillon


:
:On Sat, 23 Feb 2002, Matthew Dillon wrote:
:
: :It's too messy and unfinished (doesn't work right for SMP or irqs = 16),
: :and dificult to untangle from my other patches.  I posted these partial
: :ones to attempt to inhibit() recomplication of the current critical*
: :functions in directions that I don't want to go :-).
:
: Oh, ok.  Hmm.  Well, do you mind if I throw together an interrupt-set-cli-
: in-return-frame patch set?  I think it would wind up being just as fast.
:
:I don't see how that would work.  We need to call critical_enter() from
:ordinary code, so there is no free frame pop.
:
:Bruce

I'm trying it your way, with an ipending variable.

The set-cli-in-return-frame mechanism would only work well for level
interrupts.  Basically the interrupt occurs and you set PS_I in the
return frame (that was pushed on entry to the interrupt).  
critical_exit() would then detect the 1-0 transition and simply STI.

But since we have a combination of level and edge it gets too messy for
my tastes, so I'm trying it with an ipending variable.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Bruce Evans

On Sat, 23 Feb 2002, Matthew Dillon wrote:

 :ipending here works much as in RELENG_4.  It is ORed into by sched_ithd()
 :if curthread-td_critnest != 0.  Nothing special is needed for pci
 :(the ICU masks pending interrupts).
 :
 :Bruce

 Ok, I have most of it coded up.  Could you explain what 'spending'
 was for?

Like the SWI bits in ipending in RELENG_4.  In RELENG_4, we have to pass
around all the bits to spl*(), so we had to pack them in at least some
places (not required in inpending, but efficient).  In -current, packing
is not so important even if it is possible, so I didn't attempt it.

 I am doing it slightly different then you.  To avoid having to use
 locked instructions I have placed ipending in the thread structure:

Mine is global partly for historical reasons.  BTW, I've found that
variables in the thread struct are more fragile than per-cpu globals.
You have to remember to deal with them all before you switch threads.
I needed to clone td_critnest to the next thread in one or two places
(at least one related to the complications for forking).


 void
 critical_enter(void)
 {
 struct thread *td = curthread;
 ++td-td_critnest;
 }

 void
 critical_exit(void)
 {
 struct thread *td = curthread;
 KASSERT(td-td_critnest  0, (bad td_critnest value!));
 if (--td-td_critnest == 0) {
 if (td-td_ipending)
 unpend();
 }
 }

 The apic and icu vector code will do a simple td_critnest test and
 OR the irq bit into td-td_ipending (it conveniently already has
 the struct thread in %ebx).  And the vector code will also check and

I do this at the start of sched_ithd().  This is efficient enough because
the nested case is rare.

 handle any non-zero td_ipending if critnest is 0, handling the
 1-0 transition/preemption race.

 I'll post the patch when it gets a little further along.

 How should I deal with fast interrupts?

Hmm, this gets complicated.  First, you need the td_critnest test
in Xfastintr* (do it before Xfastintr* calls critical_enter()).  It is
important in -current (although bad for latency) that critical_enter()
keeps masking even fast interupts although it doesn't do it in software).
Next, you need to mask the fast interrupt.  It probably needs to be
masked in the ICU/APIC on i386's, so it will become not-so-fast.  Finally,
unpend() needs to do more for fast interrupts:
- give them highest priority
- unmask the in the ICU/APIC/wherever, either before or after calling the
  handler.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: malloc_bucket() idea (was Re: How to fix malloc.)

2002-02-23 Thread Bosko Milekic


On Sat, Feb 23, 2002 at 03:12:36PM -0800, Matthew Dillon wrote:
 
 :OTOH, A per CPU bucket list means no bucket mtx would be necessary;
 :without that, it's just replacing one type of contention for another,
 :I think, until you start making mutex selection indeterminate.  8^(.
 :
 :Really, this delayed freeing idea is starting to get uglier than
 :just going to per CPU pools...
 :
 :-- Terry
 
 Well, if we want to rewrite malloc() a per-cpu pool inside a critical
 section would not be that difficult.  The 'bucket' array would
 simply be placed in the per-cpu area.  However, with malloc() we still
 have a serious problem with the malloc_type structure statistics.  There
 would have to be per-cpu information for those as well.

  Someone (jeffr) is already working on a new allocator to hopefully
[at least] eventually replace malloc(9) and various other kernel
allocators. It will support per CPU working-sets and some cache friendly
goodies.

-- 
Bosko Milekic
[EMAIL PROTECTED]
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message