Does anyone have Phong Vo <[email protected]>'s new email address?
The following work (i.e. the per thread caching etc) in Illumos might
be very interesting...

Forwarded conversation
Subject: [developer] 4489-4491 ptcumem and other libumem enhancements
------------------------

From: Robert Mustacchi <[email protected]>
Date: Thu, Jan 16, 2014 at 3:59 AM
To: illumos Developer <[email protected]>


This combines work done over the past few years at Joyent enhancing
libumem. It's broken down into two commits. One which adds 128k slabs
and makes VM_BESTFIT the default, the other which adds per thread
caching umem. For more on umem, see
http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/.

https://us-east.manta.joyent.com/rmustacc/public/webrevs/4489/index.html

Robert


-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175106-197fca96
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21175106&id_secret=21175106-3d886b59
Powered by Listbox: http://www.listbox.com

----------
From: Dan McDonald <[email protected]>
Date: Mon, Jan 20, 2014 at 4:33 AM
To: Robert Mustacchi <[email protected]>
Cc: illumos Developer <[email protected]>



On Jan 15, 2014, at 9:59 PM, Robert Mustacchi <[email protected]> wrote:

> This combines work done over the past few years at Joyent enhancing
> libumem. It's broken down into two commits. One which adds 128k slabs
> and makes VM_BESTFIT the default, the other which adds per thread
> caching umem. For more on umem, see
> http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/.
>
> https://us-east.manta.joyent.com/rmustacc/public/webrevs/4489/index.html

I mostly understood this except for one constant, and I suspect that's
me being dense.  I also didn't find much wrong, save for one possible
attack-surface reduction.

Dan

-----------

exception_lists/check_rtime
usr/src/cmd/mdb/common/modules/libc/libc.c
usr/src/cmd/mdb/common/modules/libumem/libumem.c
usr/src/cmd/mdb/common/modules/libumem/umem.c
usr/src/cmd/mdb/intel/amd64/libumem/Makefile
usr/src/cmd/mdb/intel/ia32/libumem/Makefile
usr/src/cmd/mdb/sparc/v7/libumem/Makefile
usr/src/cmd/mdb/sparc/v9/libumem/Makefile
usr/src/lib/libc/amd64/Makefile
usr/src/lib/libc/i386/Makefile.com
usr/src/lib/libc/port/mapfile-vers
usr/src/lib/libc/port/threads/thr.c
usr/src/lib/libc/sparc/Makefile.com
usr/src/lib/libc/sparcv9/Makefile.com
usr/src/lib/libumem/Makefile.com
usr/src/lib/libumem/common/envvar.c
usr/src/lib/libumem/common/malloc.c
usr/src/lib/libumem/common/stub_stand.c
usr/src/lib/libumem/common/umem_base.h
usr/src/lib/libumem/common/umem_impl.h
usr/src/lib/libumem/common/vmem_base.c
usr/src/lib/libumem/i386/asm_subr.s
usr/src/lib/libc/port/threads/tmem.c
usr/src/man/man3malloc/umem_alloc.3malloc

* File look good, no comments needed.


usr/src/lib/libc/inc/thr_uberdata.h

* Why 16 for tm_roots?  Ahhh, it's on line 405-408 of umem.c.  Mention
  the big-theory statement, please?


usr/src/lib/libumem/amd64/umem_genasm.c

* Line 567: Grammar nit:  "it's" ("it is", not "single-byte addl's
  MAX_UINT32")


usr/src/lib/libumem/common/mapfile-vers

* Have you thought through the security considerations of allowing
  asm_subr.o's text segment to be writable?  Or am I worrying about nothing?
  Worst case, you create a new .o that JUST has these writeable malloc
  linkages in them, I think.


usr/src/lib/libumem/common/umem.c

* Sorry for being dense, but 16 for TMNUMROOTS still seems arbitrary.  Please
  enlighten me, or tell me where in here or in umem_genasm.c I missed it?!


usr/src/lib/libumem/i386/umem_genasm.c

* Lines 32 & 557:  it's/its grammar.


usr/src/lib/libumem/sparc/umem_genasm.c

* WHAT?!? NO SPARC?!?  okay okay... put the baseball bat down... just
  kidding.  :)

----------
From: Josef 'Jeff' Sipek <[email protected]>
Date: Mon, Jan 20, 2014 at 4:30 PM
To: Robert Mustacchi <[email protected]>
Cc: illumos Developer <[email protected]>


On Wed, Jan 15, 2014 at 06:59:29PM -0800, Robert Mustacchi wrote:
> This combines work done over the past few years at Joyent enhancing
> libumem. It's broken down into two commits. One which adds 128k slabs
> and makes VM_BESTFIT the default, the other which adds per thread
> caching umem. For more on umem, see
> http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/.
>
> https://us-east.manta.joyent.com/rmustacc/public/webrevs/4489/index.html

Fascinating read in umem.c...

I'm not exactly thrilled about some of the arch specifics in common code.
Yes, I know we support only x86 and sparc (and no one actually uses sparc)
and the ptcumem changes are supported only on x86.  For example, line 489
mentions 5-byte jump.  Should some of this architecture specific
documentation move into the architecture specific files?  Ah, I see that
there's some there.  I feel like having arch specific details in the common
files is begging for contradicting comments down the line as one gets
updated but the other doesn't.

umem.c:552: is that supposed to be s/makefile/mapfile/?

would it make sense to make umem_genasm_supported const?

stub_stand.c:151: is atomic_swap_64 supposed to be atomic? (I'm not familiar
with the requirements for stubs.)

malloc.c: Is there any "config" in illumos so that places like this can
check for a (compile time) feature instead of an architecture?

Jeff.

--
Once you have their hardware. Never give it back.
(The First Rule of Hardware Acquisition)

----------
From: Robert Mustacchi <[email protected]>
Date: Mon, Jan 20, 2014 at 5:42 PM
To: Josef 'Jeff' Sipek <[email protected]>
Cc: illumos Developer <[email protected]>


On 01/20/2014 07:30 AM, Josef 'Jeff' Sipek wrote:
> On Wed, Jan 15, 2014 at 06:59:29PM -0800, Robert Mustacchi wrote:
>> This combines work done over the past few years at Joyent enhancing
>> libumem. It's broken down into two commits. One which adds 128k slabs
>> and makes VM_BESTFIT the default, the other which adds per thread
>> caching umem. For more on umem, see
>> http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/.
>>
>> https://us-east.manta.joyent.com/rmustacc/public/webrevs/4489/index.html
>
> Fascinating read in umem.c...
>
> I'm not exactly thrilled about some of the arch specifics in common code.
> Yes, I know we support only x86 and sparc (and no one actually uses sparc)
> and the ptcumem changes are supported only on x86.  For example, line 489
> mentions 5-byte jump.  Should some of this architecture specific
> documentation move into the architecture specific files?  Ah, I see that
> there's some there.  I feel like having arch specific details in the common
> files is begging for contradicting comments down the line as one gets
> updated but the other doesn't.

It's not a perfect world, and I don't think there's a great answer. The
vast majority of that block of text that I added into the big theory
statement is equally viable across all architectures that illumos does
or may some day support. If someone wanted to port this to sparc, which
I'd welcome, or we enter the mystical future where there are other
architectures that illumos supports, then the contents there tell the
writer how they should go about doing this, the parts about breaking
down the generated asm into different portions are the same as what
you'd do across any architecture.

There are a few places where we delve into x86 specifics (8.2 and 8.3),
which I've updated to try to generalize them. I'd really rather not
break out the vast majority of this into the per-architecture
implementations as otherwise I fear we'll hit your fears much sooner, as
we'll actually be duplicating content.

> umem.c:552: is that supposed to be s/makefile/mapfile/?

Yeah, it is, thanks.

> would it make sense to make umem_genasm_supported const?

I don't think it really changes anything, but sure, I can do that.

> stub_stand.c:151: is atomic_swap_64 supposed to be atomic? (I'm not familiar
> with the requirements for stubs.)

So, no, not really. These are basically just used for kmdb and aren't
atomic. See what we're doing with mutex_lock and umem_atomic_add_64 in
cmd/mdb/common/kmdb/kmdb_umemglue.c. If you feel it will be more
consistent, I could do the same rename tricks and move its stub
implementation into there and do the same rename tricks.

> malloc.c: Is there any "config" in illumos so that places like this can
> check for a (compile time) feature instead of an architecture?

We only have that in terms of features of the architecture and processor
or langauge, see <sys/isa_defs.h> and <sys/feature_tests.h>. So, for
better or worse, this is what we do in the rest of the system.

I've updated the webrev in place.

Robert

----------
From: Josef 'Jeff' Sipek <[email protected]>
Date: Mon, Jan 20, 2014 at 8:16 PM
To: Robert Mustacchi <[email protected]>
Cc: illumos Developer <[email protected]>


Generalizing is what I was looking for.

> > umem.c:552: is that supposed to be s/makefile/mapfile/?
>
> Yeah, it is, thanks.
>
> > would it make sense to make umem_genasm_supported const?
>
> I don't think it really changes anything, but sure, I can do that.

.data vs .rodata?  Since it's not a tunable might as well have the compiler
and kernel (assuming non-writable .rodata) tell us about any bad behavior.

Should umem_base.h be updated too?

> > stub_stand.c:151: is atomic_swap_64 supposed to be atomic? (I'm not familiar
> > with the requirements for stubs.)
>
> So, no, not really. These are basically just used for kmdb and aren't
> atomic. See what we're doing with mutex_lock and umem_atomic_add_64 in
> cmd/mdb/common/kmdb/kmdb_umemglue.c. If you feel it will be more
> consistent, I could do the same rename tricks and move its stub
> implementation into there and do the same rename tricks.

I don't think I quite have the big picture here to decide which way makes
more sense, so I leave it up to you to decide.  (I just made sure to mention
it in case it did need to be atomic.)

> > malloc.c: Is there any "config" in illumos so that places like this can
> > check for a (compile time) feature instead of an architecture?
>
> We only have that in terms of features of the architecture and processor
> or langauge, see <sys/isa_defs.h> and <sys/feature_tests.h>. So, for
> better or worse, this is what we do in the rest of the system.

Yeah, that's what I thought.

Some more feedback:

umem_genasm.c:116: 0x0 -> 0x00

Overall, LGTM.

Jeff.

--
Mankind invented the atomic bomb, but no mouse would ever construct a
mousetrap.
                - Albert Einstein

----------
From: Robert Mustacchi <[email protected]>
Date: Mon, Jan 20, 2014 at 9:08 PM
To: Josef 'Jeff' Sipek <[email protected]>
Cc: illumos Developer <[email protected]>


On 1/20/14 11:16 , Josef 'Jeff' Sipek wrote:
> Should umem_base.h be updated too?

Yes, I forgot about that. Thanks.

>>> stub_stand.c:151: is atomic_swap_64 supposed to be atomic? (I'm not familiar
>>> with the requirements for stubs.)
>>
>> So, no, not really. These are basically just used for kmdb and aren't
>> atomic. See what we're doing with mutex_lock and umem_atomic_add_64 in
>> cmd/mdb/common/kmdb/kmdb_umemglue.c. If you feel it will be more
>> consistent, I could do the same rename tricks and move its stub
>> implementation into there and do the same rename tricks.
>
> I don't think I quite have the big picture here to decide which way makes
> more sense, so I leave it up to you to decide.  (I just made sure to mention
> it in case it did need to be atomic.)

I've opted to do the latter to be more consistent.

> Some more feedback:
>
> umem_genasm.c:116: 0x0 -> 0x00

Fixed.

Webrev updated in place.

Robert

----------
From: Josef 'Jeff' Sipek <[email protected]>
Date: Mon, Jan 20, 2014 at 9:24 PM
To: Robert Mustacchi <[email protected]>
Cc: illumos Developer <[email protected]>


On Mon, Jan 20, 2014 at 12:08:49PM -0800, Robert Mustacchi wrote:
> On 1/20/14 11:16 , Josef 'Jeff' Sipek wrote:
> > Should umem_base.h be updated too?
>
> Yes, I forgot about that. Thanks.
>
> >>> stub_stand.c:151: is atomic_swap_64 supposed to be atomic? (I'm not 
> >>> familiar
> >>> with the requirements for stubs.)
> >>
> >> So, no, not really. These are basically just used for kmdb and aren't
> >> atomic. See what we're doing with mutex_lock and umem_atomic_add_64 in
> >> cmd/mdb/common/kmdb/kmdb_umemglue.c. If you feel it will be more
> >> consistent, I could do the same rename tricks and move its stub
> >> implementation into there and do the same rename tricks.
> >
> > I don't think I quite have the big picture here to decide which way makes
> > more sense, so I leave it up to you to decide.  (I just made sure to mention
> > it in case it did need to be atomic.)
>
> I've opted to do the latter to be more consistent.

Turns out that I have no idea how the rename tricks work.  Assuming that it
works, LGTM.

Jeff.

> > Some more feedback:
> >
> > umem_genasm.c:116: 0x0 -> 0x00
>
> Fixed.
>
> Webrev updated in place.
>
> Robert

--
*NOTE: This message is ROT-13 encrypted twice for extra protection*

----------
From: Robert Mustacchi <[email protected]>
Date: Tue, Jan 21, 2014 at 12:21 AM
To: Dan McDonald <[email protected]>
Cc: illumos Developer <[email protected]>


Hey Dan,

Responses inline.

On 1/19/14 19:33 , Dan McDonald wrote:
> usr/src/lib/libc/inc/thr_uberdata.h
>
> * Why 16 for tm_roots?  Ahhh, it's on line 405-408 of umem.c.  Mention
>   the big-theory statement, please?

I've added a comment to direct folks there.

> usr/src/lib/libumem/common/mapfile-vers
>
> * Have you thought through the security considerations of allowing
>   asm_subr.o's text segment to be writable?  Or am I worrying about nothing?
>   Worst case, you create a new .o that JUST has these writeable malloc
>   linkages in them, I think.

If we wanted to do anything, we could mprotect it after the work by
umem_genasm(). That'd actually solve the problems that this could cause.
I think that's a reasonable low priority follow up RFE.

> usr/src/lib/libumem/common/umem.c
>
> * Sorry for being dense, but 16 for TMNUMROOTS still seems arbitrary.  Please
>   enlighten me, or tell me where in here or in umem_genasm.c I missed it?!

Well, it isn't properly explained, so I'll try and add something to the
comments. It is slightly arbitrary. But it was done based on the size of
the allocations and what sizes would be covered by the per-thread cache
and the likelihood we've seen for applications to allocate those sizes.
This means <= 256 bytes and <=448 bytes will be cached for i386 and
amd64 respectively. Basically, beyond those sizes you start getting to
more serious sized objects being allocated which are much more
noticeable in other systems. So beyond that size we basically felt that
the efficacy of this was going to decrease and that folks who are doing
larger sized allocations are going to be much more noticeable and less
likely to just do it every time by accident.

I've updated the webrev in place.

Robert






-- 
Irek
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to