Re: Request for testing malloc and multi-threaded applications
On Tue, Sep 27, 2022 at 03:31:12PM +0200, Renaud Allard wrote: > On 1/16/19 19:09, Otto Moerbeek wrote: > > On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote: > > > > > On 2019/01/04 08:09, Otto Moerbeek wrote: > > > > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote: > > > > > > > > > > > > > > Very little feedback so far. This diff can only give me valid feedback > > > > > if the coverage of systems and use cases is wide. If I do not get > > > > > more feedback, I have to base my decisions on my own testing, which > > > > > will benefit my systems and use cases, but might harm yours. > > > > > > > > > > So, ladies and gentlemen, start your tests! > > > > > > > > Another reminder. I like to make progress on this. That means I need > > > > tests for various use-cases. > > > > > > I have a map based website I use that is quite good at stressing things > > > (high spin% cpu) and have been timing from opening chromium (I'm using > > > this for the test because it typically performs less well than firefox). > > > Time is real time from starting the browser set to 'start with previously > > > opened windows' and the page open, until when the page reports that it's > > > finished loading (i.e. fetching data from the server and rendering it). > > > > > > It's not a perfect test - depends on network/server conditions etc - and > > > it's a visualisation of conditions in a game so may change slightly from > > > run to run but there shouldn't be huge changes between the times I've > > > run it - but is a bit more repeatable than a subjective "does the browser > > > feel slow". > > > > > > 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap). > > > > > > I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in > > > order, > > > more like +++, -, '', -, ++ etc. > > > > > > +++ 90 98 68 > > > ++ 85 82 > > > + 87 56 71 > > > '' 76 60 69 88 > > > - 77 74 85 > > > -- 48 86 77 67 > > > > > > So while it's not very consistent, the fastest times I've seen are on > > > runs with fewer pools, and the slowest times on runs with more pools, > > > with '' possibly seeming a bit more consistent from run to run. But > > > there's not enough consistency with any of it to be able to make any > > > clear conclusion (and I get the impression it would be hard to > > > tell without some automated test that can be repeated many times > > > and carrying out a statistical analysis on results). > > > > > > > Thanks for testing. To be clear: this is with the diff I posted and not the > > committed code, right? (There is a small change in the committed code > > to change the default to what 1 plus was with the diff). > > > > -Otto > > > > Hello, > > Given that code is in base for about 4 years, shouldn't be the man page > modified to add an explanation for those ++--? Or is there a reason why it's > not documented? > > Best Regards > No, this is for internal/development use only and might be removed any time. It's undocumented on purpose. -Otto
Re: Request for testing malloc and multi-threaded applications
On 1/16/19 19:09, Otto Moerbeek wrote: On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote: On 2019/01/04 08:09, Otto Moerbeek wrote: On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote: Very little feedback so far. This diff can only give me valid feedback if the coverage of systems and use cases is wide. If I do not get more feedback, I have to base my decisions on my own testing, which will benefit my systems and use cases, but might harm yours. So, ladies and gentlemen, start your tests! Another reminder. I like to make progress on this. That means I need tests for various use-cases. I have a map based website I use that is quite good at stressing things (high spin% cpu) and have been timing from opening chromium (I'm using this for the test because it typically performs less well than firefox). Time is real time from starting the browser set to 'start with previously opened windows' and the page open, until when the page reports that it's finished loading (i.e. fetching data from the server and rendering it). It's not a perfect test - depends on network/server conditions etc - and it's a visualisation of conditions in a game so may change slightly from run to run but there shouldn't be huge changes between the times I've run it - but is a bit more repeatable than a subjective "does the browser feel slow". 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap). I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order, more like +++, -, '', -, ++ etc. +++ 90 98 68 ++85 82 + 87 56 71 ''76 60 69 88 - 77 74 85 --48 86 77 67 So while it's not very consistent, the fastest times I've seen are on runs with fewer pools, and the slowest times on runs with more pools, with '' possibly seeming a bit more consistent from run to run. But there's not enough consistency with any of it to be able to make any clear conclusion (and I get the impression it would be hard to tell without some automated test that can be repeated many times and carrying out a statistical analysis on results). Thanks for testing. To be clear: this is with the diff I posted and not the committed code, right? (There is a small change in the committed code to change the default to what 1 plus was with the diff). -Otto Hello, Given that code is in base for about 4 years, shouldn't be the man page modified to add an explanation for those ++--? Or is there a reason why it's not documented? Best Regards smime.p7s Description: S/MIME Cryptographic Signature
Re: Request for testing malloc and multi-threaded applications
On Fri, Jan 18, 2019 at 08:41:57AM +0100, Alexandr Nedvedicky wrote: > Hello Otto, > > I gave it a try with firefox. according to my subjective tests > I could not spot any differences with various setting. > > I've decided to try with some memory benchmarks I could find on github [1]. I > did create a fork [2] with my own test runner to try out your diff. To run it > just do something like: > > git clone https://github.com/Sashan/Hoard.git > cd Hoard/benchmarks/ > make > > the benchmarks are from 90's. Description can be found in paper kept along to > Hoard project [3] > > the box where I did run tests 4 CPUs: > cpu0: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz, \ > 2997.38 MHz, 06-17-0a > with 8GB of RAM. > > I used time(1) to measure to running time of test-run.sh with particular > MALLOC_OPTIONS set. The results are as follows: > > Running with MALLOC_OPTIONS= > 1730.27 real 3289.41 user 3574.28 sys > Running with MALLOC_OPTIONS=- > 1726.16 real 3279.37 user 3575.26 sys > Running with MALLOC_OPTIONS=+ > 1712.40 real 3296.65 user 3483.03 sys > Running with MALLOC_OPTIONS=-- > 1741.42 real 3290.89 user 3616.37 sys > Running with MALLOC_OPTIONS=++ > 1765.02 real 3287.75 user 3665.30 sys > Running with MALLOC_OPTIONS=+++ > 1758.06 real 3300.00 user 3631.57 sys > > As you can see differences are insignificant, spread is ~1 minute. On round of > test took ~30 minutes. > > regards > sashan Thanks, -Otto > > [1] https://github.com/emeryberger/Hoard/tree/master/benchmarks > > [2] https://github.com/Sashan/Hoard > > [3] https://github.com/emeryberger/Hoard/blob/master/doc/berger-asplos2000.pdf > > On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote: > > On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote: > > > > > Hi, > > > > > > This diff implements a more flexible approach for the number of pools > > > malloc uses in the multi-threaded case. At the momemt I do not intend > > > to commit this as-is, I first need this to get some feedback on what > > > the proper default should be. > > > > > > Currently the number of pools is fixed at 4. More pools mean less > > > contention for allocations, but free becomes more expensive since a > > > thread might need to check other pools increasing contention. > > > > > > I'd like to know how this diff behaves using your favorite > > > mutli-threaded application. Often this will be a web-browser I guess. > > > > > > Test instructions: > > > > > > 0. Make sure you are running current. > > > > > > 1. Do a baseline test of your application. > > > > > > 2. Apply diff, build and install userland. > > > > > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: > > > "", +, -, ++, -- and +++. > > > > > > e.g. > > > > > > MALLOC_OPTIONS=++ chrome > > > > > > Note performance. Do multiple tests to get better statistics. > > > > > > If you're not able to do full tests, at least general observations are > > > welcome. Tell a bit about the system you tested on (e.g. number of > > > cores). Note that due to randomization, different runs might show > > > different performance numbers since the pools shared by subsets of > > > threads can turn out differently. > > > > > > Thanks, > > > > > > -Otto > > > > New diff with problem noted by Janne Johansson fixed. > > > > Index: include/thread_private.h > > === > > RCS file: /cvs/src/lib/libc/include/thread_private.h,v > > retrieving revision 1.33 > > diff -u -p -r1.33 thread_private.h > > --- include/thread_private.h5 Dec 2017 13:45:31 - 1.33 > > +++ include/thread_private.h19 Dec 2018 10:18:38 - > > @@ -7,7 +7,7 @@ > > > > #include /* for FILE and __isthreaded */ > > > > -#define _MALLOC_MUTEXES 4 > > +#define _MALLOC_MUTEXES 32 > > void _malloc_init(int); > > #ifdef __LIBC__ > > PROTO_NORMAL(_malloc_init); > > Index: stdlib/malloc.c > > === > > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v > > retrieving revision 1.257 > > diff -u -p -r1.257 malloc.c > > --- stdlib/malloc.c 10 Dec 2018 07:57:49 - 1.257 > > +++ stdlib/malloc.c 19 Dec 2018 10:18:38 - > > @@ -143,6 +143,8 @@ struct dir_info { > > size_t cheap_reallocs; > > size_t malloc_used; /* bytes allocated */ > > size_t malloc_guarded; /* bytes used for guards */ > > + size_t pool_searches; /* searches for pool */ > > + size_t other_pool; /* searches in other pool */ > > #define STATS_ADD(x,y) ((x) += (y)) > > #define STATS_SUB(x,y) ((x) -= (y)) > > #define STATS_INC(x) ((x)++) > > @@ -179,7 +181,9 @@ struct chunk_info { > > }; > > > > struct malloc_readonly { > > - struct dir_info *malloc_pool[_MALLOC_MUTEXES]; /* Main book
Re: Request for testing malloc and multi-threaded applications
Hello Otto, I gave it a try with firefox. according to my subjective tests I could not spot any differences with various setting. I've decided to try with some memory benchmarks I could find on github [1]. I did create a fork [2] with my own test runner to try out your diff. To run it just do something like: git clone https://github.com/Sashan/Hoard.git cd Hoard/benchmarks/ make the benchmarks are from 90's. Description can be found in paper kept along to Hoard project [3] the box where I did run tests 4 CPUs: cpu0: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz, \ 2997.38 MHz, 06-17-0a with 8GB of RAM. I used time(1) to measure to running time of test-run.sh with particular MALLOC_OPTIONS set. The results are as follows: Running with MALLOC_OPTIONS= 1730.27 real 3289.41 user 3574.28 sys Running with MALLOC_OPTIONS=- 1726.16 real 3279.37 user 3575.26 sys Running with MALLOC_OPTIONS=+ 1712.40 real 3296.65 user 3483.03 sys Running with MALLOC_OPTIONS=-- 1741.42 real 3290.89 user 3616.37 sys Running with MALLOC_OPTIONS=++ 1765.02 real 3287.75 user 3665.30 sys Running with MALLOC_OPTIONS=+++ 1758.06 real 3300.00 user 3631.57 sys As you can see differences are insignificant, spread is ~1 minute. On round of test took ~30 minutes. regards sashan [1] https://github.com/emeryberger/Hoard/tree/master/benchmarks [2] https://github.com/Sashan/Hoard [3] https://github.com/emeryberger/Hoard/blob/master/doc/berger-asplos2000.pdf On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote: > On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote: > > > Hi, > > > > This diff implements a more flexible approach for the number of pools > > malloc uses in the multi-threaded case. At the momemt I do not intend > > to commit this as-is, I first need this to get some feedback on what > > the proper default should be. > > > > Currently the number of pools is fixed at 4. More pools mean less > > contention for allocations, but free becomes more expensive since a > > thread might need to check other pools increasing contention. > > > > I'd like to know how this diff behaves using your favorite > > mutli-threaded application. Often this will be a web-browser I guess. > > > > Test instructions: > > > > 0. Make sure you are running current. > > > > 1. Do a baseline test of your application. > > > > 2. Apply diff, build and install userland. > > > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: > > "", +, -, ++, -- and +++. > > > > e.g. > > > > MALLOC_OPTIONS=++ chrome > > > > Note performance. Do multiple tests to get better statistics. > > > > If you're not able to do full tests, at least general observations are > > welcome. Tell a bit about the system you tested on (e.g. number of > > cores). Note that due to randomization, different runs might show > > different performance numbers since the pools shared by subsets of > > threads can turn out differently. > > > > Thanks, > > > > -Otto > > New diff with problem noted by Janne Johansson fixed. > > Index: include/thread_private.h > === > RCS file: /cvs/src/lib/libc/include/thread_private.h,v > retrieving revision 1.33 > diff -u -p -r1.33 thread_private.h > --- include/thread_private.h 5 Dec 2017 13:45:31 - 1.33 > +++ include/thread_private.h 19 Dec 2018 10:18:38 - > @@ -7,7 +7,7 @@ > > #include/* for FILE and __isthreaded */ > > -#define _MALLOC_MUTEXES 4 > +#define _MALLOC_MUTEXES 32 > void _malloc_init(int); > #ifdef __LIBC__ > PROTO_NORMAL(_malloc_init); > Index: stdlib/malloc.c > === > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v > retrieving revision 1.257 > diff -u -p -r1.257 malloc.c > --- stdlib/malloc.c 10 Dec 2018 07:57:49 - 1.257 > +++ stdlib/malloc.c 19 Dec 2018 10:18:38 - > @@ -143,6 +143,8 @@ struct dir_info { > size_t cheap_reallocs; > size_t malloc_used; /* bytes allocated */ > size_t malloc_guarded; /* bytes used for guards */ > + size_t pool_searches; /* searches for pool */ > + size_t other_pool; /* searches in other pool */ > #define STATS_ADD(x,y) ((x) += (y)) > #define STATS_SUB(x,y) ((x) -= (y)) > #define STATS_INC(x) ((x)++) > @@ -179,7 +181,9 @@ struct chunk_info { > }; > > struct malloc_readonly { > - struct dir_info *malloc_pool[_MALLOC_MUTEXES]; /* Main bookkeeping > information */ > + /* Main bookkeeping information */ > + struct dir_info *malloc_pool[_MALLOC_MUTEXES]; > + u_int malloc_mutexes; /* how much in actual use? */ > int malloc_mt; /* multi-threaded mode? */ > int malloc_freecheck; /* Extensive dou
Re: Request for testing malloc and multi-threaded applications
On 2019/01/16 19:09, Otto Moerbeek wrote: > On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote: > > > On 2019/01/04 08:09, Otto Moerbeek wrote: > > > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote: > > > > > > > > > > > Very little feedback so far. This diff can only give me valid feedback > > > > if the coverage of systems and use cases is wide. If I do not get > > > > more feedback, I have to base my decisions on my own testing, which > > > > will benefit my systems and use cases, but might harm yours. > > > > > > > > So, ladies and gentlemen, start your tests! > > > > > > Another reminder. I like to make progress on this. That means I need > > > tests for various use-cases. > > > > I have a map based website I use that is quite good at stressing things > > (high spin% cpu) and have been timing from opening chromium (I'm using > > this for the test because it typically performs less well than firefox). > > Time is real time from starting the browser set to 'start with previously > > opened windows' and the page open, until when the page reports that it's > > finished loading (i.e. fetching data from the server and rendering it). > > > > It's not a perfect test - depends on network/server conditions etc - and > > it's a visualisation of conditions in a game so may change slightly from > > run to run but there shouldn't be huge changes between the times I've > > run it - but is a bit more repeatable than a subjective "does the browser > > feel slow". > > > > 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap). > > > > I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order, > > more like +++, -, '', -, ++ etc. > > > > +++90 98 68 > > ++ 85 82 > > + 87 56 71 > > '' 76 60 69 88 > > - 77 74 85 > > -- 48 86 77 67 > > > > So while it's not very consistent, the fastest times I've seen are on > > runs with fewer pools, and the slowest times on runs with more pools, > > with '' possibly seeming a bit more consistent from run to run. But > > there's not enough consistency with any of it to be able to make any > > clear conclusion (and I get the impression it would be hard to > > tell without some automated test that can be repeated many times > > and carrying out a statistical analysis on results). > > > > Thanks for testing. To be clear: this is with the diff I posted and not the > committed code, right? (There is a small change in the committed code > to change the default to what 1 plus was with the diff). > > -Otto > Ah I missed that it was committed (and thought that the diff as sent was in snapshots) - this was the committed version then. (It took a while to test as I was trying to think of something where I actually had a chance of noticing a difference!).
Re: Request for testing malloc and multi-threaded applications
On Wed, Jan 16, 2019 at 01:25:25PM +, Stuart Henderson wrote: > On 2019/01/04 08:09, Otto Moerbeek wrote: > > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote: > > > > > > > > Very little feedback so far. This diff can only give me valid feedback > > > if the coverage of systems and use cases is wide. If I do not get > > > more feedback, I have to base my decisions on my own testing, which > > > will benefit my systems and use cases, but might harm yours. > > > > > > So, ladies and gentlemen, start your tests! > > > > Another reminder. I like to make progress on this. That means I need > > tests for various use-cases. > > I have a map based website I use that is quite good at stressing things > (high spin% cpu) and have been timing from opening chromium (I'm using > this for the test because it typically performs less well than firefox). > Time is real time from starting the browser set to 'start with previously > opened windows' and the page open, until when the page reports that it's > finished loading (i.e. fetching data from the server and rendering it). > > It's not a perfect test - depends on network/server conditions etc - and > it's a visualisation of conditions in a game so may change slightly from > run to run but there shouldn't be huge changes between the times I've > run it - but is a bit more repeatable than a subjective "does the browser > feel slow". > > 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap). > > I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order, > more like +++, -, '', -, ++ etc. > > +++ 90 98 68 > ++ 85 82 > +87 56 71 > '' 76 60 69 88 > -77 74 85 > -- 48 86 77 67 > > So while it's not very consistent, the fastest times I've seen are on > runs with fewer pools, and the slowest times on runs with more pools, > with '' possibly seeming a bit more consistent from run to run. But > there's not enough consistency with any of it to be able to make any > clear conclusion (and I get the impression it would be hard to > tell without some automated test that can be repeated many times > and carrying out a statistical analysis on results). > Thanks for testing. To be clear: this is with the diff I posted and not the committed code, right? (There is a small change in the committed code to change the default to what 1 plus was with the diff). -Otto
Re: Request for testing malloc and multi-threaded applications
On 2019/01/04 08:09, Otto Moerbeek wrote: > On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote: > > > > > Very little feedback so far. This diff can only give me valid feedback > > if the coverage of systems and use cases is wide. If I do not get > > more feedback, I have to base my decisions on my own testing, which > > will benefit my systems and use cases, but might harm yours. > > > > So, ladies and gentlemen, start your tests! > > Another reminder. I like to make progress on this. That means I need > tests for various use-cases. I have a map based website I use that is quite good at stressing things (high spin% cpu) and have been timing from opening chromium (I'm using this for the test because it typically performs less well than firefox). Time is real time from starting the browser set to 'start with previously opened windows' and the page open, until when the page reports that it's finished loading (i.e. fetching data from the server and rendering it). It's not a perfect test - depends on network/server conditions etc - and it's a visualisation of conditions in a game so may change slightly from run to run but there shouldn't be huge changes between the times I've run it - but is a bit more repeatable than a subjective "does the browser feel slow". 4x "real" cores, Xeon E3-1225v3, 16GB ram (not going into swap). I've mixed up the test orders so it's not 3x +++, 2x ++, 3x + etc in order, more like +++, -, '', -, ++ etc. +++90 98 68 ++ 85 82 + 87 56 71 '' 76 60 69 88 - 77 74 85 -- 48 86 77 67 So while it's not very consistent, the fastest times I've seen are on runs with fewer pools, and the slowest times on runs with more pools, with '' possibly seeming a bit more consistent from run to run. But there's not enough consistency with any of it to be able to make any clear conclusion (and I get the impression it would be hard to tell without some automated test that can be repeated many times and carrying out a statistical analysis on results).
Re: Request for testing malloc and multi-threaded applications
On Thu, Dec 27, 2018 at 09:39:56AM +0100, Otto Moerbeek wrote: > > Very little feedback so far. This diff can only give me valid feedback > if the coverage of systems and use cases is wide. If I do not get > more feedback, I have to base my decisions on my own testing, which > will benefit my systems and use cases, but might harm yours. > > So, ladies and gentlemen, start your tests! Another reminder. I like to make progress on this. That means I need tests for various use-cases. Thanks -Otto > > > On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote: > > > On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote: > > > > > Hi, > > > > > > This diff implements a more flexible approach for the number of pools > > > malloc uses in the multi-threaded case. At the momemt I do not intend > > > to commit this as-is, I first need this to get some feedback on what > > > the proper default should be. > > > > > > Currently the number of pools is fixed at 4. More pools mean less > > > contention for allocations, but free becomes more expensive since a > > > thread might need to check other pools increasing contention. > > > > > > I'd like to know how this diff behaves using your favorite > > > mutli-threaded application. Often this will be a web-browser I guess. > > > > > > Test instructions: > > > > > > 0. Make sure you are running current. > > > > > > 1. Do a baseline test of your application. > > > > > > 2. Apply diff, build and install userland. > > > > > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: > > > "", +, -, ++, -- and +++. > > > > > > e.g. > > > > > > MALLOC_OPTIONS=++ chrome > > > > > > Note performance. Do multiple tests to get better statistics. > > > > > > If you're not able to do full tests, at least general observations are > > > welcome. Tell a bit about the system you tested on (e.g. number of > > > cores). Note that due to randomization, different runs might show > > > different performance numbers since the pools shared by subsets of > > > threads can turn out differently. > > > > > > Thanks, > > > > > > -Otto > > > > New diff with problem noted by Janne Johansson fixed. > > > > Index: include/thread_private.h > > === > > RCS file: /cvs/src/lib/libc/include/thread_private.h,v > > retrieving revision 1.33 > > diff -u -p -r1.33 thread_private.h > > --- include/thread_private.h5 Dec 2017 13:45:31 - 1.33 > > +++ include/thread_private.h19 Dec 2018 10:18:38 - > > @@ -7,7 +7,7 @@ > > > > #include /* for FILE and __isthreaded */ > > > > -#define _MALLOC_MUTEXES 4 > > +#define _MALLOC_MUTEXES 32 > > void _malloc_init(int); > > #ifdef __LIBC__ > > PROTO_NORMAL(_malloc_init); > > Index: stdlib/malloc.c > > === > > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v > > retrieving revision 1.257 > > diff -u -p -r1.257 malloc.c > > --- stdlib/malloc.c 10 Dec 2018 07:57:49 - 1.257 > > +++ stdlib/malloc.c 19 Dec 2018 10:18:38 - > > @@ -143,6 +143,8 @@ struct dir_info { > > size_t cheap_reallocs; > > size_t malloc_used; /* bytes allocated */ > > size_t malloc_guarded; /* bytes used for guards */ > > + size_t pool_searches; /* searches for pool */ > > + size_t other_pool; /* searches in other pool */ > > #define STATS_ADD(x,y) ((x) += (y)) > > #define STATS_SUB(x,y) ((x) -= (y)) > > #define STATS_INC(x) ((x)++) > > @@ -179,7 +181,9 @@ struct chunk_info { > > }; > > > > struct malloc_readonly { > > - struct dir_info *malloc_pool[_MALLOC_MUTEXES]; /* Main bookkeeping > > information */ > > + /* Main bookkeeping information */ > > + struct dir_info *malloc_pool[_MALLOC_MUTEXES]; > > + u_int malloc_mutexes; /* how much in actual use? */ > > int malloc_mt; /* multi-threaded mode? */ > > int malloc_freecheck; /* Extensive double free check */ > > int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ > > @@ -267,7 +271,7 @@ getpool(void) > > return mopts.malloc_pool[0]; > > else > > return mopts.malloc_pool[TIB_GET()->tib_tid & > > - (_MALLOC_MUTEXES - 1)]; > > + (mopts.malloc_mutexes - 1)]; > > } > > > > static __dead void > > @@ -316,6 +320,16 @@ static void > > omalloc_parseopt(char opt) > > { > > switch (opt) { > > + case '+': > > + mopts.malloc_mutexes <<= 1; > > + if (mopts.malloc_mutexes > _MALLOC_MUTEXES) > > + mopts.malloc_mutexes = _MALLOC_MUTEXES; > > + break; > > + case '-': > > + mopts.malloc_mutexes >>= 1; > > + if (mopts.malloc_mutexes < 1) > > + mopts.malloc_mutexes = 1; > > + break; > > ca
Re: Request for testing malloc and multi-threaded applications
Very little feedback so far. This diff can only give me valid feedback if the coverage of systems and use cases is wide. If I do not get more feedback, I have to base my decisions on my own testing, which will benefit my systems and use cases, but might harm yours. So, ladies and gentlemen, start your tests! -Otto On Wed, Dec 19, 2018 at 11:20:19AM +0100, Otto Moerbeek wrote: > On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote: > > > Hi, > > > > This diff implements a more flexible approach for the number of pools > > malloc uses in the multi-threaded case. At the momemt I do not intend > > to commit this as-is, I first need this to get some feedback on what > > the proper default should be. > > > > Currently the number of pools is fixed at 4. More pools mean less > > contention for allocations, but free becomes more expensive since a > > thread might need to check other pools increasing contention. > > > > I'd like to know how this diff behaves using your favorite > > mutli-threaded application. Often this will be a web-browser I guess. > > > > Test instructions: > > > > 0. Make sure you are running current. > > > > 1. Do a baseline test of your application. > > > > 2. Apply diff, build and install userland. > > > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: > > "", +, -, ++, -- and +++. > > > > e.g. > > > > MALLOC_OPTIONS=++ chrome > > > > Note performance. Do multiple tests to get better statistics. > > > > If you're not able to do full tests, at least general observations are > > welcome. Tell a bit about the system you tested on (e.g. number of > > cores). Note that due to randomization, different runs might show > > different performance numbers since the pools shared by subsets of > > threads can turn out differently. > > > > Thanks, > > > > -Otto > > New diff with problem noted by Janne Johansson fixed. > > Index: include/thread_private.h > === > RCS file: /cvs/src/lib/libc/include/thread_private.h,v > retrieving revision 1.33 > diff -u -p -r1.33 thread_private.h > --- include/thread_private.h 5 Dec 2017 13:45:31 - 1.33 > +++ include/thread_private.h 19 Dec 2018 10:18:38 - > @@ -7,7 +7,7 @@ > > #include/* for FILE and __isthreaded */ > > -#define _MALLOC_MUTEXES 4 > +#define _MALLOC_MUTEXES 32 > void _malloc_init(int); > #ifdef __LIBC__ > PROTO_NORMAL(_malloc_init); > Index: stdlib/malloc.c > === > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v > retrieving revision 1.257 > diff -u -p -r1.257 malloc.c > --- stdlib/malloc.c 10 Dec 2018 07:57:49 - 1.257 > +++ stdlib/malloc.c 19 Dec 2018 10:18:38 - > @@ -143,6 +143,8 @@ struct dir_info { > size_t cheap_reallocs; > size_t malloc_used; /* bytes allocated */ > size_t malloc_guarded; /* bytes used for guards */ > + size_t pool_searches; /* searches for pool */ > + size_t other_pool; /* searches in other pool */ > #define STATS_ADD(x,y) ((x) += (y)) > #define STATS_SUB(x,y) ((x) -= (y)) > #define STATS_INC(x) ((x)++) > @@ -179,7 +181,9 @@ struct chunk_info { > }; > > struct malloc_readonly { > - struct dir_info *malloc_pool[_MALLOC_MUTEXES]; /* Main bookkeeping > information */ > + /* Main bookkeeping information */ > + struct dir_info *malloc_pool[_MALLOC_MUTEXES]; > + u_int malloc_mutexes; /* how much in actual use? */ > int malloc_mt; /* multi-threaded mode? */ > int malloc_freecheck; /* Extensive double free check */ > int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ > @@ -267,7 +271,7 @@ getpool(void) > return mopts.malloc_pool[0]; > else > return mopts.malloc_pool[TIB_GET()->tib_tid & > - (_MALLOC_MUTEXES - 1)]; > + (mopts.malloc_mutexes - 1)]; > } > > static __dead void > @@ -316,6 +320,16 @@ static void > omalloc_parseopt(char opt) > { > switch (opt) { > + case '+': > + mopts.malloc_mutexes <<= 1; > + if (mopts.malloc_mutexes > _MALLOC_MUTEXES) > + mopts.malloc_mutexes = _MALLOC_MUTEXES; > + break; > + case '-': > + mopts.malloc_mutexes >>= 1; > + if (mopts.malloc_mutexes < 1) > + mopts.malloc_mutexes = 1; > + break; > case '>': > mopts.malloc_cache <<= 1; > if (mopts.malloc_cache > MALLOC_MAXCACHE) > @@ -395,6 +409,7 @@ omalloc_init(void) > /* >* Default options >*/ > + mopts.malloc_mutexes = 4; > mopts.malloc_junk = 1; > mopts.malloc_cache = MALLOC_DEFAULT_CACHE; > > @@ -485,7 +500,7 @@ omalloc_poolinit(struct
Re: Request for testing malloc and multi-threaded applications
On Wed, Dec 19, 2018 at 10:52:03AM +0100, Otto Moerbeek wrote: > Hi, > > This diff implements a more flexible approach for the number of pools > malloc uses in the multi-threaded case. At the momemt I do not intend > to commit this as-is, I first need this to get some feedback on what > the proper default should be. > > Currently the number of pools is fixed at 4. More pools mean less > contention for allocations, but free becomes more expensive since a > thread might need to check other pools increasing contention. > > I'd like to know how this diff behaves using your favorite > mutli-threaded application. Often this will be a web-browser I guess. > > Test instructions: > > 0. Make sure you are running current. > > 1. Do a baseline test of your application. > > 2. Apply diff, build and install userland. > > 3. Run your test application with MALLOC_OPTIONS=value, where value is: > "", +, -, ++, -- and +++. > > e.g. > > MALLOC_OPTIONS=++ chrome > > Note performance. Do multiple tests to get better statistics. > > If you're not able to do full tests, at least general observations are > welcome. Tell a bit about the system you tested on (e.g. number of > cores). Note that due to randomization, different runs might show > different performance numbers since the pools shared by subsets of > threads can turn out differently. > > Thanks, > > -Otto New diff with problem noted by Janne Johansson fixed. Index: include/thread_private.h === RCS file: /cvs/src/lib/libc/include/thread_private.h,v retrieving revision 1.33 diff -u -p -r1.33 thread_private.h --- include/thread_private.h5 Dec 2017 13:45:31 - 1.33 +++ include/thread_private.h19 Dec 2018 10:18:38 - @@ -7,7 +7,7 @@ #include /* for FILE and __isthreaded */ -#define _MALLOC_MUTEXES 4 +#define _MALLOC_MUTEXES 32 void _malloc_init(int); #ifdef __LIBC__ PROTO_NORMAL(_malloc_init); Index: stdlib/malloc.c === RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v retrieving revision 1.257 diff -u -p -r1.257 malloc.c --- stdlib/malloc.c 10 Dec 2018 07:57:49 - 1.257 +++ stdlib/malloc.c 19 Dec 2018 10:18:38 - @@ -143,6 +143,8 @@ struct dir_info { size_t cheap_reallocs; size_t malloc_used; /* bytes allocated */ size_t malloc_guarded; /* bytes used for guards */ + size_t pool_searches; /* searches for pool */ + size_t other_pool; /* searches in other pool */ #define STATS_ADD(x,y) ((x) += (y)) #define STATS_SUB(x,y) ((x) -= (y)) #define STATS_INC(x) ((x)++) @@ -179,7 +181,9 @@ struct chunk_info { }; struct malloc_readonly { - struct dir_info *malloc_pool[_MALLOC_MUTEXES]; /* Main bookkeeping information */ + /* Main bookkeeping information */ + struct dir_info *malloc_pool[_MALLOC_MUTEXES]; + u_int malloc_mutexes; /* how much in actual use? */ int malloc_mt; /* multi-threaded mode? */ int malloc_freecheck; /* Extensive double free check */ int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ @@ -267,7 +271,7 @@ getpool(void) return mopts.malloc_pool[0]; else return mopts.malloc_pool[TIB_GET()->tib_tid & - (_MALLOC_MUTEXES - 1)]; + (mopts.malloc_mutexes - 1)]; } static __dead void @@ -316,6 +320,16 @@ static void omalloc_parseopt(char opt) { switch (opt) { + case '+': + mopts.malloc_mutexes <<= 1; + if (mopts.malloc_mutexes > _MALLOC_MUTEXES) + mopts.malloc_mutexes = _MALLOC_MUTEXES; + break; + case '-': + mopts.malloc_mutexes >>= 1; + if (mopts.malloc_mutexes < 1) + mopts.malloc_mutexes = 1; + break; case '>': mopts.malloc_cache <<= 1; if (mopts.malloc_cache > MALLOC_MAXCACHE) @@ -395,6 +409,7 @@ omalloc_init(void) /* * Default options */ + mopts.malloc_mutexes = 4; mopts.malloc_junk = 1; mopts.malloc_cache = MALLOC_DEFAULT_CACHE; @@ -485,7 +500,7 @@ omalloc_poolinit(struct dir_info **dp) for (j = 0; j < MALLOC_CHUNK_LISTS; j++) LIST_INIT(&d->chunk_dir[i][j]); } - STATS_ADD(d->malloc_used, regioninfo_size); + STATS_ADD(d->malloc_used, regioninfo_size + 3 * MALLOC_PAGESIZE); d->canary1 = mopts.malloc_canary ^ (u_int32_t)(uintptr_t)d; d->canary2 = ~d->canary1; @@ -1196,7 +1211,7 @@ _malloc_init(int from_rthreads) if (!mopts.malloc_canary) omalloc_init(); - max = from_rthreads ? _MALLOC_MUTEXES : 1; + max = from_rthr
Request for testing malloc and multi-threaded applications
Hi, This diff implements a more flexible approach for the number of pools malloc uses in the multi-threaded case. At the momemt I do not intend to commit this as-is, I first need this to get some feedback on what the proper default should be. Currently the number of pools is fixed at 4. More pools mean less contention for allocations, but free becomes more expensive since a thread might need to check other pools increasing contention. I'd like to know how this diff behaves using your favorite mutli-threaded application. Often this will be a web-browser I guess. Test instructions: 0. Make sure you are running current. 1. Do a baseline test of your application. 2. Apply diff, build and install userland. 3. Run your test application with MALLOC_OPTIONS=value, where value is: "", +, -, ++, -- and +++. e.g. MALLOC_OPTIONS=++ chrome Note performance. Do multiple tests to get better statistics. If you're not able to do full tests, at least general observations are welcome. Tell a bit about the system you tested on (e.g. number of cores). Note that due to randomization, different runs might show different performance numbers since the pools shared by subsets of threads can turn out differently. Thanks, -Otto Index: include/thread_private.h === RCS file: /cvs/src/lib/libc/include/thread_private.h,v retrieving revision 1.33 diff -u -p -r1.33 thread_private.h --- include/thread_private.h5 Dec 2017 13:45:31 - 1.33 +++ include/thread_private.h19 Dec 2018 06:52:07 - @@ -7,7 +7,7 @@ #include /* for FILE and __isthreaded */ -#define _MALLOC_MUTEXES 4 +#define _MALLOC_MUTEXES 32 void _malloc_init(int); #ifdef __LIBC__ PROTO_NORMAL(_malloc_init); Index: stdlib/malloc.c === RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v retrieving revision 1.257 diff -u -p -r1.257 malloc.c --- stdlib/malloc.c 10 Dec 2018 07:57:49 - 1.257 +++ stdlib/malloc.c 19 Dec 2018 06:52:07 - @@ -143,6 +143,8 @@ struct dir_info { size_t cheap_reallocs; size_t malloc_used; /* bytes allocated */ size_t malloc_guarded; /* bytes used for guards */ + size_t pool_searches; /* searches for pool */ + size_t other_pool; /* searches in other pool */ #define STATS_ADD(x,y) ((x) += (y)) #define STATS_SUB(x,y) ((x) -= (y)) #define STATS_INC(x) ((x)++) @@ -179,7 +181,9 @@ struct chunk_info { }; struct malloc_readonly { - struct dir_info *malloc_pool[_MALLOC_MUTEXES]; /* Main bookkeeping information */ + /* Main bookkeeping information */ + struct dir_info *malloc_pool[_MALLOC_MUTEXES]; + u_int malloc_mutexes; /* how much in actual use? */ int malloc_mt; /* multi-threaded mode? */ int malloc_freecheck; /* Extensive double free check */ int malloc_freeunmap; /* mprotect free pages PROT_NONE? */ @@ -267,7 +271,7 @@ getpool(void) return mopts.malloc_pool[0]; else return mopts.malloc_pool[TIB_GET()->tib_tid & - (_MALLOC_MUTEXES - 1)]; + (mopts.malloc_mutexes - 1)]; } static __dead void @@ -316,6 +320,16 @@ static void omalloc_parseopt(char opt) { switch (opt) { + case '+': + mopts.malloc_mutexes <<= 1; + if (mopts.malloc_mutexes > _MALLOC_MUTEXES) + mopts.malloc_mutexes = _MALLOC_MUTEXES; + break; + case '-': + mopts.malloc_mutexes >>= 1; + if (mopts.malloc_mutexes < 1) + mopts.malloc_mutexes = 1; + break; case '>': mopts.malloc_cache <<= 1; if (mopts.malloc_cache > MALLOC_MAXCACHE) @@ -395,6 +409,7 @@ omalloc_init(void) /* * Default options */ + mopts.malloc_mutexes = 4; mopts.malloc_junk = 1; mopts.malloc_cache = MALLOC_DEFAULT_CACHE; @@ -485,7 +500,7 @@ omalloc_poolinit(struct dir_info **dp) for (j = 0; j < MALLOC_CHUNK_LISTS; j++) LIST_INIT(&d->chunk_dir[i][j]); } - STATS_ADD(d->malloc_used, regioninfo_size); + STATS_ADD(d->malloc_used, regioninfo_size + 3 * MALLOC_PAGESIZE); d->canary1 = mopts.malloc_canary ^ (u_int32_t)(uintptr_t)d; d->canary2 = ~d->canary1; @@ -1196,7 +1211,7 @@ _malloc_init(int from_rthreads) if (!mopts.malloc_canary) omalloc_init(); - max = from_rthreads ? _MALLOC_MUTEXES : 1; + max = from_rthreads ? mopts.malloc_mutexes : 1; if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0) mprotect(&malloc_readonly, sizeof(malloc_readonly), PROT_READ