Re: [naviserver-devel] Quest for malloc
This is most probably the best variabt so far, and not complicated, such a optimizer can do the right thing easily. sorry for the many versions.. -gustaf { unsigned register int s = (size-1) 3; while (s1) { s = 1; bucket++; } } if (bucket NBUCKETS) { bucket = NBUCKETS; }
Re: [naviserver-devel] Quest for malloc
Am 16.01.2007 um 10:37 schrieb Stephen Deasey: Can you import this into CVS? Top level. You mean the tclThreadAlloc.c file on top-level of the naviserver project?
Re: [naviserver-devel] Quest for malloc
Am 16.01.2007 um 12:18 schrieb Stephen Deasey: vtmalloc -- add this It's there. Everybody can now contribute, if needed.
Re: [naviserver-devel] Quest for malloc
On 1/16/07, Stephen Deasey [EMAIL PROTECTED] wrote: On 1/16/07, Zoran Vasiljevic [EMAIL PROTECTED] wrote: Am 16.01.2007 um 12:18 schrieb Stephen Deasey: vtmalloc -- add this It's there. Everybody can now contribute, if needed. Rocking. I suggest putting the 0.0.3 tarball up on sourceforge, announcing on Freshmeat, and cross-posting on the aolserver list. You really want random people with their random workloads on random OS to beat on this. I don't know if the pool of people here is large enough for that... I'm sure there's a lot of other people who would be interested in this, if they knew about it. Should probably cross-post here, for example: http://wiki.tcl.tk/9683 - Why Do Programs Take Up So Much Memory? Vlad's already on the ball... http://freshmeat.net/projects/vtmalloc/
Re: [naviserver-devel] Quest for malloc
Am 16.01.2007 um 15:41 schrieb Stephen Deasey: I suggest putting the 0.0.3 tarball up on sourceforge, announcing on Freshmeat, and cross-posting on the aolserver list. You really want random people with their random workloads on random OS to beat on this. I don't know if the pool of people here is large enough for that... I'm sure there's a lot of other people who would be interested in this, if they knew about it. Should probably cross-post here, for example: http://wiki.tcl.tk/9683 - Why Do Programs Take Up So Much Memory? The plan was to beat this beast first in the family, then go to the next village (aol-list) and then visit the next town (tcl-core list), in that sequence. You see, even we (i.e. Mike) noticed one glitch in the test program that make Zippy look ridiculous on the Mac, although it wasn't. So we now have enough experience to go visit our neighbours and see what they'll say. On positive feedback, the next is Tcl core list. There I expect most fierce opposition to any change (which is understandable, given the size of the group of involved people and the kind of the change). Cheers Zoran
Re: [naviserver-devel] Quest for malloc
Yes, it is combined version, but Tcl version is slightly different and Zoran took it over to maintain, in my tarball i include both, we do experiments in different directions and then combine best results. Also the intention was to try to include it in Tcl itself. Stephen Deasey wrote: On 1/16/07, Stephen Deasey [EMAIL PROTECTED] wrote: On 1/16/07, Zoran Vasiljevic [EMAIL PROTECTED] wrote: Am 16.01.2007 um 12:18 schrieb Stephen Deasey: vtmalloc -- add this It's there. Everybody can now contribute, if needed. Rocking. I suggest putting the 0.0.3 tarball up on sourceforge, announcing on Freshmeat, and cross-posting on the aolserver list. You really want random people with their random workloads on random OS to beat on this. I don't know if the pool of people here is large enough for that... I'm sure there's a lot of other people who would be interested in this, if they knew about it. Should probably cross-post here, for example: http://wiki.tcl.tk/9683 - Why Do Programs Take Up So Much Memory? Vlad's already on the ball... http://freshmeat.net/projects/vtmalloc/ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
Gustaf Neumann wrote: This is most probably the best variabt so far, and not complicated, such a optimizer can do the right thing easily. sorry for the many versions.. -gustaf { unsigned register int s = (size-1) 3; while (s1) { s = 1; bucket++; } } if (bucket NBUCKETS) { bucket = NBUCKETS; } I don't think anyone has pointed this out yet, but this is a logarithm in base 2 (log2), and there are a fair number of implementations of this available; for maximum performance there are assembly implementations using 'bsr' on x86 architectures, such as this one from google's tcmalloc: // Return floor(log2(n)) for n 0. #if (defined __i386__ || defined __x86_64__) defined __GNUC__ static inline int LgFloor(size_t n) { // ro for the input spec means the input can come from either a // register (r) or offsetable memory (o). size_t result; __asm__(bsr %1, %0 : =r (result) // Output spec : ro (n)// Input spec : cc// Clobbers condition-codes ); return result; } #else // Note: the following only works for ns that fit in 32-bits, but // that is fine since we only use it for small sizes. static inline int LgFloor(size_t n) { int log = 0; for (int i = 4; i = 0; --i) { int shift = (1 i); size_t x = n shift; if (x != 0) { n = x; log += shift; } } ASSERT(n == 1); return log; } #endif (Disclaimer - this comment is based on my explorations of zippy, not vt, so the logic may be entirely different) If this log2(requested_size) is used to translate directly index into the bucket table that necessarily restricts you to having power-of-2 bucket sizes, meaning you allocate on average nearly 50% more than requested (i.e., nearly 33% of allocated memory is overhead/wasted). Adding more, closer-spaced buckets adds to the base footprint but possibly reduces the max usage by dropping the wasted space. I believe tcmalloc uses buckets spaced so that the average waste is only 12.5%. -J
Re: [naviserver-devel] Quest for malloc
Hi Jeff, we are aware that the funciton is essentially an integer log2. The chosen C-based variant is acually faster and more general than what you have included (it needs only max 2 shift operations for the relevant range) but the assembler based variant is hard to beat and yields another 3% for the performance of the benchmark on top of the fastest C version. Thanks for that! -gustaf Jeff Rogers schrieb: I don't think anyone has pointed this out yet, but this is a logarithm in base 2 (log2), and there are a fair number of implementations of this available; for maximum performance there are assembly implementations using 'bsr' on x86 architectures, such as this one from google's tcmalloc:
Re: [naviserver-devel] Quest for malloc
a) The test program Zoran includes biases Zippy toward standard allocator, which it does not do for VT. The following patch corrects this behavior: +++ memtest.c Sun Jan 14 16:43:23 2007 @@ -211,6 +211,7 @@ } else { size = 0x3FFF; /* Limit to 16K */ } + if (size16000) size = 16000; *toallocptr++ = size; } } First of all, I wanted to give Zippy a fair chance. If I increase the max allocation size, Zippy becomes even more slow than it is. And, Zippy handles 16K pages, whereas we handle 32K pages. Hence the size = 0x3FFF; /* Limit to 16K */ which limits the allocation size to 16K max. To increase that would even more hit Zippy than us. Zoran, I believe you misunderstood. The patch above limits blocks allocated by your tester to 16000 instead of 16384 blocks. The reason for this is that Zippy's largest bucket is configured to be 16284-sizeof(Block) bytes (note the 2 in 16_2_84 is _NOT_ a typo). By making uniformly random requests sizes up to 16_3_84, you are causing Zippy to fall back to system malloc for a small fraction of requests, substantially penalizing its performance in these cases. The following patch allows Zippy to be a lot less aggressive in putting blocks into the shared pool, bringing the performance of Zippy much closer to VT, at the expense of substantially higher memory waste: @@ -128,12 +174,12 @@ { 64, 256, 128, NULL}, { 128, 128, 64, NULL}, { 256, 64, 32, NULL}, -{ 512, 32, 16, NULL}, -{ 1024, 16, 8, NULL}, -{ 2048,8, 4, NULL}, -{ 4096,4, 2, NULL}, -{ 8192,2, 1, NULL}, -{16284,1, 1, NULL}, +{ 512, 64, 32, NULL}, +{ 1024, 64, 32, NULL}, +{ 2048, 64, 32, NULL}, +{ 4096, 64, 32, NULL}, +{ 8192, 64, 32, NULL}, +{16284, 64, 32, NULL}, I cannot comment on that. Possibly you are right but I do not see much benefit of that except speeding up Zippy to be on pair with VT, whereas most important VT feature is not the speed, it is the memory handling. You wanted to know why Zippy is slower on your test, this is the reason. This has substantial impact on FreeBSD and linux, and my guess is that it will have a drammatic effect on Mac OSX. VT releases the memory held in a thread's local pool when a thread terminates. Since it uses mmap by default, this means that de-allocated storage is actually released to the operating system, forcing new threads to call mmap() again to get memory, thereby incurring system call overhead that could be avoided in some cases if the system malloc implementation did not lower the sbrk point at each deallocation. Using malloc() in VT allocator should give it much more uniform and consisent performance. Not necessarily. We'd shoot ourselves in the foot by doing so, because most OS allocators never return memory to the system and one of our major benefits will be gone. What we could do: timestamp each page, return all pages to the global cache and prune older. Or, put a size constraint on the global cache. But then you'd have yet-another-knob to adjust and the difficulty would be to find the right setup. VT is more simple in that as it does not offer you ANY knobs you can trim (for better or for worse). In some early stages of the design we had number of knobs and were not certain how to adjust them. So we threw that away and redesigned all parts to be self adjusting if possible. The benefit of mmap() is being able to for sure release memory back to the system. The drawback is that it always incurrs a substantial syscall overhead compared to malloc. You decide which you prefer (I think I would lean slightly toward mmap() for long lived applications, but not by much, since the syscall introduces a lot of variance and an average performance degradation). e) Both allocators use an O(n) algorithm to compute the power of two bucket for the allocated size. This is just plain silly since an O(n log n) algorithm will ofer non-negligible speed up in both allocators. This is the current O(n) code: while (bucket NBUCKETS globalCache.sizes [bucket].blocksize size) { ++bucket; } How about adding this into the code? I think the most obvious replacement is just using an if tree: if (size0xff) bucket+=8, size=0xff; if (size0xf) bucket+=4, size0xf; ... it takes a minute to get the math right, but the performance gain should be substantial. f) Zippy uses Ptr2Block and Block2Ptr functions where as VT uses macros for this. Zippy also does more checks on MAGIC numbers on each allocation which VT only performs on de-allocation. I am not sure if current compilers are smart enough to inline the functions in Zippy, I did not test this. When compiled with -O0 with gcc, changing Zippy to use macros instead of function calls offers non-trivial
Re: [naviserver-devel] Quest for malloc
Am 15.01.2007 um 22:22 schrieb Mike: Zoran, I believe you misunderstood. The patch above limits blocks allocated by your tester to 16000 instead of 16384 blocks. The reason for this is that Zippy's largest bucket is configured to be 16284-sizeof(Block) bytes (note the 2 in 16_2_84 is _NOT_ a typo). By making uniformly random requests sizes up to 16_3_84, you are causing Zippy to fall back to system malloc for a small fraction of requests, substantially penalizing its performance in these cases. Ah! That's right. I will fix that. You wanted to know why Zippy is slower on your test, this is the reason. This has substantial impact on FreeBSD and linux, and my guess is that it will have a drammatic effect on Mac OSX. I will check that tomorrow on my machines. The benefit of mmap() is being able to for sure release memory back to the system. The drawback is that it always incurrs a substantial syscall overhead compared to malloc. You decide which you prefer (I think I would lean slightly toward mmap() for long lived applications, but not by much, since the syscall introduces a lot of variance and an average performance degradation). Yep. I agree. I would avoid it if possible. But I know of no other sure memory-returning call! I see that most (all?) of the allocs I know just keep everything allocated and never returned. How about adding this into the code? I think the most obvious replacement is just using an if tree: if (size0xff) bucket+=8, size=0xff; if (size0xf) bucket+=4, size0xf; ... it takes a minute to get the math right, but the performance gain should be substantial. Well, I can test that allright. I have the feeling that a tight loop as that (will mostly sping 5-12 times) gets well compiled in machine code, but it is better to test. In my tests, due to the frequency of calls of these functions they contribute 10% to 15% overhead in performance. Yes. That is what I was also getting. OTOH, the speed difference between VT and zippy was sometimes several orders of magnitude so I simply ignored that. Ha! It is pretty simple: you can atomically check pointer equivalence without risking a core (at least this is my experience). You are not expected to make far-reaching decisions based on it, though. In this particular example, even if the test was false, there would be no harm done, just an inoptimal path would be selected. I have marked that Dirty read to draw people attention on that place. And, I succeeded obviously :-) The dirty read I have no problem with. It's the the possibility of taking of the head element which could be placed there by another thread that bothers me. Ah, this will not happen. As, I take the global mutex at that point so the pagePtr-p_cachePtr cannot be changed under our feet. If that block was allocated by the current thread, the p_cachePtr will not be changed by anybody. So no harm. If it is not, then we must lock the global mutex to prevent anybody fiddling with that element. It is tricky but it should work. It sounds like you are in the best position to test this change to see if it fixes the unbounded growth problem. Yes! Indeed. The only thing I'd have to check is how much more memory this will take. But is certainly worth trying it out as it will be a temp relief to our users until we stress test the VT to the max so I can include it in our standard distro. -- --- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php? page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] Quest for malloc
Am 13.01.2007 um 06:17 schrieb Mike: I'm happy to offer ssh access to a test box where you can reproduce these results. Oh, that is very fine! Can you give me the access data? You can post me the login-details in a separate private mail. Thanks, Zoran
Re: [naviserver-devel] Quest for malloc
I downloaded the code in the previous mail. After some minor path adjustments, I was able to get the test program to compile and link under FreeBSD 6.1 running on a dual-processor PIII system, linked against a threaded tcl 8.5a. I could get this program to consistently do one of two things: - dump core - hang seemingly forever but absolutely nothing else. Mike, when zoran annouced the version, i downloaded it and had similar expericences. Fault 1 turned out to be: The link of Zoran lead to a premature version of the software, not the real thing (the right version is untarred to a directory containing the verion numbers). Then zoran corrected the link, i refetched, and .. well no makefile. Just complile and try: same effect. Fault was, that i did not read the README (i read the frist one) and compiled (a) without -DTCL_THREADS . i had exectly the same symptoms. correcting these configuration isses, the program works VERY well i tried on 32bit and 64bit machines (a minor complaint in the memtest program, casting 32bit int to ClientData and vice versa) -gustaf PS: i could get access to an 64-bit amd FreeBSD machine on monday, if there is still need... PPS: strangly, the only think making me supicious is the huge amount of improvement, especially on Mac OS X. I can't remember in my experience having seen such a drastical performance increase by a realitive litte code change, especially in an area, which is usually carefully fine-tuned, and where many CS grads from all over the world writing their thesis on I would recommend that Vlad and Zoran should write a technical paper about the new allocator and analyze the properties and differences
Re: [naviserver-devel] Quest for malloc
Am 13.01.2007 um 10:45 schrieb Gustaf Neumann: PPS: strangly, the only think making me supicious is the huge amount of improvement, especially on Mac OS X. Look... Running the test program unmodified (on Mac Pro box): Test Tcl allocator with 4 threads, 16000 records ... This allocator achieves 35096360 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) If I modify the memtest.c program at line 146 to read: if (dorealloc (allocptr tdata[tid].allocs) (r 1)) { allocptr[-1] = reallocs[whichmalloc](allocptr[-1], *toallocptr); } else { allocptr[0] = mallocs[whichmalloc](*toallocptr); /*--*/ memset(allocptr[0], 0, *toallocptr 64 ? 64 : *toallocptr); allocptr++; } Test Tcl allocator with 4 threads, 16000 records ... This allocator achieves 28377808 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) If I memset the whole memory area, not just first 64 bytes: Test Tcl allocator with 4 threads, 16000 records ... This allocator achieves 14862477 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) BUT, guess what! The system allocator gives me (using same test data i.e. memsetting the whole allocated chunk): Test standard allocator with 4 threads, 16000 records ... This allocator achieves 869716 ops/sec under 4 threads Press return to exit (observe the current memory footprint!) So we are still: 14862477/869716 = 17 times faster. With increasing thread count we get faster and faster whereas system allocator stays at the same (low) level or is getting slower. Now, I would really like to know why! Perhaps the fact that we are using mmap() instead of god-knows-what Apple is using... Anyways... either we have some very big error there (in which case I'd like to know where, as everything is working as it should!) or we have found much better way to handle memory on Mac OSX :-) Cheers Zoran
Re: [naviserver-devel] Quest for malloc
Am 13.01.2007 um 10:45 schrieb Gustaf Neumann: Fault was, that i did not read the README (i read the frist one) and compiled (a) without -DTCL_THREADS . In that case, fault was that on FreeBSD you need to explictly put -pthread when linking the test program, regardless of the fact that libtcl8.4.so was already linked with it. That, only, did the trick. Speed was (as expected and still not clear why) at least 2 times better than anything else. In some rough cases it was _significantly_ faster. But... I believe we should not fixate ourselves to the speed of the allocator. It was not our intention to make something faster. Our intention was to release memory early enough so we don't bloat the system as a long-running process. I admit, speed of the code is always the most interesting and tempting issue for engineers, but in this case it was really the memory savings for long-running programs that we were after. Having said that, I must again repeat that we'd like to get some field-experience with the allocator before we do any further steps. This means that we are thankful for any feedback. Cheers, zoran
Re: [naviserver-devel] Quest for malloc
I've been on a search for an allocator that will be fast enough and not so memory hungry as the allocator being built in Tcl. Unfortunately, as it mostly is, it turned out that I had to write my own. Vlad has written an allocator that uses mmap to obtain memory for the system and munmap that memory on thread exit, if possible. I have spent more than 3 weeks fiddling with that and discussing it with Vlad and this is what we bith come to: http://www.archiware.com/downloads/vtmalloc-0.0.1.tar.gz I believe we have solved most of my needs. Below is an excerpt from the README file for the qurious. If anybody would care to test it in his/her own environment? If all goes well, I might TIP this to be included in Tcl core as replacement of (or addition to) the zippy allocator. Zoran, Because I am quite biased here, to avoid later being branded as biased,I want to explicitly state my bias up front: In my experience, very little good comes out of people writing their own memory allocators. There is a small number of people in this world for who this privilege should be reserved (outside of a classroom excercise, of course), and the rest of us humble folk should help them when we can but generally stay out of the way - setting out to reinvent the wheel is not a good thing. I downloaded the code in the previous mail. After some minor path adjustments, I was able to get the test program to compile and link under FreeBSD 6.1 running on a dual-processor PIII system, linked against a threaded tcl 8.5a. I could get this program to consistently do one of two things: - dump core - hang seemingly forever but absolutely nothing else. Running this program under the latest version of valgrind (using memcheck or helgrind tools) reveals numerous errors from valgrind, which I suspect (although I did not confirm) are the reason for the core dumps and infinite hangs when it is run on its own. I have no time to debug this myself, however in the interest of science and general progress, I'm happy to offer ssh access to a test box where you can reproduce these results. I strongly advise against using a benchmark with the above characteristics to make any decisions about speed or memory consumption improvements or problems. --- After toying around with this briefly, I was able to run the test program under valgrind after specifying a -rec value of 1000 or less. Despite some errors reported by valgrind, the test program does run to completion and report its results in these cases. standard allocator: This allocator achieves 43982 ops/sec under 4 threads tcl allocator: This allocator achieves 21251 ops/sec under 4 threads improved tcl allocator: This allocator achieves 21308 ops/sec under 4 threads But again, I would not draw any serious conclusions from these numbers.
Re: [naviserver-devel] Quest for malloc
On 19.12.2006, at 01:10, Stephen Deasey wrote: This program allocates memory in a worker thread and frees it in the main thread. If all free()'s put memory into a thread-local cache then you would expect this program to bloat, but it doesn't, so I guess it's not a problem (at least not on Fedora Core 5). It is also not the case with nedmalloc as it specifically tracks that usage pattern. The block being free'd knows to which so-called mspace it belongs regardless which thread free's it. So, I'd say the nedmalloc is OK in this respect. I have given it a purify run and it runs cleanly. Our application is nnoticeably faster on Mac and bloats less. But this is only a tip of the iceberg. We yet have to give it a real stress-test on the field, yet I'm reluctant to do this now and will have to wait for a major release somewhere in spring next year.
Re: [naviserver-devel] Quest for malloc
On 19.12.2006, at 15:57, Vlad Seryakov wrote: Zoran, can you test it on Solaris and OSX so we'd know that is not Linux related problem. I have a Tcl library compiled with nedmalloc and when I link against it and make #define MemAlloc Tcl_Alloc #define MemFree Tcl_Free it runs fine. Shold I make the Solaris test?
Re: [naviserver-devel] Quest for malloc
Yes, please Zoran Vasiljevic wrote: On 19.12.2006, at 15:57, Vlad Seryakov wrote: Zoran, can you test it on Solaris and OSX so we'd know that is not Linux related problem. I have a Tcl library compiled with nedmalloc and when I link against it and make #define MemAlloc Tcl_Alloc #define MemFree Tcl_Free it runs fine. Shold I make the Solaris test? - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 19.12.2006, at 16:06, Vlad Seryakov wrote: Yes, please ( I appended the code to the nedmalloc test program and renamed their main to main1) bash-2.03$ gcc -O3 -o tcltest tcltest.c -lpthread -DNDEBUG - DTCL_THREADS -I/usr/local/include -L/usr/local/lib -ltcl8.4g bash-2.03$ gdb ./tcltest GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as sparc-sun-solaris2.8... (gdb) run Starting program: /space/homes/zv/nedmalloc_tcl/tcltest [New LWP 1] [New LWP 2] [New LWP 3] [New LWP 4] [New LWP 5] [New LWP 6] [New LWP 7] [New LWP 8] [LWP 7 exited] [New LWP 7] [LWP 4 exited] [New LWP 4] [LWP 8 exited] [New LWP 8] Program exited normally. (gdb) quit
Re: [naviserver-devel] Quest for malloc
gdb may slow down concurrency, does it run without gdb, also does it run with solaris malloc? Zoran Vasiljevic wrote: On 19.12.2006, at 16:06, Vlad Seryakov wrote: Yes, please ( I appended the code to the nedmalloc test program and renamed their main to main1) bash-2.03$ gcc -O3 -o tcltest tcltest.c -lpthread -DNDEBUG - DTCL_THREADS -I/usr/local/include -L/usr/local/lib -ltcl8.4g bash-2.03$ gdb ./tcltest GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as sparc-sun-solaris2.8... (gdb) run Starting program: /space/homes/zv/nedmalloc_tcl/tcltest [New LWP 1] [New LWP 2] [New LWP 3] [New LWP 4] [New LWP 5] [New LWP 6] [New LWP 7] [New LWP 8] [LWP 7 exited] [New LWP 7] [LWP 4 exited] [New LWP 4] [LWP 8 exited] [New LWP 8] Program exited normally. (gdb) quit - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
I was suspecting Linux malloc, looks like it has problems with high concurrency, i tried to replace MemAlloc/Fre with mmap/munmap, and it crashes as well. #define MemAlloc mmalloc #define MemFree(ptr) mfree(ptr, gSize) void *mmalloc(size_t size) { return mmap(NULL,size,PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); } void mfree(void *ptr, size_t size) { munmap(ptr, size); } Zoran Vasiljevic wrote: On 19.12.2006, at 16:15, Vlad Seryakov wrote: gdb may slow down concurrency, does it run without gdb, also does it run with solaris malloc? No problems. Runs with malloc and nedmalloc with or w/o gdb. The same on Mac. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
yes, it crashes when number of threads are more than 1 with any size but not all the time, sometimes i need to run it several times, looks like it is random, some combination, not sure of what. I guess we never got that high concurrency in Naviserver, i wonder if AOL has randomm crashes. Stephen Deasey wrote: Is this really the shortest test case you can make for this problem? - Does it crash if you allocate blocks of size 1024 rather than random size? Does for me. Strip it out. - Does it crash if you run 2 threads instead of 4? Does for me. Strip it out. Some times it crashes, some times it doesn't. Clearly it's timing related. The root cause is not going to be identified by injecting a whole bunch of random! Make this program shorter. On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote: I tried nedmalloc with LD_PRELOAD for my little test and it crashed vene before the start. Zoran, can you test it on Solaris and OSX so we'd know that is not Linux related problem. #include tcl.h #include stdlib.h #include memory.h #include unistd.h #include signal.h #include pthread.h #define MemAlloc malloc #define MemFree free static int nbuffer = 16384; static int nloops = 5; static int nthreads = 4; static void *gPtr = NULL; static Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); if (n % 50 == 0) { Tcl_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); gPtr = NULL; } else { gPtr = MemAlloc(n); } Tcl_MutexUnlock(gLock); } } } int main (int argc, char **argv) { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Tcl_CreateThread( tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } Zoran Vasiljevic wrote: On 19.12.2006, at 01:10, Stephen Deasey wrote: This program allocates memory in a worker thread and frees it in the main thread. If all free()'s put memory into a thread-local cache then you would expect this program to bloat, but it doesn't, so I guess it's not a problem (at least not on Fedora Core 5). It is also not the case with nedmalloc as it specifically tracks that usage pattern. The block being free'd knows to which so-called mspace it belongs regardless which thread free's it. So, I'd say the nedmalloc is OK in this respect. I have given it a purify run and it runs cleanly. Our application is nnoticeably faster on Mac and bloats less. But this is only a tip of the iceberg. We yet have to give it a real stress-test on the field, yet I'm reluctant to do this now and will have to wait for a major release somewhere in spring next year. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 19.12.2006, at 16:35, Vlad Seryakov wrote: yes, it crashes when number of threads are more than 1 with any size but not all the time, sometimes i need to run it several times, looks like it is random, some combination, not sure of what. I guess we never got that high concurrency in Naviserver, i wonder if AOL has randomm crashes. Concurrency or not, I'm running it on a fastest mac you can buy and tweak to 16 threads and increase loop from 5 to 50 and get this: (with nedmalloc) Blitzer:~/nedmalloc_tcl root# time ./tcltest real0m2.036s user0m4.652s sys 0m1.823s (with standard malloc) Blitzer:~/nedmalloc_tcl root# time ./tcltest real0m9.140s user0m17.319s sys 0m17.397s So that's about 4 times faster. I cannot reproduce any crash, whatever I try.
Re: [naviserver-devel] Quest for malloc
On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote: yes, it crashes when number of threads are more than 1 with any size but not all the time, sometimes i need to run it several times, looks like it is random, some combination, not sure of what. I guess we never got that high concurrency in Naviserver, i wonder if AOL has randomm crashes. You're still using Tcl threads. Strip it out. Make the loops and bock size command line parameters. If you think you've found a bug you'll want the most concise test case so you can report it to the glibc maintainers. #glibc on irc.freenode.net
Re: [naviserver-devel] Quest for malloc
I converted all to use pthreads directly instead of Tcl wrappers, and now it does not crash anymore. Will continue testing but it looks like Tcl is the problem here, not ptmalloc Stephen Deasey wrote: On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote: yes, it crashes when number of threads are more than 1 with any size but not all the time, sometimes i need to run it several times, looks like it is random, some combination, not sure of what. I guess we never got that high concurrency in Naviserver, i wonder if AOL has randomm crashes. You're still using Tcl threads. Strip it out. Make the loops and bock size command line parameters. If you think you've found a bug you'll want the most concise test case so you can report it to the glibc maintainers. #glibc on irc.freenode.net - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
I have no idea, i spent too much time on this still without realizing what i am doing and what to expect :-))) Zoran Vasiljevic wrote: On 19.12.2006, at 17:08, Vlad Seryakov wrote: I converted all to use pthreads directly instead of Tcl wrappers, and now it does not crash anymore. Will continue testing but it looks like Tcl is the problem here, not ptmalloc Where does it crash? I see you are just using Tcl_CreateThread Tcl_MutexLock/Unlock Tcl_JoinThread Those just fallback to underlying pthread lib. It makes no real sense. I believe. - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 12/19/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: On 19.12.2006, at 17:08, Vlad Seryakov wrote: I converted all to use pthreads directly instead of Tcl wrappers, and now it does not crash anymore. Will continue testing but it looks like Tcl is the problem here, not ptmalloc Where does it crash? I see you are just using Tcl_CreateThread Tcl_MutexLock/Unlock Tcl_JoinThread Those just fallback to underlying pthread lib. It makes no real sense. I believe. Simply loading the Tcl library initialises a bunch of thread stuff, right? Also, the Tcl mutexes are self initialising, which includes calling down into the global Tcl mutex. Lots of stuff going on behind the scenes... NaviServer mutexes are also self initialising, but they call down to the pthread_ functions without touching any Tcl code, which may explain why the server isn't crashing all the time. So here's a test: what happens when you compile the test program to use Ns_Mutex and Ns_ThreadCreate etc.? Pthreads work, Tcl doesn't, how about NaviServer?
Re: [naviserver-devel] Quest for malloc
Right, with Ns_ functions it does not crash. Stephen Deasey wrote: On 12/19/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: On 19.12.2006, at 17:08, Vlad Seryakov wrote: I converted all to use pthreads directly instead of Tcl wrappers, and now it does not crash anymore. Will continue testing but it looks like Tcl is the problem here, not ptmalloc Where does it crash? I see you are just using Tcl_CreateThread Tcl_MutexLock/Unlock Tcl_JoinThread Those just fallback to underlying pthread lib. It makes no real sense. I believe. Simply loading the Tcl library initialises a bunch of thread stuff, right? Also, the Tcl mutexes are self initialising, which includes calling down into the global Tcl mutex. Lots of stuff going on behind the scenes... NaviServer mutexes are also self initialising, but they call down to the pthread_ functions without touching any Tcl code, which may explain why the server isn't crashing all the time. So here's a test: what happens when you compile the test program to use Ns_Mutex and Ns_ThreadCreate etc.? Pthreads work, Tcl doesn't, how about NaviServer? - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/ /* * gcc -I/usr/local/ns/include -g ttest.c -o ttest -lpthread /usr/local/ns/lib/libnsthread.so * */ #include ns.h #include stdlib.h #include memory.h #include unistd.h #include signal.h #include pthread.h #define MemAlloc malloc #define MemFree free static int nbuffer = 16384; static int nloops = 15; static int nthreads = 12; static void *gPtr = NULL; static Ns_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); if (n % 50 == 0) { Ns_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); gPtr = NULL; } else { gPtr = MemAlloc(n); } Ns_MutexUnlock(gLock); } } } int main (int argc, char **argv) { int i; Ns_Thread *tids; if (argc 1) { nthreads = atoi(argv[1]); } if (argc 2) { nloops = atoi(argv[2]); } if (argc 3) { nbuffer = atoi(argv[3]); } tids = (Ns_Thread *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Ns_ThreadCreate(MemThread, 0, 0, tids[i]); } for (i = 0; i nthreads; ++i) { Ns_ThreadJoin(tids[i], NULL); } }
Re: [naviserver-devel] Quest for malloc
On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote: Right, with Ns_ functions it does not crash. Zoran will be happy... :-)
Re: [naviserver-devel] Quest for malloc
On 19.12.2006, at 20:42, Stephen Deasey wrote: On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote: Right, with Ns_ functions it does not crash. Zoran will be happy... :-) Not at all! So, I would like to know exactly how to reproduce the problem (what OS, machine, etc). Furthermore I need all your test-code and eventually the gdb trace of the crash, to start with. Can you get all that for me?
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 19:31, Vlad Seryakov wrote: But if speed is not important to you, you can supply Tcl without zippy, then no bloat, system is returned with reasonable speed, at least on Linux, ptmalloc is not that bad OK. I think I've reached the peace of mind with all this alternate malloc implementations... This is what I found: On all plaforms (except the Mac OSX), it really does not pay to use anything else beside system native malloc. I mean, you can gain some percent of speed with hoard/tcmalloc/nedmalloc/zippy and friends, but you pay this with bloating memory. If you can afford it, then go ahead. I believe, at least from what I've seen from my tests, that zippy is quite fast and you gain very little, if at all (speedwise) by replacing it. You can gain some less memory fragmentation by using something else, but this is not a thing that would make me say: Wow! Exception to that is really Mac OSX. The native Mac OSX malloc sucks tremendously. The speed increase by zippy and nedmalloc are so high that you can really see (without any fancy measurements), how your application flies! The nedmalloc also bloats less than zippy (normally, as it clears per-thread cache on thread exit). So for the Mac (at least for us) I will stick to nedmalloc. It is lightingly fast and reasonably conservative with memory fragmentation. Conclusion: Linux/solaris = use system malloc Mac OSX = use nedmalloc Ah, yes... windows... this I haven't tested but nedmalloc author shows some very interesting numbers on his site. I somehow tend to believe them as some I have seen by myself when experimenting on unix platforms. So, most probably the outcome will be: Windows = use nedmalloc What this means to all of us:? I would say: very little. We know that zippy is bloating and now we know that is reasonably fast and on-pair with most of the other solutions out there. For people concerned with speed, I believe this is the right solution. For people concerned with speed AND memory fragmentation (in that order) the best is to use some alternative malloc routines. For people concerned with fragmentation the best is to stay with system malloc; exception: Mac OSX. There you just need to use something else and nedmalloc is the only thing that compiles (and works) there, to my knowledge. I hope I could help somebody with this report. Cheers Zoran
Re: [naviserver-devel] Quest for malloc
I tried to run this program, it crahses with all allocators on free when it was allocated in other thread. zippy does it as well, i amnot sure how Naviserver works then. #include tcl.h #define MemAlloc ckalloc #define MemFree ckfree int nbuffer = 16384; int nloops = 5; int nthreads = 4; int gAllocs = 0; void *gPtr = NULL; Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); // Testing inter-thread alloc/free if (n % 5 == 0) { Tcl_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); } gPtr = MemAlloc(n); gAllocs++; Tcl_MutexUnlock(gLock); } } if (ptr != NULL) { MemFree(ptr); } if (gPtr != NULL) { MemFree(gPtr); } } void MemTime() { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Tcl_CreateThread( tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } int main (int argc, char **argv) { MemTime(); } Doesn't zippy also clear it's per-thread cache on exit? It puts blocks into shared queue which other threads can re-use. But shared cache never gets returned so conn threads exit will not help with memory bloat. -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
Still, even without the last free and with mutex around it, it core dumps in free(gPtr) during the loop. Stephen Deasey wrote: On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote: I tried to run this program, it crahses with all allocators on free when it was allocated in other thread. zippy does it as well, i amnot sure how Naviserver works then. I don't think allocate in one thread, free in another is an unusual strategy. Googling around I see a lot of people doing it. There must be some bugs in your program. Here's one: At the end of MemThread() gPtr is checked and freed, but the gMutex is not held. This thread may have finished it's tight loop, but the other 3 threads could still be running. Also, the gPtr is not set to NULL after the free(), leading to a double free when the next thread checks it. #include tcl.h #define MemAlloc ckalloc #define MemFree ckfree int nbuffer = 16384; int nloops = 5; int nthreads = 4; int gAllocs = 0; void *gPtr = NULL; Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); // Testing inter-thread alloc/free if (n % 5 == 0) { Tcl_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); } gPtr = MemAlloc(n); gAllocs++; Tcl_MutexUnlock(gLock); } } if (ptr != NULL) { MemFree(ptr); } if (gPtr != NULL) { MemFree(gPtr); } } void MemTime() { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Tcl_CreateThread( tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } int main (int argc, char **argv) { MemTime(); } - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote: Still, even without the last free and with mutex around it, it core dumps in free(gPtr) during the loop. OK. Still doesn't mean your program is bug free :-) There's a lot of extra stuff going on in your example program that makes it hard to see what's going on. I simplified it to this: #include tcl.h #include stdlib.h #include assert.h #define MemAlloc ckalloc #define MemFree ckfree void *gPtr = NULL; /* Global pointer to memory. */ void Thread(void *arg) { assert(gPtr != NULL); MemFree(gPtr); gPtr = NULL; } int main (int argc, char **argv) { Tcl_ThreadId tid; int i; for (i = 0; i 10; ++i) { gPtr = MemAlloc(1024); assert(gPtr != NULL); Tcl_CreateThread(tid, Thread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); Tcl_JoinThread(tid, NULL); assert(gPtr == NULL); } } Works for me. I say you can allocate memory in one thread and free it in another. Let me know what the bug turns out to be..! Stephen Deasey wrote: On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote: I tried to run this program, it crahses with all allocators on free when it was allocated in other thread. zippy does it as well, i amnot sure how Naviserver works then. I don't think allocate in one thread, free in another is an unusual strategy. Googling around I see a lot of people doing it. There must be some bugs in your program. Here's one: At the end of MemThread() gPtr is checked and freed, but the gMutex is not held. This thread may have finished it's tight loop, but the other 3 threads could still be running. Also, the gPtr is not set to NULL after the free(), leading to a double free when the next thread checks it. #include tcl.h #define MemAlloc ckalloc #define MemFree ckfree int nbuffer = 16384; int nloops = 5; int nthreads = 4; int gAllocs = 0; void *gPtr = NULL; Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); // Testing inter-thread alloc/free if (n % 5 == 0) { Tcl_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); } gPtr = MemAlloc(n); gAllocs++; Tcl_MutexUnlock(gLock); } } if (ptr != NULL) { MemFree(ptr); } if (gPtr != NULL) { MemFree(gPtr); } } void MemTime() { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Tcl_CreateThread( tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } int main (int argc, char **argv) { MemTime(); } - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] Quest for malloc
On 18.12.2006, at 22:08, Stephen Deasey wrote: Works for me. I say you can allocate memory in one thread and free it in another. Nice. Well I can say that nedmalloc works, that is, that small program runs to end w/o coring when compiled with nedmalloc. Does this prove anything?
Re: [naviserver-devel] Quest for malloc
On 18.12.2006, at 19:57, Stephen Deasey wrote: Are you saying you tested your app on Linux with native malloc and experienced no fragmentation/bloating? No. I have seen bloating but less then on zippy. I saw some bloating and fragmentation on all optimizing allocators I have tested. I think some people are experiencing fragmentation problems with ptmalloc -- the Squid and OpenLDAP guys, for example. There's also the malloc-in-one-thread, free-in-another problem, which if your threads don't exit is basically a leak. Really a leak? Why? Wouln't that depend on the implementation? Doesn't zippy also clear it's per-thread cache on exit? No. It showels all the rest to shared pool. The shared pool is never freed. Hence lots of bloating. Actually, did you experiment with exiting the conn threads after X requests? Seems to be one of the things AOL is recommending. Most of our threads are Tcl threads, not conn threads. We create them to do lots of different tasks. They are all rather short-lived. Still, the mem footprint grows and grows... One thing I wonder about this is, how do requests average out across all threads? If you set the conn threads to exit after 10,000 requests, will they all quit at roughly the same time causing an extreme load on the server? Also, this is only an option for conn threads. With scheduled proc threads, job threads etc. you get nothing. Well, if they all start to exit at the same time, they will serialize at the point where per-thread cache is pushed to the shared pool. -- --- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php? page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] Quest for malloc
I suspect something i am doing wrong, but still it crashes and i do not see it why #include tcl.h #include stdlib.h #include memory.h #include unistd.h #include signal.h #include pthread.h #define MemAlloc malloc #define MemFree free static int nbuffer = 16384; static int nloops = 5; static int nthreads = 4; static void *gPtr = NULL; static Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); if (n % 50 == 0) { Tcl_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); gPtr = NULL; } else { gPtr = MemAlloc(n); } Tcl_MutexUnlock(gLock); } } } int main (int argc, char **argv) { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Tcl_CreateThread( tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } Stephen Deasey wrote: On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote: Still, even without the last free and with mutex around it, it core dumps in free(gPtr) during the loop. OK. Still doesn't mean your program is bug free :-) There's a lot of extra stuff going on in your example program that makes it hard to see what's going on. I simplified it to this: #include tcl.h #include stdlib.h #include assert.h #define MemAlloc ckalloc #define MemFree ckfree void *gPtr = NULL; /* Global pointer to memory. */ void Thread(void *arg) { assert(gPtr != NULL); MemFree(gPtr); gPtr = NULL; } int main (int argc, char **argv) { Tcl_ThreadId tid; int i; for (i = 0; i 10; ++i) { gPtr = MemAlloc(1024); assert(gPtr != NULL); Tcl_CreateThread(tid, Thread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); Tcl_JoinThread(tid, NULL); assert(gPtr == NULL); } } Works for me. I say you can allocate memory in one thread and free it in another. Let me know what the bug turns out to be..! Stephen Deasey wrote: On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote: I tried to run this program, it crahses with all allocators on free when it was allocated in other thread. zippy does it as well, i amnot sure how Naviserver works then. I don't think allocate in one thread, free in another is an unusual strategy. Googling around I see a lot of people doing it. There must be some bugs in your program. Here's one: At the end of MemThread() gPtr is checked and freed, but the gMutex is not held. This thread may have finished it's tight loop, but the other 3 threads could still be running. Also, the gPtr is not set to NULL after the free(), leading to a double free when the next thread checks it. #include tcl.h #define MemAlloc ckalloc #define MemFree ckfree int nbuffer = 16384; int nloops = 5; int nthreads = 4; int gAllocs = 0; void *gPtr = NULL; Tcl_Mutex gLock; void MemThread(void *arg) { int i,n; void *ptr = NULL; for (i = 0; i nloops; ++i) { n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0))); if (ptr != NULL) { MemFree(ptr); } ptr = MemAlloc(n); // Testing inter-thread alloc/free if (n % 5 == 0) { Tcl_MutexLock(gLock); if (gPtr != NULL) { MemFree(gPtr); } gPtr = MemAlloc(n); gAllocs++; Tcl_MutexUnlock(gLock); } } if (ptr != NULL) { MemFree(ptr); } if (gPtr != NULL) { MemFree(gPtr); } } void MemTime() { int i; Tcl_ThreadId *tids; tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads); for (i = 0; i nthreads; ++i) { Tcl_CreateThread( tids[i], MemThread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); } for (i = 0; i nthreads; ++i) { Tcl_JoinThread(tids[i], NULL); } } int main (int argc, char **argv) { MemTime(); } - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 12/18/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: On 18.12.2006, at 19:57, Stephen Deasey wrote: One thing I wonder about this is, how do requests average out across all threads? If you set the conn threads to exit after 10,000 requests, will they all quit at roughly the same time causing an extreme load on the server? Also, this is only an option for conn threads. With scheduled proc threads, job threads etc. you get nothing. Well, if they all start to exit at the same time, they will serialize at the point where per-thread cache is pushed to the shared pool. I was worried more about things like all the Tcl procs needing to be recompiled in the new interp for the thread, and all the other stuff which is cached. If threads exit regularly, say after 10,000 requests, and the requests average out over all threads, then your site will regularly go down, effectively. It would be nice if we could make sure the thread exits were spread out. Anyway... I think some people are experiencing fragmentation problems with ptmalloc -- the Squid and OpenLDAP guys, for example. There's also the malloc-in-one-thread, free-in-another problem, which if your threads don't exit is basically a leak. Really a leak? Why? Wouln't that depend on the implementation? Yes, and I thought that was the case with Linux ptmalloc, but maybe I got it wrong or this is old news... This program allocates memory in a worker thread and frees it in the main thread. If all free()'s put memory into a thread-local cache then you would expect this program to bloat, but it doesn't, so I guess it's not a problem (at least not on Fedora Core 5). #include tcl.h #include stdlib.h #include stdio.h #include assert.h #define MemAlloc malloc #define MemFree free void *gPtr = NULL; static void Thread(void *arg); static void PrintMemUsage(const char *msg); int main (int argc, char **argv) { Tcl_ThreadId tid; int i; PrintMemUsage(start); for (i = 0; i 10; ++i) { Tcl_CreateThread(tid, Thread, NULL, TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE); Tcl_JoinThread(tid, NULL); MemFree(gPtr); gPtr = NULL; } PrintMemUsage(stop); } static void Thread(void *arg) { assert(gPtr == NULL); gPtr = MemAlloc(1024); assert(gPtr != NULL); } static void PrintMemUsage(const char *msg) { FILE *f; int m; f = fopen(/proc/self/statm, r); if (f == NULL) { perror(fopen failed: ); exit(-1); } if (fscanf(f, %d, m) != 1) { perror(fscanf failed: ); exit(-1); } fclose(f); printf(%s: %d\n, msg, m); }
Re: [naviserver-devel] Quest for malloc
On 15.12.2006, at 19:59, Vlad Seryakov wrote: http://www.nedprod.com/programs/portable/nedmalloc/index.html Hm... not bad at all: This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory: Testing standard allocator with 8 threads ... This allocator achieves 2098770.683107ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 1974570.587561ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1449969.176647ops/sec under 8 threads Now on a SuSE Linux, a 1.8GHz Intel: Testing standard allocator with 8 threads ... This allocator achieves 1752893.072620ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 2114564.246869ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1460851.824732ops/sec under 8 threads The Tcl library was compiled for threads and uses the zippy allocator. This is how I compiled the test program from the nedmalloc package: gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ local/include -L/usr/local/lib -ltcl8.4g I had to make some tweaks as they have a problem in pthread_islocked() private call. Also, I expanded the testsuite to include Tcl_Alloc/ Tcl_Free in addition. If I run this same thing on other platforms I get more/less same results with one notable exception: o. nedmalloc is always faster then standard or zippy, except on Sun Sparc where the built-in malloc is the fastest o. zippy (Tcl) allocator is always the slowest among the three Now, I imagine, the nedmalloc test program may not be telling all the truth (i.e. may be biased towards nedmalloc)... It would be interesting to see some other metrics... Cheers Zoran
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 15:00, Zoran Vasiljevic wrote: On 15.12.2006, at 19:59, Vlad Seryakov wrote: http://www.nedprod.com/programs/portable/nedmalloc/index.html Hm... not bad at all: This was on a iMac with Intel Dual Core 1.83 Ghz and 512 MB memory Testing standard allocator with 8 threads ... This allocator achieves 319503.459835ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 1687884.294403ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 294571.750823ops/sec under 8 threads Hey! I think our customers will love it! I will now try to ditch the zippy and replace it with nedmalloc... Too bad that Tcl as-is does not allow easy snap-in of alternate memory allocators. I think this should be lobbied for. This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory: Testing standard allocator with 8 threads ... This allocator achieves 2098770.683107ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 1974570.587561ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1449969.176647ops/sec under 8 threads Now on a SuSE Linux, a 1.8GHz Intel: Testing standard allocator with 8 threads ... This allocator achieves 1752893.072620ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 2114564.246869ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1460851.824732ops/sec under 8 threads The Tcl library was compiled for threads and uses the zippy allocator. This is how I compiled the test program from the nedmalloc package: gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ local/include -L/usr/local/lib -ltcl8.4g I had to make some tweaks as they have a problem in pthread_islocked() private call. Also, I expanded the testsuite to include Tcl_Alloc/ Tcl_Free in addition. If I run this same thing on other platforms I get more/less same results with one notable exception: o. nedmalloc is always faster then standard or zippy, except on Sun Sparc where the built-in malloc is the fastest o. zippy (Tcl) allocator is always the slowest among the three Now, I imagine, the nedmalloc test program may not be telling all the truth (i.e. may be biased towards nedmalloc)... It would be interesting to see some other metrics... Cheers Zoran -- --- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php? page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] Quest for malloc
On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: On 15.12.2006, at 19:59, Vlad Seryakov wrote: http://www.nedprod.com/programs/portable/nedmalloc/index.html Hm... not bad at all: This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory: Testing standard allocator with 8 threads ... This allocator achieves 2098770.683107ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 1974570.587561ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1449969.176647ops/sec under 8 threads Now on a SuSE Linux, a 1.8GHz Intel: Testing standard allocator with 8 threads ... This allocator achieves 1752893.072620ops/sec under 8 threads Testing nedmalloc with 8 threads ... This allocator achieves 2114564.246869ops/sec under 8 threads Testing Tcl alloc with 8 threads ... This allocator achieves 1460851.824732ops/sec under 8 threads The Tcl library was compiled for threads and uses the zippy allocator. This is how I compiled the test program from the nedmalloc package: gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ local/include -L/usr/local/lib -ltcl8.4g I had to make some tweaks as they have a problem in pthread_islocked() private call. Also, I expanded the testsuite to include Tcl_Alloc/ Tcl_Free in addition. If I run this same thing on other platforms I get more/less same results with one notable exception: o. nedmalloc is always faster then standard or zippy, except on Sun Sparc where the built-in malloc is the fastest o. zippy (Tcl) allocator is always the slowest among the three Now, I imagine, the nedmalloc test program may not be telling all the truth (i.e. may be biased towards nedmalloc)... It would be interesting to see some other metrics... Some other metrics: http://archive.netbsd.se/?ml=OpenLDAP-devela=2006-07t=2172728 The seem, in the end, to go for Google tcmalloc. It wasn't the absolute fastest for their particular set of tests, but had dramatically lower memory usage. Something to think about: does the nedmalloc test include allocating memory in one thread and freeing it in another? Apparently this is tough for some allocators, such as Linux ptmalloc. Naviserver does this.
Re: [naviserver-devel] Quest for malloc
On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: Hey! I think our customers will love it! I will now try to ditch the zippy and replace it with nedmalloc... Too bad that Tcl as-is does not allow easy snap-in of alternate memory allocators. I think this should be lobbied for. It would be nice to at least have a configure switch for the zippy allocator rather than having to hack up the Makefile.
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 16:25, Stephen Deasey wrote: The seem, in the end, to go for Google tcmalloc. It wasn't the absolute fastest for their particular set of tests, but had dramatically lower memory usage. The down side of tcmalloc: only Linux port. The nedmalloc does them all (win, solaris, linux, macosx) as it is written in ANSI-C and designed to be portable. I tested all our Unix boxes and was able to get it running on all of them. And the integration is rather simple, just add: #include nedmalloc.c #define malloc nedmalloc #define realloc nedrealloc #define freenedfree I believe this needs to be done in just one Tcl source file. Trickier part: you need to call neddisablethreadcache(0) at every thread exit. The lower memory usage is important of course. Here I have no experience yet. Something to think about: does the nedmalloc test include allocating memory in one thread and freeing it in another? Apparently this is tough for some allocators, such as Linux ptmalloc. Naviserver does this. Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library. The allocator there will not allow you that. There were some discussions on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl) just inherited what aolserver had at that time (I believe V4.0) the same what applies to AS applies to Tcl and indirectly to us.
Re: [naviserver-devel] Quest for malloc
On 15.12.2006, at 19:59, Vlad Seryakov wrote: Will try this one. To aid you (and others): http://www.archiware.com/downloads/nedmalloc_tcl.tar.gz Download and peek at README file. This compiles on all machines I tested and works pretty fine in terms of speed. I haven't tested the memory size nor have any idea about fragmentation, but the speed is pretty good. Just look what this does on the Mac Pro (http://www.apple.com/macpro) which is currently the fastest Mac available: Testing standard allocator with 5 threads ... This allocator achieves 531241.923013ops/sec under 5 threads Testing Tcl allocator with 5 threads ... This allocator achieves 439181.119284ops/sec under 5 threads Testing nedmalloc with 5 threads ... This allocator achieves 4137423.021490ops/sec under 5 threads nedmalloc allocator is 7.788209 times faster than standard Tcl allocator is 0.826706 times faster than standard nedmalloc is 9.420767 times faster than Tcl allocator Hm... if I was not able to get same/similar results on other Mac's, I'd say this is a cheat. But it isn't. Zoran
Re: [naviserver-devel] Quest for malloc
On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library. The allocator there will not allow you that. There were some discussions on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl) just inherited what aolserver had at that time (I believe V4.0) the same what applies to AS applies to Tcl and indirectly to us. Yeah, pretty sure. You can only use Tcl objects within a single interp, which is restricted to a single thread, but general ns_malloc'd memory chunks can be passed around between threads. It would suck pretty hard if that wasn't the case. We have a bunch of reference counted stuff, cache values for example, which we share among threads and delete when the reference count drops to zero. You can ns_register_proc from any thread, which needs to ns_free the old value... Here's the (a?) problem: http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 17:15, Stephen Deasey wrote: Yeah, pretty sure. You can only use Tcl objects within a single interp, which is restricted to a single thread, but general ns_malloc'd memory chunks can be passed around between threads. It would suck pretty hard if that wasn't the case. Interesting... I could swear I read it that you can't just alloc in one and free in other thread using the Tcl allocator. Well, regarding the nedmalloc, I do not know, but I can find out...
Re: [naviserver-devel] Quest for malloc
Instead of using threadspeed or other simple malloc/free test, i used naviserver and Tcl pages as test for allocators. Using ab from apache and stresstest it for thousand requests i test several allocators. And having everything the same except LD_PRELOAD the difference seems pretty clear. Hoard/TCmalloc/Ptmalloc2 all slower than zippy, no doubt. Using threadtest although, tcmalloc was faster than zippy, but in real life it behaves differently. So, i would suggest to you to try hit naviserver with nedmalloc. If it will be always faster than zippy, than you got what you want. Other thinks to watch, after each test see the size of nsd process. I will try nedmaloc as well later today Stephen Deasey wrote: On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote: Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library. The allocator there will not allow you that. There were some discussions on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl) just inherited what aolserver had at that time (I believe V4.0) the same what applies to AS applies to Tcl and indirectly to us. Yeah, pretty sure. You can only use Tcl objects within a single interp, which is restricted to a single thread, but general ns_malloc'd memory chunks can be passed around between threads. It would suck pretty hard if that wasn't the case. We have a bunch of reference counted stuff, cache values for example, which we share among threads and delete when the reference count drops to zero. You can ns_register_proc from any thread, which needs to ns_free the old value... Here's the (a?) problem: http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] Quest for malloc
You can, it moves Tcl_Objs struct between thread and shared pools, same goes with other memory blocks.On thread exit all memory goes to shared pool. Zoran Vasiljevic wrote: On 16.12.2006, at 17:15, Stephen Deasey wrote: Yeah, pretty sure. You can only use Tcl objects within a single interp, which is restricted to a single thread, but general ns_malloc'd memory chunks can be passed around between threads. It would suck pretty hard if that wasn't the case. Interesting... I could swear I read it that you can't just alloc in one and free in other thread using the Tcl allocator. Well, regarding the nedmalloc, I do not know, but I can find out... - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 17:29, Vlad Seryakov wrote: Instead of using threadspeed or other simple malloc/free test, i used naviserver and Tcl pages as test for allocators. Using ab from apache and stresstest it for thousand requests i test several allocators. And having everything the same except LD_PRELOAD the difference seems pretty clear. Hoard/TCmalloc/Ptmalloc2 all slower than zippy, no doubt. Using threadtest although, tcmalloc was faster than zippy, but in real life it behaves differently. So, i would suggest to you to try hit naviserver with nedmalloc. If it will be always faster than zippy, than you got what you want. Other thinks to watch, after each test see the size of nsd process. I will try nedmaloc as well later today Indeed, the best way is to checkout the real application. No test program can give you better picture! As far as this is concerned, I do plan to make this test but it takes some time! I spend the whole day getting the nedmalloc compiling OK on all platform that we use (solaris sparc/x86, mac ppc/x86, linux/x86, win). The next step is to snap it in the Tcl library and try the real application...
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 16:25, Stephen Deasey wrote: Something to think about: does the nedmalloc test include allocating memory in one thread and freeing it in another? Apparently this is tough for some allocators, such as Linux ptmalloc. Naviserver does this. I'm still not 100% ready reading the code but: The Tcl allocator just puts the free'd memory in the cache of the current thread that calls free(). On thread exit, or of the size of the cache exceeds some limit, the content of the cache is appended to shared cache. The memory is never returned to the system, unless it is allocated as a chunk larger that 16K. The nedmalloc does the same but does not move freed memory between the per-thread cache and the shared repository. Instead, the thread cache is emptied (freed) when a thread exits. This must be explicitly called by the user. As I see: all is green. But will pay more attention to that by reading the code more carefully... Perhaps there is some gotcha there which I would not like to discover at the customer site ;-) In nedmalloc you can disable the per-thread cache usage by defining -DTHREADCACHEMAX=0 during compilation. This makes some difference: Testing nedmalloc with 5 threads ... This allocator achieves 16194016.581962ops/sec under 5 threads w/o cache versus Testing nedmalloc with 5 threads ... This allocator achieves 18895753.973492ops/sec under 5 threads with the cache. The THREADCACHEMAX defines the size of the allocation which goes into cache, similarily to the zippy. The default is 8K (vs. 16K with zippy). The above figures were done with max 8K size. If you increase it to 16K the malloc cores :-( Too bad. Still, I believe that for long running processes, the approach of never releasing memory to the OS, as zippy is doing, is suboptimal. Speed here or there, I'd rather save myself process reboots if possible... Bad thing is that Tcl allocator (aka zippy) will not allow me any choice but bloat. And this is becomming more and more important. At some customers site I have observed process sizes of 1.5GB whereas we started with about 80MB. Eh!
Re: [naviserver-devel] Quest for malloc
But if speed is not important to you, you can supply Tcl without zippy, then no bloat, system is returned with reasonable speed, at least on Linux, ptmalloc is not that bad Zoran Vasiljevic wrote: On 16.12.2006, at 16:25, Stephen Deasey wrote: Something to think about: does the nedmalloc test include allocating memory in one thread and freeing it in another? Apparently this is tough for some allocators, such as Linux ptmalloc. Naviserver does this. I'm still not 100% ready reading the code but: The Tcl allocator just puts the free'd memory in the cache of the current thread that calls free(). On thread exit, or of the size of the cache exceeds some limit, the content of the cache is appended to shared cache. The memory is never returned to the system, unless it is allocated as a chunk larger that 16K. The nedmalloc does the same but does not move freed memory between the per-thread cache and the shared repository. Instead, the thread cache is emptied (freed) when a thread exits. This must be explicitly called by the user. As I see: all is green. But will pay more attention to that by reading the code more carefully... Perhaps there is some gotcha there which I would not like to discover at the customer site ;-) In nedmalloc you can disable the per-thread cache usage by defining -DTHREADCACHEMAX=0 during compilation. This makes some difference: Testing nedmalloc with 5 threads ... This allocator achieves 16194016.581962ops/sec under 5 threads w/o cache versus Testing nedmalloc with 5 threads ... This allocator achieves 18895753.973492ops/sec under 5 threads with the cache. The THREADCACHEMAX defines the size of the allocation which goes into cache, similarily to the zippy. The default is 8K (vs. 16K with zippy). The above figures were done with max 8K size. If you increase it to 16K the malloc cores :-( Too bad. Still, I believe that for long running processes, the approach of never releasing memory to the OS, as zippy is doing, is suboptimal. Speed here or there, I'd rather save myself process reboots if possible... Bad thing is that Tcl allocator (aka zippy) will not allow me any choice but bloat. And this is becomming more and more important. At some customers site I have observed process sizes of 1.5GB whereas we started with about 80MB. Eh! - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 19:31, Vlad Seryakov wrote: But if speed is not important to you, you can supply Tcl without zippy, then no bloat, system is returned with reasonable speed, at least on Linux, ptmalloc is not that bad Eh... Vlad... On the Mac the nedmalloc outperforms the standard allocator about 25 - 30 times! The same with the zippy. All tested with the supplied test program. I yet have to get real app tested... On other platforms (Linux, Solaris) yes, I can stay with the standard allocator. As the matter of fact, they are close to the nedmalloc +/- about 10-30% (in favour of nedmalloc, except on Sun/sparc). One shoe does not fit all, unfortunately... What I absolutely do not understand is: WHY? I mean, why I get 30 times difference!? It just makes no sense, but it is really true. I am absolutely confused :-((
Re: [naviserver-devel] Quest for malloc
On 16.12.2006, at 19:31, Vlad Seryakov wrote: Linux, ptmalloc is not that bad Interestingly. ptmalloc3 (http://www.malloc.de/) and nedmalloc both diverge from dlmalloc (http://gee.cs.oswego.edu/malloc.h) library from Doug lea. Consequently, their performance is similar (nedmalloc being slight faster). I have been able to verify this on the Linux box.
[naviserver-devel] Quest for malloc
Hi! I've tried libumem as Stephen suggested, but it is slower than the regular system malloc. This (libumem) is really geared toward the integration with the mdb (solaris modular debugger) for memory debugging and analysis. But, I've found: http://www.nedprod.com/programs/portable/nedmalloc/index.html and this looks more promising. I have run its (supplied) test and it seems that, at least speedwise, the code is faster than native OS malloc. I will now try to make it working on all platforms that we use (admitently, it will not run correctly if you do not set -DNDEBUG to silence some assertions; this is of course not right and I have to see why/what). Anyways perhaps a thing to try out... If you get any breath-taking news with the above, share it here. On my PPC powerbook (1.5GHZ PPC, 512 MB memory) I get improvements over the built-in allocator of a factor of 3 (3 times better) with far less system overehad. I cannot say nothing about the fragmentation; this has yet to be tested. Cheers Zoran
Re: [naviserver-devel] Quest for malloc
I also tried Hoard, Google tcmalloc, umem and some other rare mallocs i could find. Still zippy beats everybody, i ran my speed test not threadtest. Will try this one. Zoran Vasiljevic wrote: Hi! I've tried libumem as Stephen suggested, but it is slower than the regular system malloc. This (libumem) is really geared toward the integration with the mdb (solaris modular debugger) for memory debugging and analysis. But, I've found: http://www.nedprod.com/programs/portable/nedmalloc/index.html and this looks more promising. I have run its (supplied) test and it seems that, at least speedwise, the code is faster than native OS malloc. I will now try to make it working on all platforms that we use (admitently, it will not run correctly if you do not set -DNDEBUG to silence some assertions; this is of course not right and I have to see why/what). Anyways perhaps a thing to try out... If you get any breath-taking news with the above, share it here. On my PPC powerbook (1.5GHZ PPC, 512 MB memory) I get improvements over the built-in allocator of a factor of 3 (3 times better) with far less system overehad. I cannot say nothing about the fragmentation; this has yet to be tested. Cheers Zoran - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV ___ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Vlad Seryakov 571 262-8608 office [EMAIL PROTECTED] http://www.crystalballinc.com/vlad/
Re: [naviserver-devel] Quest for malloc
On 15.12.2006, at 19:59, Vlad Seryakov wrote: I also tried Hoard, Google tcmalloc, umem and some other rare mallocs i could find. Still zippy beats everybody, i ran my speed test not threadtest. Will try this one. Important: it is not only raw speed, that is important but also the memory fragmentation (i.e. lack of it). In our app we must frequently reboot the server (each couple of days) otherwise it just bloats. And... we made sure there are no leaks (have purified all libs that we use)... I now have some experience with the (zippy) fragmentation and I will try to make a testbed with this allocator and run it for several days to get some experience. Cheers Zoran