subject:"\[naviserver\-devel\] Quest for malloc"

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Gustaf Neumann


This is most probably the best variabt so far, and not complicated, such a
optimizer can do the right thing easily. sorry for the many versions.. 


-gustaf


   { unsigned register int s = (size-1)  3;
 while (s1) { s = 1; bucket++; }
   }

 if (bucket  NBUCKETS) {
   bucket = NBUCKETS;
 }

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Zoran Vasiljevic



Am 16.01.2007 um 10:37 schrieb Stephen Deasey:



Can you import this into CVS?  Top level.



You mean the tclThreadAlloc.c file on top-level
of the naviserver project?

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Zoran Vasiljevic



Am 16.01.2007 um 12:18 schrieb Stephen Deasey:


  vtmalloc  -- add this


It's there. Everybody can now contribute, if needed.

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Stephen Deasey


On 1/16/07, Stephen Deasey [EMAIL PROTECTED] wrote:

On 1/16/07, Zoran Vasiljevic [EMAIL PROTECTED] wrote:

 Am 16.01.2007 um 12:18 schrieb Stephen Deasey:

vtmalloc  -- add this

 It's there. Everybody can now contribute, if needed.


Rocking.

I suggest putting the 0.0.3 tarball up on sourceforge, announcing on
Freshmeat, and cross-posting on the aolserver list.  You really want
random people with their random workloads on random OS to beat on
this.  I don't know if the pool of people here is large enough for
that...

I'm sure there's a lot of other people who would be interested in
this, if they knew about it.  Should probably cross-post here, for
example:

http://wiki.tcl.tk/9683 - Why Do Programs Take Up So Much Memory?



Vlad's already on the ball...

   http://freshmeat.net/projects/vtmalloc/

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Zoran Vasiljevic



Am 16.01.2007 um 15:41 schrieb Stephen Deasey:



I suggest putting the 0.0.3 tarball up on sourceforge, announcing on
Freshmeat, and cross-posting on the aolserver list.  You really want
random people with their random workloads on random OS to beat on
this.  I don't know if the pool of people here is large enough for
that...

I'm sure there's a lot of other people who would be interested in
this, if they knew about it.  Should probably cross-post here, for
example:

http://wiki.tcl.tk/9683 - Why Do Programs Take Up So Much Memory?


The plan was to beat this beast first in the family,
then go to the next village (aol-list) and then visit
the next town (tcl-core list), in that sequence.

You see, even we (i.e. Mike) noticed one glitch in the
test program that make Zippy look ridiculous on the Mac,
although it wasn't. So we now have enough experience
to go visit our neighbours and see what they'll say.
On positive feedback, the next is Tcl core list. There
I expect most fierce opposition to any change (which
is understandable, given the size of the group of
involved people and the kind of the change).

Cheers
Zoran

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Vlad Seryakov

Yes, it is combined version, but Tcl version is slightly different and 
Zoran took it over to maintain, in my tarball i include both, we do 
experiments in different directions and then combine best results.


Also the intention was to try to include it in Tcl itself.

Stephen Deasey wrote:

On 1/16/07, Stephen Deasey [EMAIL PROTECTED] wrote:

On 1/16/07, Zoran Vasiljevic [EMAIL PROTECTED] wrote:

Am 16.01.2007 um 12:18 schrieb Stephen Deasey:


  vtmalloc  -- add this

It's there. Everybody can now contribute, if needed.


Rocking.

I suggest putting the 0.0.3 tarball up on sourceforge, announcing on
Freshmeat, and cross-posting on the aolserver list.  You really want
random people with their random workloads on random OS to beat on
this.  I don't know if the pool of people here is large enough for
that...

I'm sure there's a lot of other people who would be interested in
this, if they knew about it.  Should probably cross-post here, for
example:

http://wiki.tcl.tk/9683 - Why Do Programs Take Up So Much Memory?



Vlad's already on the ball...

http://freshmeat.net/projects/vtmalloc/

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Jeff Rogers


Gustaf Neumann wrote:

This is most probably the best variabt so far, and not complicated, such a
optimizer can do the right thing easily. sorry for the many versions.. 


-gustaf


{ unsigned register int s = (size-1)  3;
  while (s1) { s = 1; bucket++; }
}

  if (bucket  NBUCKETS) {
bucket = NBUCKETS;
  }


I don't think anyone has pointed this out yet, but this is a logarithm 
in base 2 (log2), and there are a fair number of implementations of this 
available; for maximum performance there are assembly implementations 
using 'bsr' on x86 architectures, such as this one from google's tcmalloc:


// Return floor(log2(n)) for n  0.
#if (defined __i386__ || defined __x86_64__)  defined __GNUC__
static inline int LgFloor(size_t n) {
  // ro for the input spec means the input can come from either a
  // register (r) or offsetable memory (o).
  size_t result;
  __asm__(bsr  %1, %0
  : =r (result)   // Output spec
  : ro (n)// Input spec
  : cc// Clobbers condition-codes
  );
  return result;
}
#else
// Note: the following only works for ns that fit in 32-bits, but
// that is fine since we only use it for small sizes.
static inline int LgFloor(size_t n) {
  int log = 0;
  for (int i = 4; i = 0; --i) {
int shift = (1  i);
size_t x = n  shift;
if (x != 0) {
  n = x;
  log += shift;
}
  }
  ASSERT(n == 1);
  return log;
}
#endif

(Disclaimer - this comment is based on my explorations of zippy, not vt, 
so the logic may be entirely different)  If this log2(requested_size) is 
used to translate directly index into the bucket table that necessarily 
restricts you to having power-of-2 bucket sizes, meaning you allocate on 
average nearly 50% more than requested (i.e., nearly 33% of allocated 
memory is overhead/wasted).  Adding more, closer-spaced buckets adds to 
the base footprint but possibly reduces the max usage by dropping the 
wasted space.  I believe tcmalloc uses buckets spaced so that the 
average waste is only 12.5%.


-J

Re: [naviserver-devel] Quest for malloc

2007-01-16 Thread Gustaf Neumann


Hi Jeff,

we are aware that the funciton is essentially an integer log2.
The chosen C-based variant is acually faster and more general than
what you have included (it needs only max 2 shift operations for
the relevant range) but the assembler based variant is hard to beat
and yields another 3% for the performance of the benchmark
on top of the fastest C version. Thanks for that!

-gustaf

Jeff Rogers schrieb:

I don't think anyone has pointed this out yet, but this is a logarithm
in base 2 (log2), and there are a fair number of implementations of this 
available; for maximum performance there are assembly implementations 
using 'bsr' on x86 architectures, such as this one from google's tcmalloc:

Re: [naviserver-devel] Quest for malloc

2007-01-15 Thread Mike


 a)
 The test program Zoran includes biases Zippy toward standard
 allocator, which it does not do for VT.  The following patch
 corrects this behavior:

 +++ memtest.c   Sun Jan 14 16:43:23 2007
 @@ -211,6 +211,7 @@
  } else {
  size = 0x3FFF; /* Limit to 16K */
  }
 +   if (size16000)
 size = 16000;
  *toallocptr++ = size;
  }
  }



First of all, I wanted to give Zippy a fair chance. If I increase
the max allocation size, Zippy becomes even more slow than it is.
And, Zippy handles 16K pages, whereas we handle 32K pages.
Hence the

 size = 0x3FFF; /* Limit to 16K */

which limits the allocation size to 16K max. To increase that
would even more hit Zippy than us.


Zoran, I believe you misunderstood.  The patch above limits blocks
allocated by your tester to 16000 instead of 16384 blocks.  The reason
for this is that Zippy's largest bucket is configured to be
16284-sizeof(Block) bytes (note the 2 in 16_2_84 is _NOT_ a typo).
By making uniformly random requests sizes up to 16_3_84, you are
causing Zippy to fall back to system malloc for a small fraction of
requests, substantially penalizing its performance in these cases.


 The following patch allows Zippy to be a lot less aggressive in
 putting blocks into the shared pool, bringing the performance of Zippy
 much closer to VT, at the expense of substantially higher memory
 waste:

 @@ -128,12 +174,12 @@
  {   64,  256, 128, NULL},
  {  128,  128,  64, NULL},
  {  256,   64,  32, NULL},
 -{  512,   32,  16, NULL},
 -{ 1024,   16,   8, NULL},
 -{ 2048,8,   4, NULL},
 -{ 4096,4,   2, NULL},
 -{ 8192,2,   1, NULL},
 -{16284,1,   1, NULL},
 +{  512,   64,  32, NULL},
 +{ 1024,   64,  32, NULL},
 +{ 2048,   64,  32, NULL},
 +{ 4096,   64,  32, NULL},
 +{ 8192,   64,  32, NULL},
 +{16284,   64,  32, NULL},


I cannot comment on that. Possibly you are right but I do not
see much benefit of that except speeding up Zippy to be on pair
with VT, whereas most important VT feature is not the speed,
it is the memory handling.


You wanted to know why Zippy is slower on your test, this is the
reason.  This has substantial impact on FreeBSD and linux, and my
guess is that it will have a drammatic effect on Mac OSX.


 VT releases the memory held in a thread's
 local pool when a thread terminates.  Since it uses mmap by default,
 this means that de-allocated storage is actually released to the
 operating system, forcing new threads to call mmap() again to get
 memory, thereby incurring system call overhead that could be avoided
 in some cases if the system malloc implementation did not lower the
 sbrk point at each deallocation. Using malloc() in VT allocator
 should give it much more uniform and consisent performance.

Not necessarily.  We'd shoot ourselves in the foot by doing so,
because most OS allocators never return memory to the system and
one of our major benefits will be gone.
What we could do: timestamp each page, return all pages to the
global cache and prune older. Or, put a size constraint on the
global cache. But then you'd have yet-another-knob to adjust
and the difficulty would be to find the right setup. VT is more
simple in that as it does not offer you ANY knobs you can trim
(for better or for worse). In some early stages of the design
we had number of knobs and were not certain how to adjust them.
So we threw that away and redesigned all parts to be self adjusting
if possible.


The benefit of mmap() is being able to for sure release memory back
to the system.  The drawback is that it always incurrs a substantial
syscall overhead compared to malloc.  You decide which you prefer (I
think I would lean slightly toward mmap() for long lived applications,
but not by much, since the syscall introduces a lot of variance and an
average performance degradation).


 e)
 Both allocators use an O(n) algorithm to compute the power of two
 bucket for the allocated size.  This is just plain silly since an
 O(n log n) algorithm will ofer non-negligible speed up in both
 allocators.  This is the current O(n) code:
  while (bucket  NBUCKETS  globalCache.sizes
 [bucket].blocksize  size) {
 ++bucket;
 }

How about adding this into the code?


I think the most obvious replacement is just using an if tree:
if (size0xff) bucket+=8, size=0xff;
if (size0xf) bucket+=4, size0xf;
...
it takes a minute to get the math right, but the performance gain
should be substantial.


 f)
 Zippy uses Ptr2Block and Block2Ptr functions where as VT uses macros
 for this.  Zippy also does more checks on MAGIC numbers on each
 allocation which VT only performs on de-allocation.  I am not sure if
 current compilers are smart enough to inline the functions in Zippy, I
 did not test this.  When compiled with -O0 with gcc, changing Zippy to
 use macros instead of function calls offers non-trivial

Re: [naviserver-devel] Quest for malloc

2007-01-15 Thread Zoran Vasiljevic



Am 15.01.2007 um 22:22 schrieb Mike:



Zoran, I believe you misunderstood.  The patch above limits blocks
allocated by your tester to 16000 instead of 16384 blocks.  The reason
for this is that Zippy's largest bucket is configured to be
16284-sizeof(Block) bytes (note the 2 in 16_2_84 is _NOT_ a typo).
By making uniformly random requests sizes up to 16_3_84, you are
causing Zippy to fall back to system malloc for a small fraction of
requests, substantially penalizing its performance in these cases.


Ah! That's right. I will fix that.



You wanted to know why Zippy is slower on your test, this is the
reason.  This has substantial impact on FreeBSD and linux, and my
guess is that it will have a drammatic effect on Mac OSX.


I will check that tomorrow on my machines.


The benefit of mmap() is being able to for sure release memory back
to the system.  The drawback is that it always incurrs a substantial
syscall overhead compared to malloc.  You decide which you prefer (I
think I would lean slightly toward mmap() for long lived applications,
but not by much, since the syscall introduces a lot of variance and an
average performance degradation).


Yep. I agree. I would avoid it if possible. But I know of no other
sure memory-returning call! I see that most (all?) of the allocs
I know just keep everything allocated and never returned.



How about adding this into the code?


I think the most obvious replacement is just using an if tree:
if (size0xff) bucket+=8, size=0xff;
if (size0xf) bucket+=4, size0xf;
...
it takes a minute to get the math right, but the performance gain
should be substantial.


Well, I can test that allright. I have the feeling that a tight
loop as that (will mostly sping 5-12 times) gets well compiled
in machine code, but it is better to test.



In my tests, due to the frequency of calls of these functions they
contribute 10% to 15% overhead in performance.


Yes. That is what I was also getting. OTOH, the speed difference
between VT and zippy was sometimes several orders of magnitude
so I simply ignored that.


Ha! It is pretty simple: you can atomically check pointer equivalence
without risking a core (at least this is my experience). You are not
expected to make far-reaching decisions based on it, though.
In this particular example, even if the test was false, there  
would be

no harm done, just an inoptimal path would be selected.
I have marked that Dirty read to draw people attention on that  
place.

And, I succeeded obviously :-)


The dirty read I have no problem with.  It's the the possibility of
taking of the head element which could be placed there by another
thread that bothers me.


Ah, this will not happen. As, I take the global mutex at that point
so the pagePtr-p_cachePtr cannot be changed under our feet.
If that block was allocated by the current thread, the p_cachePtr
will not be changed by anybody. So no harm. If it is not, then we
must lock the global mutex to prevent anybody fiddling with that
element. It is tricky but it should work.


It sounds like you are in the best position to test this change to see
if it fixes the unbounded growth problem.



Yes! Indeed. The only thing I'd have to check is how much more
memory this will take. But is certainly worth trying it out
as it will be a temp relief to our users until we stress test
the VT to the max so I can include it in our standard distro.

-- 
---

Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to  
share your

opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php? 
page=join.phpp=sourceforgeCID=DEVDEV

___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Quest for malloc

2007-01-13 Thread Zoran Vasiljevic



Am 13.01.2007 um 06:17 schrieb Mike:


 I'm happy to offer ssh access to a test
box where you can reproduce these results.


Oh, that is very fine! Can you give me the
access data? You can post me the login-details
in a separate private mail.

Thanks,
Zoran

Re: [naviserver-devel] Quest for malloc

2007-01-13 Thread Gustaf Neumann




I downloaded the code in the previous mail.  After some minor path
adjustments, I was able to get the test program to compile and link
under FreeBSD 6.1 running on a dual-processor PIII system, linked
against a threaded tcl 8.5a.  I could get this program to consistently
do one of two things:
- dump core
- hang seemingly forever
but absolutely nothing else.
  

Mike,

when zoran annouced the version, i downloaded it and had similar 
expericences.
Fault 1 turned out to be: The link of Zoran lead to a premature version 
of the
software, not the real thing (the right version is untarred to a 
directory containing

the verion numbers).

Then zoran corrected the link, i refetched, and .. well no makefile.
Just complile and try: same effect.
Fault was, that i did not read the README (i read the frist one) and
compiled (a) without -DTCL_THREADS .

i had exectly the same symptoms.

correcting these configuration isses, the program works VERY well
i tried on 32bit and 64bit machines (a minor complaint in the memtest
program, casting 32bit int to ClientData and vice versa)

-gustaf

PS: i could get access to an 64-bit amd FreeBSD machine
on monday, if there is still need...

PPS: strangly, the only think making me supicious is the
huge amount of improvement, especially on Mac OS X.
I can't remember in my experience having seen such a
drastical performance increase by a realitive litte code
change, especially in an area, which is usually carefully
fine-tuned, and where many CS grads from all over the world
writing their thesis on I would recommend that Vlad
and Zoran should write a technical paper about the new
allocator and analyze the properties and differences

Re: [naviserver-devel] Quest for malloc

2007-01-13 Thread Zoran Vasiljevic



Am 13.01.2007 um 10:45 schrieb Gustaf Neumann:


PPS: strangly, the only think making me supicious is the
huge amount of improvement, especially on Mac OS X.


Look...
Running the test program unmodified (on Mac Pro box):

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 35096360 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)

If I modify the memtest.c program at line 146 to read:

 if (dorealloc  (allocptr  tdata[tid].allocs)  (r  1)) {
 allocptr[-1] = reallocs[whichmalloc](allocptr[-1],  
*toallocptr);

 } else {
 allocptr[0] = mallocs[whichmalloc](*toallocptr);
/*--*/  memset(allocptr[0], 0, *toallocptr  64 ? 64 : *toallocptr);
 allocptr++;
 }

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 28377808 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)

If I memset the whole memory area, not just first 64 bytes:

Test Tcl allocator with 4 threads, 16000 records ...
This allocator achieves 14862477 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)


BUT, guess what! The system allocator gives me (using same test data
i.e. memsetting the whole allocated chunk):

Test standard allocator with 4 threads, 16000 records ...
This allocator achieves 869716 ops/sec under 4 threads
Press return to exit (observe the current memory footprint!)


So we are still: 14862477/869716 = 17 times faster. With increasing
thread count we get faster and faster whereas system allocator stays
at the same (low) level or is getting slower.

Now, I would really like to know why! Perhaps the fact that we are
using mmap() instead of god-knows-what Apple is using...

Anyways... either we have some very big error there (in which
case I'd like to know where, as everything is working as it should!)
or we have found much better way to handle memory on Mac OSX :-)

Cheers
Zoran

Re: [naviserver-devel] Quest for malloc

2007-01-13 Thread Zoran Vasiljevic



Am 13.01.2007 um 10:45 schrieb Gustaf Neumann:


Fault was, that i did not read the README (i read the frist one) and
compiled (a) without -DTCL_THREADS .


In that case, fault was that on FreeBSD you need to
explictly put -pthread when linking the test program,
regardless of the fact that libtcl8.4.so was already
linked with it. That, only, did the trick.

Speed was (as expected and still not clear why)
at least 2 times better than anything else. In some
rough cases it was _significantly_ faster.

But... I believe we should not fixate ourselves to the
speed of the allocator. It was not our intention to
make something faster. Our intention was to release
memory early enough so we don't bloat the system as
a long-running process.

I admit, speed of the code is always the most interesting
and tempting issue for engineers, but in this case it
was really the memory savings for long-running programs
that we were after.

Having said that, I must again repeat that we'd like to
get some field-experience with the allocator before we
do any further steps. This means that we are thankful
for any feedback.

Cheers,
zoran

Re: [naviserver-devel] Quest for malloc

2007-01-12 Thread Mike


I've been on a search for an allocator that will be fast
enough and not so memory hungry as the allocator being
built in Tcl. Unfortunately, as it mostly is, it turned
out that I had to write my own.

Vlad has written an allocator that uses mmap to obtain
memory for the system and munmap that memory on thread
exit, if possible.

I have spent more than 3 weeks fiddling with that and
discussing it with Vlad and this is what we bith come to:

http://www.archiware.com/downloads/vtmalloc-0.0.1.tar.gz

I believe we have solved most of my needs. Below is an excerpt
from the README file for the qurious.

If anybody would care to test it in his/her own environment?
If all goes well, I might TIP this to be included in Tcl core
as replacement of (or addition to) the zippy allocator.


Zoran,

Because I am quite biased here, to avoid later being branded as
biased,I want to explicitly state my bias up front: In my experience,
very little good comes out of people writing their own memory
allocators.  There is a small number of people in this world for who
this privilege should be reserved (outside of a classroom excercise,
of course), and the rest of us humble folk should help them when we
can but generally stay out of the way - setting out to reinvent the
wheel is not a good thing.

I downloaded the code in the previous mail.  After some minor path
adjustments, I was able to get the test program to compile and link
under FreeBSD 6.1 running on a dual-processor PIII system, linked
against a threaded tcl 8.5a.  I could get this program to consistently
do one of two things:
- dump core
- hang seemingly forever
but absolutely nothing else.
Running this program under the latest version of valgrind (using
memcheck or helgrind tools) reveals numerous errors from valgrind,
which I suspect (although I did not confirm) are the reason for the
core dumps and infinite hangs when it is run on its own.

I have no time to debug this myself,  however in the interest of
science and general progress, I'm happy to offer ssh access to a test
box where you can reproduce these results.  I strongly advise against
using a benchmark with the above characteristics to make any decisions
about speed or memory consumption improvements or problems.

---

After toying around with this briefly, I was able to run the test
program under valgrind after specifying a -rec value of 1000 or less.
Despite some errors reported by valgrind, the test program does run to
completion and report its results in these cases.

standard allocator:
This allocator achieves 43982 ops/sec under 4 threads
tcl allocator:
This allocator achieves 21251 ops/sec under 4 threads
improved tcl allocator:
This allocator achieves 21308 ops/sec under 4 threads

But again, I would not draw any serious conclusions from these numbers.

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Zoran Vasiljevic



On 19.12.2006, at 01:10, Stephen Deasey wrote:


This program allocates memory in a worker thread and frees it in the
main thread. If all free()'s put memory into a thread-local cache then
you would expect this program to bloat, but it doesn't, so I guess
it's not a problem (at least not on Fedora Core 5).


It is also not the case with nedmalloc as it specifically
tracks that usage pattern. The block being free'd knows
to which so-called mspace it belongs regardless which thread
free's it.

So, I'd say the nedmalloc is OK in this respect.
I have given it a purify run and it runs cleanly.
Our application is nnoticeably faster on Mac and
bloats less. But this is only a tip of the iceberg.
We yet have to give it a real stress-test on the
field, yet I'm reluctant to do this now and will
have to wait for a major release somewhere in spring
next year.

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Zoran Vasiljevic



On 19.12.2006, at 15:57, Vlad Seryakov wrote:



Zoran, can you test it on Solaris and OSX so we'd know that is not  
Linux

related problem.


I have a Tcl library compiled with nedmalloc and when I link
against it and make

#define MemAlloc Tcl_Alloc
#define MemFree Tcl_Free

it runs fine. Shold I make the Solaris test?

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov


Yes, please

Zoran Vasiljevic wrote:

On 19.12.2006, at 15:57, Vlad Seryakov wrote:

Zoran, can you test it on Solaris and OSX so we'd know that is not  
Linux

related problem.


I have a Tcl library compiled with nedmalloc and when I link
against it and make

#define MemAlloc Tcl_Alloc
#define MemFree Tcl_Free

it runs fine. Shold I make the Solaris test?





-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Zoran Vasiljevic



On 19.12.2006, at 16:06, Vlad Seryakov wrote:


Yes, please


( I appended the code to the nedmalloc test program
  and renamed their main to main1)

bash-2.03$ gcc -O3 -o tcltest tcltest.c -lpthread -DNDEBUG - 
DTCL_THREADS -I/usr/local/include -L/usr/local/lib -ltcl8.4g

bash-2.03$ gdb ./tcltest
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as sparc-sun-solaris2.8...
(gdb) run
Starting program: /space/homes/zv/nedmalloc_tcl/tcltest
[New LWP 1]
[New LWP 2]
[New LWP 3]
[New LWP 4]
[New LWP 5]
[New LWP 6]
[New LWP 7]
[New LWP 8]
[LWP 7 exited]
[New LWP 7]
[LWP 4 exited]
[New LWP 4]
[LWP 8 exited]
[New LWP 8]

Program exited normally.
(gdb) quit

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov

gdb may slow down concurrency, does it run without gdb, also does it run 
with solaris malloc?


Zoran Vasiljevic wrote:

On 19.12.2006, at 16:06, Vlad Seryakov wrote:


Yes, please


( I appended the code to the nedmalloc test program
   and renamed their main to main1)

bash-2.03$ gcc -O3 -o tcltest tcltest.c -lpthread -DNDEBUG - 
DTCL_THREADS -I/usr/local/include -L/usr/local/lib -ltcl8.4g

bash-2.03$ gdb ./tcltest
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as sparc-sun-solaris2.8...
(gdb) run
Starting program: /space/homes/zv/nedmalloc_tcl/tcltest
[New LWP 1]
[New LWP 2]
[New LWP 3]
[New LWP 4]
[New LWP 5]
[New LWP 6]
[New LWP 7]
[New LWP 8]
[LWP 7 exited]
[New LWP 7]
[LWP 4 exited]
[New LWP 4]
[LWP 8 exited]
[New LWP 8]

Program exited normally.
(gdb) quit



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov

I was suspecting Linux malloc, looks like it has problems with high 
concurrency, i tried to replace MemAlloc/Fre with mmap/munmap, and it 
crashes as well.


#define MemAlloc mmalloc
#define MemFree(ptr) mfree(ptr, gSize)
void *mmalloc(size_t size) { return 
mmap(NULL,size,PROT_READ|PROT_WRITE|PROT_EXEC, 
MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); }

void mfree(void *ptr, size_t size) { munmap(ptr, size); }


Zoran Vasiljevic wrote:

On 19.12.2006, at 16:15, Vlad Seryakov wrote:

gdb may slow down concurrency, does it run without gdb, also does  
it run

with solaris malloc?


No problems. Runs with malloc and nedmalloc with or w/o gdb.
The same on Mac.




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov

yes, it crashes when number of threads are more than 1 with any size but 
not all the time, sometimes i need to run it several times, looks like 
it is random, some combination, not sure of what.


I guess we never got that high concurrency in Naviserver, i wonder if 
AOL has randomm crashes.


Stephen Deasey wrote:

Is this really the shortest test case you can make for this problem?

- Does it crash if you allocate blocks of size 1024 rather than random size?
  Does for me. Strip it out.

- Does it crash if you run 2 threads instead of 4?
  Does for me. Strip it out.

Some times it crashes, some times it doesn't. Clearly it's timing
related.  The root cause is not going to be identified by injecting a
whole bunch of random!

Make this program shorter.


On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

I tried nedmalloc with LD_PRELOAD for my little test and it crashed vene
before the start.

Zoran, can you test it on Solaris and OSX so we'd know that is not Linux
related problem.


#include tcl.h

#include stdlib.h
#include memory.h
#include unistd.h
#include signal.h
#include pthread.h

#define MemAlloc malloc
#define MemFree free

static int nbuffer = 16384;
static int nloops = 5;
static int nthreads = 4;

static void *gPtr = NULL;
static Tcl_Mutex gLock;

void MemThread(void *arg)
{
  int   i,n;
  void *ptr = NULL;

  for (i = 0; i  nloops; ++i) {
  n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
  if (ptr != NULL) {
  MemFree(ptr);
  }
  ptr = MemAlloc(n);
  if (n % 50 == 0) {
  Tcl_MutexLock(gLock);
  if (gPtr != NULL) {
  MemFree(gPtr);
  gPtr = NULL;
  } else {
  gPtr = MemAlloc(n);
  }
  Tcl_MutexUnlock(gLock);
  }
  }
}

int main (int argc, char **argv)
{
  int i;
  Tcl_ThreadId *tids;

  tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads);

  for (i = 0; i  nthreads; ++i) {
  Tcl_CreateThread( tids[i], MemThread, NULL,
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
  }
  for (i = 0; i  nthreads; ++i) {
  Tcl_JoinThread(tids[i], NULL);
  }
}




Zoran Vasiljevic wrote:

On 19.12.2006, at 01:10, Stephen Deasey wrote:


This program allocates memory in a worker thread and frees it in the
main thread. If all free()'s put memory into a thread-local cache then
you would expect this program to bloat, but it doesn't, so I guess
it's not a problem (at least not on Fedora Core 5).

It is also not the case with nedmalloc as it specifically
tracks that usage pattern. The block being free'd knows
to which so-called mspace it belongs regardless which thread
free's it.

So, I'd say the nedmalloc is OK in this respect.
I have given it a purify run and it runs cleanly.
Our application is nnoticeably faster on Mac and
bloats less. But this is only a tip of the iceberg.
We yet have to give it a real stress-test on the
field, yet I'm reluctant to do this now and will
have to wait for a major release somewhere in spring
next year.




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Zoran Vasiljevic



On 19.12.2006, at 16:35, Vlad Seryakov wrote:

yes, it crashes when number of threads are more than 1 with any  
size but

not all the time, sometimes i need to run it several times, looks like
it is random, some combination, not sure of what.

I guess we never got that high concurrency in Naviserver, i wonder if
AOL has randomm crashes.


Concurrency or not, I'm running it on a fastest mac
you can buy and tweak to 16 threads and increase loop
from 5 to 50 and get this:

(with nedmalloc)
Blitzer:~/nedmalloc_tcl root# time ./tcltest

real0m2.036s
user0m4.652s
sys 0m1.823s

(with standard malloc)
Blitzer:~/nedmalloc_tcl root# time ./tcltest
real0m9.140s
user0m17.319s
sys 0m17.397s

So that's about 4 times faster. I cannot reproduce
any crash, whatever I try.

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Stephen Deasey


On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

yes, it crashes when number of threads are more than 1 with any size but
not all the time, sometimes i need to run it several times, looks like
it is random, some combination, not sure of what.

I guess we never got that high concurrency in Naviserver, i wonder if
AOL has randomm crashes.



You're still using Tcl threads. Strip it out.
Make the loops and bock size command line parameters.

If you think you've found a bug you'll want the most concise test case
so you can report it to the glibc maintainers.


#glibc on irc.freenode.net

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov

I converted all to use pthreads directly instead of Tcl wrappers, and 
now it does not crash anymore. Will continue testing but it looks like 
Tcl is the problem here, not ptmalloc


Stephen Deasey wrote:

On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

yes, it crashes when number of threads are more than 1 with any size but
not all the time, sometimes i need to run it several times, looks like
it is random, some combination, not sure of what.

I guess we never got that high concurrency in Naviserver, i wonder if
AOL has randomm crashes.



You're still using Tcl threads. Strip it out.
Make the loops and bock size command line parameters.

If you think you've found a bug you'll want the most concise test case
so you can report it to the glibc maintainers.


#glibc on irc.freenode.net

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov

I have no idea, i spent too much time on this still without realizing 
what i am doing and what to expect :-)))


Zoran Vasiljevic wrote:

On 19.12.2006, at 17:08, Vlad Seryakov wrote:


I converted all to use pthreads directly instead of Tcl wrappers, and
now it does not crash anymore. Will continue testing but it looks like
Tcl is the problem here, not ptmalloc


Where does it crash? I see you are just using
Tcl_CreateThread
Tcl_MutexLock/Unlock
Tcl_JoinThread
Those just fallback to underlying pthread lib.
It makes no real sense. I believe.





-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Stephen Deasey


On 12/19/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:


On 19.12.2006, at 17:08, Vlad Seryakov wrote:

 I converted all to use pthreads directly instead of Tcl wrappers, and
 now it does not crash anymore. Will continue testing but it looks like
 Tcl is the problem here, not ptmalloc

Where does it crash? I see you are just using
Tcl_CreateThread
Tcl_MutexLock/Unlock
Tcl_JoinThread
Those just fallback to underlying pthread lib.
It makes no real sense. I believe.



Simply loading the Tcl library initialises a bunch of thread stuff,
right?  Also, the Tcl mutexes are self initialising, which includes
calling down into the global Tcl mutex.  Lots of stuff going on behind
the scenes...

NaviServer mutexes are also self initialising, but they call down to
the pthread_ functions without touching any Tcl code, which may
explain why the server isn't crashing all the time.

So here's a test: what happens when you compile the test program to
use Ns_Mutex and Ns_ThreadCreate etc.? Pthreads work, Tcl doesn't, how
about NaviServer?

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Vlad Seryakov


Right, with Ns_ functions it does not crash.

Stephen Deasey wrote:

On 12/19/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:

On 19.12.2006, at 17:08, Vlad Seryakov wrote:


I converted all to use pthreads directly instead of Tcl wrappers, and
now it does not crash anymore. Will continue testing but it looks like
Tcl is the problem here, not ptmalloc

Where does it crash? I see you are just using
Tcl_CreateThread
Tcl_MutexLock/Unlock
Tcl_JoinThread
Those just fallback to underlying pthread lib.
It makes no real sense. I believe.



Simply loading the Tcl library initialises a bunch of thread stuff,
right?  Also, the Tcl mutexes are self initialising, which includes
calling down into the global Tcl mutex.  Lots of stuff going on behind
the scenes...

NaviServer mutexes are also self initialising, but they call down to
the pthread_ functions without touching any Tcl code, which may
explain why the server isn't crashing all the time.

So here's a test: what happens when you compile the test program to
use Ns_Mutex and Ns_ThreadCreate etc.? Pthreads work, Tcl doesn't, how
about NaviServer?

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

/*
 * gcc -I/usr/local/ns/include -g ttest.c -o ttest -lpthread /usr/local/ns/lib/libnsthread.so
 *
 */

#include ns.h
#include stdlib.h
#include memory.h
#include unistd.h
#include signal.h
#include pthread.h

#define MemAlloc malloc
#define MemFree free

static int nbuffer = 16384;
static int nloops = 15;
static int nthreads = 12;

static void *gPtr = NULL;
static Ns_Mutex gLock;

void MemThread(void *arg)
{
 int   i,n;
 void *ptr = NULL;

 for (i = 0; i  nloops; ++i) {
 n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
 if (ptr != NULL) {
 MemFree(ptr);
 }
 ptr = MemAlloc(n);
 if (n % 50 == 0) {
 Ns_MutexLock(gLock);
 if (gPtr != NULL) {
 MemFree(gPtr);
 gPtr = NULL;
 } else {
 gPtr = MemAlloc(n);
 }
 Ns_MutexUnlock(gLock);
 }
 }
}

int main (int argc, char **argv)
{
int i;
Ns_Thread *tids;

if (argc  1) {
nthreads = atoi(argv[1]);
}
if (argc  2) {
nloops = atoi(argv[2]);
}
if (argc  3) {
nbuffer = atoi(argv[3]);
}

tids = (Ns_Thread *)malloc(sizeof(Tcl_ThreadId) * nthreads);

for (i = 0; i  nthreads; ++i) {
Ns_ThreadCreate(MemThread, 0, 0, tids[i]);
}
for (i = 0; i  nthreads; ++i) {
Ns_ThreadJoin(tids[i], NULL);
}
}

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Stephen Deasey


On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote:


Right, with Ns_ functions it does not crash.



Zoran will be happy...  :-)

Re: [naviserver-devel] Quest for malloc

2006-12-19 Thread Zoran Vasiljevic



On 19.12.2006, at 20:42, Stephen Deasey wrote:


On 12/19/06, Vlad Seryakov [EMAIL PROTECTED] wrote:


Right, with Ns_ functions it does not crash.



Zoran will be happy...  :-)




Not at all!

So, I would like to know exactly how to reproduce the problem
(what OS, machine, etc).
Furthermore I need all your test-code and eventually the gdb
trace of the crash, to start with.

Can you get all that for me?

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Zoran Vasiljevic



On 16.12.2006, at 19:31, Vlad Seryakov wrote:

But if speed is not important to you, you can supply Tcl without  
zippy,

then no bloat, system is returned with reasonable speed, at least on
Linux, ptmalloc is not that bad



OK. I think I've reached the peace of mind with all this
alternate malloc implementations...

This is what I found:

On all plaforms (except the Mac OSX), it really does
not pay to use anything else beside system native
malloc. I mean, you can gain some percent of speed
with hoard/tcmalloc/nedmalloc/zippy and friends, but you
pay this with bloating memory. If you can afford it,
then go ahead. I believe, at least from what I've seen
from my tests, that zippy is quite fast and you gain
very little, if at all (speedwise) by replacing it.
You can gain some less memory fragmentation by using
something else, but this is not a thing that would
make me say: Wow!

Exception to that is really Mac OSX. The native Mac OSX
malloc sucks tremendously. The speed increase by zippy
and nedmalloc are so high that you can really see
(without any fancy measurements), how your application
flies! The nedmalloc also bloats less than zippy (normally,
as it clears per-thread cache on thread exit).
So for the Mac (at least for us) I will stick to nedmalloc.
It is lightingly fast and reasonably conservative with
memory fragmentation.

Conclusion:

   Linux/solaris = use system malloc
   Mac OSX = use nedmalloc

Ah, yes... windows... this I haven't tested but nedmalloc
author shows some very interesting numbers on his site.
I somehow tend to believe them as some I have seen by
myself when experimenting on unix platforms. So, most
probably the outcome will be:

   Windows = use nedmalloc

What this means to all of us:? I would say: very little.
We know that zippy is bloating and now we know that is
reasonably fast and on-pair with most of the other solutions
out there. For people concerned with speed, I believe this
is the right solution. For people concerned with speed AND
memory fragmentation (in that order) the best is to use some
alternative malloc routines. For people concerned with fragmentation
the best is to stay with system malloc; exception: Mac OSX.
There you just need to use something else and nedmalloc is the
only thing that compiles (and works) there, to my knowledge.

I hope I could help somebody with this report.

Cheers
Zoran

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Vlad Seryakov

I tried to run this program, it crahses with all allocators on free when 
it was allocated in other thread. zippy does it as well, i amnot sure 
how Naviserver works then.



#include tcl.h

#define MemAlloc ckalloc
#define MemFree ckfree

int nbuffer = 16384;
int nloops = 5;
int nthreads = 4;

int gAllocs = 0;
void *gPtr = NULL;
Tcl_Mutex gLock;

void MemThread(void *arg)
{
int   i,n;
void *ptr = NULL;

for (i = 0; i  nloops; ++i) {
n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
if (ptr != NULL) {
MemFree(ptr);
}
ptr = MemAlloc(n);
// Testing inter-thread alloc/free
if (n % 5 == 0) {
Tcl_MutexLock(gLock);
if (gPtr != NULL) {
MemFree(gPtr);
}
gPtr = MemAlloc(n);
gAllocs++;
Tcl_MutexUnlock(gLock);
}
}
if (ptr != NULL) {
MemFree(ptr);
}
if (gPtr != NULL) {
MemFree(gPtr);
}
}

void MemTime()
{
int   i;
Tcl_ThreadId *tids;
tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads);

for (i = 0; i  nthreads; ++i) {
Tcl_CreateThread( tids[i], MemThread, NULL, 
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);

}
for (i = 0; i  nthreads; ++i) {
Tcl_JoinThread(tids[i], NULL);
}
}

int main (int argc, char **argv)
{
   MemTime();
}



Doesn't zippy also clear it's per-thread cache on exit?



It puts blocks into shared queue which other threads can re-use.
But shared cache never gets returned so conn threads exit will not help 
with memory bloat.



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Vlad Seryakov

Still, even without the last free and with mutex around it, it core 
dumps in free(gPtr) during the loop.


Stephen Deasey wrote:

On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

I tried to run this program, it crahses with all allocators on free when
it was allocated in other thread. zippy does it as well, i amnot sure
how Naviserver works then.



I don't think allocate in one thread, free in another is an unusual
strategy.  Googling around I see a lot of people doing it. There must
be some bugs in your program. Here's one:

At the end of MemThread() gPtr is checked and freed, but the gMutex is
not held. This thread may have finished it's tight loop, but the other
3 threads could still be running. Also, the gPtr is not set to NULL
after the free(), leading to a double free when the next thread checks
it.



#include tcl.h

#define MemAlloc ckalloc
#define MemFree ckfree

int nbuffer = 16384;
int nloops = 5;
int nthreads = 4;

int gAllocs = 0;
void *gPtr = NULL;
Tcl_Mutex gLock;

void MemThread(void *arg)
{
 int   i,n;
 void *ptr = NULL;

 for (i = 0; i  nloops; ++i) {
 n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
 if (ptr != NULL) {
 MemFree(ptr);
 }
 ptr = MemAlloc(n);
 // Testing inter-thread alloc/free
 if (n % 5 == 0) {
 Tcl_MutexLock(gLock);
 if (gPtr != NULL) {
 MemFree(gPtr);
 }
 gPtr = MemAlloc(n);
 gAllocs++;
 Tcl_MutexUnlock(gLock);
 }
 }
 if (ptr != NULL) {
 MemFree(ptr);
 }
 if (gPtr != NULL) {
 MemFree(gPtr);
 }
}

void MemTime()
{
 int   i;
 Tcl_ThreadId *tids;
 tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads);

 for (i = 0; i  nthreads; ++i) {
 Tcl_CreateThread( tids[i], MemThread, NULL,
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
 }
 for (i = 0; i  nthreads; ++i) {
 Tcl_JoinThread(tids[i], NULL);
 }
}

int main (int argc, char **argv)
{
MemTime();
}


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Stephen Deasey


On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

Still, even without the last free and with mutex around it, it core
dumps in free(gPtr) during the loop.



OK.  Still doesn't mean your program is bug free  :-)

There's a lot of extra stuff going on in your example program that
makes it hard to see what's going on. I simplified it to this:


#include tcl.h
#include stdlib.h
#include assert.h


#define MemAlloc ckalloc
#define MemFree  ckfree


void *gPtr = NULL;  /* Global pointer to memory. */

void
Thread(void *arg)
{
   assert(gPtr != NULL);

   MemFree(gPtr);
   gPtr = NULL;
}

int
main (int argc, char **argv)
{
   Tcl_ThreadId tid;
   int  i;

   for (i = 0; i  10; ++i) {

   gPtr = MemAlloc(1024);
   assert(gPtr != NULL);

   Tcl_CreateThread(tid, Thread, NULL,
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
   Tcl_JoinThread(tid, NULL);

   assert(gPtr == NULL);
   }
}


Works for me.

I say you can allocate memory in one thread and free it in another.

Let me know what the bug turns out to be..!



Stephen Deasey wrote:
 On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote:
 I tried to run this program, it crahses with all allocators on free when
 it was allocated in other thread. zippy does it as well, i amnot sure
 how Naviserver works then.


 I don't think allocate in one thread, free in another is an unusual
 strategy.  Googling around I see a lot of people doing it. There must
 be some bugs in your program. Here's one:

 At the end of MemThread() gPtr is checked and freed, but the gMutex is
 not held. This thread may have finished it's tight loop, but the other
 3 threads could still be running. Also, the gPtr is not set to NULL
 after the free(), leading to a double free when the next thread checks
 it.


 #include tcl.h

 #define MemAlloc ckalloc
 #define MemFree ckfree

 int nbuffer = 16384;
 int nloops = 5;
 int nthreads = 4;

 int gAllocs = 0;
 void *gPtr = NULL;
 Tcl_Mutex gLock;

 void MemThread(void *arg)
 {
  int   i,n;
  void *ptr = NULL;

  for (i = 0; i  nloops; ++i) {
  n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
  if (ptr != NULL) {
  MemFree(ptr);
  }
  ptr = MemAlloc(n);
  // Testing inter-thread alloc/free
  if (n % 5 == 0) {
  Tcl_MutexLock(gLock);
  if (gPtr != NULL) {
  MemFree(gPtr);
  }
  gPtr = MemAlloc(n);
  gAllocs++;
  Tcl_MutexUnlock(gLock);
  }
  }
  if (ptr != NULL) {
  MemFree(ptr);
  }
  if (gPtr != NULL) {
  MemFree(gPtr);
  }
 }

 void MemTime()
 {
  int   i;
  Tcl_ThreadId *tids;
  tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads);

  for (i = 0; i  nthreads; ++i) {
  Tcl_CreateThread( tids[i], MemThread, NULL,
 TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
  }
  for (i = 0; i  nthreads; ++i) {
  Tcl_JoinThread(tids[i], NULL);
  }
 }

 int main (int argc, char **argv)
 {
 MemTime();
 }

 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys - and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 naviserver-devel mailing list
 naviserver-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/naviserver-devel


--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Zoran Vasiljevic



On 18.12.2006, at 22:08, Stephen Deasey wrote:



Works for me.

I say you can allocate memory in one thread and free it in another.


Nice. Well I can say that nedmalloc works, that is, that small
program runs to end w/o coring when compiled with nedmalloc.
Does this prove anything?

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Zoran Vasiljevic



On 18.12.2006, at 19:57, Stephen Deasey wrote:



Are you saying you tested your app on Linux with native malloc and
experienced no fragmentation/bloating?


No. I have seen bloating but less then on zippy. I saw some
bloating and fragmentation on all optimizing allocators I
have tested.



I think some people are experiencing fragmentation problems with
ptmalloc -- the Squid and OpenLDAP guys, for example.  There's also
the malloc-in-one-thread, free-in-another problem, which if your
threads don't exit is basically a leak.


Really a leak? Why? Wouln't that depend on the implementation?




Doesn't zippy also clear it's per-thread cache on exit?


No. It showels all the rest to shared pool. The shared
pool is never freed. Hence lots of bloating.



Actually, did you experiment with exiting the conn threads after X
requests? Seems to be one of the things AOL is recommending.


Most of our threads are Tcl threads, not conn threads. We create
them to do lots of different tasks. They are all rather short-lived.
Still, the mem footprint grows and grows...



One thing I wonder about this is, how do requests average out across
all threads? If you set the conn threads to exit after 10,000
requests, will they all quit at roughly the same time causing an
extreme load on the server?  Also, this is only an option for conn
threads. With scheduled proc threads, job threads etc. you get
nothing.



Well, if they all start to exit at the same time, they will
serialize at the point where per-thread cache is pushed to
the shared pool.




-- 
---

Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to  
share your

opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php? 
page=join.phpp=sourceforgeCID=DEVDEV

___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Vlad Seryakov

I suspect something i am doing wrong, but still it crashes and i do not 
see it why


#include tcl.h

#include stdlib.h
#include memory.h
#include unistd.h
#include signal.h
#include pthread.h

#define MemAlloc malloc
#define MemFree free

static int nbuffer = 16384;
static int nloops = 5;
static int nthreads = 4;

static void *gPtr = NULL;
static Tcl_Mutex gLock;

void MemThread(void *arg)
{
int   i,n;
void *ptr = NULL;

for (i = 0; i  nloops; ++i) {
n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
if (ptr != NULL) {
MemFree(ptr);
}
ptr = MemAlloc(n);
if (n % 50 == 0) {
Tcl_MutexLock(gLock);
if (gPtr != NULL) {
MemFree(gPtr);
gPtr = NULL;
} else {
gPtr = MemAlloc(n);
}
Tcl_MutexUnlock(gLock);
}
}
}

int main (int argc, char **argv)
{
int i;
Tcl_ThreadId *tids;

tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads);

for (i = 0; i  nthreads; ++i) {
Tcl_CreateThread( tids[i], MemThread, NULL, 
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);

}
for (i = 0; i  nthreads; ++i) {
Tcl_JoinThread(tids[i], NULL);
}
}



Stephen Deasey wrote:

On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

Still, even without the last free and with mutex around it, it core
dumps in free(gPtr) during the loop.



OK.  Still doesn't mean your program is bug free  :-)

There's a lot of extra stuff going on in your example program that
makes it hard to see what's going on. I simplified it to this:


#include tcl.h
#include stdlib.h
#include assert.h


#define MemAlloc ckalloc
#define MemFree  ckfree


void *gPtr = NULL;  /* Global pointer to memory. */

void
Thread(void *arg)
{
assert(gPtr != NULL);

MemFree(gPtr);
gPtr = NULL;
}

int
main (int argc, char **argv)
{
Tcl_ThreadId tid;
int  i;

for (i = 0; i  10; ++i) {

gPtr = MemAlloc(1024);
assert(gPtr != NULL);

Tcl_CreateThread(tid, Thread, NULL,
 TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
Tcl_JoinThread(tid, NULL);

assert(gPtr == NULL);
}
}


Works for me.

I say you can allocate memory in one thread and free it in another.

Let me know what the bug turns out to be..!



Stephen Deasey wrote:

On 12/18/06, Vlad Seryakov [EMAIL PROTECTED] wrote:

I tried to run this program, it crahses with all allocators on free when
it was allocated in other thread. zippy does it as well, i amnot sure
how Naviserver works then.


I don't think allocate in one thread, free in another is an unusual
strategy.  Googling around I see a lot of people doing it. There must
be some bugs in your program. Here's one:

At the end of MemThread() gPtr is checked and freed, but the gMutex is
not held. This thread may have finished it's tight loop, but the other
3 threads could still be running. Also, the gPtr is not set to NULL
after the free(), leading to a double free when the next thread checks
it.



#include tcl.h

#define MemAlloc ckalloc
#define MemFree ckfree

int nbuffer = 16384;
int nloops = 5;
int nthreads = 4;

int gAllocs = 0;
void *gPtr = NULL;
Tcl_Mutex gLock;

void MemThread(void *arg)
{
 int   i,n;
 void *ptr = NULL;

 for (i = 0; i  nloops; ++i) {
 n = 1 + (int) (nbuffer * (rand() / (RAND_MAX + 1.0)));
 if (ptr != NULL) {
 MemFree(ptr);
 }
 ptr = MemAlloc(n);
 // Testing inter-thread alloc/free
 if (n % 5 == 0) {
 Tcl_MutexLock(gLock);
 if (gPtr != NULL) {
 MemFree(gPtr);
 }
 gPtr = MemAlloc(n);
 gAllocs++;
 Tcl_MutexUnlock(gLock);
 }
 }
 if (ptr != NULL) {
 MemFree(ptr);
 }
 if (gPtr != NULL) {
 MemFree(gPtr);
 }
}

void MemTime()
{
 int   i;
 Tcl_ThreadId *tids;
 tids = (Tcl_ThreadId *)malloc(sizeof(Tcl_ThreadId) * nthreads);

 for (i = 0; i  nthreads; ++i) {
 Tcl_CreateThread( tids[i], MemThread, NULL,
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
 }
 for (i = 0; i  nthreads; ++i) {
 Tcl_JoinThread(tids[i], NULL);
 }
}

int main (int argc, char **argv)
{
MemTime();
}

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel


--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-18 Thread Stephen Deasey


On 12/18/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:


On 18.12.2006, at 19:57, Stephen Deasey wrote:


 One thing I wonder about this is, how do requests average out across
 all threads? If you set the conn threads to exit after 10,000
 requests, will they all quit at roughly the same time causing an
 extreme load on the server?  Also, this is only an option for conn
 threads. With scheduled proc threads, job threads etc. you get
 nothing.


Well, if they all start to exit at the same time, they will
serialize at the point where per-thread cache is pushed to
the shared pool.


I was worried more about things like all the Tcl procs needing to be
recompiled in the new interp for the thread, and all the other stuff
which is cached.  If threads exit regularly, say after 10,000
requests, and the requests average out over all threads, then your
site will regularly go down, effectively. It would be nice if we could
make sure the thread exits were spread out.

Anyway...


 I think some people are experiencing fragmentation problems with
 ptmalloc -- the Squid and OpenLDAP guys, for example.  There's also
 the malloc-in-one-thread, free-in-another problem, which if your
 threads don't exit is basically a leak.

Really a leak? Why? Wouln't that depend on the implementation?


Yes, and I thought that was the case with Linux ptmalloc, but maybe I
got it wrong or this is old news...

This program allocates memory in a worker thread and frees it in the
main thread. If all free()'s put memory into a thread-local cache then
you would expect this program to bloat, but it doesn't, so I guess
it's not a problem (at least not on Fedora Core 5).


#include tcl.h
#include stdlib.h
#include stdio.h
#include assert.h


#define MemAlloc malloc
#define MemFree  free


void *gPtr = NULL;

static void Thread(void *arg);
static void PrintMemUsage(const char *msg);


int
main (int argc, char **argv)
{
   Tcl_ThreadId tid;
   int  i;

   PrintMemUsage(start);

   for (i = 0; i  10; ++i) {

   Tcl_CreateThread(tid, Thread, NULL,
TCL_THREAD_STACK_DEFAULT, TCL_THREAD_JOINABLE);
   Tcl_JoinThread(tid, NULL);

   MemFree(gPtr);
   gPtr = NULL;
   }

   PrintMemUsage(stop);
}

static void
Thread(void *arg)
{
   assert(gPtr == NULL);
   gPtr = MemAlloc(1024);
   assert(gPtr != NULL);
}

static void
PrintMemUsage(const char *msg)
{
   FILE *f;
   int   m;

   f = fopen(/proc/self/statm, r);
   if (f == NULL) {
   perror(fopen failed: );
   exit(-1);
   }
   if (fscanf(f, %d, m) != 1) {
   perror(fscanf failed: );
   exit(-1);
   }
   fclose(f);

   printf(%s: %d\n, msg, m);
}

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 15.12.2006, at 19:59, Vlad Seryakov wrote:



http://www.nedprod.com/programs/portable/nedmalloc/index.html



Hm... not bad at all:
This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory:

Testing standard allocator with 8 threads ...
This allocator achieves 2098770.683107ops/sec under 8 threads

Testing nedmalloc with 8 threads ...
This allocator achieves 1974570.587561ops/sec under 8 threads

Testing Tcl alloc  with 8 threads ...
This allocator achieves 1449969.176647ops/sec under 8 threads

Now on a SuSE Linux, a 1.8GHz Intel:

Testing standard allocator with 8 threads ...
This allocator achieves 1752893.072620ops/sec under 8 threads

Testing nedmalloc with 8 threads ...
This allocator achieves 2114564.246869ops/sec under 8 threads

Testing Tcl alloc  with 8 threads ...
This allocator achieves 1460851.824732ops/sec under 8 threads


The Tcl library was compiled for threads and uses the zippy
allocator. This is how I compiled the test program from the
nedmalloc package:

gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/ 
local/include -L/usr/local/lib -ltcl8.4g


I had to make some tweaks as they have a problem in pthread_islocked()
private call. Also, I expanded the testsuite to include Tcl_Alloc/ 
Tcl_Free

in addition.

If I run this same thing on other platforms I get more/less same
results with one notable exception:

  o. nedmalloc is always faster then standard or zippy, except on  
Sun Sparc

 where the built-in malloc is the fastest

  o. zippy (Tcl) allocator is always the slowest among the three

Now, I imagine, the nedmalloc test program may not be telling all the  
truth

(i.e. may be biased towards nedmalloc)...

It would be interesting to see some other metrics...

Cheers
Zoran

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 15:00, Zoran Vasiljevic wrote:



On 15.12.2006, at 19:59, Vlad Seryakov wrote:



http://www.nedprod.com/programs/portable/nedmalloc/index.html



Hm... not bad at all:


This was on a iMac with Intel Dual Core 1.83 Ghz and 512 MB memory

 Testing standard allocator with 8 threads ...
 This allocator achieves 319503.459835ops/sec under 8 threads

 Testing nedmalloc with 8 threads ...
 This allocator achieves 1687884.294403ops/sec under 8 threads

 Testing Tcl alloc  with 8 threads ...
 This allocator achieves 294571.750823ops/sec under 8 threads


Hey! I think our customers will love it! I will now try to
ditch the zippy and replace it with nedmalloc... Too bad that
Tcl as-is does not allow easy snap-in of alternate memory allocators.
I think this should be lobbied for.




This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory:

 Testing standard allocator with 8 threads ...
 This allocator achieves 2098770.683107ops/sec under 8 threads

 Testing nedmalloc with 8 threads ...
 This allocator achieves 1974570.587561ops/sec under 8 threads

 Testing Tcl alloc  with 8 threads ...
 This allocator achieves 1449969.176647ops/sec under 8 threads

Now on a SuSE Linux, a 1.8GHz Intel:

 Testing standard allocator with 8 threads ...
 This allocator achieves 1752893.072620ops/sec under 8 threads

 Testing nedmalloc with 8 threads ...
 This allocator achieves 2114564.246869ops/sec under 8 threads

 Testing Tcl alloc  with 8 threads ...
 This allocator achieves 1460851.824732ops/sec under 8 threads


The Tcl library was compiled for threads and uses the zippy
allocator. This is how I compiled the test program from the
nedmalloc package:

gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/
local/include -L/usr/local/lib -ltcl8.4g

I had to make some tweaks as they have a problem in pthread_islocked()
private call. Also, I expanded the testsuite to include Tcl_Alloc/
Tcl_Free
in addition.

If I run this same thing on other platforms I get more/less same
results with one notable exception:

   o. nedmalloc is always faster then standard or zippy, except on
Sun Sparc
  where the built-in malloc is the fastest

   o. zippy (Tcl) allocator is always the slowest among the three

Now, I imagine, the nedmalloc test program may not be telling all the
truth
(i.e. may be biased towards nedmalloc)...

It would be interesting to see some other metrics...

Cheers
Zoran



-- 
---

Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to  
share your

opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php? 
page=join.phpp=sourceforgeCID=DEVDEV

___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Stephen Deasey


On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:


On 15.12.2006, at 19:59, Vlad Seryakov wrote:


 http://www.nedprod.com/programs/portable/nedmalloc/index.html


Hm... not bad at all:
This was under Solaris 2.8 on a Sun Blade2500 (Sparc) 1GB memory:

 Testing standard allocator with 8 threads ...
 This allocator achieves 2098770.683107ops/sec under 8 threads

 Testing nedmalloc with 8 threads ...
 This allocator achieves 1974570.587561ops/sec under 8 threads

 Testing Tcl alloc  with 8 threads ...
 This allocator achieves 1449969.176647ops/sec under 8 threads

Now on a SuSE Linux, a 1.8GHz Intel:

 Testing standard allocator with 8 threads ...
 This allocator achieves 1752893.072620ops/sec under 8 threads

 Testing nedmalloc with 8 threads ...
 This allocator achieves 2114564.246869ops/sec under 8 threads

 Testing Tcl alloc  with 8 threads ...
 This allocator achieves 1460851.824732ops/sec under 8 threads


The Tcl library was compiled for threads and uses the zippy
allocator. This is how I compiled the test program from the
nedmalloc package:

gcc -O -g -o test test.c -lpthread -DNDEBUG -DTCL_THREADS -I/usr/
local/include -L/usr/local/lib -ltcl8.4g

I had to make some tweaks as they have a problem in pthread_islocked()
private call. Also, I expanded the testsuite to include Tcl_Alloc/
Tcl_Free
in addition.

If I run this same thing on other platforms I get more/less same
results with one notable exception:

   o. nedmalloc is always faster then standard or zippy, except on
Sun Sparc
  where the built-in malloc is the fastest

   o. zippy (Tcl) allocator is always the slowest among the three

Now, I imagine, the nedmalloc test program may not be telling all the
truth
(i.e. may be biased towards nedmalloc)...

It would be interesting to see some other metrics...



Some other metrics:

 http://archive.netbsd.se/?ml=OpenLDAP-devela=2006-07t=2172728

The seem, in the end, to go for Google tcmalloc. It wasn't the
absolute fastest for their particular set of tests, but had
dramatically lower memory usage.

Something to think about: does the nedmalloc test include allocating
memory in one thread and freeing it in another?  Apparently this is
tough for some allocators, such as Linux ptmalloc. Naviserver does
this.

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Stephen Deasey


On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:


Hey! I think our customers will love it! I will now try to
ditch the zippy and replace it with nedmalloc... Too bad that
Tcl as-is does not allow easy snap-in of alternate memory allocators.
I think this should be lobbied for.



It would be nice to at least have a configure switch for the zippy
allocator rather than having to hack up the Makefile.

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 16:25, Stephen Deasey wrote:


The seem, in the end, to go for Google tcmalloc. It wasn't the
absolute fastest for their particular set of tests, but had
dramatically lower memory usage.


The down side of tcmalloc: only Linux port.

The nedmalloc does them all (win, solaris, linux, macosx) as
it is written in ANSI-C and designed to be portable.
I tested all our Unix boxes and was able to get it running
on all of them. And the integration is rather simple, just
add:

 #include nedmalloc.c
 #define malloc  nedmalloc
 #define realloc nedrealloc
 #define freenedfree

I believe this needs to be done in just one Tcl source file.
Trickier part: you need to call neddisablethreadcache(0)
at every thread exit.

The lower memory usage is important of course. Here I have
no experience yet.



Something to think about: does the nedmalloc test include allocating
memory in one thread and freeing it in another?  Apparently this is
tough for some allocators, such as Linux ptmalloc. Naviserver does
this.


Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library.
The allocator there will not allow you that. There were some discussions
on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl)
just inherited what aolserver had at that time (I believe V4.0)
the same what applies to AS applies to Tcl and indirectly to us.

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 15.12.2006, at 19:59, Vlad Seryakov wrote:


 Will try this one.


To aid you (and others):

http://www.archiware.com/downloads/nedmalloc_tcl.tar.gz

Download and peek at README file. This compiles on all
machines I tested and works pretty fine in terms of speed.
I haven't tested the memory size nor have any idea about
fragmentation, but the speed is pretty good.

Just look what this does on the Mac Pro (http://www.apple.com/macpro)
which is currently the fastest Mac available:


   Testing standard allocator with 5 threads ...
   This allocator achieves 531241.923013ops/sec under 5 threads

   Testing Tcl allocator with 5 threads ...
   This allocator achieves 439181.119284ops/sec under 5 threads

   Testing nedmalloc with 5 threads ...
   This allocator achieves 4137423.021490ops/sec under 5 threads


nedmalloc allocator is 7.788209 times faster than standard


Tcl allocator is 0.826706 times faster than standard


nedmalloc is 9.420767 times faster than Tcl allocator

Hm... if I was not able to get same/similar results
on other Mac's, I'd say this is a cheat. But it isn't.

Zoran

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Stephen Deasey


On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:


Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library.
The allocator there will not allow you that. There were some discussions
on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl)
just inherited what aolserver had at that time (I believe V4.0)
the same what applies to AS applies to Tcl and indirectly to us.




Yeah, pretty sure.  You can only use Tcl objects within a single
interp, which is restricted to a single thread, but general
ns_malloc'd memory chunks can be passed around between threads. It
would suck pretty hard if that wasn't the case.

We have a bunch of reference counted stuff, cache values for example,
which we share among threads and delete when the reference count drops
to zero.  You can ns_register_proc from any thread, which needs to
ns_free the old value...

Here's the (a?) problem:

http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 17:15, Stephen Deasey wrote:



Yeah, pretty sure.  You can only use Tcl objects within a single
interp, which is restricted to a single thread, but general
ns_malloc'd memory chunks can be passed around between threads. It
would suck pretty hard if that wasn't the case.


Interesting... I could swear I read it that you
can't just alloc in one and free in other thread
using the Tcl allocator.

Well, regarding the nedmalloc, I do not know, but
I can find out...

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Vlad Seryakov

Instead of using threadspeed or other simple malloc/free test, i used 
naviserver and Tcl pages as test for allocators.
Using ab from apache and stresstest it for thousand requests i test 
several allocators. And
having everything the same except LD_PRELOAD the difference seems pretty 
clear. Hoard/TCmalloc/Ptmalloc2 all
slower than zippy, no doubt. Using threadtest although, tcmalloc was 
faster than zippy, but in real life it behaves differently.


So, i would suggest to you to try hit naviserver with nedmalloc. If it 
will be always faster than zippy, than you got what you want. Other

thinks to watch, after each test see the size of nsd process.

I will try nedmaloc as well later today


Stephen Deasey wrote:

On 12/16/06, Zoran Vasiljevic [EMAIL PROTECTED] wrote:
  

Are you sure? AFAIK, we just go down to Tcl_Alloc in Tcl library.
The allocator there will not allow you that. There were some discussions
on comp.lang.tcl about it (Jeff Hobbs knows better). As they (Tcl)
just inherited what aolserver had at that time (I believe V4.0)
the same what applies to AS applies to Tcl and indirectly to us.





Yeah, pretty sure.  You can only use Tcl objects within a single
interp, which is restricted to a single thread, but general
ns_malloc'd memory chunks can be passed around between threads. It
would suck pretty hard if that wasn't the case.

We have a bunch of reference counted stuff, cache values for example,
which we share among threads and delete when the reference count drops
to zero.  You can ns_register_proc from any thread, which needs to
ns_free the old value...

Here's the (a?) problem:

http://www.bozemanpass.com/info/linux/malloc/Linux_Heap_Contention.html

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Vlad Seryakov

You can, it moves Tcl_Objs struct between thread and shared pools, same 
goes with other memory blocks.On thread exit

all memory goes to shared pool.

Zoran Vasiljevic wrote:

On 16.12.2006, at 17:15, Stephen Deasey wrote:

  

Yeah, pretty sure.  You can only use Tcl objects within a single
interp, which is restricted to a single thread, but general
ns_malloc'd memory chunks can be passed around between threads. It
would suck pretty hard if that wasn't the case.



Interesting... I could swear I read it that you
can't just alloc in one and free in other thread
using the Tcl allocator.

Well, regarding the nedmalloc, I do not know, but
I can find out...




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 17:29, Vlad Seryakov wrote:


Instead of using threadspeed or other simple malloc/free test, i used
naviserver and Tcl pages as test for allocators.
Using ab from apache and stresstest it for thousand requests i test
several allocators. And
having everything the same except LD_PRELOAD the difference seems  
pretty

clear. Hoard/TCmalloc/Ptmalloc2 all
slower than zippy, no doubt. Using threadtest although, tcmalloc was
faster than zippy, but in real life it behaves differently.

So, i would suggest to you to try hit naviserver with nedmalloc. If it
will be always faster than zippy, than you got what you want. Other
thinks to watch, after each test see the size of nsd process.

I will try nedmaloc as well later today


Indeed, the best way is to checkout the real application.
No test program can give you better picture!

As far as this is concerned, I do plan to make this test
but it takes some time! I spend the whole day getting the
nedmalloc compiling OK on all platform that we use
(solaris sparc/x86, mac ppc/x86, linux/x86, win). The next
step is to snap it in the Tcl library and try the real
application...

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 16:25, Stephen Deasey wrote:


Something to think about: does the nedmalloc test include allocating
memory in one thread and freeing it in another?  Apparently this is
tough for some allocators, such as Linux ptmalloc. Naviserver does
this.


I'm still not 100% ready reading the code but:

The Tcl allocator just puts the free'd memory
in the cache of the current thread that calls
free(). On thread exit, or of the size of the
cache exceeds some limit, the content of the cache
is appended to shared cache. The memory is never
returned to the system, unless it is allocated
as a chunk larger that 16K.

The nedmalloc does the same but does not move
freed memory between the per-thread cache and
the shared repository. Instead, the thread cache
is emptied (freed) when a thread exits. This
must be explicitly called by the user.

As I see: all is green. But will pay more attention
to that by reading the code more carefully... Perhaps
there is some gotcha there which I would not like to
discover at the customer site ;-)

In nedmalloc you can disable the per-thread cache
usage by defining -DTHREADCACHEMAX=0 during compilation.
This makes some difference:

   Testing nedmalloc with 5 threads ...
   This allocator achieves 16194016.581962ops/sec under 5 threads

w/o cache versus

   Testing nedmalloc with 5 threads ...
   This allocator achieves 18895753.973492ops/sec under 5 threads

with the cache. The THREADCACHEMAX defines the size of
the allocation which goes into cache, similarily to the
zippy. The default is 8K (vs. 16K with zippy). The above
figures were done with max 8K size. If you increase it
to 16K the malloc cores :-( Too bad.

Still, I believe that for long running processes, the
approach of never releasing memory to the OS, as zippy
is doing, is suboptimal. Speed here or there, I'd rather
save myself process reboots if possible...
Bad thing is that Tcl allocator (aka zippy) will not
allow me any choice but bloat.  And this is becomming
more and more important. At some customers site I have
observed process sizes of 1.5GB whereas we started with
about 80MB. Eh!

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Vlad Seryakov

But if speed is not important to you, you can supply Tcl without zippy, 
then no bloat, system is returned with reasonable speed, at least on 
Linux, ptmalloc is not that bad


Zoran Vasiljevic wrote:

On 16.12.2006, at 16:25, Stephen Deasey wrote:


Something to think about: does the nedmalloc test include allocating
memory in one thread and freeing it in another?  Apparently this is
tough for some allocators, such as Linux ptmalloc. Naviserver does
this.


I'm still not 100% ready reading the code but:

The Tcl allocator just puts the free'd memory
in the cache of the current thread that calls
free(). On thread exit, or of the size of the
cache exceeds some limit, the content of the cache
is appended to shared cache. The memory is never
returned to the system, unless it is allocated
as a chunk larger that 16K.

The nedmalloc does the same but does not move
freed memory between the per-thread cache and
the shared repository. Instead, the thread cache
is emptied (freed) when a thread exits. This
must be explicitly called by the user.

As I see: all is green. But will pay more attention
to that by reading the code more carefully... Perhaps
there is some gotcha there which I would not like to
discover at the customer site ;-)

In nedmalloc you can disable the per-thread cache
usage by defining -DTHREADCACHEMAX=0 during compilation.
This makes some difference:

Testing nedmalloc with 5 threads ...
This allocator achieves 16194016.581962ops/sec under 5 threads

w/o cache versus

Testing nedmalloc with 5 threads ...
This allocator achieves 18895753.973492ops/sec under 5 threads

with the cache. The THREADCACHEMAX defines the size of
the allocation which goes into cache, similarily to the
zippy. The default is 8K (vs. 16K with zippy). The above
figures were done with max 8K size. If you increase it
to 16K the malloc cores :-( Too bad.

Still, I believe that for long running processes, the
approach of never releasing memory to the OS, as zippy
is doing, is suboptimal. Speed here or there, I'd rather
save myself process reboots if possible...
Bad thing is that Tcl allocator (aka zippy) will not
allow me any choice but bloat.  And this is becomming
more and more important. At some customers site I have
observed process sizes of 1.5GB whereas we started with
about 80MB. Eh!







-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel




--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 19:31, Vlad Seryakov wrote:

But if speed is not important to you, you can supply Tcl without  
zippy,

then no bloat, system is returned with reasonable speed, at least on
Linux, ptmalloc is not that bad


Eh... Vlad...

On the Mac the nedmalloc outperforms the standard allocator
about 25 - 30 times! The same with the zippy.
All tested with the supplied test program.
I yet have to get real app tested...

On other platforms (Linux, Solaris) yes, I can stay
with the standard allocator. As the matter of fact,
they are close to the nedmalloc +/- about 10-30%
(in favour of nedmalloc, except on Sun/sparc).

One shoe does not fit all, unfortunately...

What I absolutely do not understand is: WHY?
I mean, why I get 30 times difference!? It just
makes no sense, but it is really true.
I am absolutely confused :-((

Re: [naviserver-devel] Quest for malloc

2006-12-16 Thread Zoran Vasiljevic



On 16.12.2006, at 19:31, Vlad Seryakov wrote:


Linux, ptmalloc is not that bad


Interestingly. ptmalloc3 (http://www.malloc.de/) and
nedmalloc both diverge from dlmalloc (http://gee.cs.oswego.edu/malloc.h)
library from Doug lea. Consequently, their performance
is similar (nedmalloc being slight faster).
I have been able to verify this on the Linux box.

[naviserver-devel] Quest for malloc

2006-12-15 Thread Zoran Vasiljevic


Hi!

I've tried libumem as Stephen suggested, but it is slower
than the regular system malloc. This (libumem) is really
geared toward the integration with the mdb (solaris modular
debugger) for memory debugging and analysis.

But, I've found:

http://www.nedprod.com/programs/portable/nedmalloc/index.html

and this looks more promising. I have run its (supplied)
test and it seems that, at least speedwise, the code is
faster than native OS malloc. I will now try to make it working
on all platforms that we use (admitently, it will not run
correctly if you do not set -DNDEBUG to silence some assertions;
this is of course not right and I have to see why/what).

Anyways perhaps a thing to try out...

If you get any breath-taking news with the above, share it here.
On my PPC powerbook (1.5GHZ PPC, 512 MB memory) I get improvements
over the built-in allocator of a factor of 3 (3 times better)
with far less system overehad. I cannot say nothing about the
fragmentation; this has yet to be tested.

Cheers
Zoran

Re: [naviserver-devel] Quest for malloc

2006-12-15 Thread Vlad Seryakov

I also tried Hoard, Google tcmalloc, umem and some other rare mallocs i 
could find. Still zippy beats everybody, i ran my speed test not 
threadtest. Will try this one.


Zoran Vasiljevic wrote:

Hi!

I've tried libumem as Stephen suggested, but it is slower
than the regular system malloc. This (libumem) is really
geared toward the integration with the mdb (solaris modular
debugger) for memory debugging and analysis.

But, I've found:

http://www.nedprod.com/programs/portable/nedmalloc/index.html

and this looks more promising. I have run its (supplied)
test and it seems that, at least speedwise, the code is
faster than native OS malloc. I will now try to make it working
on all platforms that we use (admitently, it will not run
correctly if you do not set -DNDEBUG to silence some assertions;
this is of course not right and I have to see why/what).

Anyways perhaps a thing to try out...

If you get any breath-taking news with the above, share it here.
On my PPC powerbook (1.5GHZ PPC, 512 MB memory) I get improvements
over the built-in allocator of a factor of 3 (3 times better)
with far less system overehad. I cannot say nothing about the
fragmentation; this has yet to be tested.

Cheers
Zoran



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



--
Vlad Seryakov
571 262-8608 office
[EMAIL PROTECTED]
http://www.crystalballinc.com/vlad/

Re: [naviserver-devel] Quest for malloc

2006-12-15 Thread Zoran Vasiljevic



On 15.12.2006, at 19:59, Vlad Seryakov wrote:

I also tried Hoard, Google tcmalloc, umem and some other rare  
mallocs i

could find. Still zippy beats everybody, i ran my speed test not
threadtest. Will try this one.


Important: it is not only raw speed, that is important but also
the memory fragmentation (i.e. lack of it).
In our app we must frequently reboot the server (each couple of
days) otherwise it just bloats. And... we made sure there are
no leaks (have purified all libs that we use)...

I now have some experience with the (zippy) fragmentation and I will
try to make a testbed with this allocator and run it for several
days to get some experience.

Cheers
Zoran

56 matches

Mail list logo