Well, if you're not starting your timer until after you initialize your array, 
the VM manager shouldn't be causing you to get slow results with larger number 
of threads.  One thing to remember, though, is that just doing a new or 
malloc() or similar really only gives you a pointer to virtual memory reserved 
for your process.  It doesn't give your process actual physical memory until it 
tries to modify its contents.  That's true on most any modern OS.  (You'd be 
surprised how much that can get in your way - compare reading a gigabyte file 
into memory from a 4-gbps fiber-channel SAN, once with the memory memset() 
before reading, and once straight into a new buffer off the heap without doing 
a memset() before doing the read.)

There's also the possibility that with a large number of threads you're 
overwhelming the virtual memory manager as it tries to create memory pages for 
each of your threads.  Depending on how each thread goes about processing, they 
may modify their stack memory until they actually start processing data.  If 
all your threads begin processing at about the same time and they all suddenly 
need access to their stack memory, the virtual memory manager is going to be 
overwhelmed.  You can eliminate this problem by explicitly creating the stack 
memory for each thread yourself, and doing a memset() on the block before 
starting the respective thread.  (And memset() is going to be a lot faster than 
any loop you write yourself.)

If not that, you could also be seeing a constant processing time for smaller 
number of threads because your process is limited by the amount of memory 
you're going through, which if I read your description correctly is a constant. 
 If my guess is true, doubling your RAM from 256MB to 512MB should cause the 
flat part of your performance plot to double from about 10 seconds to about 20 
seconds, and then you should see your times increasing once you get above 4k 
threads or so.

Above that number of threads, you're handling smaller and smaller chunks of 
data, and I suspect your message processing might be more limiting than the 
amount of data you're having all your threads process.

And yes, those hypotheses are based on the thread scheduler not being your 
problem.  What does running 16K threads get you on a 2-cpu host?  Resources on 
any computer are limited, and oversubscribing any one of them will slow you 
down.
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to