Am 15.01.2007 um 22:22 schrieb Mike:


Zoran, I believe you misunderstood.  The "patch" above limits blocks
allocated by your tester to 16000 instead of 16384 blocks.  The reason
for this is that Zippy's "largest bucket" is configured to be
16284-sizeof(Block) bytes (note the "2" in 16_2_84 is _NOT_ a typo).
By making uniformly random requests sizes up to 16_3_84, you are
causing Zippy to fall back to system malloc for a small fraction of
requests, substantially penalizing its performance in these cases.

Ah! That's right. I will fix that.


You wanted to know why Zippy is slower on your test, this is the
reason.  This has substantial impact on FreeBSD and linux, and my
guess is that it will have a drammatic effect on Mac OSX.

I will check that tomorrow on my machines.

The benefit of mmap() is being able to "for sure" release memory back
to the system.  The drawback is that it always incurrs a substantial
syscall overhead compared to malloc.  You decide which you prefer (I
think I would lean slightly toward mmap() for long lived applications,
but not by much, since the syscall introduces a lot of variance and an
average performance degradation).

Yep. I agree. I would avoid it if possible. But I know of no other
sure memory-returning call! I see that most (all?) of the allocs
I know just keep everything allocated and never returned.


How about adding this into the code?

I think the most obvious replacement is just using an if "tree":
if (size>0xff) bucket+=8, size&=0xff;
if (size>0xf) bucket+=4, size&0xf;
...
it takes a minute to get the math right, but the performance gain
should be substantial.

Well, I can test that allright. I have the feeling that a tight
loop as that (will mostly sping 5-12 times) gets well compiled
in machine code, but it is better to test.


In my tests, due to the frequency of calls of these functions they
contribute 10% to 15% overhead in performance.

Yes. That is what I was also getting. OTOH, the speed difference
between VT and zippy was sometimes several orders of magnitude
so I simply ignored that.

Ha! It is pretty simple: you can atomically check pointer equivalence
without risking a core (at least this is my experience). You are not
expected to make far-reaching decisions based on it, though.
In this particular example, even if the test was false, there would be
no "harm" done, just an inoptimal path would be selected.
I have marked that "Dirty read" to draw people attention on that place.
And, I succeeded obviously :-)

The dirty read I have no problem with.  It's the the possibility of
taking of the head element which could be placed there by another
thread that bothers me.

Ah, this will not happen. As, I take the global mutex at that point
so the pagePtr->p_cachePtr cannot be changed under our feet.
If that block was allocated by the current thread, the p_cachePtr
will not be changed by anybody. So no harm. If it is not, then we
must lock the global mutex to prevent anybody fiddling with that
element. It is tricky but it should work.

It sounds like you are in the best position to test this change to see
if it fixes the "unbounded" growth problem.


Yes! Indeed. The only thing I'd have to check is how much more
memory this will take. But is certainly worth trying it out
as it will be a temp relief to our users until we stress test
the VT to the max so I can include it in our standard distro.

---------------------------------------------------------------------- ---
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php? page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel



Reply via email to