At 10:25 PM 2/8/2009, you wrote:
>Don't get me started guys!
>
>Suspects (might have no improvement, might be all the difference in the
>world):
>- Their 'all in one binary' means that aside from the math functions they
>specifically program, 90% of code is generated for a featureless i486
>platform. Providing an advanced binary that's all -msse2 -mssse3
>-ffast-math/etc might have a massive improvement.

I think compiling with -fPIC reduces the ability to optimize down 
further, at least on older GCC.
I need to do an analysis of that on an older gcc to see what is different.

>- fPIC might be lagging the call heavy binary and clobbering a register
>causing issues.

Indeed. clobbering %ebx does more harm than good, even though people 
will say differently.

>- The engine spends about 8% of its time in bitbuffers. Reading a int out of
>a bit buffer requires loading 5 bytes, applying a mask, bitshifting them,
>adding them. A more intelligent data structure might reduce this by a factor
>of 10 (or, again, have no effect)

The engine also spends HZ/sec reading the clock. Do you really need 
to call the clock that many times? Solution:
cache the last timestamp and update it to the engine only when 
necessary. Or use rdtsc or open /dev/hpet and read it. (the former is 
the better choice, most newer systems have TSC's that don't require a 
cpuid instruction)

>- Realtime scheduling in linux yields absolutely massive performance
>increases, though the linux scheduler is generally regarded as pretty good.
>Something about how their frameloop expects/yields CPU time could probably
>use some tweaking.

They are still using usleep()'s, which is only good down to -4 or 
-5.. they should just use nanosleeps and avoid a couple of
extra paths (not really and optimizations, just saves a couple of 
steps that glibc does). But extra accuracy increases cpu overhead :P

A better solution would be is to just completely profile the engine 
to see what it's spending it's time doing.

>- Real 64-bit binaries would eliminate context switches, making gettimeofday
>a virtual syscall, and probably do everyone a lot of favours. Oh, and fPIC
>on 64-bit has almost no performance penalty.

Indeed. 64bit code will run overall better, 
vdso/vgettimeofday/vclock_gettime allows a syscall without an 
interrupt. Not really a performance increase, but
calling syscalls so much eats up cpu cycles/cacheline bounces.


_______________________________________________
To unsubscribe, edit your list preferences, or view the list archives, please 
visit:
http://list.valvesoftware.com/mailman/listinfo/hlds_linux

Reply via email to