At 10:25 PM 2/8/2009, you wrote: >Don't get me started guys! > >Suspects (might have no improvement, might be all the difference in the >world): >- Their 'all in one binary' means that aside from the math functions they >specifically program, 90% of code is generated for a featureless i486 >platform. Providing an advanced binary that's all -msse2 -mssse3 >-ffast-math/etc might have a massive improvement.
I think compiling with -fPIC reduces the ability to optimize down further, at least on older GCC. I need to do an analysis of that on an older gcc to see what is different. >- fPIC might be lagging the call heavy binary and clobbering a register >causing issues. Indeed. clobbering %ebx does more harm than good, even though people will say differently. >- The engine spends about 8% of its time in bitbuffers. Reading a int out of >a bit buffer requires loading 5 bytes, applying a mask, bitshifting them, >adding them. A more intelligent data structure might reduce this by a factor >of 10 (or, again, have no effect) The engine also spends HZ/sec reading the clock. Do you really need to call the clock that many times? Solution: cache the last timestamp and update it to the engine only when necessary. Or use rdtsc or open /dev/hpet and read it. (the former is the better choice, most newer systems have TSC's that don't require a cpuid instruction) >- Realtime scheduling in linux yields absolutely massive performance >increases, though the linux scheduler is generally regarded as pretty good. >Something about how their frameloop expects/yields CPU time could probably >use some tweaking. They are still using usleep()'s, which is only good down to -4 or -5.. they should just use nanosleeps and avoid a couple of extra paths (not really and optimizations, just saves a couple of steps that glibc does). But extra accuracy increases cpu overhead :P A better solution would be is to just completely profile the engine to see what it's spending it's time doing. >- Real 64-bit binaries would eliminate context switches, making gettimeofday >a virtual syscall, and probably do everyone a lot of favours. Oh, and fPIC >on 64-bit has almost no performance penalty. Indeed. 64bit code will run overall better, vdso/vgettimeofday/vclock_gettime allows a syscall without an interrupt. Not really a performance increase, but calling syscalls so much eats up cpu cycles/cacheline bounces. _______________________________________________ To unsubscribe, edit your list preferences, or view the list archives, please visit: http://list.valvesoftware.com/mailman/listinfo/hlds_linux