My experience tuning our java VM is similar, and it runs on a lot of
different CPUs.  Still, there is reason to be suspicious of very small
changes, which might be repeatable for our benchmark set, yet have no
real meaning for normal use.  And there is reason to be careful not to
waste time pursuing 3% tweaks instead of going for 100% breakthroughs.

On Thursday, May 8, 2003, at 09:11 America/Denver, Darin Adler wrote:
On Thursday, May 8, 2003, at 07:04 AM, Beman Dawes wrote:

A 2-3% timing difference probably isn't reliably repeatable in real code.

How code and data happens to land in hardware caches can easily swamp out such a small difference. The version-to-version or step-to-step differences in CPU's, memory, compilers, or operating systems can cause that much difference in a given program. Differences need to get up into the 20-30% range before they are likely to be reliably repeatable across different systems.

At least that's been my experience.

That has not been my recent experience. While working on my current project (the Safari web browser), we have routinely made 1% speedups that are measurable and have an effect across multiple machines and compilers (same basic CPU type and operating system), and we have also detected 1% slowdowns when we inadvertently introduced them.


They add up. Ten 1% speedups result in a 9.5% speedup.

It's true that differences in CPUs, memory, compilers, and operating systems can cause huge differences, but that does not mean that changes that make such small increases in performance are therefore not worthwhile.

In our project, a 3% speed increase is considered a cause for celebration.

I'm not sure, though, if this negates your point, Beman. Something that gives a 2-3% speedup for one Boost user might not be worth any level of obfuscation unless we can prove it provides a similar speedup for other Boost uses.

-- Darin

PS: On the occasions where you can fix an algorithm in a way that gives a 10x speed increase, or a 25% one, that's even more exciting. To give you an idea of what I'm talking about, here's a log from a week I spent increasing the speed of our JavaScript library:

Monday, November 18, 2002
- sped up JavaScript iBench by 70% by using a better sort algorithm and reducing the number of UString allocations
- sped up JavaScript iBench by 6% by turning ExecState into a simple object instead of a two level abstraction
- sped up JavaScript iBench by 7% by turning the property map into a hash table and improving String instance handling
- sped up JavaScript iBench by 6% by hoisting the property map into ObjectImp and doing less ref/unref
- sped up JavaScript iBench by 1.5% by converting integers into strings with custom code rather than sprintf


Tuesday, November 19, 2002
- sped up JavaScript iBench by 2% by using masking instead of modulus in the property map hash table code
- sped up JavaScript iBench by 3% by improving the implementation of the "perfect hashing" hash tables used for static properties
- sped up JavaScript iBench by 1.5% by storing computed hash values in the UString representation so we don't recompute them
- sped up JavaScript iBench by 6.5% by atomizing property identifiers
- sped up JavaScript iBench by 1.5% by not clearing and rebuilding the list each time during sorting


Wednesday, November 20, 2002
- sped up JavaScript iBench by 5% by decreasing the amount of ref/deref done by changing interfaces so they can deal directly with ValueImp
- sped up JavaScript iBench by 1% by making lists ref/unref less
- sped up JavaScript iBench by 7.5% by creating argument list objects only on demand rather than for each function call


Thursday, November 21, 2002
- sped up JavaScript iBench by 3% by allocating the ActivationImp objects on the stack rather than in the garbage-collected heap
- sped up JavaScript iBench by 11% by turning the scope chain into a singly-linked list that shares tails rather than a non-sharing doubly-linked list with subtly different semantics


Friday, November 22, 2002
- sped up JavaScript iBench by 10% by changing the List class from a linked list to a vector, and using a pool of instances


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Reply via email to