2010/1/29 Nick Coghlan <ncogh...@gmail.com> > I wouldn't consider changing from bytecode to wordcode uncontroversial - > the potential to have an effect on cache hit ratios means it needs to be > benchmarked (the U-S performance tests should be helpful there). >
It's quite strange, but from the tests made it seems that wpython perform better with old architectures (such as my Athlon64 socket 754), which have less resources like caches. It'll be interesting to check how it works on more limited ISAs. I'm especially curious about ARMs. > It's the same basic problem where any changes to the ceval loop can have > surprising performance effects due to the way they affect the compiled > switch statements ability to fit into the cache and other low level > processor weirdness. > > Cheers, > Nick. > Sure, but consider that with wpython wordcodes require less space on average. Also, less instructions are executed inside the ceval loop, thanks to some natural instruction grouping. For example, I recently introduced in wpython 1.1 a new opcode to handle more efficiently expression generators. It's mapped as a unary operator, so it exposes interesting properties which I'll show you with an example. def f(a): return sum(x for x in a) With CPython 2.6.4 it generates: 0 LOAD_GLOBAL 0 (sum) 3 LOAD_CONST 1 (<code object <genexpr> at 00512EC8, file "<stdin>", line 1>) 6 MAKE_FUNCTION 0 9 LOAD_FAST 0 (a) 12 GET_ITER 13 CALL_FUNCTION 1 16 CALL_FUNCTION 1 19 RETURN_VALUE With wpython 1.1: 0 LOAD_GLOBAL 0 (sum) 1 LOAD_CONST 1 (<code object <genexpr> at 01F13208, file "<stdin>", line 1>) 2 MAKE_FUNCTION 0 3 FAST_BINOP get_generator a 5 QUICK_CALL_FUNCTION 1 6 RETURN_VALUE The new opcode is GET_GENERATOR, which is equivalent (but more efficient, using a faster internal function call) to: GET_ITER CALL_FUNCTION 1 The compiler initially generated the following opcodes: LOAD_FAST 0 (a) GET_GENERATOR then the peepholer recognized the pattern UNARY(FAST), and produced the single opcode: FAST_BINOP get_generator a In the end, the ceval loop executes a single instruction instead of three. The wordcode requires 14 bytes to be stored instead of 20, so it will use 1 data cache line instead of 2 on CPUs with 16 bytes lines data cache. The same grouping behavior happens with binary operators as well. Opcodes aggregation is a natural and useful concept with the new wordcode structure. Cheers, Cesare
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com