Hi all, I passed the last few days slowly experimenting with alternative/faster ways to do lookup of attributes.
For simplicity, I didn't touch the interpreter but I simply wrote few rpython programs doing only the last part of the attribute lookup, i.e. accessing the underlying RPython dictionary. Moreover, I played only with dict of strings, since this is how objects are implemented when multidict is enabled. My first experiment was with precomputed hashes for keys; I introduced a new method get_fast on dictionaries, which accepts both the key and its hash value, so it does not need to compute it. Of course this is just a quick hack because programs using get_fast works only when translated and can't run on top of CPython, but it was the simplest way to experiment. You can find both the get_fast patch and the benchmark here: http://codespeak.net/svn/user/antocuni/lookup-hack/get_fast.patch http://codespeak.net/svn/user/antocuni/lookup-hack/targethashcache.py The interesting thing is the result of the benchmark: since RPython strings already cache their hash I didn't expect get_fast to be much faster than get, but the benchmark says the opposite; on my machine: [EMAIL PROTECTED] lookup-hack $ ./targethashcache-c 50000000 iterations get: 3.820000 seconds get_fast: 2.520000 seconds get_fast: 1.515873 times faster It is 50% faster. I really don't know why, maybe there is something wrong in the benchmark or maybe something is wrong with the hash cache of rpython strings. The second and more insteresting experiment is a bit more involved. The idea is that for most of Python objects, we can guess a set of "likely attributes" at compile time: - for modules, by inspecting its global namespace; - for classes, by inspecting the namespace of a class definition; - for instances, by looking for LOAD_FAST 0, STORE_ATTR x inside the methods of the class. Note that we don't pretend to catch all the possible attributes, just the most likely. The assumption is that it's very uncommon to have a "non-likely attribute" and that for most of the objects the set of the attributes at runtime is a subset of the likely attributes. The idea is to speed up consistently all the accesses to likely attributes, at the cost of a slighly slow down for accessing the non-likely ones. Once we have collected the set of likely attributes, we compute a perfect hash function for this set; for my experiment, I used an hacked version of this algorithm: http://www.amk.ca/files/python/perfect_hash.txt The perfect hash function looks like this, where N and G vary from set to set of likely attributes:: def perfect_hash(N, G, key): h = hash(key) # normal hash i = h & N-1 j = i+1 res = G[i] + G[j] res = res & N-1 return res Moreover, since h depends only on key we could also precomputed it at compile time, as we did for get_fast:: def perfect_hash(N, G, key, h): ... Finally, we insert a fast path for the common case in the lookup code:: def lookup(d, key, h): # h is the precomputed hash index = perfect_hash(d.N, d.G, key, h) entry = d.entries[index] if entry.is_valid() and entry.key is key: return entry.value else: # do a full lookup Note that in the fast path we use 'is' instead of == to compare the keys; since the W_Strings storing the names of the attributes are interned, this should be enough. Note also that G and N are stored on the dict (thus we can have different perfect hashes for different set of likely attributes, as we need). If my assumption about likely attributes is true, most of LOAD_ATTR will follow the fast path, resulting in a considerable speedup: if the object only uses likely attributes, there will never be a collision. Most important, this does not break anything, because in case the fast path is not followed (for example when we do {get,set}attr(obj, 'attr')), the full lookup will work as well. You can find the benchmark here (this doesn't need get_fast.patch): http://codespeak.net/svn/user/antocuni/lookup-hack/targetperfectdict.py For the test, I implemented a perfect_dict as an RPython class, thus it might be slower than what we can get if we implement it as a real rpython type on the style of r_dict; in particular, RPython does index-checking when calcultaing G[i] and G[j], but it's not necessary. Although perfect_dict is not as fast as possible, the result of the benchmark is very interesting: [EMAIL PROTECTED] lookup-hack $ ./targetperfectdict-c 50000000 iterations get: 3.740000 seconds pdict: 1.450000 seconds pdict: 2.579310 times faster What do you think about this idea? Is there any obvious bug/wrong assumption I can't catch? Do you think it might be worth of implementing it? (it would be a nice sprint topic). ciao Anto _______________________________________________ [email protected] http://codespeak.net/mailman/listinfo/pypy-dev
