On 23.12.20 14:42, PIERRE AUGIER wrote:
I wrote another very simple benchmark that should not depend on auto-vectorization. The bench function is:```python def sum_x(positions): result = 0.0 for i in range(len(positions)): result += positions[i].x return result ```
This benchmark probably really shows the crux of the problem. In Python, the various Points instances (whether with lists, or with direct attributes) are vastly more complex beasts than the structs in Julia. There, you can declare a struct with a certain number of Float64 fields and be done. Thus, reading .x from such a struct is just a pointer dereference. In Python, due to dynamic typing, the ability to add more fields later and even the ability to change the class of an instance, the actual memory layout of a Point3D type is much more complex with various indirections and boxing. Reading .x out of such a thing is done in several steps: 1) check that positions[i] is an instance 2) check that it's an instance of Point3D 3) read its x field 4) check that the field is a float 5) read the float's value All of these steps involve a pointer read. Improving this situation is probably possible (there's even a paper how to get rid of steps 1 and 2: https://www.csl.cornell.edu/~cbatten/pdfs/cheng-type-freezing-cgo2020.pdf but the work wasn't merged). But there are problems: - basically every single one of these steps needs to be addressed, and every one is its own optimization - it's extremely delicate to get the balance and the trade-offs right, because the object system is so central in getting good performance for Python code across a wide variety of areas (not just numerical algorithms). Another approach would indeed be (as you say in the other mail) to add support for telling PyPy explicitly that some list can contain only instances of a specific class and (more importantly) that a class is not to be considered to be "dynamic" meaning that its fields are fixed and of specific types. So far, we have not really gone in such directions, because that is language design and we leave that to the CPython devs ;-). Note that some of your other benchmarks are not measuring what you hope! eg I suspect that get_objects, get_xs and loop_over_list_of_objects from your other mail get completely removed by the Julia compiler, since they don't have side effects and don't compute anything. PyPy isn't actually able to remove empty loops. So you are comparing empty loops in PyPy with no code at all in Julia. Cheers, Carl Friedrich _______________________________________________ pypy-dev mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-dev
