STINNER Victor added the comment:

Serhiy Storchaka: "My patches were just samples. I'm glad that Yury 
incorporated the main idea and that this helps."

Oh, if even Serhiy prefers Yury's patches, I should read them again :-)

--

I read fastint5.patch one more time and I finally understood the following 
macros:

+#define NB_SLOT(slot) offsetof(PyNumberMethods, slot)
+#define NB_BINOP(nb_methods, slot) \
+    (*(binaryfunc*)(& ((char*)nb_methods)[NB_SLOT(slot)]))
+#define PY_LONG_CALL_BINOP(slot, left, right) \
+    (NB_BINOP(PyLong_Type.tp_as_number, slot))(left, right)
+#define PY_FLOAT_CALL_BINOP(slot, left, right) \
+    (NB_BINOP(PyFloat_Type.tp_as_number, slot))(left, right)

In short, a+b calls long_add(a, b) with that. At the first read, I understood 
that it casted objects to C long or C double (don't ask me why).


I see a difference between fastint5.patch and fastintfloat_alt.patch: 
fastint5.patch resolves the address of long_add() at runtime, whereas 
fastintfloat_alt.patch gets a direct pointer to _PyLong_Add() at the 
compilation. I expected a sublte speedup, but I'm unable to see it on 
benchmarks (again, both patches have the same speed).

The float path is simpler in fastint5.patch because it uses the same code if 
right is float or long, but it adds more checks for the slow-path. No patch 
looks to have a real impact on the slow-path. Is it worth to change the second 
if to PyFloat_CheckExact() and then check type of right in the if body to avoid 
other checks on the slow-path?

(C checks look very cheap, so I think that I already replied to my own question 
:-))

--

fastint5.patch optimizes a+b, a-b, a*b, a/b and a//b. Why not other operators? 
List of operators from my constant folding optimzation in fatoptimizer:

* int, float: a+b, a-b, a*b, a/b, +x, -x, ~x, a//b, a%b, a**b
* int only: a<<b, a>>b, a&b, a|b, a^b

If we optimize a//b, I suggest to also optimize a%b to be consistent. For 
integers, a**b, a<<b and a>>b would make sense too. Coming from the C language, 
I would prefer a<<b and a>>b than a*2**k or a//2**k, since I expect better 
performance.

For float, -x and +x may be common, but less a+b, a-b, a*b, a/b.

Well, what I'm trying to say: if choose fastintfloat_alt.patch design, we will 
have to expose like a lot of new C functions in headers, and duplicate a lot of 
code.

To support more than 4 operators, we need a macro.

If we use a macro, it's cheap (in term of code maintenance) to use it for most 
or even all operators.

--

> But I don't quite understand why it adds any gain. Is this just due to 
> overhead of calling PyNumber_Add?

Hum, that's a good question.


> Then we should test with other compilers and with the LTO option.

There are projects (I don't recall the number number) but I would prefer to 
handle LTO separatly. Python supports platforms and compilers which don't 
implement LTO.


> fastint5.patch adds an overhead for type checks and increases the size of 
> ceval loop. What is outweigh this overhead?

I stopped to guess the speedup just by reading the code or a patch. I only 
trust benchmarks :-)

Advice: don't trust yourself! only trust benchmarks.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21955>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to