Hello All,
I have been toying with OpenMP through f2py and ctypes. On the whole, 
the results of my efforts have been very encouraging. That said, some 
results are a bit perplexing.

I have written identical routines that I run directly as a C-derived 
executable, and through ctypes as a shared library. I am running the 
tests on a dual-Xeon Ubuntu system with 12 cores and 24 threads. The C 
executable is SLIGHTLY faster than the ctypes at lower thread counts, 
but the C eventually has a speedup ratio of 12+, while the python caps 
off at 7.7, as shown below:

threads C-speedup Python-speedup
1       1       1
2       2.07    1.98
3       3.1     2.96
4       4.11    3.93
5       4.97    4.75
6       5.94    5.54
7       6.83    6.53
8       7.78    7.3
9       8.68    7.68
10      9.62    7.42
11      10.38   7.51
12      10.44   7.26
13      7.19    6.04
14      7.7     5.73
15      8.27    6.03
16      8.81    6.29
17      9.37    6.55
18      9.9     6.67
19      10.36   6.9
20      10.98   7.01
21      11.45   6.97
22      11.92   7.1
23      12.2    7.08

These ratios are quite consistent from 100KB double arrays to 100MB 
double arrays, so I do not think it reflects a Python overhead issue. 
There is no question the routine is memory bandwidth constrained, and I 
feel lucky to squeeze the eventual 12+ ratio, but I am very perplexed as 
to why the performance of the Python-invoked routine seems to cap off.

Does anyone have an explanation for the caps? Am I seeing some effect 
from ctypes, or the Python engine, or what?

Cheers,
Eric

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to