On 29.10.2015 21:50, Daπid wrote: > > On 29 October 2015 at 20:25, Julian Taylor > <jtaylor.deb...@googlemail.com <mailto:jtaylor.deb...@googlemail.com>> > wrote: > > should be possible by putting this into: ~/.numpy-site.cfg > > [openblas] > libraries = openblasp > > LD_PRELOAD the file should also work. > > > Thank! > > I did some timings on a dot product of a square matrix of size 10000 > with LD_PRELOADing the different versions. I checked that all the cores > were crunching when an other than plain libopenblas/64 was selected. > Here are the timings in seconds: > > > Intel i5-3317U: > /usr/lib64/libopenblaso.so > 86.3651878834 > /usr/lib64/libopenblasp64.so > 96.8817200661 > /usr/lib64/libopenblas.so > 114.60265708 > /usr/lib64/libopenblasp.so > 107.927740097 > /usr/lib64/libopenblaso64.so > 97.5418870449 <tel:5418870449> > /usr/lib64/libopenblas64.so > 109.000799179 > > Intel i7-4770: > /usr/lib64/libopenblas.so > 37.9794859886 > /usr/lib64/libopenblasp.so > 12.3455951214 > /usr/lib64/libopenblas64.so > 38.0571939945 > /usr/lib64/libopenblasp64.so > 12.5558650494 > /usr/lib64/libopenblaso64.so > 12.4118559361 > /usr/lib64/libopenblaso.so > 13.4787950516 > > Both computers have the same software and OS. So, it seems that openblas > doesn't get a significant advantage from going parallel in the older i5; > the i7 using all its cores (4 + 4 hyperthread) gains a 3x speed up, and > there is no big different between OpenMP and pthreads. > > I am particullary puzzled by the i5 results, shouldn't threads get a > noticeable speedup? > > > /David. > >
Try with only 2 cores instead of the 2+2 via OMP_NUM_THREADS=2, its possible the hyperthreading is just leading to cache trashing. Also when only one core is active the cpus will overclock themselves a bit which will decrease relative parallelization speedups (intel turbo boost). _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion