Hi, in case you have not heard: I'm currently working on the PPC and S390X port for micro numpy. Thanks to IBM for funding this work.
I'm ~50% through the ppc operations to implement. The goal is to turn this optimization on (by default) in the micro numpy module. I recently had the idea to enhance the jit driver by giving it more information about parallel execution. I'm *not* talking about the main interp. loop. Having a vectorized loop that executes parallel in threads would certainly push micronumpy performance. Has somebody already tried something similar? I think it is a challenge, but it should be possible (with a reasonable amount of work) to get a simple thread fork/join model such as OpenMP provides. Cheers, Richard
signature.asc
Description: OpenPGP digital signature
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev