—snip—

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if 
> contributors want to leverage a new architecture specific SIMD instruction, 
> will they be expected to add software implementation of this instruction for 
> all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot 
add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  
However, I would not expect one person or team to be an expert in all 
assemblies, so intrinsics for one architecture can be developed independently 
of another.

> 2) On whom does the burden lie to ensure that new implementations are 
> benchmarked and shows benefits on every architecture? What happens if 
> optimizing an Ufunc leads to improving performance on one architecture and 
> worsens performance on another?

I would look at this from a maintainability point of view. If we are increasing 
the code size by 20% for a certain ufunc, there must be a domonstrable 20% 
increase in performance on any CPU. That is to say, micro-optimisation will be 
unwelcome, and code readability will be preferable. Usually we ask the 
submitter of the PR to test the PR with a machine they have on hand, and I 
would be inclined to keep this trend of self-reporting. Of course, if someone 
else came along and reported a performance regression of, say, 10%, then we 
have increased code by 20%, with only a net 5% gain in performance, and the PR 
will have to be reverted.

—snip—
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to