Hi Matti,

Thanks for your questions :-)

> This seems like it would improve performance on aarch64. Would the routines 
> also work with the Apple silicon?

Yip, I can't see a reason why that wouldn't be the case.

> If these are new routines, it would be better to implement them in terms of 
> the numpy universal intrinsics rather than adding a new submodule.

These would be the same routines as seen in SVML (integrated here: 
https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/loops_umath_fp.dispatch.c.src#L67),
 which use the universal intrinsics before using the SVML library, the actual 
surface area is minimal so I'd propose we follow a similar path with our 
existing routines and then aim to apply universal intrinsics if that's possible 
in the future - does that sound like a good approach?

Cheers,
Chris
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to