Hi Matti, Thanks for your questions :-)
> This seems like it would improve performance on aarch64. Would the routines > also work with the Apple silicon? Yip, I can't see a reason why that wouldn't be the case. > If these are new routines, it would be better to implement them in terms of > the numpy universal intrinsics rather than adding a new submodule. These would be the same routines as seen in SVML (integrated here: https://github.com/numpy/numpy/blob/main/numpy/core/src/umath/loops_umath_fp.dispatch.c.src#L67), which use the universal intrinsics before using the SVML library, the actual surface area is minimal so I'd propose we follow a similar path with our existing routines and then aim to apply universal intrinsics if that's possible in the future - does that sound like a good approach? Cheers, Chris _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com