On Monday, 19 November 2012 at 15:48:23 UTC, Manu wrote:
This wouldn't strictly retain half precision though, it would beslightly higher precision since the intermediates were full precision(which is surely preferable?).
I would think it's actually not preferable.Imagine you developed and tuned all the code on x86 and everything is fine. Then run it on ARM and suddenly all computations are inaccurate.