On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote: > On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner <hen...@gramner.com> wrote: >> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that >> gets assembled as FMA3) since normal FMA3 opcodes are horrible to >> read, nobody ever remembers the ordering of operands. > > 1. It is very easy to remember: take fmadd231pd x, y, z for instance. > This means 2*3 + 1, so x = y*z+x. How the macro is more readable is > beyond me; especially with some side cases that are undocumented, see > below.
fmaddps dst, src1, src2, src3 is always going to be easier to read for anyone without having to think about what number belongs to what operation and what operand. And it will output either FMA4 or FMA3 depending on the value passed to INIT_[XY]MM. > 2. If anything, the macro is harder, since it is not Intel supported, Of course it wont be there, it's not defined by them. Non-destructive four operand fma is defined by AMD. > I can't look it up at > https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf. Neither are any of the dozens other compat macros in x86utils. And many of them are also undocumented within x86utils. This point is absurd. > 3. The macro does not seem to take care of the mov's (if any), still > requiring explicit thought on the part of the programmer. Yes, and? It's not an emulation macro like the uppercase ones that become several instructions. It translate a single FMA4-like instruction into either an FMA4 or FMA3 one. fmaddps xmm0, xmm0, xmm1, xmm2 becomes vfmaddps xmm0, xmm0, xmm1, xmm2 if FMA4 vfmadd132ps xmm0, xmm2, xmm1 if FMA3 If you try to use it with four different operands, it will work with FMA4 but not FMA3, since as i said it's not trying to emulate anything. > 4. The macro lacks documentation. In particular, it is not a thorough > fma4 emulation in the spirit of > https://gist.github.com/rygorous/22180ced9c7a00bd68dd. > > Or put in other words, IMO not good. No, it's good and what's done in every other asm file precisely for being more flexible and readable. Especially since it allows one to write both FMA4 and FMA3 functions without duplicating code. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel