I've already answered these on IRC but for the sake of completion I'll include the answers here as well.
On Sat, Feb 13, 2016 at 10:26:58PM -0300, James Almer wrote: > On 2/13/2016 9:27 PM, Timothy Gu wrote: > > --- > > > > The reason why this function uses SSE4.1 is the roundps instruction. Would > > love to find a way to truncate a float to integer in SSE2. CVTTPS2DQ—Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers > > + punpcklwd m0, m2 ; 000x000x > > + punpcklwd m1, m2 > > + > > + cvtdq2ps m0, m0 > > + cvtdq2ps m1, m1 > > + divps m0, m1 ; a / b > > + mulps m0, m3 ; a / b * 255 > > + roundps m0, m0, 3 ; truncate > > + minps m0, m3 > > Are these two really needed? After a quick glance GCC seems to simply > generate more > or less the same code you're using here sans these two. (convert to float, > div, mul, > convert to int, saturate to uint8_t). roundps becomes unnecessary after cvttps2dq. minps is needed for divide-by-0 cases. Timothy _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel