On 07/12/2016 03:45 AM, Matt Turner wrote:
On Thu, Jul 7, 2016 at 2:18 AM, Pengfei Qu<pengfei...@intel.com>  wrote:
+/* Compare three word data to get the min value */
+word_imin:
+       cmp.le.f0.0 (1)         null:w          INPUT_ARG0.0<0,1,0>:w   
INPUT_ARG0.4<0,1,0>:w {align1};
+       (f0.0) mov  (1)         TEMP_VAR0.0<1>:w INPUT_ARG0.0<0,1,0>:w          
          {align1};
+       (-f0.0) mov (1)         TEMP_VAR0.0<1>:w INPUT_ARG0.4<0,1,0>:w          
          {align1};
+       cmp.le.f0.0 (1)         null:w          TEMP_VAR0.0<0,1,0>:w    
INPUT_ARG0.8<0,1,0>:w {align1};
+       (f0.0) mov  (1)         RET_ARG<1>:w TEMP_VAR0.0<0,1,0>:w               
          {align1};
+       (-f0.0) mov (1)         RET_ARG<1>:w INPUT_ARG0.8<0,1,0>:w              
          {align1};

I think each of these groups of cmp/mov/mov can be replaced with a single sel.

Hi, Matt

    Thanks for your suggestion.

The above cmp/mov/mov can't be replaced with one single sel. Only the two mov/mov can be replaced with one single sel as the condition of select instruction is based on cmp instruction.

Another reason is that the current shader is derived from that for H264, which is already verified that it can work well. At the same time as it is not critical to the performance, I don't think that we need speed much efforts on replacing mov/mov with SEL.

Thanks
   Yakui


sel.l.f0.0 (1) TEMP_VAR0.0<1>:w  INPUT_ARG0.0<0,1,0>:w
INPUT_ARG0.4<0,1,0>:w {align1};
sel.l.f0.0 (1) RET_ARG<1>:w         TEMP_VAR0.0<1>:w
INPUT_ARG0.8<0,1,0>:w {align1};

+       RETURN          {align1};
+
+/* Compare three word data to get the max value */
+word_imax:
+       cmp.ge.f0.0 (1)         null:w          INPUT_ARG0.0<0,1,0>:w   
INPUT_ARG0.4<0,1,0>:w {align1};
+       (f0.0) mov  (1)         TEMP_VAR0.0<1>:w INPUT_ARG0.0<0,1,0>:w          
          {align1};
+       (-f0.0) mov (1)         TEMP_VAR0.0<1>:w INPUT_ARG0.4<0,1,0>:w          
          {align1};
+       cmp.ge.f0.0 (1)         null:w          TEMP_VAR0.0<0,1,0>:w    
INPUT_ARG0.8<0,1,0>:w {align1};
+       (f0.0) mov  (1)         RET_ARG<1>:w TEMP_VAR0.0<0,1,0>:w               
          {align1};
+       (-f0.0) mov (1)         RET_ARG<1>:w INPUT_ARG0.8<0,1,0>:w              
          {align1};

Same here I expect.

+       RETURN          {align1};
+
+word_imedian:
+       cmp.ge.f0.0 (1) null:w INPUT_ARG0.0<0,1,0>:w INPUT_ARG0.4<0,1,0>:w 
{align1};
+       (f0.0)  jmpi (1) cmp_a_ge_b;
+       cmp.ge.f0.0 (1) null:w INPUT_ARG0.0<0,1,0>:w INPUT_ARG0.8<0,1,0>:w 
{align1};
+       (f0.0) mov (1) RET_ARG<1>:w INPUT_ARG0.0<0,1,0>:w {align1};
+       (f0.0) jmpi (1) cmp_end;
+       cmp.ge.f0.0 (1) null:w INPUT_ARG0.4<0,1,0>:w INPUT_ARG0.8<0,1,0>:w 
{align1};
+       (f0.0) mov (1) RET_ARG<1>:w INPUT_ARG0.8<0,1,0>:w {align1};
+       (-f0.0) mov (1) RET_ARG<1>:w INPUT_ARG0.4<0,1,0>:w {align1};
+       jmpi (1) cmp_end;
+cmp_a_ge_b:
+       cmp.ge.f0.0 (1) null:w INPUT_ARG0.4<0,1,0>:w INPUT_ARG0.8<0,1,0>:w 
{align1};
+       (f0.0) mov (1) RET_ARG<1>:w INPUT_ARG0.4<0,1,0>:w {align1};
+       (f0.0) jmpi (1) cmp_end;
+       cmp.ge.f0.0 (1) null:w INPUT_ARG0.0<0,1,0>:w INPUT_ARG0.8<0,1,0>:w 
{align1};
+       (f0.0) mov (1) RET_ARG<1>:w INPUT_ARG0.8<0,1,0>:w {align1};
+       (-f0.0) mov (1) RET_ARG<1>:w INPUT_ARG0.0<0,1,0>:w {align1};

And here.
_______________________________________________
Libva mailing list
Libva@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libva

_______________________________________________
Libva mailing list
Libva@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libva

Reply via email to