https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117804
Bug ID: 117804
Summary: RISC-V: Worse codegen in mc_chroma of x264
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: juzhe.zhong at rivai dot ai
Target Milestone: ---
#include <stdint.h>
#include <math.h>
void mc_chroma( uint8_t *dst, int i_dst_stride,
uint8_t *src, int i_src_stride,
int mvx, int mvy,
int i_width, int i_height )
{
uint8_t *srcp;
int d8x = mvx&0x07;
int d8y = mvy&0x07;
int cA = (8-d8x)*(8-d8y);
int cB = d8x *(8-d8y);
int cC = (8-d8x)*d8y;
int cD = d8x *d8y;
src += (mvy >> 3) * i_src_stride + (mvx >> 3);
srcp = &src[i_src_stride];
for( int y = 0; y < i_height; y++ )
{
for( int x = 0; x < i_width; x++ )
dst[x] = ( cA*src[x] + cB*src[x+1] + cC*srcp[x] + cD*srcp[x+1] +
32 ) >> 6;
dst += i_dst_stride;
src = srcp;
srcp += i_src_stride;
}
}
https://godbolt.org/z/6xncTjo88
gcc:
vzext.vf2 v8,v4
vzext.vf2 v6,v3
vzext.vf2 v4,v2
vmadd.vv v8,v16,v18
vzext.vf2 v2,v1
vmadd.vv v6,v14,v8
vmadd.vv v4,v12,v6
vmadd.vv v2,v10,v4
Clang:
vwmulu.vx v16, v8, s7
vwmulu.vx v20, v12, t3
vwmaccu.vx v20, t2, v14
vwmaccu.vx v16, s8, v10
Ideally, we should be able combine instruction into vwmacc and transform
vmv.v.x
vx instructions.